Compare commits

...

154 Commits

Author SHA1 Message Date
Ian Romanick
f836ef63fd Bump version to 10.2 (final)
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
2014-06-06 20:40:00 -07:00
Ilia Mirkin
99b9a0973a gk110/ir: fix slct emission
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 9fef8b3d81)
2014-06-06 20:40:00 -07:00
Ilia Mirkin
d36d53b564 gk110/ir: fix interp mode emission
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit d588a4919b)
2014-06-06 18:40:58 -07:00
Ilia Mirkin
283cd12933 nvc0: don't bother trying to set up compute for gk110+
The nouveau fw currently prints a bunch of errors. No point in seeing
those all the time, esp since compute doesn't really work in the first
place.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ben Skeggs <bskeggs@redhat.com>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>

Conflicts:
	src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
(cherry picked from commit ca65fc418f)
2014-06-06 18:40:21 -07:00
Ilia Mirkin
aa8ea648f4 gk110: add in forgotten code for gk110 isa
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ben Skeggs <bskeggs@redhat.com>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>

Conflicts:
	src/gallium/drivers/nouveau/nvc0/nvc0_surface.c
(cherry picked from commit b9ec766bd0)
2014-06-06 18:37:07 -07:00
Ilia Mirkin
e901f40764 gk110/ir: fix ISAD emission with register args
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit ed1b9e5721)
2014-06-06 18:19:45 -07:00
Ilia Mirkin
d5e47ee66b gk110/ir: fix quadon opcode emission
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 6e046508a1)
2014-06-06 18:19:10 -07:00
Ilia Mirkin
932a5dadda gk110/ir: emit texbar the same way that the blob does
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ben Skeggs <bskeggs@redhat.com>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 73eec47ef8)
2014-06-06 18:14:50 -07:00
Tobias Klausmann
203bc289a0 nv50/ir: clear subop when folding constant expressions
Some operations (e.g. OP_MUL/OP_MAD/OP_EXTBF) might have a subop set.
After folding, make sure that it is cleared

Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 3164bfc734)
2014-06-06 18:14:22 -07:00
Kenneth Graunke
11b3011805 i965: Support GL_CLAMP natively on Broadwell.
The new hardware actually supports this OpenGL 1.x feature natively,
so we can finally drop our shader workarounds.

Not many applications use GL_CLAMP, and most use it unintentionally, but
it's trivial to do right, so we should.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 221169693b)
2014-06-06 18:13:03 -07:00
Kenneth Graunke
c62bc58cce i965: Pass brw to translate_wrap_mode().
This lets us do generation checks.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 7f3d64a77b)
2014-06-06 18:12:20 -07:00
Kenneth Graunke
304e80e356 i965: Fix copy and pasted values in Broadwell code.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 7913b4b97b)
2014-06-06 18:11:54 -07:00
Sinclair Yeh
f4aca6868a egl: Check for NULL native_window in eglCreateWindowSurface
We have customers using NULL as a way to test the robustness of the API.
Without this check, EGL will segfault trying to dereference
dri2_surf->wl_win->private because wl_win is NULL.

This fix adds a check and sets EGL_BAD_NATIVE_WINDOW

v2: Incorporated feedback from idr - moved the check to a higher level
function.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 91ff0d4c65)
2014-06-06 18:11:30 -07:00
Marek Olšák
5ab9a9c0cc r600g,radeonsi: don't use hardware MSAA resolve if dst is fast-cleared
It doesn't work and our docs say so too.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
(cherry picked from commit d226191820)
2014-06-06 18:08:23 -07:00
Marek Olšák
ae16f443c2 r600g,radeonsi: disable fast clear if render condition is on
For some reason, CP DMA doesn't follow the predicate bit if I enable it,
so this is the only option.

This fixes piglit: spec/NV_conditional_render/clear

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
(cherry picked from commit bf701a84eb)
2014-06-06 18:03:10 -07:00
José Fonseca
b8241bb3f2 mesa: Make glGetIntegerv(GL_*_ARRAY_SIZE) return GL_BGRA.
Same as b026b6bbfe, but
COLOR_ARRAY_SIZE/SECONDARY_COLOR_ARRAY_SIZE.

Ideally we wouldn't munge the incoming state, so that we wouldn't need
to unmunge it back on glGet*.  But the array size state is copied and
referred in many places, many of which couldn't take an GLenum like
GL_BGRA instead of a plain integer.  So just hack around on glGet*,
to ensure there is no risk of introducing regressions elsewhere.

This bug causes problems to Apitrace, resulting in wrong traces.  See
https://github.com/apitrace/apitrace/issues/261 for details.

Tested with piglit arb_vertex_array_bgra-get, which was created for this
purpose.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit e3e13d6b85)
2014-06-06 17:54:32 -07:00
José Fonseca
224c193237 mesa/main: Make get_hash.c values constant.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 53468dee03)
2014-06-06 17:35:45 -07:00
Beren Minor
494f916125 egl/main: Fix eglMakeCurrent when releasing context from current thread.
EGL 1.4 Specification says that
eglMakeCurrent(display, EGL_NO_SURFACE, EGL_NO_SURFACE, EGL_NO_CONTEXT)
can be used to release the current thread's ownership on the surfaces
and context.

MESA's egl implementation was only accepting the parameters when the
KHR_surfaceless_context extension is supported.

[chadv] Add quote from the EGL 1.4 spec.
Cc: "10,1, 10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
(cherry picked from commit 0ca0d5743f)
2014-06-06 17:15:51 -07:00
Marek Olšák
767bc05309 Revert "glx: load dri driver with RTLD_LOCAL so dlclose never fails to unload"
This reverts commit e3cc0d90e1.

It breaks too many apps and completely breaks my desktop too.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=79469

We'll probably need to re-release all stable versions after this is committed.

Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 0d5ec2c615)
2014-06-06 17:13:03 -07:00
Roland Scheidegger
3aaae6056e llvmpipe: fix crash when not all attachments are populated in a fb
Framebuffers can have NULL attachments since a while. llvmpipe handled
that properly for lp_rast_shade_quads_mask but it seems the change didn't
make it to lp_rast_shade_tile.
This fixes piglit fbo-drawbuffers-none test (though I need to increase
the FB_SIZE from 32 to 256 so the tris cover some tiles fully).
https://bugs.freedesktop.org/show_bug.cgi?id=79421

Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
(cherry picked from commit 576868140b)
2014-06-06 17:06:55 -07:00
Ian Romanick
8b71741222 Bump version to 10.2-rc5
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
2014-05-30 17:11:47 -07:00
Lubomir Rintel
15ec4ef0da i915: add a missing NULL pointer check
mesaVisual can be NULL with configless context since this commit:

    commit 551d459af4
    Author: Neil Roberts <neil@linux.intel.com>
    Date:   Fri Mar 7 18:05:47 2014 +0000

    Add the EGL_MESA_configless_context extension
...
    Previously the i965 and i915 drivers were explicitly creating a zeroed visual
    whenever 0 is passed for the EGLConfig.

We attempt to dereference the visual in i915 and now we don't create a
zeroed-out one one it crashes, breaking at least weston in an i915. There's
no point in doing so as it would be zero anyway.

v2: Fixed a typo in commit message.  Added some tags.

Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1100967
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
(cherry picked from commit 90b5747856)
2014-05-30 17:11:47 -07:00
Ian Romanick
9fde5670e2 glapi: Duplicate GLES1 prototypes in glapi_dispatch.c
These prototypes are necessary because GLES1 library builds will create
dispatch functions for them.  We can't directly include GLES/gl.h
because it would conflict the previously-included GL/gl.h.  Since GLES1
ABI is not expected to every add more functions, the path of least
resistance is to just duplicate the prototypes for the functions that
aren't already in desktop OpenGL.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=79294
Acked-by: Matt Turner <mattst88@gmail.com>
Tested-by: Andreas Boll <andreas.boll.dev@gmail.com>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 7b1aeec9cd)
2014-05-30 17:11:47 -07:00
Ilia Mirkin
76e112380a nvc0: revert mistaken logic to collapse color outputs to the beginning
In commit af38ef907, I added a "fix" to color outputs not being assigned
correctly when sample mask was being output. This was totally wrong --
the color indices (i.e. "si" values) were the ones that were wrong. Undo
that hunk.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit 0d699530ff)

Requested-by: Ilia Mirkin <imirkin@alum.mit.edu>
2014-05-30 17:11:15 -07:00
Ilia Mirkin
8ac81e5b66 mesa/st: fix color outputs in presence of sample mask output
Commit c5d822dad9 added support for sample mask incorrectly. It became
treated as a color output, and messed up the color output indices.
Revert the hunk that did that, and add explicit support just like for
depth/stencil writes.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit ab7bd7093d)

Requested-by: Ilia Mirkin <imirkin@alum.mit.edu>
2014-05-30 17:11:15 -07:00
Rob Clark
6d23a0b2a6 configure: fix build error with XA
Fixes:

xa_tracker.c: In function 'xa_tracker_create':
 xa_tracker.c:147:5: error: implicit declaration of function 'pipe_loader_drm_probe_fd' [-Werror=implicit-function-declaration]

in some build configurations, as XA now implicitly depends on
gallium_drm_loader.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
(cherry picked from commit 20d14ef263)

Bugzilla: https://bugs.gentoo.org/show_bug.cgi?id=511700
Requested-by: Matt Turner <mattst88@gmail.com>
2014-05-30 17:11:15 -07:00
Pavel Popov
8f984928cc i965: Fix Line Stipple enable bit in 3DSTATE_SF for Haswell.
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Pavel Popov <pavel.e.popov@intel.com>
(cherry picked from commit d292d40207)
2014-05-30 17:11:15 -07:00
Jerome Glisse
7ab2363c11 glx: load dri driver with RTLD_LOCAL so dlclose never fails to unload
There is no reason anymore to load with RTLD_GLOBAL and for some driver
this even result in dlclose failing to unload leading to catastrophic
failure with swrast fallback.

Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
(cherry picked from commit e3cc0d90e1)
2014-05-29 15:48:53 -07:00
Brian Paul
55b9effa4a glsl: fix use-after free bug/crash in ast_declarator_list::hir()
The call to get_variable_being_redeclared() may delete 'var' so we
can't reference var->name afterward.  We fix that by examining the
var's name before making that call.

Fixes valgrind warnings and possible crash when running the piglit
tests/spec/glsl-1.30/execution/clipping/vs-clip-distance-in-param.shader_test
test (and probably others).

Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
(cherry picked from commit f9cecca7a6)
2014-05-29 15:48:02 -07:00
Kenneth Graunke
5347fc5295 i965: Fix repeated usage of rectangle texture coordinate scaling.
Previously, we set up new entries in the params[] array on every access
of a rectangle texture.  Unfortunately, we only reserve space for
(2 * MaxTextureImageUnits) extra entries, so programs which accessed
rectangle textures more times than that would write off the end of the
array and likely crash.

We don't really have a decent mapping between the index returned by
_mesa_add_state_reference and our index into the params array, so we
have to manually search for it.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78691
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit bb9623a1a8)
2014-05-29 15:47:29 -07:00
Topi Pohjolainen
e8e48889e6 meta/blit: Use gl_FragColor also in the msaa blit shader
Fixes framebuffer_blit_functionality_multisampled_to_singlesampled_blit
es3 cts test on bdw. Also fixes this on ivb when ivb is forced to use
the meta path.

No piglit regressions on IVB.

Further input from Ken:

 "Unfortunately, this doesn't fix MRT for integer data.

  In the single-sampled case, since we're directly copying data, we were
  read/copy/write data as "float" values, which actually contained the
  integer bits.  Here, we can't do that since we need to process the
  actual integer data.

  I do wonder if we could use intBitsToFloat/uintBitsToFloat to stuff the
  integer bits in the float gl_FragColor output.  Just a crazy idea.

  In the long term (post 10.2), I think we should draft an extension that
  allows you to do "layout(location = all)" on user-defined fragment
  shader outputs.  (Or some similar syntax.)"

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
(cherry picked from commit a6022e5405)
2014-05-29 15:46:26 -07:00
Topi Pohjolainen
af3d4eddc1 i965/meta: Store stencil texturing mode
Meta path needs to keep the current texture object's state. Fixes
the following gles3 cts tests on bdw:

framebuffer_blit_functionality_negative_width_blit.test: fail
framebuffer_blit_functionality_all_buffer_blit.test: fail
framebuffer_blit_functionality_negative_height_blit.test: fail
framebuffer_blit_functionality_missing_buffers_blit.test: fail
framebuffer_blit_functionality_negative_dimensions_blit.test: fail
framebuffer_blit_functionality_minifying_blit.test: fail
framebuffer_blit_functionality_magnifying_blit.test: fail

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 57730d67f6)
2014-05-29 15:45:43 -07:00
Topi Pohjolainen
75ae4fff35 meta/blit: Add stencil texturing mode save and restore
v2 (Ken): Only restore the mode if it has changed.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit c246828c4d)
2014-05-29 15:44:45 -07:00
Matt Turner
c984e5bd2e Revert "i965: Don't make instructions with a null dest a barrier to scheduling."
This reverts commit 42a26cb5e4.

Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78648
(cherry picked from commit 0d3f83f4ad)
2014-05-29 15:44:09 -07:00
Matt Turner
ca6b38b80a Revert "i965/fs: Simplify interference scan in register coalescing."
This reverts commit 5ff1e446d4.

Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77704
(cherry picked from commit a39428cf5c)
2014-05-29 15:42:43 -07:00
Matt Turner
b814afeb6c Revert "i965/fs: Give up in interference check if we see a WHILE."
This reverts commit 55de1c035c.

Cc: "10.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit fc025a6719)
2014-05-29 15:41:53 -07:00
Matt Turner
17c7ead727 Revert "i965/fs: Reduce restrictions on interference in register coalescing."
This reverts commit f770123f58.

Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78692
(cherry picked from commit ccb1ea8a15)
2014-05-29 15:40:55 -07:00
Emil Velikov
2a29dbdc6e glx: do not leak dri3Display
v2: Do not wrap the code in ifdef HAVE_DRI3 (suggested by Keith)

Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
Cc: Keith Packard <keithp@keithp.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit eb2241f8a9)
2014-05-29 15:40:09 -07:00
Matt Turner
03e93f6079 Revert "i965/fs: Change fs_visitor::emit_lrp to use MAC for gen<6"
This reverts commit a6860100b8.

Why this code didn't work in all circumstances is unknown and without a
working Ironlake simulator (which uses a different AUB format) we'll
probably never know, short of a lot of experimentation, and spending a
bunch of time to try to optimize a few instructions on Ironlake is not
time well spent.

Moreover, for mix(vec4, vec4, vec4) using the accumulator introduces a
dependence between the otherwise independent per-component calculations.
Not using the accumulator, even if it means an extra instruction per
component might be preferable. We don't know, we don't have data, and
we don't have the necessary register on Ironlake for shader_time to tell
us.

Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77707
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit c2c639ecf6)
2014-05-29 15:17:53 -07:00
Matt Turner
bc4b9467af Revert "i965/vec4: Change vec4_visitor::emit_lrp to use MAC for gen<6"
This reverts commit 2dfbbeca50 with the
comment about MAC and implicit accumulator removed.

Why this code didn't work in all circumstances is unknown and without a
working Ironlake simulator (which uses a different AUB format) we'll
probably never know, short of a lot of experimentation, and spending a
bunch of time to try to optimize a few instructions on Ironlake is not
time well spent.

Moreover, for mix(vec4, vec4, vec4) using the accumulator introduces a
dependence between the otherwise independent per-component calculations.
Not using the accumulator, even if it means an extra instruction per
component might be preferable. We don't know, we don't have data, and
we don't have the necessary register on Ironlake for shader_time to tell
us.

Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77703
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit db42dd8952)
2014-05-29 15:17:28 -07:00
Christoph Bumiller
7efdc55f5f nv50/ir/tgsi: optimize KIL
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit d479713d25)
2014-05-29 15:16:56 -07:00
Christoph Bumiller
9ea859931e nv50/ir: fix lowering of predicated instructions (without defs)
Note that predicated instructions with defs are still not supported
because transformation to SSA doesn't handle them yet.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 452a4151aa)
2014-05-29 15:16:24 -07:00
Christoph Bumiller
4e5296208d nv50/ir/opt: fix constant folding with saturate modifier
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 3b0867f35b)
2014-05-29 15:16:03 -07:00
Christoph Bumiller
1ced952686 nv50/ir/tgsi: TGSI_OPCODE_POW replicates its result
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 2f2d1b3d9b)
2014-05-29 15:15:59 -07:00
Christoph Bumiller
afe723ce5f nv50,nvc0: set constbufs dirty on pipe context switch
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 49eccef06b)
2014-05-29 15:15:39 -07:00
Christoph Bumiller
8b74c2bdbd nv50: setup scissors on clear_render_target/depth_stencil
[imirkin: add logic to also clear the "regular" scissors]
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 200382be85)
2014-05-29 15:15:10 -07:00
Christoph Bumiller
4afbd9b0e2 nv50,nvc0: always pull out bufctx on context destruction
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 7d11b761f2)
2014-05-29 15:01:49 -07:00
Ian Romanick
697316fe06 Bump version to 10.2-rc4
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
2014-05-23 17:36:42 -07:00
Ian Romanick
bfaee5277a Merge remote-tracking branch 'robclark/freedreno-10.2' into 10.2 2014-05-23 17:21:59 -07:00
Pavel Popov
9a8f12ae03 i965: Properly return *RESET* status in glGetGraphicsResetStatusARB
The glGetGraphicsResetStatusARB from ARB_robustness extension always
returns GUILTY_CONTEXT_RESET_ARB and never returns NO_ERROR for guilty
context with LOSE_CONTEXT_ON_RESET_ARB strategy.  This is because Mesa
returns GUILTY_CONTEXT_RESET_ARB if batch_active !=0 whereas kernel
driver never reset batch_active and this variable always > 0 for guilty
context.  The same behaviour also can be observed for batch_pending and
INNOCENT_CONTEXT_RESET_ARB.

But ARB_robustness spec says:

  If a reset status other than NO_ERROR is returned and subsequent calls
  return NO_ERROR, the context reset was encountered and completed. If a
  reset status is repeatedly returned, the context may be in the process
  of resetting.

  8. How should the application react to a reset context event?
  RESOLVED: For this extension, the application is expected to query the
  reset status until NO_ERROR is returned. If a reset is encountered, at
  least one *RESET* status will be returned. Once NO_ERROR is
  encountered, the application can safely destroy the old context and
  create a new one.

The main problem is the context may be in the process of resetting and
in this case a reset status should be repeatedly returned.  But looks
like the kernel driver returns nonzero active/pending only if the
context reset has already been encountered and completed.  For this
reason the *RESET* status cannot be repeatedly returned and should be
returned only once.

The reset_count and brw->reset_count variables can be used to control
that glGetGraphicsResetStatusARB returns *RESET* status only once for
each context.  Note the i915 triggers reset_count twice which allows to
return correct reset count immediately after active/pending have been
incremented.

v2 (idr): Trivial reformatting of comments.

Signed-off-by: Pavel Popov <pavel.e.popov@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 8dc4a98c44)
2014-05-23 09:57:18 -07:00
Emil Velikov
a31062fcb3 targets/egl-static: add missing line break in ldflags
Accidently omitted by commit 7b7944ee1c.

Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Jon TURNEY <jon.turney@dronecode.org.uk>
(cherry picked from commit e0372239a5)
2014-05-23 09:57:15 -07:00
James Legg
a1fff38c96 mesa: Fix unbinding GL_DEPTH_STENCIL_ATTACHMENT
glFramebufferRender(..., GL_DEPTH_STENCIL_ATTACHMENT, ..., 0) only
detached the depth buffer and not the stencil buffer.

Bugzilla: http://bugs.freedesktop.org/show_bug.cgi?id=79115
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 846c715abb)
2014-05-23 09:56:26 -07:00
Jordan Justen
1db3ebd8a5 meta blit: Set Z texcoord during meta blit to sample the correct layer
If the source renderbuffer has a depth > 0, then send a Z texcoord
which is set to the source attachment Z offset.

This fixes piglit's gl-3.2-layered-rendering-gl-layer-render with the
GL_TEXTURE_2D_MULTISAMPLE_ARRAY case test on i965/gen8.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 57876fee38)
2014-05-23 09:55:23 -07:00
Kenneth Graunke
7cf3a674ea i965: Listen to BRW_NEW_FRAGMENT_PROGRAM for 3DSTATE_PS_BLEND.
brw_color_buffer_write_enabled depends on brw->fragment_program, which
means we have to listen to BRW_NEW_FRAGMENT_PROGRAM.

On most generations, this was only called from a function that already
subscribed.  However, on Broadwell, we failed to listen to the necessary
event in the atom that emits 3DSTATE_PS_BLEND.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 746921cbb4)
2014-05-23 09:54:41 -07:00
Kenneth Graunke
d2521a44af i965: Use WE_all for FB write header setup on Broadwell.
I forgot to disable writemasking on the OR and MOV which set the render
target index and "source 0 alpha present to render target" bit.

Using get_element_ud is equivalent and avoids a line-wrap.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 7d3985ca6c)
2014-05-23 09:54:15 -07:00
Anuj Phogat
00f2dcb791 meta: Use gl_FragColor to output color values to all the draw buffers
_mesa_meta_setup_blit_shader() currently generates a fragment shader
which, irrespective of the number of draw buffers, writes the color
to only one 'out' variable. Current shader rely on an undefined
behavior and possibly works by chance.

From OpenGL 4.0  spec, page 256:
  "If a fragment shader writes to gl_FragColor, DrawBuffers specifies a
   set of draw buffers into which the single fragment color defined by
   gl_FragColor is written. If a fragment shader writes to gl_FragData,
   or a user-defined varying out variable, DrawBuffers specifies a set
   of draw buffers into which each of the multiple output colors defined
   by these variables are separately written. If a fragment shader writes
   to none of gl_FragColor, gl_FragData, nor any user defined varying out
   variables, the values of the fragment colors following shader execution
   are undefined, and may differ for each fragment color."

OpenGL 4.4 spec, page 463, added an additional line in this section:
  "If some, but not all user-defined output variables are written, the
   values of fragment colors corresponding to unwritten variables are
   similarly undefined."

V2: Write color output to gl_FragColor instead of writing to multiple
    'out' variables. This'll avoid recompiling the shader every time
    draw buffers count is updated.

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 46737cebd3)
2014-05-23 09:53:42 -07:00
Anuj Phogat
ed1ffa0197 meta: Refactor _mesa_meta_setup_blit_shader() to avoid duplicate shader code
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit bee2915210)
2014-05-23 09:52:29 -07:00
Ilia Mirkin
5d056f51ab tgsi: add GS_INVOCATIONS to property names array
In commit 4be146b1, I neglected to add the new property to the strings
array. This leads to the string '(null)' to be printed instead when
converting a GS shader to text.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
(cherry picked from commit cdeb7004e0)
2014-05-23 09:51:49 -07:00
Ilia Mirkin
6be7789e11 nv50,nvc0: fix 3d blits with mipmap levels
Make sure to normalize the z coordinates as well as the x/y ones when
there are mipmaps present. Fixes 3d mipmap generation, which now uses
the blit path.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ben Skeggs <bskeggs@redhat.com>
(cherry picked from commit 28360fcad7)
2014-05-23 09:51:26 -07:00
Ilia Mirkin
d6a4c3c29c nv50/ir: fix constant folding for OP_MUL subop HIGH
These instructions can come in either through IMUL_HI/UMUL_HI TGSI
opcodes, or from OP_DIV constant folding.

Also make sure that the constant foldings which delete the original
instruction still get counted as having done something.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ben Skeggs <bskeggs@redhat.com>
(cherry picked from commit d2a3de19c6)
2014-05-23 09:51:06 -07:00
Ilia Mirkin
9028b94670 nv50/ir: fix s32 x s32 -> high s32 multiply logic
Retrieving the high 32 bits of a signed multiply is rather annoying. It
appears that the simplest way to do this is to compute the absolute
value of the arguments, and perform a u32 x u32 -> u64 operation. If the
arguments' signs differ, then negate the result. Since there is no u64
support in the cvt instruction, we have the perform the 2's complement
negation "by hand".

This logic can come into use by the IMUL_HI instruction (very unlikely
to be seen), as well as from constant folding of division by a constant.
Fixes dolphin's divisions by 255.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ben Skeggs <bskeggs@redhat.com>
(cherry picked from commit d3a5cf052c)
2014-05-23 09:50:26 -07:00
Kenneth Graunke
085d6bd5e7 meta: Avoid _swrast_BlitFramebuffer in the meta CopyTexSubImage code.
This is a replacement for bd44ac8b5c
that should actually work.

Fixes Piglit's copyteximage-border on swrast, as well as one of
es3conform's packed_pixels_pixelstore test.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78546
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77705
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 2ecc7268ba)
2014-05-23 09:49:28 -07:00
Kenneth Graunke
fd0ea5be9d meta: Split _swrast_BlitFramebuffer out of the meta blit path.
Separating the software fallbacks from the rest of the meta path (which
is usually hardware accelerated) gives callers better control over their
blitting options.

For example, i965 might want to try meta blit, hardware blits, then
swrast as a last resort.  Splitting it makes that possible.

This updates all callers to maintain the existing behavior (even in the
few cases where it isn't desirable behavior - later patches can change
that).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 54540ea691)
2014-05-23 09:48:13 -07:00
Kenneth Graunke
27d4836f35 meta: Drop unnecessary early returns in _mesa_meta_BlitFramebuffer.
These aren't necessary - all of the following code is predicated on mask
being non-zero, so no code will get executed anyway.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Courtney Goeltzenleuchter <courtney@lunarg.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit d89ce333cc)
2014-05-23 09:47:37 -07:00
Kenneth Graunke
e306ba9a9b Revert "i965: Don't _swrast_BlitFramebuffer when doing CopyTexSubImage."
This reverts commit bd44ac8b5c.

Fixes:
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78842
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78843

Re-breaks:
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77705
but that will be fixed properly in a few commits.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 2fa3796bc1)
2014-05-23 09:46:57 -07:00
Topi Pohjolainen
81fb9ef112 i965/fbo: Only try stencil meta blits on gen >= 8
I don't have an ILK at hand but the fix should be trivial.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78872
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-and-tested-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 21dddb22c1)
2014-05-23 09:46:28 -07:00
Kenneth Graunke
32549f3f17 mesa: Disable GL_EXT_framebuffer_multisample_blit_scaled on Broadwell.
It's not properly implemented in the meta code, and we don't have time
to fix it for 10.2.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 0b96d362bf)
2014-05-23 09:45:52 -07:00
Ilia Mirkin
9576e17804 nv50/ir: fix integer mul lowering for u32 x u32 -> high u32
UNION appears to expect that all of its sources are conditionally
defined. Otherwise it inserts an unpredicated mov instruction which
overwrites the desired result. This fixes tests that use UMUL_HI, and
much less directly, unsigned integer division by a constant, which uses
this functionality in a peephole pass.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ben Skeggs <bskeggs@redhat.com>
(cherry picked from commit 5b8f1a0f7c)
2014-05-23 09:45:13 -07:00
Ilia Mirkin
cc65bc4d15 nv50/ir: make sure that texprep/texquerylod's args get coalesced
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ben Skeggs <bskeggs@redhat.com>
(cherry picked from commit 4ebaabcccb)
2014-05-23 09:40:26 -07:00
Jeremy Huddleston Sequoia
25e641213f darwin: Fix test for kCGLPFAOpenGLProfile support at runtime
Signed-off-by: Jeremy Huddleston Sequoia <jeremyhu@apple.com>
(cherry picked from commit 7a109268ab)
2014-05-20 10:55:12 -07:00
Rob Clark
e084f71548 freedreno: don't advertise texture arrays for now
I think a3xx and later should support (it is part of GLES3), but this
isn't needed for the time being and still needs to be reversed.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-05-20 10:55:54 -04:00
Rob Clark
cdd328639f freedreno/a3xx: shadow sampler support
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-05-20 08:48:49 -04:00
Rob Clark
6440561737 freedreno/a3xx/compiler: refactor trans_samp()
Split it up into some smaller fxns so it doesn't grow into a huge
monster as we add things.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-05-20 08:48:49 -04:00
Rob Clark
fb4461b7dc freedreno: update generated headers
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-05-20 08:48:20 -04:00
Rob Clark
fec2b45d02 freedreno/a3xx: use util_format_compose_swizzles()
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-05-20 08:48:20 -04:00
Rob Clark
d0c813c40a freedreno/a3xx/compiler: 1D textures
Gallium already gives us height==1 for these, so the texture state is
already setup correctly to emulate 1D textures as a Nx1 2D texture.  We
just need to supply the .y coord.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-05-20 08:48:20 -04:00
Rob Clark
a05c073d79 freedreno: fix caps
In particular, we want mesa to emulate primitive restart for us.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-05-20 08:48:20 -04:00
Rob Clark
031ee21961 freedreno: fix index buffer offset
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-05-20 08:48:20 -04:00
Rob Clark
b7604eff4c freedreno/a3xx: add sRBG texture support
That was easy.  Turns out it is just a matter of setting one bit.
Enable sampling from sRGB texture, and therefore enable GL 2.1 :-)

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-05-20 08:48:20 -04:00
Rob Clark
80da86c650 freedreno: update generated headers
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-05-20 08:48:20 -04:00
Rob Clark
3c0ca023dd freedreno/a3xx: fix write to bogus register
The loops for updating the multiple packed fields in SP_VS_OUT[] and
SP_VS_VPC_DST[] will zero out one register beyond the last that on
required.  Which is normally not a problem (and is kinda convenient
when looking at cmdstream dumps) unless we have maximum (16) varyings.

Fix loop termination condition so that this does not happen.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-05-20 08:47:20 -04:00
Rob Clark
516db26e1e freedreno/a3xx: account for special inputs/outputs
We need to size input/output tables big enough for special inputs/
outputs (gl_Position, gl_FrontFacing, etc) which, while they don't
count towards the hw limit of 16 attributes or 16 varyings, we do
still need to track them all the same.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-05-20 08:47:19 -04:00
Rob Clark
d5d9984c2b freedreno/a3xx: fix MAX_INPUTS shader cap
Hardware only supports 16.  Which fd3_shader_variant properly reflected,
but the pipe cap did not, leading to array overflow (and shaders that
could not possibly work).

Also a bunch of asserts to make problems like this easier to see.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-05-20 08:47:19 -04:00
Ryan Houdek
6db6f05fae freedreno/a3xx/compiler: add KILL_IF
The KILL_IF opcode could potentially be merged in to the regular KILL
opcode function.  It was a pain to do so, so I've left is separated
for cleanliness.

Signed-off-by: Ryan Houdek <Sonicadvance1@gmail.com>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-05-20 08:47:19 -04:00
Ryan Houdek
c338759051 freedreno/a3xx/compiler: start adding integer support
Adds a large sum of TGSI opcodes to the a3xx compiler.

For integer opcodes we have 28 opcodes added.
Adds 4 floating point compare opcodes

If GLSL 1.30 is enabled, this allows the GLSL 1.30 piglits to have a
completion amount of 432/641.

Signed-off-by: Ryan Houdek <Sonicadvance1@gmail.com>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-05-20 08:46:38 -04:00
Rob Clark
47a6830e22 freedreno/a3xx: occlusion query support
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-05-20 08:46:38 -04:00
Rob Clark
3ffc507c94 freedreno: add support for hw queries
Real GPU queries need some infrastructure to track samples per tile and
accumulate the results.  But fortunately this can be shared across GPU
generation.

See:
https://github.com/freedreno/freedreno/wiki/Queries#hardware-queries

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-05-20 08:46:38 -04:00
Rob Clark
c94e339adc freedreno/query: allow multiple query implementations
Split out fd_query into an abstract base class, to allow multiple
implementations.  The current sw based queries are moved into
fd_sw_query.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-05-20 08:45:50 -04:00
Rob Clark
a5951d09a5 freedreno/a3xx: add point-size
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-05-20 08:45:50 -04:00
Rob Clark
3475ca1f00 freedreno: update generated headers
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-05-20 08:45:50 -04:00
Rob Clark
3733cc3e8f freedreno/a2xx: fix compiler warning
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-05-20 08:45:50 -04:00
Jeremy Huddleston Sequoia
ac49f97f12 glapi: Avoid heap corruption in _glapi_table
Signed-off-by: Jeremy Huddleston Sequoia <jeremyhu@apple.com>
Reviewed-by: Chia-I Wu <olv@lunarg.com>
(cherry picked from commit ff5456d1ac)
2014-05-20 01:39:17 -07:00
Ian Romanick
d0aa394741 Bump version to 10.2-rc3
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
2014-05-16 23:48:44 -07:00
Brian Paul
4baf6f12a5 mesa: fix double-freeing of dispatch tables inside glBegin/End.
We allocate dispatch tables for BeginEnd and OutsideBeginEnd.  But
when we destroy the context we were freeing the BeginEnd and Exec
tables.  If Exec==BeginEnd we did a double-free.  This would happen
if the context was destroyed while inside a glBegin/End pair.  Now
free the BeginEnd and OutsideBeginEnd pointers.

Cc: "10.1", "10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
(cherry picked from commit ef6b6658f9)
2014-05-16 23:46:34 -07:00
Michel Dänzer
21792665c7 glsl_to_tgsi: Make sure the 'shader' member is always initialized
Fixes the valgrind report below and random crashes with piglit on radeonsi.

==30005== Conditional jump or move depends on uninitialised value(s)
==30005==    at 0xB13584E: st_translate_program (st_glsl_to_tgsi.cpp:5100)
==30005==    by 0xB14698B: st_translate_fragment_program (st_program.c:747)
==30005==    by 0xB14777D: st_get_fp_variant (st_program.c:824)
==30005==    by 0xB11219C: get_color_fp_variant (st_cb_drawpixels.c:1042)
==30005==    by 0xB1131AE: st_DrawPixels (st_cb_drawpixels.c:1154)
==30005==    by 0xAFF8806: _mesa_DrawPixels (drawpix.c:162)
==30005==    by 0x4EB86DB: stub_glDrawPixels (generated_dispatch.c:6640)
==30005==    by 0x4F1DF08: piglit_visualize_image (piglit-util-gl.c:1574)
==30005==    by 0x40691D: draw_image_to_window_system_fb(int, bool) (draw-buffers-common.cpp:733)
==30005==    by 0x406C8B: draw_reference_image(bool, bool) (draw-buffers-common.cpp:854)
==30005==    by 0x40722A: piglit_display (alpha-to-coverage-dual-src-blend.cpp:117)
==30005==    by 0x4EA7168: run_test (piglit_fbo_framework.c:52)

Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
(cherry picked from commit 2bab95973d)
2014-05-16 23:45:50 -07:00
Topi Pohjolainen
872ea423ac i965/fb: Use meta path for stencil up/downsampling
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
(cherry picked from commit d45fadf11a)
2014-05-16 23:45:24 -07:00
Topi Pohjolainen
ad8ad99eff i965/meta: Stencil blit for miptree updownsampling
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 475216a4f0)
2014-05-16 23:43:16 -07:00
Topi Pohjolainen
62f1509070 i965/fb: Use meta path for stencil blits
This is effective only on gen8 for now as previous generations still
go through blorp.

Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit b18f6b9b86)
2014-05-16 23:42:08 -07:00
Topi Pohjolainen
eb2ef1641c i965/meta: Stencil blits
v2: Create the intel renderbuffer with level hardcoded to zero instead
    of overriding it in the surface state configuration. Also moved the
    dimension adjustments for tiling, mip level, msaa into the render
    buffer creation. Finally prepares for another blit path needed for
    miptree updownsampling.
v3 (Ken): Dropped unnecessary memory context for "ralloc_asprintf()"

Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
(cherry picked from commit d1829badf5)
2014-05-16 23:41:56 -07:00
Topi Pohjolainen
947b60d19e meta: Refactor state save/restore for framebuffer texture blits
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
(cherry picked from commit 2a549c43a8)

Note: This patch was cherry picked so that the next patch would build.
2014-05-16 23:41:40 -07:00
Topi Pohjolainen
cb37016f89 i965: Extend brw_get_rb_for_first_slice() for specified level/layer
v2: Configure stencil directly for final dimensions instead of
    adjusting bit by bit for tiling, mip level and msaa.
v3 (Ken): Used non-static constant for horizontal alignment

Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 9d752c098c)
2014-05-16 23:31:05 -07:00
Topi Pohjolainen
43ea5f9347 i965/gen8: Surface state overriding for stencil
v2: Allow hardware to offset accesses to individual layers. Also leave
    the mip-level overriding for the creator of the intel renderbuffer
    to handle. Merged with "i965/gen8: Allow stencil buffers to be
    configured as single sampled"

Ken: I left the "_mesa_problem()" still in place. I think it is clearer
     to remove it in a separate patch.

Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 36caae48b2)
2014-05-16 23:28:04 -07:00
Topi Pohjolainen
b5e717a618 i965/wm: Surface state overrides for configuring w-tiled as y-tiled
v2: Use intel_mipmap_tree::total_width in order to get correct alignment
    automatically. Also use "mt->total_height / mt->physical_depth0" as
    surface height allowing hardware to offset to correct slice.

Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 6aefaa4eb2)
2014-05-16 23:27:29 -07:00
Jordan Justen
f5848ec2e4 i965 meta up/downsample: Fix renderbuffer _BaseFormat
mt->format is of type mesa_format, and therefore can't be
used with _mesa_base_fbo_format which requires a GLenum input.

On gen8, this fixes various piglit fbo-depthstencil tests with
samples > 1.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 103057b2b7)
2014-05-16 23:26:58 -07:00
Roland Scheidegger
79a34441d5 mesa/st: fix number of ubos being declared in a shader
Previously the code used the total number of ubos being declared in the
linked program (so the ubos of all shaders combined), use the number
from the particular shader instead.
This fixes an assertion failure with piglit arb_uniform_buffer_object-maxblocks
seen in llvmpipe since 8a9f5ecdb1 as it now emits
code for each declared buffer, not just the ones actually used.

CC: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
(cherry picked from commit 3e817e7e56)
2014-05-16 23:17:47 -07:00
Emil Velikov
1041fb86c0 docs: Add a note about llvm-shared-libs and libxatracker
Both changes landed in 10.2, and for people not following the
development cycle these will come as a surprise. Note that the
pipe_* interface is not stable.

Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
(cherry picked from commit e48054d036)
2014-05-16 23:15:14 -07:00
Emil Velikov
b1aa25907a configure: correctly set LD_NO_UNDEFINED
Commit 11623be934 was meant to have this hunk, which
I accidently dropped during git rebase.

Cc: 10.2 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Julien Cristau <jcristau@debian.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jonathan Gray <jsg@jsg.id.au>
(cherry picked from commit f57d092199)
2014-05-16 23:15:09 -07:00
Michel Dänzer
5d6e822d03 radeonsi: Fix anisotropic filtering state setup
Bring it back in line with r600g. I broke this in the original radeonsi
bringup. :(

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78537

Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit c5828b0599)
2014-05-16 23:14:36 -07:00
Jonathan Gray
26d5b22039 glsl: simplify the M_PI*f macros, fixes build on OpenBSD
The M_PI*f macros used a preprocessor paste to append 'f'
to M_PI defines, which works if the values are only numbers
but breaks on OpenBSD where M_PI definitions have casts
and brackets to meet requirements of a future version of POSIX,

http://austingroupbugs.net/view.php?id=801
http://austingroupbugs.net/view.php?id=828

Simplify the M_PI*f macros by using casts directly in the defines
as suggested by Kenneth Graunke.

Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78665
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
(cherry picked from commit 0c0bbe77d0)
2014-05-16 23:13:37 -07:00
Kenneth Graunke
3171da3402 i965: Don't _swrast_BlitFramebuffer when doing CopyTexSubImage.
The point of copytexsubimage_using_blit_framebuffer is to use a hardware
accelerated BlitFramebuffer path.  If that fails, we shouldn't do a
swrast blit---we should try our CTSI fallback code.

This is especially important for i965 and GLES, where we don't even
create a swrast context.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77705
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit bd44ac8b5c)
2014-05-16 23:13:04 -07:00
Kristian Høgsberg
875fd92d16 wayland: Move version 2 request to end of interface specification
We're moving towards requiring interface additions to be appended to the
end of the interface block.  No functional change, opcodes are assigned as
before, but version 2 additions are now grouped together, which prevents
a scanner warning.

Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
(cherry picked from commit 06842d436e)
2014-05-16 23:12:45 -07:00
Topi Pohjolainen
fb5c68d312 meta: Refactor configuration of renderbuffer sampling
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 4dc9c314c8)
2014-05-16 23:07:02 -07:00
Topi Pohjolainen
0e7b0f2a0a meta: Refactor binding of renderbuffer as texture image
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit a2952315ac)
2014-05-16 23:05:22 -07:00
Topi Pohjolainen
5f495b85a0 meta: Merge compiling and linking of blit program
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit ac4db0aa55)
2014-05-16 23:04:22 -07:00
Topi Pohjolainen
253834cbf6 i965/blorp: Expose coordinate scissoring and mirroring
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 3a43cd0c3e)
2014-05-16 23:00:40 -07:00
Topi Pohjolainen
f5c083dbc3 i965/gen8: Use helper variables for surface parameters
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 4a92ad5531)
2014-05-16 22:55:44 -07:00
Jordan Justen
2b4a871e05 i965/gen8: Set depth extent field
The depth extent field is used to limit the allowed slice range that
can be rendered to.

With the previous setting, only slice 0 could be rendered.

This fixes piglit amd_vertex_shader_layer-layered-depth-texture-render.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
(cherry picked from commit c51c192891)
2014-05-14 12:19:16 -07:00
Jordan Justen
27da0bbeb4 i965/gen8 depth: Set depth size based on LOD0 for 3D textures
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
(cherry picked from commit 294ada2fef)
2014-05-14 12:19:14 -07:00
Jordan Justen
91e2808c41 i965/gen7 depth: Set depth size based on LOD0 for 3D textures
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
(cherry picked from commit e6d6ed55ab)
2014-05-14 12:19:13 -07:00
Jordan Justen
6cad93daab i965/gen8 renderbuffer: Set depth size based on LOD0 for 3D textures
Fixes piglit's
'gl-3.2-layered-rendering-clear-color-all-types 3d mipmapped'

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
(cherry picked from commit e47d08adef)
2014-05-14 12:19:12 -07:00
Jordan Justen
71f78bb87e i965/gen7 renderbuffer: Set depth size based on LOD0 for 3D textures
If blorp is disabled for color clears, then piglit's
'gl-3.2-layered-rendering-clear-color-all-types 3d mipmapped'
will fail.

Currently, gen8 fails similarly on this test because gen8
does not use blorp.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
(cherry picked from commit b875f39e29)
2014-05-14 12:19:08 -07:00
Chris Forbes
ab43a98fcf i965/Gen8: Set up layer constraints properly for depth buffers
Same issues as the previous commit fixed for Gen7:
- Bogus physical->logical layer conversion; depth/stencil surfaces
  are still IMS layout on Gen8.
- mt_layer ignored in layered rendering case, which breaks handling
  of views with MinLayer.
- Render target array extent not set correctly for arrays.

I'm not able to test this one since I can't get a Broadwell yet, but
it's the same set of fixes as for Gen7.

V2: Restore the MAX2() to account for zero depth/layer_count.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 23e9f06569)
2014-05-14 12:16:54 -07:00
Chris Forbes
af228e999c i965/Gen7: Set up layer constraints properly for depth buffers
Again, a few problems:
- Layered attachments did not honor MinLayer.
- Non-layered MSAA attachments rendered to the wrong layer due to
  dividing by the layer count. All depth buffers use the IMS layout, so
  the physical layer count == logical layer count.
- Layered attachments were not limited to irb->layer_count, so we could
  render off the end of the texture.

V2: Restore the MAX2() to account for zero depth/layer_count.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 77d55ef481)
2014-05-14 12:16:51 -07:00
Chris Forbes
725a27e04d i965/Gen8: Set up layer constraints properly for renderbuffers
Fixing the same issues the previous commit does for Gen7.

Note that I can't test this one, since I don't have a Broadwell.

V2: Restore the MAX2() to account for zero depth/layer_count.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 9269ea599c)
2014-05-14 12:16:50 -07:00
Chris Forbes
b0609b715b i965/Gen7: Set up layer constraints properly for renderbuffers
There were a few problems here, which mostly just broke layered
rendering into a view:

- Render target view extent was always set to be == depth. This is
  benign for non-layered-rendering, but allows writes off the end of the
  render target for layered rendering, which ends badly.
- Layered rendering did not honor the mt_layer setting, so would not
  properly handle MinLayer being set on a view.

V2: Restore the MAX2() to account for zero depth/layer_count.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit dd43900b7b)
2014-05-14 12:16:47 -07:00
Ilia Mirkin
ca549a0f19 nv50,nvc0: fix blit 3d path for 1d array textures
Need to adjust coordinates since the shader receives the array index as
depth in z, but the TEX instruction expects it to be the second
coordinate for a 1D array texture. This fixes fbo-generatemipmap-array.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ben Skeggs <bskeggs@redhat.com>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 8baed87212)
2014-05-13 10:19:04 -07:00
Ilia Mirkin
407bff9db0 nv50,nvc0: leave queries on during blit, turn them on for 2d engine
Fixes the new logic of the conditional rendering piglit test.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ben Skeggs <bskeggs@redhat.com>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 4467c0c9fb)
2014-05-13 10:18:05 -07:00
Ilia Mirkin
0e14b19492 mesa/st: leave current query enabled during glBlitFramebuffer
Also make sure that pipe_blit_info gets zero'd out so that query isn't
accidentally left enabled.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 64a7ddf40d)
2014-05-13 10:11:00 -07:00
Ilia Mirkin
a233f4c303 gallium: add bit to pipe_blit_info to leave current query enabled
Previously the implication was that queries should be disabled during
blits. However glBlitFramebuffer() is supposed to obey the current
query, and this new bit will indicate that to the driver.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 752ce0affb)
2014-05-13 10:08:33 -07:00
Ilia Mirkin
7a81788c67 nv50: fix setting of texture ms info to be per-stage
Different textures may be bound to each slot for each stage. So we need
to be able to upload ms parameters for each one without stages
overwriting each other.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ben Skeggs <bskeggs@redhat.com>
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 863573b9cb)
2014-05-13 10:08:01 -07:00
Ilia Mirkin
13bb2bc84b nv50/ir: make sure to reverse cond codes on all the OP_SET variants
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ben Skeggs <bskeggs@redhat.com>
Cc: "10.2 10.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 68f47cad0d)
2014-05-13 09:57:28 -07:00
Ian Romanick
98b66e8d96 Add .cherry-ignore file
e696727 adds a change, and 155f98d reverts that change.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
2014-05-13 09:55:23 -07:00
Ian Romanick
0b3126bddd mesa: Bump version to 10.2-rc2
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
2014-05-09 20:10:38 -07:00
Emil Velikov
f2682b3b9f glx/tests: Partially revert commit 51e3569573
C++ does not support designated initializers, thus compilation
is not guaranteed to succeed. Surprisingly gcc 4.6.3 fails to
build the code, while version 4.9.0 compiles it without a hitch.

Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78403
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Tested-by: Vinson Lee <vlee@freedesktop.org>
(cherry picked from commit 326b8e253e)
2014-05-09 20:10:38 -07:00
Emil Velikov
d259928a56 configure: error out if building GBM without dri
Both backends require --enable-dri, and building an empty libgbm
makes little to no sense. Error out at configure to prevent the
user from shooting themselves in the foot.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78225
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit e477d12c33)
2014-05-09 20:10:38 -07:00
Kenneth Graunke
ec6bd21162 i965: Fix GPU hangs on Broadwell in shaders with some control flow.
According to the documentation, we need to set the source 0 register
type to IMM for flow control instructions that have both JIP and UIP.

Fixes GPU hangs in approximately 10 Piglit tests, 5 es3conform tests,
Unigine Crypt, a WebGL raytracer demo, and several Steam titles.

Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75478
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75878
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76939
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Tested-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit 9584959123)
2014-05-09 20:10:37 -07:00
Tom Stellard
53a0f9d0ba radeonsi: Enable geometry shaders with LLVM 3.4.1
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>

CC: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 93c2ebbd83)
2014-05-09 20:10:37 -07:00
Tom Stellard
0f0f1106b6 configure.ac: Add LLVM_VERSION_PATCH to DEFINES
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>

CC: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit c5d0008325)
2014-05-09 20:10:37 -07:00
Thomas Hellstrom
2b34277bbd st/xa: Fix performance regression introduced by commit "Cache render target surface"
The mentioned commit has the nasty side-effect of turning off accelerated
copies.

Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
(cherry picked from commit 9306b7c171)
2014-05-09 20:10:37 -07:00
Tom Stellard
e29daf82cc clover: Destory pipe_screen when device does not support compute v2
v2:
  - Make sure screen was successfully created before destroying it.

Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
(cherry picked from commit c5f0c98c49)
2014-05-09 20:10:37 -07:00
Tom Stellard
03673bcf6c pipe-loader: Don't destroy the winsys in the sw loader
The screen takes ownership of the winsys, and is responsible for
destroying it.  Users of pipe-loader should make sure they destory
and  screens they've created to avoid memory leaks.

This fixes a crash in clover introduced by
ce6c17c083 where the pipe-loader was
destroying the winsys while a screen was still using it.

Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit c650033b86)
2014-05-09 20:10:37 -07:00
Roland Scheidegger
af47859aed draw: do not use draw_get_option_use_llvm() inside draw execution paths
1c73e919a4 made it possible to not allocate
the tgsi machine if llvm was used. However, draw_get_option_use_llvm() is
not reliable after draw context creation, since drivers can explicitly
request a non-llvm draw context even if draw_get_option_use_llvm() would
return true (and softpipe does just that) which leads to crashes.
Thus use draw->llvm to determine if we're using llvm or not instead (and
make draw->llvm available even if HAVE_LLVM is false so we don't have to put
even more ifdefs).

Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
(cherry picked from commit 9af68e9b1d)
2014-05-09 20:10:37 -07:00
Kenneth Graunke
e120f1a958 mesa: Fix MaxNumLayers for 1D array textures.
1D array targets store the number of slices in the Height field.

Cc: "10.2 10.1 10.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
(cherry picked from commit 5c399ca8e4)
2014-05-09 18:27:26 -07:00
Kenneth Graunke
cc92276cb8 i965: Enable GL_ARB_texture_view on Broadwell.
This is a port of commit c9c08867ed.
A tiny bit of extra work was necessary to not break stencil texturing.

Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
(cherry picked from commit ecfc418b68)
2014-05-08 14:57:12 -07:00
Ilia Mirkin
fac042fa05 nv50/ir/gk110: fix set with f32 dest
Should fix comparison opcodes like SGE/SLT/etc which expected a float to
be returned. These were previously getting integer 0/-1 values.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ben Skeggs <bskeggs@redhat.com>
Cc: 10.2 <mesa-stable@lists.freedesktop.org>
(cherry picked from commit e7047f2917)
2014-05-08 14:50:33 -07:00
Ian Romanick
d26b59ec27 linker: Fix consumer_inputs_with_locations indexing
In an earlier incarnation of populate_consumer_input_sets and
get_matching_input, the consumer_inputs_with_locations array was indexed
using the user-specified location.  In that version, only user-defined
varyings were included in the array.

In the current incarnation, the Mesa location is used to index the
array, and built-in varyings are included.

This change fixes the unit test to exepect gl_ClipDistance in the array,
and it resizes the arrays to actually be big enough.  It's just dumb
luck that the existing piglit tests use small enough locations to not
stomp the stack. :(

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78258
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Cc: Vinson Lee <vlee@freedesktop.org>
(cherry picked from commit f7bf37cb13)
2014-05-07 09:50:52 -07:00
Kenneth Graunke
c2c15a9a37 meta: Only clear the requested color buffers.
This path is used to implement both glClear and glClearBuffer; the
latter is only supposed to clear particular buffers.  Core Mesa provides
us that information in the buffers bitmask; we must only clear buffers
mentioned there.

To accomplish this, we save/restore the color draw buffers state, and
use glDrawBuffers to restrict drawing to the relevant buffers.

Fixes Piglit's spec/!OpenGL 3.0/clearbuffer-mixed-formats and
spec/ARB_framebuffer_object/fbo-drawbuffers-none glClearBuffer tests
for drivers using meta clears (such as Broadwell).

Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77852
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77856
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
(cherry picked from commit 9701c6984d)
2014-05-07 09:49:13 -07:00
Kenneth Graunke
e6c98309c6 meta: Add infrastructure for saving/restoring the DrawBuffers state.
Sometimes we need to configure what draw buffers we render to, without
creating a new FBO.  This path will make that possible.

Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
(cherry picked from commit c1c1cf5f92)
2014-05-07 09:48:34 -07:00
Kenneth Graunke
ffc0cc027a meta: Add a new MESA_META_DRAW_BUFFERS bit.
This will be used for saving/restoring the glDrawBuffers state.
For now, make sure that existing users of MESA_META_ALL don't get
the new bit, since they probably won't want it.

Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
(cherry picked from commit e526ebf35c)
2014-05-07 09:48:34 -07:00
Kenneth Graunke
658d0410d0 meta: Unify the GLSL and fixed-function clear paths.
The majority of _mesa_meta_Clear and _mesa_meta_glsl_Clear was the same;
adding a boolean for whether to use GLSL allows us to share most of it
without polluting either path too much.

Tested for regressions by hacking i965 to always use the non-GLSL path.

Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
(cherry picked from commit 7c8df60f31)
2014-05-07 09:48:34 -07:00
Kenneth Graunke
a1dd1e62fa i965: Always intel_prepare_render() after invalidating front buffers.
Fixes glean/texture_srgb, which hit recursive-flush prevention
assertions in vbo_exec_FlushVertices.

This probably hurts the performance of front buffer rendering, but
very few people in their right mind do front buffer rendering.

Fixes Glean's texture_srgb test.

Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
(cherry picked from commit cde8bad1c9)
2014-05-07 09:48:34 -07:00
Tapani Pälli
c7a3c2d29d glsl: fix bogus layout qualifier warnings
Print out GL_ARB_explicit_attrib_location warnings only
when parsing attribute that uses "location" qualifier.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77245
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit e65917f94e)
2014-05-07 09:48:34 -07:00
Kenneth Graunke
0a5034517a i965: Set miptree target field when creating from a BO.
Prior to commit 8435b60a35, the region
equivalent of this function called intel_miptree_create_layout, which
set mt->target to target.  With that commit, it no longer copied target.

Piglit's ext_image_dma_buf_import-sample_[xa]rgb8888 tests would then
hit an assertion failure, where image->TexObject->Target was
GL_TEXTURE_EXTERNAL_OES, and mt->target was GL_TEXTURE_2D.

Copying the target fixes this assertion failure.

Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
(cherry picked from commit 829cb0423d)
2014-05-05 10:10:54 -07:00
Ian Romanick
e8f6150320 mesa: Bump version to 10.2-rc1
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
2014-05-02 21:17:00 -07:00
136 changed files with 3898 additions and 975 deletions

View File

@@ -1 +1 @@
10.2.0-devel
10.2.0

3
bin/.cherry-ignore Normal file
View File

@@ -0,0 +1,3 @@
# The first is the change, and the second is the revert of that change.
e6967270c75a5b669152127bb7a746d55f4407a6 i965: Fix depth (array slices) computation for 1D_ARRAY render targets.
155f98d49fdc2f46c760f8214327b3804ee60079 Revert "i965: Fix depth (array slices) computation for 1D_ARRAY render targets."

View File

@@ -331,6 +331,19 @@ LDFLAGS=$save_LDFLAGS
AC_SUBST([GC_SECTIONS])
dnl
dnl OpenBSD does not have DT_NEEDED entries for libc by design
dnl so when these flags are passed to ld via libtool the checks will fail
dnl
case "$host_os" in
openbsd*)
LD_NO_UNDEFINED="" ;;
*)
LD_NO_UNDEFINED="-Wl,--no-undefined" ;;
esac
AC_SUBST([LD_NO_UNDEFINED])
dnl
dnl compatibility symlinks
dnl
@@ -1179,6 +1192,13 @@ if test "x$enable_gbm" = xyes; then
if test "x$enable_shared_glapi" = xno; then
AC_MSG_ERROR([gbm_dri requires --enable-shared-glapi])
fi
else
# Strictly speaking libgbm does not require --enable-dri, although
# both of its backends do. Thus one can build libgbm without any
# backends if --disable-dri is set.
# To avoid unnecessary complexity of checking if at least one backend
# is available when building, just mandate --enable-dri.
AC_MSG_ERROR([gbm requires --enable-dri])
fi
fi
AM_CONDITIONAL(HAVE_GBM, test "x$enable_gbm" = xyes)
@@ -1273,6 +1293,7 @@ if test "x$enable_xa" = xyes; then
fi
GALLIUM_STATE_TRACKERS_DIRS="xa $GALLIUM_STATE_TRACKERS_DIRS"
enable_gallium_loader=yes
enable_gallium_drm_loader=yes
fi
AM_CONDITIONAL(HAVE_ST_XA, test "x$enable_xa" = xyes)
@@ -1605,6 +1626,12 @@ if test "x$enable_gallium_llvm" = xyes; then
AC_COMPUTE_INT([LLVM_VERSION_MINOR], [LLVM_VERSION_MINOR],
[#include "${LLVM_INCLUDEDIR}/llvm/Config/llvm-config.h"])
dnl In LLVM 3.4.1 patch level was defined in config.h and not
dnl llvm-config.h
AC_COMPUTE_INT([LLVM_VERSION_PATCH], [LLVM_VERSION_PATCH],
[#include "${LLVM_INCLUDEDIR}/llvm/Config/config.h"],
LLVM_VERSION_PATCH=0) dnl Default if LLVM_VERSION_PATCH not found
if test -n "${LLVM_VERSION_MAJOR}"; then
LLVM_VERSION_INT="${LLVM_VERSION_MAJOR}0${LLVM_VERSION_MINOR}"
else
@@ -1627,7 +1654,7 @@ if test "x$enable_gallium_llvm" = xyes; then
LLVM_COMPONENTS="${LLVM_COMPONENTS} option"
fi
fi
DEFINES="${DEFINES} -DHAVE_LLVM=0x0$LLVM_VERSION_INT"
DEFINES="${DEFINES} -DHAVE_LLVM=0x0$LLVM_VERSION_INT -DLLVM_VERSION_PATCH=$LLVM_VERSION_PATCH"
MESA_LLVM=1
dnl Check for Clang internal headers

View File

@@ -67,6 +67,25 @@ TBD.
<h2>Changes</h2>
<ul>
<li>Renamed <i>--with-llvm-shared-libs</i> to <i>--enable-llvm-shared-libs</i></li>
<p>
The option is used to control how mesa is linked against LLVM, and now
defaults to enabled (shared linking).
</p>
<li>Split <i>libxatracker.so</i> into a standalone library which can be used
with any gallium driver.</li>
<p>
Previously the library was linked statically against vmware's virtual gpu
driver(svga), whereas now it loads a shared pipe_*.so driver. Provide the
following options during configure, if you would like support for svga driver
<i>--enable-xa --with-gallium-drivers=svga</i>
</p>
<p>
Note: The files are installed in $(libdir)/gallium-pipe/ and the interface
between them and libxatracker.so is <strong>not</strong> stable.
</p>
</ul>
</div>

View File

@@ -524,8 +524,12 @@ eglMakeCurrent(EGLDisplay dpy, EGLSurface draw, EGLSurface read,
if (!context && ctx != EGL_NO_CONTEXT)
RETURN_EGL_ERROR(disp, EGL_BAD_CONTEXT, EGL_FALSE);
if (!draw_surf || !read_surf) {
/* surfaces may be NULL if surfaceless */
if (!disp->Extensions.KHR_surfaceless_context)
/* From the EGL 1.4 (20130211) spec:
*
* To release the current context without assigning a new one, set ctx
* to EGL_NO_CONTEXT and set draw and read to EGL_NO_SURFACE.
*/
if (!disp->Extensions.KHR_surfaceless_context && ctx != EGL_NO_CONTEXT)
RETURN_EGL_ERROR(disp, EGL_BAD_SURFACE, EGL_FALSE);
if ((!draw_surf && draw != EGL_NO_SURFACE) ||
@@ -567,6 +571,10 @@ _eglCreateWindowSurfaceCommon(_EGLDisplay *disp, EGLConfig config,
EGLSurface ret;
_EGL_CHECK_CONFIG(disp, conf, EGL_NO_SURFACE, drv);
if (native_window == NULL)
RETURN_EGL_ERROR(disp, EGL_BAD_NATIVE_WINDOW, EGL_NO_SURFACE);
surf = drv->API.CreateWindowSurface(drv, disp, conf, native_window,
attrib_list);
ret = (surf) ? _eglLinkSurface(surf) : EGL_NO_SURFACE;

View File

@@ -135,22 +135,6 @@
<arg name="stride2" type="int"/>
</request>
<!-- Create a wayland buffer for the prime fd. Use for regular and planar
buffers. Pass 0 for offset and stride for unused planes. -->
<request name="create_prime_buffer" since="2">
<arg name="id" type="new_id" interface="wl_buffer"/>
<arg name="name" type="fd"/>
<arg name="width" type="int"/>
<arg name="height" type="int"/>
<arg name="format" type="uint"/>
<arg name="offset0" type="int"/>
<arg name="stride0" type="int"/>
<arg name="offset1" type="int"/>
<arg name="stride1" type="int"/>
<arg name="offset2" type="int"/>
<arg name="stride2" type="int"/>
</request>
<!-- Notification of the path of the drm device which is used by
the server. The client should use this device for creating
local buffers. Only buffers created from this device should
@@ -177,6 +161,25 @@
<event name="capabilities">
<arg name="value" type="uint"/>
</event>
<!-- Version 2 additions -->
<!-- Create a wayland buffer for the prime fd. Use for regular and planar
buffers. Pass 0 for offset and stride for unused planes. -->
<request name="create_prime_buffer" since="2">
<arg name="id" type="new_id" interface="wl_buffer"/>
<arg name="name" type="fd"/>
<arg name="width" type="int"/>
<arg name="height" type="int"/>
<arg name="format" type="uint"/>
<arg name="offset0" type="int"/>
<arg name="stride0" type="int"/>
<arg name="offset1" type="int"/>
<arg name="stride1" type="int"/>
<arg name="offset2" type="int"/>
<arg name="stride2" type="int"/>
</request>
</interface>
</protocol>

View File

@@ -1000,6 +1000,8 @@ draw_get_shader_param_no_llvm(unsigned shader, enum pipe_shader_cap param)
/**
* XXX: Results for PIPE_SHADER_CAP_MAX_TEXTURE_SAMPLERS because there are two
* different ways of setting textures, and drivers typically only support one.
* Drivers requesting a draw context explicitly without llvm must call
* draw_get_shader_param_no_llvm instead.
*/
int
draw_get_shader_param(unsigned shader, enum pipe_shader_cap param)

View File

@@ -597,7 +597,7 @@ int draw_geometry_shader_run(struct draw_geometry_shader *shader,
#ifdef HAVE_LLVM
if (draw_get_option_use_llvm()) {
if (shader->draw->llvm) {
shader->gs_output = output_verts->verts;
if (max_out_prims > shader->max_out_prims) {
unsigned i;
@@ -674,7 +674,7 @@ int draw_geometry_shader_run(struct draw_geometry_shader *shader,
void draw_geometry_shader_prepare(struct draw_geometry_shader *shader,
struct draw_context *draw)
{
boolean use_llvm = draw_get_option_use_llvm();
boolean use_llvm = draw->llvm != NULL;
if (!use_llvm && shader && shader->machine->Tokens != shader->state.tokens) {
tgsi_exec_machine_bind_shader(shader->machine,
shader->state.tokens,
@@ -686,7 +686,7 @@ void draw_geometry_shader_prepare(struct draw_geometry_shader *shader,
boolean
draw_gs_init( struct draw_context *draw )
{
if (!draw_get_option_use_llvm()) {
if (!draw->llvm) {
draw->gs.tgsi.machine = tgsi_exec_machine_create();
if (!draw->gs.tgsi.machine)
return FALSE;
@@ -715,7 +715,7 @@ draw_create_geometry_shader(struct draw_context *draw,
const struct pipe_shader_state *state)
{
#ifdef HAVE_LLVM
boolean use_llvm = draw_get_option_use_llvm();
boolean use_llvm = draw->llvm != NULL;
struct llvm_geometry_shader *llvm_gs;
#endif
struct draw_geometry_shader *gs;
@@ -870,7 +870,7 @@ void draw_delete_geometry_shader(struct draw_context *draw,
return;
}
#ifdef HAVE_LLVM
if (draw_get_option_use_llvm()) {
if (draw->llvm) {
struct llvm_geometry_shader *shader = llvm_geometry_shader(dgs);
struct draw_gs_llvm_variant_list_item *li;

View File

@@ -47,7 +47,6 @@
#include "tgsi/tgsi_scan.h"
#ifdef HAVE_LLVM
struct draw_llvm;
struct gallivm_state;
#endif
@@ -69,6 +68,7 @@ struct tgsi_exec_machine;
struct tgsi_sampler;
struct draw_pt_front_end;
struct draw_assembler;
struct draw_llvm;
/**
@@ -318,9 +318,7 @@ struct draw_context
unsigned start_instance;
unsigned start_index;
#ifdef HAVE_LLVM
struct draw_llvm *llvm;
#endif
/** Texture sampler and sampler view state.
* Note that we have arrays indexed by shader type. At this time

View File

@@ -149,7 +149,7 @@ draw_vs_init( struct draw_context *draw )
{
draw->dump_vs = debug_get_option_gallium_dump_vs();
if (!draw_get_option_use_llvm()) {
if (!draw->llvm) {
draw->vs.tgsi.machine = tgsi_exec_machine_create();
if (!draw->vs.tgsi.machine)
return FALSE;
@@ -175,7 +175,7 @@ draw_vs_destroy( struct draw_context *draw )
if (draw->vs.emit_cache)
translate_cache_destroy(draw->vs.emit_cache);
if (!draw_get_option_use_llvm())
if (!draw->llvm)
tgsi_exec_machine_destroy(draw->vs.tgsi.machine);
}

View File

@@ -63,7 +63,7 @@ vs_exec_prepare( struct draw_vertex_shader *shader,
{
struct exec_vertex_shader *evs = exec_vertex_shader(shader);
debug_assert(!draw_get_option_use_llvm());
debug_assert(!draw->llvm);
/* Specify the vertex program to interpret/execute.
* Avoid rebinding when possible.
*/
@@ -97,7 +97,7 @@ vs_exec_run_linear( struct draw_vertex_shader *shader,
unsigned slot;
boolean clamp_vertex_color = shader->draw->rasterizer->clamp_vertex_color;
debug_assert(!draw_get_option_use_llvm());
debug_assert(!shader->draw->llvm);
tgsi_exec_set_constant_buffers(machine, PIPE_MAX_CONSTANT_BUFFERS,
constants, const_size);

View File

@@ -145,9 +145,6 @@ pipe_loader_sw_release(struct pipe_loader_device **dev)
{
struct pipe_loader_sw_device *sdev = pipe_loader_sw_device(*dev);
if (sdev->ws && sdev->ws->destroy)
sdev->ws->destroy(sdev->ws);
if (sdev->lib)
util_dl_close(sdev->lib);

View File

@@ -120,7 +120,8 @@ const char *tgsi_property_names[TGSI_PROPERTY_COUNT] =
"FS_COORD_PIXEL_CENTER",
"FS_COLOR0_WRITES_ALL_CBUFS",
"FS_DEPTH_LAYOUT",
"VS_PROHIBIT_UCPS"
"VS_PROHIBIT_UCPS",
"GS_INVOCATIONS",
};
const char *tgsi_type_names[5] =

View File

@@ -3,6 +3,8 @@ C_SOURCES := \
freedreno_lowering.c \
freedreno_program.c \
freedreno_query.c \
freedreno_query_hw.c \
freedreno_query_sw.c \
freedreno_fence.c \
freedreno_resource.c \
freedreno_surface.c \
@@ -38,6 +40,7 @@ a3xx_SOURCES := \
a3xx/fd3_emit.c \
a3xx/fd3_gmem.c \
a3xx/fd3_program.c \
a3xx/fd3_query.c \
a3xx/fd3_rasterizer.c \
a3xx/fd3_screen.c \
a3xx/fd3_texture.c \

View File

@@ -10,11 +10,11 @@ git clone https://github.com/freedreno/envytools.git
The rules-ng-ng source files this header was generated from are:
- /home/robclark/src/freedreno/envytools/rnndb/adreno.xml ( 364 bytes, from 2013-11-30 14:47:15)
- /home/robclark/src/freedreno/envytools/rnndb/freedreno_copyright.xml ( 1453 bytes, from 2013-03-31 16:51:27)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/a2xx.xml ( 32840 bytes, from 2014-01-05 14:44:21)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/adreno_common.xml ( 9009 bytes, from 2014-01-11 16:56:35)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/adreno_pm4.xml ( 12362 bytes, from 2014-01-07 14:47:36)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/a3xx.xml ( 56545 bytes, from 2014-02-26 16:32:11)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/a4xx.xml ( 8344 bytes, from 2013-11-30 14:49:47)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/a2xx.xml ( 32580 bytes, from 2014-05-16 11:51:57)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/adreno_common.xml ( 10186 bytes, from 2014-05-16 11:51:57)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/adreno_pm4.xml ( 14477 bytes, from 2014-05-16 11:51:57)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/a3xx.xml ( 57831 bytes, from 2014-05-19 21:02:34)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/a4xx.xml ( 26293 bytes, from 2014-05-16 11:51:57)
Copyright (C) 2013-2014 by the following authors:
- Rob Clark <robdclark@gmail.com> (robclark)

View File

@@ -125,7 +125,7 @@ emit_texture(struct fd_ringbuffer *ring, struct fd_context *ctx,
{
unsigned const_idx = fd2_get_const_idx(ctx, tex, samp_id);
static const struct fd2_sampler_stateobj dummy_sampler = {};
struct fd2_sampler_stateobj *sampler;
const struct fd2_sampler_stateobj *sampler;
struct fd2_pipe_sampler_view *view;
if (emitted & (1 << const_idx))

View File

@@ -10,11 +10,11 @@ git clone https://github.com/freedreno/envytools.git
The rules-ng-ng source files this header was generated from are:
- /home/robclark/src/freedreno/envytools/rnndb/adreno.xml ( 364 bytes, from 2013-11-30 14:47:15)
- /home/robclark/src/freedreno/envytools/rnndb/freedreno_copyright.xml ( 1453 bytes, from 2013-03-31 16:51:27)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/a2xx.xml ( 32840 bytes, from 2014-01-05 14:44:21)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/adreno_common.xml ( 9009 bytes, from 2014-01-11 16:56:35)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/adreno_pm4.xml ( 12362 bytes, from 2014-01-07 14:47:36)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/a3xx.xml ( 56545 bytes, from 2014-02-26 16:32:11)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/a4xx.xml ( 8344 bytes, from 2013-11-30 14:49:47)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/a2xx.xml ( 32580 bytes, from 2014-05-16 11:51:57)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/adreno_common.xml ( 10186 bytes, from 2014-05-16 11:51:57)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/adreno_pm4.xml ( 14477 bytes, from 2014-05-16 11:51:57)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/a3xx.xml ( 57831 bytes, from 2014-05-19 21:02:34)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/a4xx.xml ( 26293 bytes, from 2014-05-16 11:51:57)
Copyright (C) 2013-2014 by the following authors:
- Rob Clark <robdclark@gmail.com> (robclark)
@@ -41,31 +41,11 @@ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
*/
enum a3xx_render_mode {
RB_RENDERING_PASS = 0,
RB_TILING_PASS = 1,
RB_RESOLVE_PASS = 2,
};
enum a3xx_tile_mode {
LINEAR = 0,
TILE_32X32 = 2,
};
enum a3xx_threadmode {
MULTI = 0,
SINGLE = 1,
};
enum a3xx_instrbuffermode {
BUFFER = 1,
};
enum a3xx_threadsize {
TWO_QUADS = 0,
FOUR_QUADS = 1,
};
enum a3xx_state_block_id {
HLSQ_BLOCK_ID_TP_TEX = 2,
HLSQ_BLOCK_ID_TP_MIPMAP = 3,
@@ -180,12 +160,6 @@ enum a3xx_color_swap {
XYZW = 3,
};
enum a3xx_msaa_samples {
MSAA_ONE = 0,
MSAA_TWO = 1,
MSAA_FOUR = 2,
};
enum a3xx_sp_perfcounter_select {
SP_FS_CFLOW_INSTRUCTIONS = 12,
SP_FS_FULL_ALU_INSTRUCTIONS = 14,
@@ -212,11 +186,6 @@ enum a3xx_rop_code {
ROP_SET = 15,
};
enum adreno_rb_copy_control_mode {
RB_COPY_RESOLVE = 1,
RB_COPY_DEPTH_STENCIL = 5,
};
enum a3xx_tex_filter {
A3XX_TEX_NEAREST = 0,
A3XX_TEX_LINEAR = 1,
@@ -337,6 +306,7 @@ enum a3xx_tex_type {
#define REG_A3XX_RBBM_INT_0_STATUS 0x00000064
#define REG_A3XX_RBBM_PERFCTR_CTL 0x00000080
#define A3XX_RBBM_PERFCTR_CTL_ENABLE 0x00000001
#define REG_A3XX_RBBM_PERFCTR_LOAD_CMD0 0x00000081
@@ -570,6 +540,10 @@ static inline uint32_t REG_A3XX_CP_PROTECT_REG(uint32_t i0) { return 0x00000460
#define REG_A3XX_CP_AHB_FAULT 0x0000054d
#define REG_A3XX_SP_GLOBAL_MEM_SIZE 0x00000e22
#define REG_A3XX_SP_GLOBAL_MEM_ADDR 0x00000e23
#define REG_A3XX_GRAS_CL_CLIP_CNTL 0x00002040
#define A3XX_GRAS_CL_CLIP_CNTL_IJ_PERSP_CENTER 0x00001000
#define A3XX_GRAS_CL_CLIP_CNTL_CLIP_DISABLE 0x00010000
@@ -644,8 +618,26 @@ static inline uint32_t A3XX_GRAS_CL_VPORT_ZSCALE(float val)
}
#define REG_A3XX_GRAS_SU_POINT_MINMAX 0x00002068
#define A3XX_GRAS_SU_POINT_MINMAX_MIN__MASK 0x0000ffff
#define A3XX_GRAS_SU_POINT_MINMAX_MIN__SHIFT 0
static inline uint32_t A3XX_GRAS_SU_POINT_MINMAX_MIN(float val)
{
return ((((uint32_t)(val * 8.0))) << A3XX_GRAS_SU_POINT_MINMAX_MIN__SHIFT) & A3XX_GRAS_SU_POINT_MINMAX_MIN__MASK;
}
#define A3XX_GRAS_SU_POINT_MINMAX_MAX__MASK 0xffff0000
#define A3XX_GRAS_SU_POINT_MINMAX_MAX__SHIFT 16
static inline uint32_t A3XX_GRAS_SU_POINT_MINMAX_MAX(float val)
{
return ((((uint32_t)(val * 8.0))) << A3XX_GRAS_SU_POINT_MINMAX_MAX__SHIFT) & A3XX_GRAS_SU_POINT_MINMAX_MAX__MASK;
}
#define REG_A3XX_GRAS_SU_POINT_SIZE 0x00002069
#define A3XX_GRAS_SU_POINT_SIZE__MASK 0xffffffff
#define A3XX_GRAS_SU_POINT_SIZE__SHIFT 0
static inline uint32_t A3XX_GRAS_SU_POINT_SIZE(float val)
{
return ((((uint32_t)(val * 8.0))) << A3XX_GRAS_SU_POINT_SIZE__SHIFT) & A3XX_GRAS_SU_POINT_SIZE__MASK;
}
#define REG_A3XX_GRAS_SU_POLY_OFFSET_SCALE 0x0000206c
#define A3XX_GRAS_SU_POLY_OFFSET_SCALE_VAL__MASK 0x00ffffff
@@ -992,6 +984,12 @@ static inline uint32_t A3XX_RB_COPY_CONTROL_MODE(enum adreno_rb_copy_control_mod
{
return ((val) << A3XX_RB_COPY_CONTROL_MODE__SHIFT) & A3XX_RB_COPY_CONTROL_MODE__MASK;
}
#define A3XX_RB_COPY_CONTROL_FASTCLEAR__MASK 0x00000f00
#define A3XX_RB_COPY_CONTROL_FASTCLEAR__SHIFT 8
static inline uint32_t A3XX_RB_COPY_CONTROL_FASTCLEAR(uint32_t val)
{
return ((val) << A3XX_RB_COPY_CONTROL_FASTCLEAR__SHIFT) & A3XX_RB_COPY_CONTROL_FASTCLEAR__MASK;
}
#define A3XX_RB_COPY_CONTROL_GMEM_BASE__MASK 0xffffc000
#define A3XX_RB_COPY_CONTROL_GMEM_BASE__SHIFT 14
static inline uint32_t A3XX_RB_COPY_CONTROL_GMEM_BASE(uint32_t val)
@@ -1034,6 +1032,12 @@ static inline uint32_t A3XX_RB_COPY_DEST_INFO_SWAP(enum a3xx_color_swap val)
{
return ((val) << A3XX_RB_COPY_DEST_INFO_SWAP__SHIFT) & A3XX_RB_COPY_DEST_INFO_SWAP__MASK;
}
#define A3XX_RB_COPY_DEST_INFO_DITHER_MODE__MASK 0x00000c00
#define A3XX_RB_COPY_DEST_INFO_DITHER_MODE__SHIFT 10
static inline uint32_t A3XX_RB_COPY_DEST_INFO_DITHER_MODE(enum adreno_rb_dither_mode val)
{
return ((val) << A3XX_RB_COPY_DEST_INFO_DITHER_MODE__SHIFT) & A3XX_RB_COPY_DEST_INFO_DITHER_MODE__MASK;
}
#define A3XX_RB_COPY_DEST_INFO_COMPONENT_ENABLE__MASK 0x0003c000
#define A3XX_RB_COPY_DEST_INFO_COMPONENT_ENABLE__SHIFT 14
static inline uint32_t A3XX_RB_COPY_DEST_INFO_COMPONENT_ENABLE(uint32_t val)
@@ -1202,6 +1206,8 @@ static inline uint32_t A3XX_RB_WINDOW_OFFSET_Y(uint32_t val)
}
#define REG_A3XX_RB_SAMPLE_COUNT_CONTROL 0x00002110
#define A3XX_RB_SAMPLE_COUNT_CONTROL_RESET 0x00000001
#define A3XX_RB_SAMPLE_COUNT_CONTROL_COPY 0x00000002
#define REG_A3XX_RB_SAMPLE_COUNT_ADDR 0x00002111
@@ -1366,10 +1372,36 @@ static inline uint32_t A3XX_HLSQ_CONST_FSPRESV_RANGE_REG_ENDENTRY(uint32_t val)
}
#define REG_A3XX_HLSQ_CL_NDRANGE_0_REG 0x0000220a
#define A3XX_HLSQ_CL_NDRANGE_0_REG_WORKDIM__MASK 0x00000003
#define A3XX_HLSQ_CL_NDRANGE_0_REG_WORKDIM__SHIFT 0
static inline uint32_t A3XX_HLSQ_CL_NDRANGE_0_REG_WORKDIM(uint32_t val)
{
return ((val) << A3XX_HLSQ_CL_NDRANGE_0_REG_WORKDIM__SHIFT) & A3XX_HLSQ_CL_NDRANGE_0_REG_WORKDIM__MASK;
}
#define A3XX_HLSQ_CL_NDRANGE_0_REG_LOCALSIZE0__MASK 0x00000ffc
#define A3XX_HLSQ_CL_NDRANGE_0_REG_LOCALSIZE0__SHIFT 2
static inline uint32_t A3XX_HLSQ_CL_NDRANGE_0_REG_LOCALSIZE0(uint32_t val)
{
return ((val) << A3XX_HLSQ_CL_NDRANGE_0_REG_LOCALSIZE0__SHIFT) & A3XX_HLSQ_CL_NDRANGE_0_REG_LOCALSIZE0__MASK;
}
#define A3XX_HLSQ_CL_NDRANGE_0_REG_LOCALSIZE1__MASK 0x003ff000
#define A3XX_HLSQ_CL_NDRANGE_0_REG_LOCALSIZE1__SHIFT 12
static inline uint32_t A3XX_HLSQ_CL_NDRANGE_0_REG_LOCALSIZE1(uint32_t val)
{
return ((val) << A3XX_HLSQ_CL_NDRANGE_0_REG_LOCALSIZE1__SHIFT) & A3XX_HLSQ_CL_NDRANGE_0_REG_LOCALSIZE1__MASK;
}
#define A3XX_HLSQ_CL_NDRANGE_0_REG_LOCALSIZE2__MASK 0xffc00000
#define A3XX_HLSQ_CL_NDRANGE_0_REG_LOCALSIZE2__SHIFT 22
static inline uint32_t A3XX_HLSQ_CL_NDRANGE_0_REG_LOCALSIZE2(uint32_t val)
{
return ((val) << A3XX_HLSQ_CL_NDRANGE_0_REG_LOCALSIZE2__SHIFT) & A3XX_HLSQ_CL_NDRANGE_0_REG_LOCALSIZE2__MASK;
}
#define REG_A3XX_HLSQ_CL_NDRANGE_1_REG 0x0000220b
static inline uint32_t REG_A3XX_HLSQ_CL_GLOBAL_WORK(uint32_t i0) { return 0x0000220b + 0x2*i0; }
#define REG_A3XX_HLSQ_CL_NDRANGE_2_REG 0x0000220c
static inline uint32_t REG_A3XX_HLSQ_CL_GLOBAL_WORK_SIZE(uint32_t i0) { return 0x0000220b + 0x2*i0; }
static inline uint32_t REG_A3XX_HLSQ_CL_GLOBAL_WORK_OFFSET(uint32_t i0) { return 0x0000220c + 0x2*i0; }
#define REG_A3XX_HLSQ_CL_CONTROL_0_REG 0x00002211
@@ -1377,7 +1409,9 @@ static inline uint32_t A3XX_HLSQ_CONST_FSPRESV_RANGE_REG_ENDENTRY(uint32_t val)
#define REG_A3XX_HLSQ_CL_KERNEL_CONST_REG 0x00002214
#define REG_A3XX_HLSQ_CL_KERNEL_GROUP_X_REG 0x00002215
static inline uint32_t REG_A3XX_HLSQ_CL_KERNEL_GROUP(uint32_t i0) { return 0x00002215 + 0x1*i0; }
static inline uint32_t REG_A3XX_HLSQ_CL_KERNEL_GROUP_RATIO(uint32_t i0) { return 0x00002215 + 0x1*i0; }
#define REG_A3XX_HLSQ_CL_KERNEL_GROUP_Y_REG 0x00002216
@@ -1624,6 +1658,7 @@ static inline uint32_t A3XX_SP_VS_CTRL_REG0_THREADSIZE(enum a3xx_threadsize val)
}
#define A3XX_SP_VS_CTRL_REG0_SUPERTHREADMODE 0x00200000
#define A3XX_SP_VS_CTRL_REG0_PIXLODENABLE 0x00400000
#define A3XX_SP_VS_CTRL_REG0_COMPUTEMODE 0x00800000
#define A3XX_SP_VS_CTRL_REG0_LENGTH__MASK 0xff000000
#define A3XX_SP_VS_CTRL_REG0_LENGTH__SHIFT 24
static inline uint32_t A3XX_SP_VS_CTRL_REG0_LENGTH(uint32_t val)
@@ -1797,6 +1832,7 @@ static inline uint32_t A3XX_SP_FS_CTRL_REG0_THREADSIZE(enum a3xx_threadsize val)
}
#define A3XX_SP_FS_CTRL_REG0_SUPERTHREADMODE 0x00200000
#define A3XX_SP_FS_CTRL_REG0_PIXLODENABLE 0x00400000
#define A3XX_SP_FS_CTRL_REG0_COMPUTEMODE 0x00800000
#define A3XX_SP_FS_CTRL_REG0_LENGTH__MASK 0xff000000
#define A3XX_SP_FS_CTRL_REG0_LENGTH__SHIFT 24
static inline uint32_t A3XX_SP_FS_CTRL_REG0_LENGTH(uint32_t val)
@@ -1976,6 +2012,42 @@ static inline uint32_t A3XX_TPL1_TP_FS_TEX_OFFSET_BASETABLEPTR(uint32_t val)
#define REG_A3XX_VBIF_OUT_AXI_AOOO 0x0000305f
#define REG_A3XX_VBIF_PERF_CNT_EN 0x00003070
#define A3XX_VBIF_PERF_CNT_EN_CNT0 0x00000001
#define A3XX_VBIF_PERF_CNT_EN_CNT1 0x00000002
#define A3XX_VBIF_PERF_CNT_EN_PWRCNT0 0x00000004
#define A3XX_VBIF_PERF_CNT_EN_PWRCNT1 0x00000008
#define A3XX_VBIF_PERF_CNT_EN_PWRCNT2 0x00000010
#define REG_A3XX_VBIF_PERF_CNT_CLR 0x00003071
#define A3XX_VBIF_PERF_CNT_CLR_CNT0 0x00000001
#define A3XX_VBIF_PERF_CNT_CLR_CNT1 0x00000002
#define A3XX_VBIF_PERF_CNT_CLR_PWRCNT0 0x00000004
#define A3XX_VBIF_PERF_CNT_CLR_PWRCNT1 0x00000008
#define A3XX_VBIF_PERF_CNT_CLR_PWRCNT2 0x00000010
#define REG_A3XX_VBIF_PERF_CNT_SEL 0x00003072
#define REG_A3XX_VBIF_PERF_CNT0_LO 0x00003073
#define REG_A3XX_VBIF_PERF_CNT0_HI 0x00003074
#define REG_A3XX_VBIF_PERF_CNT1_LO 0x00003075
#define REG_A3XX_VBIF_PERF_CNT1_HI 0x00003076
#define REG_A3XX_VBIF_PERF_PWR_CNT0_LO 0x00003077
#define REG_A3XX_VBIF_PERF_PWR_CNT0_HI 0x00003078
#define REG_A3XX_VBIF_PERF_PWR_CNT1_LO 0x00003079
#define REG_A3XX_VBIF_PERF_PWR_CNT1_HI 0x0000307a
#define REG_A3XX_VBIF_PERF_PWR_CNT2_LO 0x0000307b
#define REG_A3XX_VBIF_PERF_PWR_CNT2_HI 0x0000307c
#define REG_A3XX_VSC_BIN_SIZE 0x00000c01
#define A3XX_VSC_BIN_SIZE_WIDTH__MASK 0x0000001f
#define A3XX_VSC_BIN_SIZE_WIDTH__SHIFT 0
@@ -2249,6 +2321,12 @@ static inline uint32_t A3XX_TEX_SAMP_0_WRAP_R(enum a3xx_tex_clamp val)
{
return ((val) << A3XX_TEX_SAMP_0_WRAP_R__SHIFT) & A3XX_TEX_SAMP_0_WRAP_R__MASK;
}
#define A3XX_TEX_SAMP_0_COMPARE_FUNC__MASK 0x00700000
#define A3XX_TEX_SAMP_0_COMPARE_FUNC__SHIFT 20
static inline uint32_t A3XX_TEX_SAMP_0_COMPARE_FUNC(enum adreno_compare_func val)
{
return ((val) << A3XX_TEX_SAMP_0_COMPARE_FUNC__SHIFT) & A3XX_TEX_SAMP_0_COMPARE_FUNC__MASK;
}
#define A3XX_TEX_SAMP_0_UNNORM_COORDS 0x80000000
#define REG_A3XX_TEX_SAMP_1 0x00000001
@@ -2267,6 +2345,7 @@ static inline uint32_t A3XX_TEX_SAMP_1_MIN_LOD(float val)
#define REG_A3XX_TEX_CONST_0 0x00000000
#define A3XX_TEX_CONST_0_TILED 0x00000001
#define A3XX_TEX_CONST_0_SRGB 0x00000004
#define A3XX_TEX_CONST_0_SWIZ_X__MASK 0x00000070
#define A3XX_TEX_CONST_0_SWIZ_X__SHIFT 4
static inline uint32_t A3XX_TEX_CONST_0_SWIZ_X(enum a3xx_tex_swiz val)
@@ -2303,6 +2382,7 @@ static inline uint32_t A3XX_TEX_CONST_0_FMT(enum a3xx_tex_fmt val)
{
return ((val) << A3XX_TEX_CONST_0_FMT__SHIFT) & A3XX_TEX_CONST_0_FMT__MASK;
}
#define A3XX_TEX_CONST_0_NOCONVERT 0x20000000
#define A3XX_TEX_CONST_0_TYPE__MASK 0xc0000000
#define A3XX_TEX_CONST_0_TYPE__SHIFT 30
static inline uint32_t A3XX_TEX_CONST_0_TYPE(enum a3xx_tex_type val)

View File

@@ -1074,77 +1074,154 @@ trans_arl(const struct instr_translater *t,
add_src_reg(ctx, instr, tmp_src, chan)->flags |= IR3_REG_HALF;
}
/* texture fetch/sample instructions: */
static void
trans_samp(const struct instr_translater *t,
struct fd3_compile_context *ctx,
/*
* texture fetch/sample instructions:
*/
struct tex_info {
int8_t order[4];
unsigned src_wrmask, flags;
};
static const struct tex_info *
get_tex_info(struct fd3_compile_context *ctx,
struct tgsi_full_instruction *inst)
{
struct ir3_instruction *instr;
struct tgsi_src_register *coord = &inst->Src[0].Register;
struct tgsi_src_register *samp = &inst->Src[1].Register;
unsigned tex = inst->Texture.Texture;
int8_t *order;
unsigned i, flags = 0, src_wrmask;
bool needs_mov = false;
static const struct tex_info tex1d = {
.order = { 0, -1, -1, -1 }, /* coord.x */
.src_wrmask = TGSI_WRITEMASK_XY,
.flags = 0,
};
static const struct tex_info tex1ds = {
.order = { 0, -1, 2, -1 }, /* coord.xz */
.src_wrmask = TGSI_WRITEMASK_XYZ,
.flags = IR3_INSTR_S,
};
static const struct tex_info tex2d = {
.order = { 0, 1, -1, -1 }, /* coord.xy */
.src_wrmask = TGSI_WRITEMASK_XY,
.flags = 0,
};
static const struct tex_info tex2ds = {
.order = { 0, 1, 2, -1 }, /* coord.xyz */
.src_wrmask = TGSI_WRITEMASK_XYZ,
.flags = IR3_INSTR_S,
};
static const struct tex_info tex3d = {
.order = { 0, 1, 2, -1 }, /* coord.xyz */
.src_wrmask = TGSI_WRITEMASK_XYZ,
.flags = IR3_INSTR_3D,
};
static const struct tex_info tex3ds = {
.order = { 0, 1, 2, 3 }, /* coord.xyzw */
.src_wrmask = TGSI_WRITEMASK_XYZW,
.flags = IR3_INSTR_S | IR3_INSTR_3D,
};
static const struct tex_info txp1d = {
.order = { 0, -1, 3, -1 }, /* coord.xw */
.src_wrmask = TGSI_WRITEMASK_XYZ,
.flags = IR3_INSTR_P,
};
static const struct tex_info txp1ds = {
.order = { 0, -1, 2, 3 }, /* coord.xzw */
.src_wrmask = TGSI_WRITEMASK_XYZW,
.flags = IR3_INSTR_P | IR3_INSTR_S,
};
static const struct tex_info txp2d = {
.order = { 0, 1, 3, -1 }, /* coord.xyw */
.src_wrmask = TGSI_WRITEMASK_XYZ,
.flags = IR3_INSTR_P,
};
static const struct tex_info txp2ds = {
.order = { 0, 1, 2, 3 }, /* coord.xyzw */
.src_wrmask = TGSI_WRITEMASK_XYZW,
.flags = IR3_INSTR_P | IR3_INSTR_S,
};
static const struct tex_info txp3d = {
.order = { 0, 1, 2, 3 }, /* coord.xyzw */
.src_wrmask = TGSI_WRITEMASK_XYZW,
.flags = IR3_INSTR_P | IR3_INSTR_3D,
};
switch (t->arg) {
unsigned tex = inst->Texture.Texture;
switch (inst->Instruction.Opcode) {
case TGSI_OPCODE_TEX:
switch (tex) {
case TGSI_TEXTURE_1D:
return &tex1d;
case TGSI_TEXTURE_SHADOW1D:
return &tex1ds;
case TGSI_TEXTURE_2D:
case TGSI_TEXTURE_RECT:
order = (int8_t[4]){ 0, 1, -1, -1 };
src_wrmask = TGSI_WRITEMASK_XY;
break;
return &tex2d;
case TGSI_TEXTURE_SHADOW2D:
case TGSI_TEXTURE_SHADOWRECT:
return &tex2ds;
case TGSI_TEXTURE_3D:
case TGSI_TEXTURE_CUBE:
order = (int8_t[4]){ 0, 1, 2, -1 };
src_wrmask = TGSI_WRITEMASK_XYZ;
flags |= IR3_INSTR_3D;
break;
return &tex3d;
case TGSI_TEXTURE_SHADOWCUBE:
return &tex3ds;
default:
compile_error(ctx, "unknown texture type: %s\n",
tgsi_texture_names[tex]);
break;
return NULL;
}
break;
case TGSI_OPCODE_TXP:
switch (tex) {
case TGSI_TEXTURE_1D:
return &txp1d;
case TGSI_TEXTURE_SHADOW1D:
return &txp1ds;
case TGSI_TEXTURE_2D:
case TGSI_TEXTURE_RECT:
order = (int8_t[4]){ 0, 1, 3, -1 };
src_wrmask = TGSI_WRITEMASK_XYZ;
break;
return &txp2d;
case TGSI_TEXTURE_SHADOW2D:
case TGSI_TEXTURE_SHADOWRECT:
return &txp2ds;
case TGSI_TEXTURE_3D:
case TGSI_TEXTURE_CUBE:
order = (int8_t[4]){ 0, 1, 2, 3 };
src_wrmask = TGSI_WRITEMASK_XYZW;
flags |= IR3_INSTR_3D;
break;
return &txp3d;
default:
compile_error(ctx, "unknown texture type: %s\n",
tgsi_texture_names[tex]);
break;
}
flags |= IR3_INSTR_P;
break;
default:
compile_assert(ctx, 0);
break;
}
compile_assert(ctx, 0);
return NULL;
}
static struct tgsi_src_register *
get_tex_coord(struct fd3_compile_context *ctx,
struct tgsi_full_instruction *inst,
const struct tex_info *tinf)
{
struct tgsi_src_register *coord = &inst->Src[0].Register;
struct ir3_instruction *instr;
unsigned tex = inst->Texture.Texture;
bool needs_mov = false;
unsigned i;
/* cat5 instruction cannot seem to handle const or relative: */
if (is_rel_or_const(coord))
needs_mov = true;
/* 1D textures we fix up w/ 0.0 as 2nd coord: */
if ((tex == TGSI_TEXTURE_1D) || (tex == TGSI_TEXTURE_SHADOW1D))
needs_mov = true;
/* The texture sample instructions need to coord in successive
* registers/components (ie. src.xy but not src.yx). And TXP
* needs the .w component in .z for 2D.. so in some cases we
* might need to emit some mov instructions to shuffle things
* around:
*/
for (i = 1; (i < 4) && (order[i] >= 0) && !needs_mov; i++)
if (src_swiz(coord, i) != (src_swiz(coord, 0) + order[i]))
for (i = 1; (i < 4) && (tinf->order[i] >= 0) && !needs_mov; i++)
if (src_swiz(coord, i) != (src_swiz(coord, 0) + tinf->order[i]))
needs_mov = true;
if (needs_mov) {
@@ -1157,28 +1234,55 @@ trans_samp(const struct instr_translater *t,
/* need to move things around: */
tmp_src = get_internal_temp(ctx, &tmp_dst);
for (j = 0; (j < 4) && (order[j] >= 0); j++) {
instr = instr_create(ctx, 1, 0);
for (j = 0; j < 4; j++) {
if (tinf->order[j] < 0)
continue;
instr = instr_create(ctx, 1, 0); /* mov */
instr->cat1.src_type = type_mov;
instr->cat1.dst_type = type_mov;
add_dst_reg(ctx, instr, &tmp_dst, j);
add_src_reg(ctx, instr, coord,
src_swiz(coord, order[j]));
src_swiz(coord, tinf->order[j]));
}
/* fix up .y coord: */
if ((tex == TGSI_TEXTURE_1D) ||
(tex == TGSI_TEXTURE_SHADOW1D)) {
instr = instr_create(ctx, 1, 0); /* mov */
instr->cat1.src_type = type_mov;
instr->cat1.dst_type = type_mov;
add_dst_reg(ctx, instr, &tmp_dst, 1); /* .y */
ir3_reg_create(instr, 0, IR3_REG_IMMED)->fim_val = 0.5;
}
coord = tmp_src;
}
return coord;
}
static void
trans_samp(const struct instr_translater *t,
struct fd3_compile_context *ctx,
struct tgsi_full_instruction *inst)
{
struct ir3_instruction *instr;
struct tgsi_dst_register *dst = &inst->Dst[0].Register;
struct tgsi_src_register *coord;
struct tgsi_src_register *samp = &inst->Src[1].Register;
const struct tex_info *tinf;
tinf = get_tex_info(ctx, inst);
coord = get_tex_coord(ctx, inst, tinf);
instr = instr_create(ctx, 5, t->opc);
instr->cat5.type = get_ftype(ctx);
instr->cat5.samp = samp->Index;
instr->cat5.tex = samp->Index;
instr->flags |= flags;
instr->flags |= tinf->flags;
add_dst_reg_wrmask(ctx, instr, &inst->Dst[0].Register, 0,
inst->Dst[0].Register.WriteMask);
add_src_reg_wrmask(ctx, instr, coord, coord->SwizzleX, src_wrmask);
add_dst_reg_wrmask(ctx, instr, dst, 0, dst->WriteMask);
add_src_reg_wrmask(ctx, instr, coord, coord->SwizzleX, tinf->src_wrmask);
}
/*
@@ -1231,15 +1335,19 @@ trans_cmp(const struct instr_translater *t,
switch (t->tgsi_opc) {
case TGSI_OPCODE_SEQ:
case TGSI_OPCODE_FSEQ:
condition = IR3_COND_EQ;
break;
case TGSI_OPCODE_SNE:
case TGSI_OPCODE_FSNE:
condition = IR3_COND_NE;
break;
case TGSI_OPCODE_SGE:
case TGSI_OPCODE_FSGE:
condition = IR3_COND_GE;
break;
case TGSI_OPCODE_SLT:
case TGSI_OPCODE_FSLT:
condition = IR3_COND_LT;
break;
case TGSI_OPCODE_SLE:
@@ -1269,11 +1377,15 @@ trans_cmp(const struct instr_translater *t,
switch (t->tgsi_opc) {
case TGSI_OPCODE_SEQ:
case TGSI_OPCODE_FSEQ:
case TGSI_OPCODE_SGE:
case TGSI_OPCODE_FSGE:
case TGSI_OPCODE_SLE:
case TGSI_OPCODE_SNE:
case TGSI_OPCODE_FSNE:
case TGSI_OPCODE_SGT:
case TGSI_OPCODE_SLT:
case TGSI_OPCODE_FSLT:
/* cov.u16f16 dst, tmp0 */
instr = instr_create(ctx, 1, 0);
instr->cat1.src_type = get_utype(ctx);
@@ -1293,6 +1405,96 @@ trans_cmp(const struct instr_translater *t,
put_dst(ctx, inst, dst);
}
/*
* USNE(a,b) = (a != b) ? 1 : 0
* cmps.u32.ne dst, a, b
*
* USEQ(a,b) = (a == b) ? 1 : 0
* cmps.u32.eq dst, a, b
*
* ISGE(a,b) = (a > b) ? 1 : 0
* cmps.s32.ge dst, a, b
*
* USGE(a,b) = (a > b) ? 1 : 0
* cmps.u32.ge dst, a, b
*
* ISLT(a,b) = (a < b) ? 1 : 0
* cmps.s32.lt dst, a, b
*
* USLT(a,b) = (a < b) ? 1 : 0
* cmps.u32.lt dst, a, b
*
* UCMP(a,b,c) = (a < 0) ? b : c
* cmps.u32.lt tmp0, a, {0}
* sel.b16 dst, b, tmp0, c
*/
static void
trans_icmp(const struct instr_translater *t,
struct fd3_compile_context *ctx,
struct tgsi_full_instruction *inst)
{
struct ir3_instruction *instr;
struct tgsi_dst_register *dst = get_dst(ctx, inst);
struct tgsi_src_register constval0;
struct tgsi_src_register *a0, *a1, *a2;
unsigned condition;
a0 = &inst->Src[0].Register; /* a */
a1 = &inst->Src[1].Register; /* b */
switch (t->tgsi_opc) {
case TGSI_OPCODE_USNE:
condition = IR3_COND_NE;
break;
case TGSI_OPCODE_USEQ:
condition = IR3_COND_EQ;
break;
case TGSI_OPCODE_ISGE:
case TGSI_OPCODE_USGE:
condition = IR3_COND_GE;
break;
case TGSI_OPCODE_ISLT:
case TGSI_OPCODE_USLT:
condition = IR3_COND_LT;
break;
case TGSI_OPCODE_UCMP:
get_immediate(ctx, &constval0, 0);
a0 = &inst->Src[0].Register; /* a */
a1 = &constval0; /* {0} */
condition = IR3_COND_LT;
break;
default:
compile_assert(ctx, 0);
return;
}
if (is_const(a0) && is_const(a1))
a0 = get_unconst(ctx, a0);
if (t->tgsi_opc == TGSI_OPCODE_UCMP) {
struct tgsi_dst_register tmp_dst;
struct tgsi_src_register *tmp_src;
tmp_src = get_internal_temp(ctx, &tmp_dst);
/* cmps.u32.lt tmp, a0, a1 */
instr = instr_create(ctx, 2, t->opc);
instr->cat2.condition = condition;
vectorize(ctx, instr, &tmp_dst, 2, a0, 0, a1, 0);
a1 = &inst->Src[1].Register;
a2 = &inst->Src[2].Register;
/* sel.{b32,b16} dst, src2, tmp, src1 */
instr = instr_create(ctx, 3, OPC_SEL_B32);
vectorize(ctx, instr, dst, 3, a1, 0, tmp_src, 0, a2, 0);
} else {
/* cmps.{u32,s32}.<cond> dst, a0, a1 */
instr = instr_create(ctx, 2, t->opc);
instr->cat2.condition = condition;
vectorize(ctx, instr, dst, 2, a0, 0, a1, 0);
}
put_dst(ctx, inst, dst);
}
/*
* Conditional / Flow control
*/
@@ -1533,7 +1735,7 @@ trans_endif(const struct instr_translater *t,
}
/*
* Kill / Kill-if
* Kill
*/
static void
@@ -1579,6 +1781,76 @@ trans_kill(const struct instr_translater *t,
ctx->kill[ctx->kill_count++] = instr;
}
/*
* Kill-If
*/
static void
trans_killif(const struct instr_translater *t,
struct fd3_compile_context *ctx,
struct tgsi_full_instruction *inst)
{
struct tgsi_src_register *src = &inst->Src[0].Register;
struct ir3_instruction *instr, *immed, *cond = NULL;
bool inv = false;
immed = create_immed(ctx, 0.0);
/* cmps.f.ne p0.x, cond, {0.0} */
instr = instr_create(ctx, 2, OPC_CMPS_F);
instr->cat2.condition = IR3_COND_NE;
ir3_reg_create(instr, regid(REG_P0, 0), 0);
ir3_reg_create(instr, 0, IR3_REG_SSA)->instr = immed;
add_src_reg(ctx, instr, src, src->SwizzleX);
cond = instr;
/* kill p0.x */
instr = instr_create(ctx, 0, OPC_KILL);
instr->cat0.inv = inv;
ir3_reg_create(instr, 0, 0); /* dummy dst */
ir3_reg_create(instr, 0, IR3_REG_SSA)->instr = cond;
ctx->kill[ctx->kill_count++] = instr;
}
/*
* I2F / U2F / F2I / F2U
*/
static void
trans_cov(const struct instr_translater *t,
struct fd3_compile_context *ctx,
struct tgsi_full_instruction *inst)
{
struct ir3_instruction *instr;
struct tgsi_dst_register *dst = get_dst(ctx, inst);
struct tgsi_src_register *src = &inst->Src[0].Register;
// cov.f32s32 dst, tmp0 /
instr = instr_create(ctx, 1, 0);
switch (t->tgsi_opc) {
case TGSI_OPCODE_U2F:
instr->cat1.src_type = TYPE_U32;
instr->cat1.dst_type = TYPE_F32;
break;
case TGSI_OPCODE_I2F:
instr->cat1.src_type = TYPE_S32;
instr->cat1.dst_type = TYPE_F32;
break;
case TGSI_OPCODE_F2U:
instr->cat1.src_type = TYPE_F32;
instr->cat1.dst_type = TYPE_U32;
break;
case TGSI_OPCODE_F2I:
instr->cat1.src_type = TYPE_F32;
instr->cat1.dst_type = TYPE_S32;
break;
}
vectorize(ctx, instr, dst, 1, src, 0);
}
/*
* Handlers for TGSI instructions which do have 1:1 mapping to native
* instructions:
@@ -1616,9 +1888,11 @@ instr_cat2(const struct instr_translater *t,
switch (t->tgsi_opc) {
case TGSI_OPCODE_ABS:
case TGSI_OPCODE_IABS:
src0_flags = IR3_REG_ABS;
break;
case TGSI_OPCODE_SUB:
case TGSI_OPCODE_INEG:
src1_flags = IR3_REG_NEGATE;
break;
}
@@ -1724,6 +1998,22 @@ static const struct instr_translater translaters[TGSI_OPCODE_LAST] = {
INSTR(SUB, instr_cat2, .opc = OPC_ADD_F),
INSTR(MIN, instr_cat2, .opc = OPC_MIN_F),
INSTR(MAX, instr_cat2, .opc = OPC_MAX_F),
INSTR(UADD, instr_cat2, .opc = OPC_ADD_U),
INSTR(IMIN, instr_cat2, .opc = OPC_MIN_S),
INSTR(UMIN, instr_cat2, .opc = OPC_MIN_U),
INSTR(IMAX, instr_cat2, .opc = OPC_MAX_S),
INSTR(UMAX, instr_cat2, .opc = OPC_MAX_U),
INSTR(AND, instr_cat2, .opc = OPC_AND_B),
INSTR(OR, instr_cat2, .opc = OPC_OR_B),
INSTR(NOT, instr_cat2, .opc = OPC_NOT_B),
INSTR(XOR, instr_cat2, .opc = OPC_XOR_B),
INSTR(UMUL, instr_cat2, .opc = OPC_MUL_U),
INSTR(SHL, instr_cat2, .opc = OPC_SHL_B),
INSTR(USHR, instr_cat2, .opc = OPC_SHR_B),
INSTR(ISHR, instr_cat2, .opc = OPC_ASHR_B),
INSTR(IABS, instr_cat2, .opc = OPC_ABSNEG_S),
INSTR(INEG, instr_cat2, .opc = OPC_ABSNEG_S),
INSTR(AND, instr_cat2, .opc = OPC_AND_B),
INSTR(MAD, instr_cat3, .opc = OPC_MAD_F32, .hopc = OPC_MAD_F16),
INSTR(TRUNC, instr_cat2, .opc = OPC_TRUNC_F),
INSTR(CLAMP, trans_clamp),
@@ -1741,16 +2031,33 @@ static const struct instr_translater translaters[TGSI_OPCODE_LAST] = {
INSTR(TXP, trans_samp, .opc = OPC_SAM, .arg = TGSI_OPCODE_TXP),
INSTR(SGT, trans_cmp),
INSTR(SLT, trans_cmp),
INSTR(FSLT, trans_cmp),
INSTR(SGE, trans_cmp),
INSTR(FSGE, trans_cmp),
INSTR(SLE, trans_cmp),
INSTR(SNE, trans_cmp),
INSTR(FSNE, trans_cmp),
INSTR(SEQ, trans_cmp),
INSTR(FSEQ, trans_cmp),
INSTR(CMP, trans_cmp),
INSTR(USNE, trans_icmp, .opc = OPC_CMPS_U),
INSTR(USEQ, trans_icmp, .opc = OPC_CMPS_U),
INSTR(ISGE, trans_icmp, .opc = OPC_CMPS_S),
INSTR(USGE, trans_icmp, .opc = OPC_CMPS_U),
INSTR(ISLT, trans_icmp, .opc = OPC_CMPS_S),
INSTR(USLT, trans_icmp, .opc = OPC_CMPS_U),
INSTR(UCMP, trans_icmp, .opc = OPC_CMPS_U),
INSTR(IF, trans_if),
INSTR(UIF, trans_if),
INSTR(ELSE, trans_else),
INSTR(ENDIF, trans_endif),
INSTR(END, instr_cat0, .opc = OPC_END),
INSTR(KILL, trans_kill, .opc = OPC_KILL),
INSTR(KILL_IF, trans_killif, .opc = OPC_KILL),
INSTR(I2F, trans_cov),
INSTR(U2F, trans_cov),
INSTR(F2I, trans_cov),
INSTR(F2U, trans_cov),
};
static fd3_semantic
@@ -1935,6 +2242,8 @@ decl_in(struct fd3_compile_context *ctx, struct tgsi_full_declaration *decl)
DBG("decl in -> r%d", i);
compile_assert(ctx, n < ARRAY_SIZE(so->inputs));
so->inputs[n].semantic = decl_semantic(&decl->Semantic);
so->inputs[n].compmask = (1 << ncomp) - 1;
so->inputs[n].regid = r;
@@ -2024,6 +2333,8 @@ decl_out(struct fd3_compile_context *ctx, struct tgsi_full_declaration *decl)
ncomp = 4;
compile_assert(ctx, n < ARRAY_SIZE(so->outputs));
so->outputs[n].semantic = decl_semantic(&decl->Semantic);
so->outputs[n].regid = regid(i, comp);
@@ -2147,6 +2458,7 @@ compile_instructions(struct fd3_compile_context *ctx)
struct tgsi_full_immediate *imm =
&ctx->parser.FullToken.FullImmediate;
unsigned n = ctx->so->immediates_count++;
compile_assert(ctx, n < ARRAY_SIZE(ctx->so->immediates));
memcpy(ctx->so->immediates[n].val, imm->u, 16);
break;
}

View File

@@ -1324,6 +1324,8 @@ decl_in(struct fd3_compile_context *ctx, struct tgsi_full_declaration *decl)
DBG("decl in -> r%d", i + base); // XXX
compile_assert(ctx, n < ARRAY_SIZE(so->inputs));
so->inputs[n].semantic = decl_semantic(&decl->Semantic);
so->inputs[n].compmask = (1 << ncomp) - 1;
so->inputs[n].ncomp = ncomp;
@@ -1410,6 +1412,7 @@ decl_out(struct fd3_compile_context *ctx, struct tgsi_full_declaration *decl)
for (i = decl->Range.First; i <= decl->Range.Last; i++) {
unsigned n = so->outputs_count++;
compile_assert(ctx, n < ARRAY_SIZE(so->outputs));
so->outputs[n].semantic = decl_semantic(&decl->Semantic);
so->outputs[n].regid = regid(i + base, comp);
}

View File

@@ -33,6 +33,7 @@
#include "fd3_emit.h"
#include "fd3_gmem.h"
#include "fd3_program.h"
#include "fd3_query.h"
#include "fd3_rasterizer.h"
#include "fd3_texture.h"
#include "fd3_zsa.h"
@@ -134,5 +135,7 @@ fd3_context_create(struct pipe_screen *pscreen, void *priv)
fd3_ctx->solid_vbuf = create_solid_vertexbuf(pctx);
fd3_ctx->blit_texcoord_vbuf = create_blit_texcoord_vertexbuf(pctx);
fd3_query_context_init(pctx);
return pctx;
}

View File

@@ -406,7 +406,7 @@ fd3_program_emit(struct fd_ringbuffer *ring,
A3XX_SP_VS_PARAM_REG_PSIZEREGID(psize_regid) |
A3XX_SP_VS_PARAM_REG_TOTALVSOUTVAR(align(fp->total_in, 4) / 4));
for (i = 0, j = -1; j < (int)fp->inputs_count; i++) {
for (i = 0, j = -1; (i < 8) && (j < (int)fp->inputs_count); i++) {
uint32_t reg = 0;
OUT_PKT0(ring, REG_A3XX_SP_VS_OUT_REG(i), 1);
@@ -428,7 +428,7 @@ fd3_program_emit(struct fd_ringbuffer *ring,
OUT_RING(ring, reg);
}
for (i = 0, j = -1; j < (int)fp->inputs_count; i++) {
for (i = 0, j = -1; (i < 4) && (j < (int)fp->inputs_count); i++) {
uint32_t reg = 0;
OUT_PKT0(ring, REG_A3XX_SP_VS_VPC_DST_REG(i), 1);

View File

@@ -91,7 +91,7 @@ struct fd3_shader_variant {
struct {
fd3_semantic semantic;
uint8_t regid;
} outputs[16];
} outputs[16 + 2]; /* +POSITION +PSIZE */
bool writes_pos, writes_psize;
/* vertices/inputs: */
@@ -104,7 +104,7 @@ struct fd3_shader_variant {
/* in theory inloc of fs should match outloc of vs: */
uint8_t inloc;
uint8_t bary;
} inputs[16];
} inputs[16 + 2]; /* +POSITION +FACE */
unsigned total_in; /* sum of inputs (scalar) */

View File

@@ -0,0 +1,139 @@
/* -*- mode: C; c-file-style: "k&r"; tab-width 4; indent-tabs-mode: t; -*- */
/*
* Copyright (C) 2014 Rob Clark <robclark@freedesktop.org>
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the "Software"),
* to deal in the Software without restriction, including without limitation
* the rights to use, copy, modify, merge, publish, distribute, sublicense,
* and/or sell copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice (including the next
* paragraph) shall be included in all copies or substantial portions of the
* Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
* THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
* SOFTWARE.
*
* Authors:
* Rob Clark <robclark@freedesktop.org>
*/
#include "freedreno_query_hw.h"
#include "freedreno_context.h"
#include "freedreno_util.h"
#include "fd3_query.h"
#include "fd3_util.h"
struct fd_rb_samp_ctrs {
uint64_t ctr[16];
};
/*
* Occlusion Query:
*
* OCCLUSION_COUNTER and OCCLUSION_PREDICATE differ only in how they
* interpret results
*/
static struct fd_hw_sample *
occlusion_get_sample(struct fd_context *ctx, struct fd_ringbuffer *ring)
{
struct fd_hw_sample *samp =
fd_hw_sample_init(ctx, sizeof(struct fd_rb_samp_ctrs));
/* Set RB_SAMPLE_COUNT_ADDR to samp->offset plus value of
* HW_QUERY_BASE_REG register:
*/
OUT_PKT3(ring, CP_SET_CONSTANT, 3);
OUT_RING(ring, CP_REG(REG_A3XX_RB_SAMPLE_COUNT_ADDR) | 0x80000000);
OUT_RING(ring, HW_QUERY_BASE_REG);
OUT_RING(ring, samp->offset);
OUT_PKT0(ring, REG_A3XX_RB_SAMPLE_COUNT_CONTROL, 1);
OUT_RING(ring, A3XX_RB_SAMPLE_COUNT_CONTROL_COPY);
OUT_PKT3(ring, CP_DRAW_INDX, 3);
OUT_RING(ring, 0x00000000);
OUT_RING(ring, DRAW(DI_PT_POINTLIST_A2XX, DI_SRC_SEL_AUTO_INDEX,
INDEX_SIZE_IGN, USE_VISIBILITY));
OUT_RING(ring, 0); /* NumIndices */
OUT_PKT3(ring, CP_EVENT_WRITE, 1);
OUT_RING(ring, ZPASS_DONE);
OUT_PKT0(ring, REG_A3XX_RBBM_PERFCTR_CTL, 1);
OUT_RING(ring, A3XX_RBBM_PERFCTR_CTL_ENABLE);
OUT_PKT0(ring, REG_A3XX_VBIF_PERF_CNT_EN, 1);
OUT_RING(ring, A3XX_VBIF_PERF_CNT_EN_CNT0 |
A3XX_VBIF_PERF_CNT_EN_CNT1 |
A3XX_VBIF_PERF_CNT_EN_PWRCNT0 |
A3XX_VBIF_PERF_CNT_EN_PWRCNT1 |
A3XX_VBIF_PERF_CNT_EN_PWRCNT2);
return samp;
}
static uint64_t
count_samples(const struct fd_rb_samp_ctrs *start,
const struct fd_rb_samp_ctrs *end)
{
uint64_t n = 0;
unsigned i;
/* not quite sure what all of these are, possibly different
* counters for each MRT render target:
*/
for (i = 0; i < 16; i += 4)
n += end->ctr[i] - start->ctr[i];
return n;
}
static void
occlusion_counter_accumulate_result(struct fd_context *ctx,
const void *start, const void *end,
union pipe_query_result *result)
{
uint64_t n = count_samples(start, end);
result->u64 += n;
}
static void
occlusion_predicate_accumulate_result(struct fd_context *ctx,
const void *start, const void *end,
union pipe_query_result *result)
{
uint64_t n = count_samples(start, end);
result->b |= (n > 0);
}
static const struct fd_hw_sample_provider occlusion_counter = {
.query_type = PIPE_QUERY_OCCLUSION_COUNTER,
.active = FD_STAGE_DRAW, /* | FD_STAGE_CLEAR ??? */
.get_sample = occlusion_get_sample,
.accumulate_result = occlusion_counter_accumulate_result,
};
static const struct fd_hw_sample_provider occlusion_predicate = {
.query_type = PIPE_QUERY_OCCLUSION_PREDICATE,
.active = FD_STAGE_DRAW, /* | FD_STAGE_CLEAR ??? */
.get_sample = occlusion_get_sample,
.accumulate_result = occlusion_predicate_accumulate_result,
};
void fd3_query_context_init(struct pipe_context *pctx)
{
fd_hw_query_register_provider(pctx, &occlusion_counter);
fd_hw_query_register_provider(pctx, &occlusion_predicate);
}

View File

@@ -0,0 +1,36 @@
/* -*- mode: C; c-file-style: "k&r"; tab-width 4; indent-tabs-mode: t; -*- */
/*
* Copyright (C) 2014 Rob Clark <robclark@freedesktop.org>
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the "Software"),
* to deal in the Software without restriction, including without limitation
* the rights to use, copy, modify, merge, publish, distribute, sublicense,
* and/or sell copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice (including the next
* paragraph) shall be included in all copies or substantial portions of the
* Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
* THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
* SOFTWARE.
*
* Authors:
* Rob Clark <robclark@freedesktop.org>
*/
#ifndef FD3_QUERY_H_
#define FD3_QUERY_H_
#include "pipe/p_context.h"
void fd3_query_context_init(struct pipe_context *pctx);
#endif /* FD3_QUERY_H_ */

View File

@@ -40,6 +40,7 @@ fd3_rasterizer_state_create(struct pipe_context *pctx,
const struct pipe_rasterizer_state *cso)
{
struct fd3_rasterizer_stateobj *so;
float psize_min, psize_max;
so = CALLOC_STRUCT(fd3_rasterizer_stateobj);
if (!so)
@@ -47,19 +48,28 @@ fd3_rasterizer_state_create(struct pipe_context *pctx,
so->base = *cso;
if (cso->point_size_per_vertex) {
psize_min = util_get_min_point_size(cso);
psize_max = 8192;
} else {
/* Force the point size to be as if the vertex output was disabled. */
psize_min = cso->point_size;
psize_max = cso->point_size;
}
/*
if (cso->line_stipple_enable) {
??? TODO line stipple
}
TODO cso->half_pixel_center
TODO cso->point_size
TODO psize_min/psize_max
if (cso->multisample)
TODO
*/
so->gras_cl_clip_cntl = A3XX_GRAS_CL_CLIP_CNTL_IJ_PERSP_CENTER; /* ??? */
so->gras_su_point_minmax = 0xffc00010; /* ??? */
so->gras_su_point_size = 0x00000008; /* ??? */
so->gras_su_point_minmax =
A3XX_GRAS_SU_POINT_MINMAX_MIN(psize_min/2) |
A3XX_GRAS_SU_POINT_MINMAX_MAX(psize_max/2);
so->gras_su_point_size = A3XX_GRAS_SU_POINT_SIZE(cso->point_size/2);
so->gras_su_poly_offset_scale =
A3XX_GRAS_SU_POLY_OFFSET_SCALE_VAL(cso->offset_scale);
so->gras_su_poly_offset_offset =

View File

@@ -30,6 +30,7 @@
#include "util/u_string.h"
#include "util/u_memory.h"
#include "util/u_inlines.h"
#include "util/u_format.h"
#include "fd3_texture.h"
#include "fd3_util.h"
@@ -99,6 +100,9 @@ fd3_sampler_state_create(struct pipe_context *pctx,
A3XX_TEX_SAMP_0_WRAP_T(tex_clamp(cso->wrap_t)) |
A3XX_TEX_SAMP_0_WRAP_R(tex_clamp(cso->wrap_r));
if (cso->compare_mode)
so->texsamp0 |= A3XX_TEX_SAMP_0_COMPARE_FUNC(cso->compare_func); /* maps 1:1 */
if (cso->min_mip_filter != PIPE_TEX_MIPFILTER_NONE) {
so->texsamp1 =
A3XX_TEX_SAMP_1_MIN_LOD(cso->min_lod) |
@@ -158,6 +162,10 @@ fd3_sampler_view_create(struct pipe_context *pctx, struct pipe_resource *prsc,
A3XX_TEX_CONST_0_MIPLVLS(miplevels) |
fd3_tex_swiz(cso->format, cso->swizzle_r, cso->swizzle_g,
cso->swizzle_b, cso->swizzle_a);
if (util_format_is_srgb(cso->format))
so->texconst0 |= A3XX_TEX_CONST_0_SRGB;
so->texconst1 =
A3XX_TEX_CONST_1_FETCHSIZE(fd3_pipe2fetchsize(cso->format)) |
A3XX_TEX_CONST_1_WIDTH(prsc->width0) |

View File

@@ -235,6 +235,10 @@ fd3_pipe2tex(enum pipe_format format)
case PIPE_FORMAT_B8G8R8X8_UNORM:
case PIPE_FORMAT_R8G8B8A8_UNORM:
case PIPE_FORMAT_R8G8B8X8_UNORM:
case PIPE_FORMAT_B8G8R8A8_SRGB:
case PIPE_FORMAT_B8G8R8X8_SRGB:
case PIPE_FORMAT_R8G8B8A8_SRGB:
case PIPE_FORMAT_R8G8B8X8_SRGB:
return TFMT_NORM_UINT_8_8_8_8;
case PIPE_FORMAT_Z24X8_UNORM:
@@ -275,6 +279,12 @@ fd3_pipe2fetchsize(enum pipe_format format)
case PIPE_FORMAT_B8G8R8A8_UNORM:
case PIPE_FORMAT_B8G8R8X8_UNORM:
case PIPE_FORMAT_R8G8B8A8_UNORM:
case PIPE_FORMAT_R8G8B8X8_UNORM:
case PIPE_FORMAT_B8G8R8A8_SRGB:
case PIPE_FORMAT_B8G8R8X8_SRGB:
case PIPE_FORMAT_R8G8B8A8_SRGB:
case PIPE_FORMAT_R8G8B8X8_SRGB:
case PIPE_FORMAT_Z24X8_UNORM:
case PIPE_FORMAT_Z24_UNORM_S8_UINT:
return TFETCH_4_BYTE;
@@ -379,14 +389,14 @@ fd3_tex_swiz(enum pipe_format format, unsigned swizzle_r, unsigned swizzle_g,
{
const struct util_format_description *desc =
util_format_description(format);
uint8_t swiz[] = {
unsigned char swiz[4] = {
swizzle_r, swizzle_g, swizzle_b, swizzle_a,
PIPE_SWIZZLE_ZERO, PIPE_SWIZZLE_ONE,
PIPE_SWIZZLE_ONE, PIPE_SWIZZLE_ONE,
};
}, rswiz[4];
return A3XX_TEX_CONST_0_SWIZ_X(tex_swiz(swiz[desc->swizzle[0]])) |
A3XX_TEX_CONST_0_SWIZ_Y(tex_swiz(swiz[desc->swizzle[1]])) |
A3XX_TEX_CONST_0_SWIZ_Z(tex_swiz(swiz[desc->swizzle[2]])) |
A3XX_TEX_CONST_0_SWIZ_W(tex_swiz(swiz[desc->swizzle[3]]));
util_format_compose_swizzles(desc->swizzle, swiz, rswiz);
return A3XX_TEX_CONST_0_SWIZ_X(tex_swiz(rswiz[0])) |
A3XX_TEX_CONST_0_SWIZ_Y(tex_swiz(rswiz[1])) |
A3XX_TEX_CONST_0_SWIZ_Z(tex_swiz(rswiz[2])) |
A3XX_TEX_CONST_0_SWIZ_W(tex_swiz(rswiz[3]));
}

View File

@@ -10,11 +10,11 @@ git clone https://github.com/freedreno/envytools.git
The rules-ng-ng source files this header was generated from are:
- /home/robclark/src/freedreno/envytools/rnndb/adreno.xml ( 364 bytes, from 2013-11-30 14:47:15)
- /home/robclark/src/freedreno/envytools/rnndb/freedreno_copyright.xml ( 1453 bytes, from 2013-03-31 16:51:27)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/a2xx.xml ( 32840 bytes, from 2014-01-05 14:44:21)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/adreno_common.xml ( 9009 bytes, from 2014-01-11 16:56:35)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/adreno_pm4.xml ( 12362 bytes, from 2014-01-07 14:47:36)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/a3xx.xml ( 56545 bytes, from 2014-02-26 16:32:11)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/a4xx.xml ( 8344 bytes, from 2013-11-30 14:49:47)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/a2xx.xml ( 32580 bytes, from 2014-05-16 11:51:57)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/adreno_common.xml ( 10186 bytes, from 2014-05-16 11:51:57)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/adreno_pm4.xml ( 14477 bytes, from 2014-05-16 11:51:57)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/a3xx.xml ( 57831 bytes, from 2014-05-19 21:02:34)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/a4xx.xml ( 26293 bytes, from 2014-05-16 11:51:57)
Copyright (C) 2013-2014 by the following authors:
- Rob Clark <robdclark@gmail.com> (robclark)
@@ -116,6 +116,39 @@ enum adreno_rb_depth_format {
DEPTHX_24_8 = 1,
};
enum adreno_rb_copy_control_mode {
RB_COPY_RESOLVE = 1,
RB_COPY_CLEAR = 2,
RB_COPY_DEPTH_STENCIL = 5,
};
enum a3xx_render_mode {
RB_RENDERING_PASS = 0,
RB_TILING_PASS = 1,
RB_RESOLVE_PASS = 2,
RB_COMPUTE_PASS = 3,
};
enum a3xx_msaa_samples {
MSAA_ONE = 0,
MSAA_TWO = 1,
MSAA_FOUR = 2,
};
enum a3xx_threadmode {
MULTI = 0,
SINGLE = 1,
};
enum a3xx_instrbuffermode {
BUFFER = 1,
};
enum a3xx_threadsize {
TWO_QUADS = 0,
FOUR_QUADS = 1,
};
#define REG_AXXX_CP_RB_BASE 0x000001c0
#define REG_AXXX_CP_RB_CNTL 0x000001c1

View File

@@ -10,11 +10,11 @@ git clone https://github.com/freedreno/envytools.git
The rules-ng-ng source files this header was generated from are:
- /home/robclark/src/freedreno/envytools/rnndb/adreno.xml ( 364 bytes, from 2013-11-30 14:47:15)
- /home/robclark/src/freedreno/envytools/rnndb/freedreno_copyright.xml ( 1453 bytes, from 2013-03-31 16:51:27)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/a2xx.xml ( 32840 bytes, from 2014-01-05 14:44:21)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/adreno_common.xml ( 9009 bytes, from 2014-01-11 16:56:35)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/adreno_pm4.xml ( 12362 bytes, from 2014-01-07 14:47:36)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/a3xx.xml ( 56545 bytes, from 2014-02-26 16:32:11)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/a4xx.xml ( 8344 bytes, from 2013-11-30 14:49:47)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/a2xx.xml ( 32580 bytes, from 2014-05-16 11:51:57)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/adreno_common.xml ( 10186 bytes, from 2014-05-16 11:51:57)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/adreno_pm4.xml ( 14477 bytes, from 2014-05-16 11:51:57)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/a3xx.xml ( 57831 bytes, from 2014-05-19 21:02:34)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/a4xx.xml ( 26293 bytes, from 2014-05-16 11:51:57)
Copyright (C) 2013-2014 by the following authors:
- Rob Clark <robdclark@gmail.com> (robclark)
@@ -164,6 +164,11 @@ enum adreno_pm4_type3_packets {
CP_SET_BIN = 76,
CP_TEST_TWO_MEMS = 113,
CP_WAIT_FOR_ME = 19,
CP_SET_DRAW_STATE = 67,
CP_DRAW_INDX_OFFSET = 56,
CP_DRAW_INDIRECT = 40,
CP_DRAW_INDX_INDIRECT = 41,
CP_DRAW_AUTO = 36,
IN_IB_PREFETCH_END = 23,
IN_SUBBLK_PREFETCH = 31,
IN_INSTR_PREFETCH = 32,
@@ -351,6 +356,93 @@ static inline uint32_t CP_DRAW_INDX_2_2_NUM_INDICES(uint32_t val)
return ((val) << CP_DRAW_INDX_2_2_NUM_INDICES__SHIFT) & CP_DRAW_INDX_2_2_NUM_INDICES__MASK;
}
#define REG_CP_DRAW_INDX_OFFSET_0 0x00000000
#define CP_DRAW_INDX_OFFSET_0_PRIM_TYPE__MASK 0x0000003f
#define CP_DRAW_INDX_OFFSET_0_PRIM_TYPE__SHIFT 0
static inline uint32_t CP_DRAW_INDX_OFFSET_0_PRIM_TYPE(enum pc_di_primtype val)
{
return ((val) << CP_DRAW_INDX_OFFSET_0_PRIM_TYPE__SHIFT) & CP_DRAW_INDX_OFFSET_0_PRIM_TYPE__MASK;
}
#define CP_DRAW_INDX_OFFSET_0_SOURCE_SELECT__MASK 0x000000c0
#define CP_DRAW_INDX_OFFSET_0_SOURCE_SELECT__SHIFT 6
static inline uint32_t CP_DRAW_INDX_OFFSET_0_SOURCE_SELECT(enum pc_di_src_sel val)
{
return ((val) << CP_DRAW_INDX_OFFSET_0_SOURCE_SELECT__SHIFT) & CP_DRAW_INDX_OFFSET_0_SOURCE_SELECT__MASK;
}
#define CP_DRAW_INDX_OFFSET_0_VIS_CULL__MASK 0x00000700
#define CP_DRAW_INDX_OFFSET_0_VIS_CULL__SHIFT 8
static inline uint32_t CP_DRAW_INDX_OFFSET_0_VIS_CULL(enum pc_di_vis_cull_mode val)
{
return ((val) << CP_DRAW_INDX_OFFSET_0_VIS_CULL__SHIFT) & CP_DRAW_INDX_OFFSET_0_VIS_CULL__MASK;
}
#define CP_DRAW_INDX_OFFSET_0_INDEX_SIZE__MASK 0x00000800
#define CP_DRAW_INDX_OFFSET_0_INDEX_SIZE__SHIFT 11
static inline uint32_t CP_DRAW_INDX_OFFSET_0_INDEX_SIZE(enum pc_di_index_size val)
{
return ((val) << CP_DRAW_INDX_OFFSET_0_INDEX_SIZE__SHIFT) & CP_DRAW_INDX_OFFSET_0_INDEX_SIZE__MASK;
}
#define CP_DRAW_INDX_OFFSET_0_NOT_EOP 0x00001000
#define CP_DRAW_INDX_OFFSET_0_SMALL_INDEX 0x00002000
#define CP_DRAW_INDX_OFFSET_0_PRE_DRAW_INITIATOR_ENABLE 0x00004000
#define CP_DRAW_INDX_OFFSET_0_NUM_INDICES__MASK 0xffff0000
#define CP_DRAW_INDX_OFFSET_0_NUM_INDICES__SHIFT 16
static inline uint32_t CP_DRAW_INDX_OFFSET_0_NUM_INDICES(uint32_t val)
{
return ((val) << CP_DRAW_INDX_OFFSET_0_NUM_INDICES__SHIFT) & CP_DRAW_INDX_OFFSET_0_NUM_INDICES__MASK;
}
#define REG_CP_DRAW_INDX_OFFSET_1 0x00000001
#define REG_CP_DRAW_INDX_OFFSET_2 0x00000002
#define CP_DRAW_INDX_OFFSET_2_NUM_INDICES__MASK 0xffffffff
#define CP_DRAW_INDX_OFFSET_2_NUM_INDICES__SHIFT 0
static inline uint32_t CP_DRAW_INDX_OFFSET_2_NUM_INDICES(uint32_t val)
{
return ((val) << CP_DRAW_INDX_OFFSET_2_NUM_INDICES__SHIFT) & CP_DRAW_INDX_OFFSET_2_NUM_INDICES__MASK;
}
#define REG_CP_DRAW_INDX_OFFSET_2 0x00000002
#define CP_DRAW_INDX_OFFSET_2_INDX_BASE__MASK 0xffffffff
#define CP_DRAW_INDX_OFFSET_2_INDX_BASE__SHIFT 0
static inline uint32_t CP_DRAW_INDX_OFFSET_2_INDX_BASE(uint32_t val)
{
return ((val) << CP_DRAW_INDX_OFFSET_2_INDX_BASE__SHIFT) & CP_DRAW_INDX_OFFSET_2_INDX_BASE__MASK;
}
#define REG_CP_DRAW_INDX_OFFSET_2 0x00000002
#define CP_DRAW_INDX_OFFSET_2_INDX_SIZE__MASK 0xffffffff
#define CP_DRAW_INDX_OFFSET_2_INDX_SIZE__SHIFT 0
static inline uint32_t CP_DRAW_INDX_OFFSET_2_INDX_SIZE(uint32_t val)
{
return ((val) << CP_DRAW_INDX_OFFSET_2_INDX_SIZE__SHIFT) & CP_DRAW_INDX_OFFSET_2_INDX_SIZE__MASK;
}
#define REG_CP_SET_DRAW_STATE_0 0x00000000
#define CP_SET_DRAW_STATE_0_COUNT__MASK 0x0000ffff
#define CP_SET_DRAW_STATE_0_COUNT__SHIFT 0
static inline uint32_t CP_SET_DRAW_STATE_0_COUNT(uint32_t val)
{
return ((val) << CP_SET_DRAW_STATE_0_COUNT__SHIFT) & CP_SET_DRAW_STATE_0_COUNT__MASK;
}
#define CP_SET_DRAW_STATE_0_DIRTY 0x00010000
#define CP_SET_DRAW_STATE_0_DISABLE 0x00020000
#define CP_SET_DRAW_STATE_0_DISABLE_ALL_GROUPS 0x00040000
#define CP_SET_DRAW_STATE_0_LOAD_IMMED 0x00080000
#define CP_SET_DRAW_STATE_0_GROUP_ID__MASK 0x1f000000
#define CP_SET_DRAW_STATE_0_GROUP_ID__SHIFT 24
static inline uint32_t CP_SET_DRAW_STATE_0_GROUP_ID(uint32_t val)
{
return ((val) << CP_SET_DRAW_STATE_0_GROUP_ID__SHIFT) & CP_SET_DRAW_STATE_0_GROUP_ID__MASK;
}
#define REG_CP_SET_DRAW_STATE_1 0x00000001
#define CP_SET_DRAW_STATE_1_ADDR__MASK 0xffffffff
#define CP_SET_DRAW_STATE_1_ADDR__SHIFT 0
static inline uint32_t CP_SET_DRAW_STATE_1_ADDR(uint32_t val)
{
return ((val) << CP_SET_DRAW_STATE_1_ADDR__SHIFT) & CP_SET_DRAW_STATE_1_ADDR__MASK;
}
#define REG_CP_SET_BIN_0 0x00000000
#define REG_CP_SET_BIN_1 0x00000001

View File

@@ -34,6 +34,7 @@
#include "freedreno_state.h"
#include "freedreno_gmem.h"
#include "freedreno_query.h"
#include "freedreno_query_hw.h"
#include "freedreno_util.h"
static struct fd_ringbuffer *next_rb(struct fd_context *ctx)
@@ -145,6 +146,7 @@ fd_context_destroy(struct pipe_context *pctx)
DBG("");
fd_prog_fini(pctx);
fd_hw_query_fini(pctx);
util_slab_destroy(&ctx->transfer_pool);
@@ -221,6 +223,7 @@ fd_context_init(struct fd_context *ctx, struct pipe_screen *pscreen,
fd_query_context_init(pctx);
fd_texture_init(pctx);
fd_state_init(pctx);
fd_hw_query_init(pctx);
ctx->blitter = util_blitter_create(pctx);
if (!ctx->blitter)

View File

@@ -33,6 +33,7 @@
#include "pipe/p_context.h"
#include "indices/u_primconvert.h"
#include "util/u_blitter.h"
#include "util/u_double_list.h"
#include "util/u_slab.h"
#include "util/u_string.h"
@@ -82,16 +83,80 @@ struct fd_vertex_stateobj {
unsigned num_elements;
};
/* Bitmask of stages in rendering that a particular query query is
* active. Queries will be automatically started/stopped (generating
* additional fd_hw_sample_period's) on entrance/exit from stages that
* are applicable to the query.
*
* NOTE: set the stage to NULL at end of IB to ensure no query is still
* active. Things aren't going to work out the way you want if a query
* is active across IB's (or between tile IB and draw IB)
*/
enum fd_render_stage {
FD_STAGE_NULL = 0x00,
FD_STAGE_DRAW = 0x01,
FD_STAGE_CLEAR = 0x02,
/* TODO before queries which include MEM2GMEM or GMEM2MEM will
* work we will need to call fd_hw_query_prepare() from somewhere
* appropriate so that queries in the tiling IB get backed with
* memory to write results to.
*/
FD_STAGE_MEM2GMEM = 0x04,
FD_STAGE_GMEM2MEM = 0x08,
/* used for driver internal draws (ie. util_blitter_blit()): */
FD_STAGE_BLIT = 0x10,
};
#define MAX_HW_SAMPLE_PROVIDERS 4
struct fd_hw_sample_provider;
struct fd_hw_sample;
struct fd_context {
struct pipe_context base;
struct fd_device *dev;
struct fd_screen *screen;
struct blitter_context *blitter;
struct primconvert_context *primconvert;
/* slab for pipe_transfer allocations: */
struct util_slab_mempool transfer_pool;
/* slabs for fd_hw_sample and fd_hw_sample_period allocations: */
struct util_slab_mempool sample_pool;
struct util_slab_mempool sample_period_pool;
/* next sample offset.. incremented for each sample in the batch/
* submit, reset to zero on next submit.
*/
uint32_t next_sample_offset;
/* sample-providers for hw queries: */
const struct fd_hw_sample_provider *sample_providers[MAX_HW_SAMPLE_PROVIDERS];
/* cached samples (in case multiple queries need to reference
* the same sample snapshot)
*/
struct fd_hw_sample *sample_cache[MAX_HW_SAMPLE_PROVIDERS];
/* tracking for current stage, to know when to start/stop
* any active queries:
*/
enum fd_render_stage stage;
/* list of active queries: */
struct list_head active_queries;
/* list of queries that are not active, but were active in the
* current submit:
*/
struct list_head current_queries;
/* current query result bo and tile stride: */
struct fd_bo *query_bo;
uint32_t query_tile_stride;
/* table with PIPE_PRIM_MAX entries mapping PIPE_PRIM_x to
* DI_PT_x value to use for draw initiator. There are some
* slight differences between generation:

View File

@@ -36,6 +36,7 @@
#include "freedreno_context.h"
#include "freedreno_state.h"
#include "freedreno_resource.h"
#include "freedreno_query_hw.h"
#include "freedreno_util.h"
@@ -70,7 +71,7 @@ fd_draw_emit(struct fd_context *ctx, struct fd_ringbuffer *ring,
idx_bo = fd_resource(idx->buffer)->bo;
idx_type = size2indextype(idx->index_size);
idx_size = idx->index_size * info->count;
idx_offset = idx->offset;
idx_offset = idx->offset + (info->start * idx->index_size);
src_sel = DI_SRC_SEL_DMA;
} else {
idx_bo = NULL;
@@ -156,6 +157,7 @@ fd_draw_vbo(struct pipe_context *pctx, const struct pipe_draw_info *info)
/* and any buffers used, need to be resolved: */
ctx->resolve |= buffers;
fd_hw_query_set_stage(ctx, ctx->ring, FD_STAGE_DRAW);
ctx->draw(ctx, info);
}
@@ -188,6 +190,8 @@ fd_clear(struct pipe_context *pctx, unsigned buffers,
util_format_short_name(pipe_surface_format(pfb->cbufs[0])),
util_format_short_name(pipe_surface_format(pfb->zsbuf)));
fd_hw_query_set_stage(ctx, ctx->ring, FD_STAGE_CLEAR);
ctx->clear(ctx, buffers, color, depth, stencil);
ctx->dirty |= FD_DIRTY_ZSA |

View File

@@ -35,6 +35,7 @@
#include "freedreno_gmem.h"
#include "freedreno_context.h"
#include "freedreno_resource.h"
#include "freedreno_query_hw.h"
#include "freedreno_util.h"
/*
@@ -273,17 +274,24 @@ render_tiles(struct fd_context *ctx)
ctx->emit_tile_prep(ctx, tile);
if (ctx->restore)
if (ctx->restore) {
fd_hw_query_set_stage(ctx, ctx->ring, FD_STAGE_MEM2GMEM);
ctx->emit_tile_mem2gmem(ctx, tile);
fd_hw_query_set_stage(ctx, ctx->ring, FD_STAGE_NULL);
}
ctx->emit_tile_renderprep(ctx, tile);
fd_hw_query_prepare_tile(ctx, i, ctx->ring);
/* emit IB to drawcmds: */
OUT_IB(ctx->ring, ctx->draw_start, ctx->draw_end);
fd_reset_wfi(ctx);
/* emit gmem2mem to transfer tile back to system memory: */
fd_hw_query_set_stage(ctx, ctx->ring, FD_STAGE_GMEM2MEM);
ctx->emit_tile_gmem2mem(ctx, tile);
fd_hw_query_set_stage(ctx, ctx->ring, FD_STAGE_NULL);
}
}
@@ -292,6 +300,8 @@ render_sysmem(struct fd_context *ctx)
{
ctx->emit_sysmem_prep(ctx);
fd_hw_query_prepare_tile(ctx, 0, ctx->ring);
/* emit IB to drawcmds: */
OUT_IB(ctx->ring, ctx->draw_start, ctx->draw_end);
fd_reset_wfi(ctx);
@@ -314,6 +324,11 @@ fd_gmem_render_tiles(struct pipe_context *pctx)
}
}
/* close out the draw cmds by making sure any active queries are
* paused:
*/
fd_hw_query_set_stage(ctx, ctx->ring, FD_STAGE_NULL);
/* mark the end of the clear/draw cmds before emitting per-tile cmds: */
fd_ringmarker_mark(ctx->draw_end);
fd_ringmarker_mark(ctx->binning_end);
@@ -326,6 +341,7 @@ fd_gmem_render_tiles(struct pipe_context *pctx)
DBG("rendering sysmem (%s/%s)",
util_format_short_name(pipe_surface_format(pfb->cbufs[0])),
util_format_short_name(pipe_surface_format(pfb->zsbuf)));
fd_hw_query_prepare(ctx, 1);
render_sysmem(ctx);
ctx->stats.batch_sysmem++;
} else {
@@ -334,6 +350,7 @@ fd_gmem_render_tiles(struct pipe_context *pctx)
DBG("rendering %dx%d tiles (%s/%s)", gmem->nbins_x, gmem->nbins_y,
util_format_short_name(pipe_surface_format(pfb->cbufs[0])),
util_format_short_name(pipe_surface_format(pfb->zsbuf)));
fd_hw_query_prepare(ctx, gmem->nbins_x * gmem->nbins_y);
render_tiles(ctx);
ctx->stats.batch_gmem++;
}

View File

@@ -1,7 +1,7 @@
/* -*- mode: C; c-file-style: "k&r"; ttxab-width 4; indent-tabs-mode: t; -*- */
/*
* Copyright (C) 2012 Rob Clark <robclark@freedesktop.org>
* Copyright (C) 2013 Rob Clark <robclark@freedesktop.org>
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the "Software"),
@@ -27,63 +27,27 @@
*/
#include "pipe/p_state.h"
#include "util/u_string.h"
#include "util/u_memory.h"
#include "util/u_inlines.h"
#include "os/os_time.h"
#include "freedreno_query.h"
#include "freedreno_query_sw.h"
#include "freedreno_query_hw.h"
#include "freedreno_context.h"
#include "freedreno_util.h"
#define FD_QUERY_DRAW_CALLS (PIPE_QUERY_DRIVER_SPECIFIC + 0)
#define FD_QUERY_BATCH_TOTAL (PIPE_QUERY_DRIVER_SPECIFIC + 1) /* total # of batches (submits) */
#define FD_QUERY_BATCH_SYSMEM (PIPE_QUERY_DRIVER_SPECIFIC + 2) /* batches using system memory (GMEM bypass) */
#define FD_QUERY_BATCH_GMEM (PIPE_QUERY_DRIVER_SPECIFIC + 3) /* batches using GMEM */
#define FD_QUERY_BATCH_RESTORE (PIPE_QUERY_DRIVER_SPECIFIC + 4) /* batches requiring GMEM restore */
/* Currently just simple cpu query's supported.. probably need
* to refactor this a bit when I'm eventually ready to add gpu
* queries:
/*
* Pipe Query interface:
*/
struct fd_query {
int type;
/* storage for the collected data */
union pipe_query_result data;
bool active;
uint64_t begin_value, end_value;
uint64_t begin_time, end_time;
};
static inline struct fd_query *
fd_query(struct pipe_query *pq)
{
return (struct fd_query *)pq;
}
static struct pipe_query *
fd_create_query(struct pipe_context *pctx, unsigned query_type)
{
struct fd_context *ctx = fd_context(pctx);
struct fd_query *q;
switch (query_type) {
case PIPE_QUERY_PRIMITIVES_GENERATED:
case PIPE_QUERY_PRIMITIVES_EMITTED:
case FD_QUERY_DRAW_CALLS:
case FD_QUERY_BATCH_TOTAL:
case FD_QUERY_BATCH_SYSMEM:
case FD_QUERY_BATCH_GMEM:
case FD_QUERY_BATCH_RESTORE:
break;
default:
return NULL;
}
q = CALLOC_STRUCT(fd_query);
q = fd_sw_create_query(ctx, query_type);
if (!q)
return NULL;
q->type = query_type;
q = fd_hw_create_query(ctx, query_type);
return (struct pipe_query *) q;
}
@@ -92,64 +56,21 @@ static void
fd_destroy_query(struct pipe_context *pctx, struct pipe_query *pq)
{
struct fd_query *q = fd_query(pq);
free(q);
}
static uint64_t
read_counter(struct pipe_context *pctx, int type)
{
struct fd_context *ctx = fd_context(pctx);
switch (type) {
case PIPE_QUERY_PRIMITIVES_GENERATED:
/* for now same thing as _PRIMITIVES_EMITTED */
case PIPE_QUERY_PRIMITIVES_EMITTED:
return ctx->stats.prims_emitted;
case FD_QUERY_DRAW_CALLS:
return ctx->stats.draw_calls;
case FD_QUERY_BATCH_TOTAL:
return ctx->stats.batch_total;
case FD_QUERY_BATCH_SYSMEM:
return ctx->stats.batch_sysmem;
case FD_QUERY_BATCH_GMEM:
return ctx->stats.batch_gmem;
case FD_QUERY_BATCH_RESTORE:
return ctx->stats.batch_restore;
}
return 0;
}
static bool
is_rate_query(struct fd_query *q)
{
switch (q->type) {
case FD_QUERY_BATCH_TOTAL:
case FD_QUERY_BATCH_SYSMEM:
case FD_QUERY_BATCH_GMEM:
case FD_QUERY_BATCH_RESTORE:
return true;
default:
return false;
}
q->funcs->destroy_query(fd_context(pctx), q);
}
static void
fd_begin_query(struct pipe_context *pctx, struct pipe_query *pq)
{
struct fd_query *q = fd_query(pq);
q->active = true;
q->begin_value = read_counter(pctx, q->type);
if (is_rate_query(q))
q->begin_time = os_time_get();
q->funcs->begin_query(fd_context(pctx), q);
}
static void
fd_end_query(struct pipe_context *pctx, struct pipe_query *pq)
{
struct fd_query *q = fd_query(pq);
q->active = false;
q->end_value = read_counter(pctx, q->type);
if (is_rate_query(q))
q->end_time = os_time_get();
q->funcs->end_query(fd_context(pctx), q);
}
static boolean
@@ -157,21 +78,7 @@ fd_get_query_result(struct pipe_context *pctx, struct pipe_query *pq,
boolean wait, union pipe_query_result *result)
{
struct fd_query *q = fd_query(pq);
if (q->active)
return false;
util_query_clear_result(result, q->type);
result->u64 = q->end_value - q->begin_value;
if (is_rate_query(q)) {
double fps = (result->u64 * 1000000) /
(double)(q->end_time - q->begin_time);
result->u64 = (uint64_t)fps;
}
return true;
return q->funcs->get_query_result(fd_context(pctx), q, wait, result);
}
static int

View File

@@ -1,7 +1,7 @@
/* -*- mode: C; c-file-style: "k&r"; tab-width 4; indent-tabs-mode: t; -*- */
/*
* Copyright (C) 2012 Rob Clark <robclark@freedesktop.org>
* Copyright (C) 2013 Rob Clark <robclark@freedesktop.org>
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the "Software"),
@@ -31,6 +31,37 @@
#include "pipe/p_context.h"
struct fd_context;
struct fd_query;
struct fd_query_funcs {
void (*destroy_query)(struct fd_context *ctx,
struct fd_query *q);
void (*begin_query)(struct fd_context *ctx, struct fd_query *q);
void (*end_query)(struct fd_context *ctx, struct fd_query *q);
boolean (*get_query_result)(struct fd_context *ctx,
struct fd_query *q, boolean wait,
union pipe_query_result *result);
};
struct fd_query {
const struct fd_query_funcs *funcs;
bool active;
int type;
};
static inline struct fd_query *
fd_query(struct pipe_query *pq)
{
return (struct fd_query *)pq;
}
#define FD_QUERY_DRAW_CALLS (PIPE_QUERY_DRIVER_SPECIFIC + 0)
#define FD_QUERY_BATCH_TOTAL (PIPE_QUERY_DRIVER_SPECIFIC + 1) /* total # of batches (submits) */
#define FD_QUERY_BATCH_SYSMEM (PIPE_QUERY_DRIVER_SPECIFIC + 2) /* batches using system memory (GMEM bypass) */
#define FD_QUERY_BATCH_GMEM (PIPE_QUERY_DRIVER_SPECIFIC + 3) /* batches using GMEM */
#define FD_QUERY_BATCH_RESTORE (PIPE_QUERY_DRIVER_SPECIFIC + 4) /* batches requiring GMEM restore */
void fd_query_screen_init(struct pipe_screen *pscreen);
void fd_query_context_init(struct pipe_context *pctx);

View File

@@ -0,0 +1,465 @@
/* -*- mode: C; c-file-style: "k&r"; tab-width 4; indent-tabs-mode: t; -*- */
/*
* Copyright (C) 2014 Rob Clark <robclark@freedesktop.org>
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the "Software"),
* to deal in the Software without restriction, including without limitation
* the rights to use, copy, modify, merge, publish, distribute, sublicense,
* and/or sell copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice (including the next
* paragraph) shall be included in all copies or substantial portions of the
* Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
* THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
* SOFTWARE.
*
* Authors:
* Rob Clark <robclark@freedesktop.org>
*/
#include "pipe/p_state.h"
#include "util/u_memory.h"
#include "util/u_inlines.h"
#include "freedreno_query_hw.h"
#include "freedreno_context.h"
#include "freedreno_util.h"
struct fd_hw_sample_period {
struct fd_hw_sample *start, *end;
struct list_head list;
};
/* maps query_type to sample provider idx: */
static int pidx(unsigned query_type)
{
switch (query_type) {
case PIPE_QUERY_OCCLUSION_COUNTER:
return 0;
case PIPE_QUERY_OCCLUSION_PREDICATE:
return 1;
default:
return -1;
}
}
static struct fd_hw_sample *
get_sample(struct fd_context *ctx, struct fd_ringbuffer *ring,
unsigned query_type)
{
struct fd_hw_sample *samp = NULL;
int idx = pidx(query_type);
if (!ctx->sample_cache[idx]) {
ctx->sample_cache[idx] =
ctx->sample_providers[idx]->get_sample(ctx, ring);
}
fd_hw_sample_reference(ctx, &samp, ctx->sample_cache[idx]);
return samp;
}
static void
clear_sample_cache(struct fd_context *ctx)
{
int i;
for (i = 0; i < ARRAY_SIZE(ctx->sample_cache); i++)
fd_hw_sample_reference(ctx, &ctx->sample_cache[i], NULL);
}
static bool
is_active(struct fd_hw_query *hq, enum fd_render_stage stage)
{
return !!(hq->provider->active & stage);
}
static void
resume_query(struct fd_context *ctx, struct fd_hw_query *hq,
struct fd_ringbuffer *ring)
{
assert(!hq->period);
hq->period = util_slab_alloc(&ctx->sample_period_pool);
list_inithead(&hq->period->list);
hq->period->start = get_sample(ctx, ring, hq->base.type);
/* NOTE: util_slab_alloc() does not zero out the buffer: */
hq->period->end = NULL;
}
static void
pause_query(struct fd_context *ctx, struct fd_hw_query *hq,
struct fd_ringbuffer *ring)
{
assert(hq->period && !hq->period->end);
hq->period->end = get_sample(ctx, ring, hq->base.type);
list_addtail(&hq->period->list, &hq->current_periods);
hq->period = NULL;
}
static void
destroy_periods(struct fd_context *ctx, struct list_head *list)
{
struct fd_hw_sample_period *period, *s;
LIST_FOR_EACH_ENTRY_SAFE(period, s, list, list) {
fd_hw_sample_reference(ctx, &period->start, NULL);
fd_hw_sample_reference(ctx, &period->end, NULL);
list_del(&period->list);
util_slab_free(&ctx->sample_period_pool, period);
}
}
static void
fd_hw_destroy_query(struct fd_context *ctx, struct fd_query *q)
{
struct fd_hw_query *hq = fd_hw_query(q);
destroy_periods(ctx, &hq->periods);
destroy_periods(ctx, &hq->current_periods);
list_del(&hq->list);
free(hq);
}
static void
fd_hw_begin_query(struct fd_context *ctx, struct fd_query *q)
{
struct fd_hw_query *hq = fd_hw_query(q);
if (q->active)
return;
/* begin_query() should clear previous results: */
destroy_periods(ctx, &hq->periods);
if (is_active(hq, ctx->stage))
resume_query(ctx, hq, ctx->ring);
q->active = true;
/* add to active list: */
list_del(&hq->list);
list_addtail(&hq->list, &ctx->active_queries);
}
static void
fd_hw_end_query(struct fd_context *ctx, struct fd_query *q)
{
struct fd_hw_query *hq = fd_hw_query(q);
if (!q->active)
return;
if (is_active(hq, ctx->stage))
pause_query(ctx, hq, ctx->ring);
q->active = false;
/* move to current list: */
list_del(&hq->list);
list_addtail(&hq->list, &ctx->current_queries);
}
/* helper to get ptr to specified sample: */
static void * sampptr(struct fd_hw_sample *samp, uint32_t n, void *ptr)
{
return ((char *)ptr) + (samp->tile_stride * n) + samp->offset;
}
static boolean
fd_hw_get_query_result(struct fd_context *ctx, struct fd_query *q,
boolean wait, union pipe_query_result *result)
{
struct fd_hw_query *hq = fd_hw_query(q);
const struct fd_hw_sample_provider *p = hq->provider;
struct fd_hw_sample_period *period;
if (q->active)
return false;
/* if the app tries to read back the query result before the
* back is submitted, that forces us to flush so that there
* are actually results to wait for:
*/
if (!LIST_IS_EMPTY(&hq->list)) {
DBG("reading query result forces flush!");
ctx->needs_flush = true;
fd_context_render(&ctx->base);
}
util_query_clear_result(result, q->type);
if (LIST_IS_EMPTY(&hq->periods))
return true;
assert(LIST_IS_EMPTY(&hq->list));
assert(LIST_IS_EMPTY(&hq->current_periods));
assert(!hq->period);
if (LIST_IS_EMPTY(&hq->periods))
return true;
/* if !wait, then check the last sample (the one most likely to
* not be ready yet) and bail if it is not ready:
*/
if (!wait) {
int ret;
period = LIST_ENTRY(struct fd_hw_sample_period,
hq->periods.prev, list);
ret = fd_bo_cpu_prep(period->end->bo, ctx->screen->pipe,
DRM_FREEDRENO_PREP_READ | DRM_FREEDRENO_PREP_NOSYNC);
if (ret)
return false;
fd_bo_cpu_fini(period->end->bo);
}
/* sum the result across all sample periods: */
LIST_FOR_EACH_ENTRY(period, &hq->periods, list) {
struct fd_hw_sample *start = period->start;
struct fd_hw_sample *end = period->end;
unsigned i;
/* start and end samples should be from same batch: */
assert(start->bo == end->bo);
assert(start->num_tiles == end->num_tiles);
for (i = 0; i < start->num_tiles; i++) {
void *ptr;
fd_bo_cpu_prep(start->bo, ctx->screen->pipe,
DRM_FREEDRENO_PREP_READ);
ptr = fd_bo_map(start->bo);
p->accumulate_result(ctx, sampptr(period->start, i, ptr),
sampptr(period->end, i, ptr), result);
fd_bo_cpu_fini(start->bo);
}
}
return true;
}
static const struct fd_query_funcs hw_query_funcs = {
.destroy_query = fd_hw_destroy_query,
.begin_query = fd_hw_begin_query,
.end_query = fd_hw_end_query,
.get_query_result = fd_hw_get_query_result,
};
struct fd_query *
fd_hw_create_query(struct fd_context *ctx, unsigned query_type)
{
struct fd_hw_query *hq;
struct fd_query *q;
int idx = pidx(query_type);
if ((idx < 0) || !ctx->sample_providers[idx])
return NULL;
hq = CALLOC_STRUCT(fd_hw_query);
if (!hq)
return NULL;
hq->provider = ctx->sample_providers[idx];
list_inithead(&hq->periods);
list_inithead(&hq->current_periods);
list_inithead(&hq->list);
q = &hq->base;
q->funcs = &hw_query_funcs;
q->type = query_type;
return q;
}
struct fd_hw_sample *
fd_hw_sample_init(struct fd_context *ctx, uint32_t size)
{
struct fd_hw_sample *samp = util_slab_alloc(&ctx->sample_pool);
pipe_reference_init(&samp->reference, 1);
samp->size = size;
samp->offset = ctx->next_sample_offset;
/* NOTE: util_slab_alloc() does not zero out the buffer: */
samp->bo = NULL;
samp->num_tiles = 0;
samp->tile_stride = 0;
ctx->next_sample_offset += size;
return samp;
}
void
__fd_hw_sample_destroy(struct fd_context *ctx, struct fd_hw_sample *samp)
{
if (samp->bo)
fd_bo_del(samp->bo);
util_slab_free(&ctx->sample_pool, samp);
}
static void
prepare_sample(struct fd_hw_sample *samp, struct fd_bo *bo,
uint32_t num_tiles, uint32_t tile_stride)
{
if (samp->bo) {
assert(samp->bo == bo);
assert(samp->num_tiles == num_tiles);
assert(samp->tile_stride == tile_stride);
return;
}
samp->bo = bo;
samp->num_tiles = num_tiles;
samp->tile_stride = tile_stride;
}
static void
prepare_query(struct fd_hw_query *hq, struct fd_bo *bo,
uint32_t num_tiles, uint32_t tile_stride)
{
struct fd_hw_sample_period *period, *s;
/* prepare all the samples in the query: */
LIST_FOR_EACH_ENTRY_SAFE(period, s, &hq->current_periods, list) {
prepare_sample(period->start, bo, num_tiles, tile_stride);
prepare_sample(period->end, bo, num_tiles, tile_stride);
/* move from current_periods list to periods list: */
list_del(&period->list);
list_addtail(&period->list, &hq->periods);
}
}
static void
prepare_queries(struct fd_context *ctx, struct fd_bo *bo,
uint32_t num_tiles, uint32_t tile_stride,
struct list_head *list, bool remove)
{
struct fd_hw_query *hq, *s;
LIST_FOR_EACH_ENTRY_SAFE(hq, s, list, list) {
prepare_query(hq, bo, num_tiles, tile_stride);
if (remove)
list_delinit(&hq->list);
}
}
/* called from gmem code once total storage requirements are known (ie.
* number of samples times number of tiles)
*/
void
fd_hw_query_prepare(struct fd_context *ctx, uint32_t num_tiles)
{
uint32_t tile_stride = ctx->next_sample_offset;
struct fd_bo *bo;
if (ctx->query_bo)
fd_bo_del(ctx->query_bo);
if (tile_stride > 0) {
bo = fd_bo_new(ctx->dev, tile_stride * num_tiles,
DRM_FREEDRENO_GEM_CACHE_WCOMBINE |
DRM_FREEDRENO_GEM_TYPE_KMEM);
} else {
bo = NULL;
}
ctx->query_bo = bo;
ctx->query_tile_stride = tile_stride;
prepare_queries(ctx, bo, num_tiles, tile_stride,
&ctx->active_queries, false);
prepare_queries(ctx, bo, num_tiles, tile_stride,
&ctx->current_queries, true);
/* reset things for next batch: */
ctx->next_sample_offset = 0;
}
void
fd_hw_query_prepare_tile(struct fd_context *ctx, uint32_t n,
struct fd_ringbuffer *ring)
{
uint32_t tile_stride = ctx->query_tile_stride;
uint32_t offset = tile_stride * n;
/* bail if no queries: */
if (tile_stride == 0)
return;
fd_wfi(ctx, ring);
OUT_PKT0 (ring, HW_QUERY_BASE_REG, 1);
OUT_RELOCW(ring, ctx->query_bo, offset, 0, 0);
}
void
fd_hw_query_set_stage(struct fd_context *ctx, struct fd_ringbuffer *ring,
enum fd_render_stage stage)
{
/* special case: internal blits (like mipmap level generation)
* go through normal draw path (via util_blitter_blit()).. but
* we need to ignore the FD_STAGE_DRAW which will be set, so we
* don't enable queries which should be paused during internal
* blits:
*/
if ((ctx->stage == FD_STAGE_BLIT) &&
(stage != FD_STAGE_NULL))
return;
if (stage != ctx->stage) {
struct fd_hw_query *hq;
LIST_FOR_EACH_ENTRY(hq, &ctx->active_queries, list) {
bool was_active = is_active(hq, ctx->stage);
bool now_active = is_active(hq, stage);
if (now_active && !was_active)
resume_query(ctx, hq, ring);
else if (was_active && !now_active)
pause_query(ctx, hq, ring);
}
}
clear_sample_cache(ctx);
ctx->stage = stage;
}
void
fd_hw_query_register_provider(struct pipe_context *pctx,
const struct fd_hw_sample_provider *provider)
{
struct fd_context *ctx = fd_context(pctx);
int idx = pidx(provider->query_type);
assert((0 <= idx) && (idx < MAX_HW_SAMPLE_PROVIDERS));
assert(!ctx->sample_providers[idx]);
ctx->sample_providers[idx] = provider;
}
void
fd_hw_query_init(struct pipe_context *pctx)
{
struct fd_context *ctx = fd_context(pctx);
util_slab_create(&ctx->sample_pool, sizeof(struct fd_hw_sample),
16, UTIL_SLAB_SINGLETHREADED);
util_slab_create(&ctx->sample_period_pool, sizeof(struct fd_hw_sample_period),
16, UTIL_SLAB_SINGLETHREADED);
list_inithead(&ctx->active_queries);
list_inithead(&ctx->current_queries);
}
void
fd_hw_query_fini(struct pipe_context *pctx)
{
struct fd_context *ctx = fd_context(pctx);
util_slab_destroy(&ctx->sample_pool);
util_slab_destroy(&ctx->sample_period_pool);
}

View File

@@ -0,0 +1,164 @@
/* -*- mode: C; c-file-style: "k&r"; tab-width 4; indent-tabs-mode: t; -*- */
/*
* Copyright (C) 2014 Rob Clark <robclark@freedesktop.org>
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the "Software"),
* to deal in the Software without restriction, including without limitation
* the rights to use, copy, modify, merge, publish, distribute, sublicense,
* and/or sell copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice (including the next
* paragraph) shall be included in all copies or substantial portions of the
* Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
* THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
* SOFTWARE.
*
* Authors:
* Rob Clark <robclark@freedesktop.org>
*/
#ifndef FREEDRENO_QUERY_HW_H_
#define FREEDRENO_QUERY_HW_H_
#include "util/u_double_list.h"
#include "freedreno_query.h"
#include "freedreno_context.h"
/*
* HW Queries:
*
* See: https://github.com/freedreno/freedreno/wiki/Queries#hardware-queries
*
* Hardware queries will be specific to gpu generation, but they need
* some common infrastructure for triggering start/stop samples at
* various points (for example, to exclude mem2gmem/gmem2mem or clear)
* as well as per tile tracking.
*
* NOTE: in at least some cases hw writes sample values to memory addr
* specified in some register. So we don't really have the option to
* just sample the same counter multiple times for multiple different
* queries with the same query_type. So we cache per sample provider
* the most recent sample since the last draw. This way multiple
* sample periods for multiple queries can reference the same sample.
*
* fd_hw_sample_provider:
* - one per query type, registered/implemented by gpu generation
* specific code
* - can construct fd_hw_samples on demand
* - most recent sample (since last draw) cached so multiple
* different queries can ref the same sample
*
* fd_hw_sample:
* - abstracts one snapshot of counter value(s) across N tiles
* - backing object not allocated until submit time when number
* of samples and number of tiles is known
*
* fd_hw_sample_period:
* - consists of start and stop sample
* - a query accumulates a list of sample periods
* - the query result is the sum of the sample periods
*/
struct fd_hw_sample_provider {
unsigned query_type;
/* stages applicable to the query type: */
enum fd_render_stage active;
/* when a new sample is required, emit appropriate cmdstream
* and return a sample object:
*/
struct fd_hw_sample *(*get_sample)(struct fd_context *ctx,
struct fd_ringbuffer *ring);
/* accumulate the results from specified sample period: */
void (*accumulate_result)(struct fd_context *ctx,
const void *start, const void *end,
union pipe_query_result *result);
};
struct fd_hw_sample {
struct pipe_reference reference; /* keep this first */
/* offset and size of the sample are know at the time the
* sample is constructed.
*/
uint32_t size;
uint32_t offset;
/* backing object, offset/stride/etc are determined not when
* the sample is constructed, but when the batch is submitted.
* This way we can defer allocation until total # of requested
* samples, and total # of tiles, is known.
*/
struct fd_bo *bo;
uint32_t num_tiles;
uint32_t tile_stride;
};
struct fd_hw_sample_period;
struct fd_hw_query {
struct fd_query base;
const struct fd_hw_sample_provider *provider;
/* list of fd_hw_sample_period in previous submits: */
struct list_head periods;
/* list of fd_hw_sample_period's in current submit: */
struct list_head current_periods;
/* if active and not paused, the current sample period (not
* yet added to current_periods):
*/
struct fd_hw_sample_period *period;
struct list_head list; /* list-node in ctx->active_queries */
};
static inline struct fd_hw_query *
fd_hw_query(struct fd_query *q)
{
return (struct fd_hw_query *)q;
}
struct fd_query * fd_hw_create_query(struct fd_context *ctx, unsigned query_type);
/* helper for sample providers: */
struct fd_hw_sample * fd_hw_sample_init(struct fd_context *ctx, uint32_t size);
/* don't call directly, use fd_hw_sample_reference() */
void __fd_hw_sample_destroy(struct fd_context *ctx, struct fd_hw_sample *samp);
void fd_hw_query_prepare(struct fd_context *ctx, uint32_t num_tiles);
void fd_hw_query_prepare_tile(struct fd_context *ctx, uint32_t n,
struct fd_ringbuffer *ring);
void fd_hw_query_set_stage(struct fd_context *ctx,
struct fd_ringbuffer *ring, enum fd_render_stage stage);
void fd_hw_query_register_provider(struct pipe_context *pctx,
const struct fd_hw_sample_provider *provider);
void fd_hw_query_init(struct pipe_context *pctx);
void fd_hw_query_fini(struct pipe_context *pctx);
static inline void
fd_hw_sample_reference(struct fd_context *ctx,
struct fd_hw_sample **ptr, struct fd_hw_sample *samp)
{
struct fd_hw_sample *old_samp = *ptr;
if (pipe_reference(&(*ptr)->reference, &samp->reference))
__fd_hw_sample_destroy(ctx, old_samp);
if (ptr)
*ptr = samp;
}
#endif /* FREEDRENO_QUERY_HW_H_ */

View File

@@ -0,0 +1,165 @@
/* -*- mode: C; c-file-style: "k&r"; tab-width 4; indent-tabs-mode: t; -*- */
/*
* Copyright (C) 2014 Rob Clark <robclark@freedesktop.org>
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the "Software"),
* to deal in the Software without restriction, including without limitation
* the rights to use, copy, modify, merge, publish, distribute, sublicense,
* and/or sell copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice (including the next
* paragraph) shall be included in all copies or substantial portions of the
* Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
* THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
* SOFTWARE.
*
* Authors:
* Rob Clark <robclark@freedesktop.org>
*/
#include "pipe/p_state.h"
#include "util/u_string.h"
#include "util/u_memory.h"
#include "util/u_inlines.h"
#include "os/os_time.h"
#include "freedreno_query_sw.h"
#include "freedreno_context.h"
#include "freedreno_util.h"
/*
* SW Queries:
*
* In the core, we have some support for basic sw counters
*/
static void
fd_sw_destroy_query(struct fd_context *ctx, struct fd_query *q)
{
struct fd_sw_query *sq = fd_sw_query(q);
free(sq);
}
static uint64_t
read_counter(struct fd_context *ctx, int type)
{
switch (type) {
case PIPE_QUERY_PRIMITIVES_GENERATED:
/* for now same thing as _PRIMITIVES_EMITTED */
case PIPE_QUERY_PRIMITIVES_EMITTED:
return ctx->stats.prims_emitted;
case FD_QUERY_DRAW_CALLS:
return ctx->stats.draw_calls;
case FD_QUERY_BATCH_TOTAL:
return ctx->stats.batch_total;
case FD_QUERY_BATCH_SYSMEM:
return ctx->stats.batch_sysmem;
case FD_QUERY_BATCH_GMEM:
return ctx->stats.batch_gmem;
case FD_QUERY_BATCH_RESTORE:
return ctx->stats.batch_restore;
}
return 0;
}
static bool
is_rate_query(struct fd_query *q)
{
switch (q->type) {
case FD_QUERY_BATCH_TOTAL:
case FD_QUERY_BATCH_SYSMEM:
case FD_QUERY_BATCH_GMEM:
case FD_QUERY_BATCH_RESTORE:
return true;
default:
return false;
}
}
static void
fd_sw_begin_query(struct fd_context *ctx, struct fd_query *q)
{
struct fd_sw_query *sq = fd_sw_query(q);
q->active = true;
sq->begin_value = read_counter(ctx, q->type);
if (is_rate_query(q))
sq->begin_time = os_time_get();
}
static void
fd_sw_end_query(struct fd_context *ctx, struct fd_query *q)
{
struct fd_sw_query *sq = fd_sw_query(q);
q->active = false;
sq->end_value = read_counter(ctx, q->type);
if (is_rate_query(q))
sq->end_time = os_time_get();
}
static boolean
fd_sw_get_query_result(struct fd_context *ctx, struct fd_query *q,
boolean wait, union pipe_query_result *result)
{
struct fd_sw_query *sq = fd_sw_query(q);
if (q->active)
return false;
util_query_clear_result(result, q->type);
result->u64 = sq->end_value - sq->begin_value;
if (is_rate_query(q)) {
double fps = (result->u64 * 1000000) /
(double)(sq->end_time - sq->begin_time);
result->u64 = (uint64_t)fps;
}
return true;
}
static const struct fd_query_funcs sw_query_funcs = {
.destroy_query = fd_sw_destroy_query,
.begin_query = fd_sw_begin_query,
.end_query = fd_sw_end_query,
.get_query_result = fd_sw_get_query_result,
};
struct fd_query *
fd_sw_create_query(struct fd_context *ctx, unsigned query_type)
{
struct fd_sw_query *sq;
struct fd_query *q;
switch (query_type) {
case PIPE_QUERY_PRIMITIVES_GENERATED:
case PIPE_QUERY_PRIMITIVES_EMITTED:
case FD_QUERY_DRAW_CALLS:
case FD_QUERY_BATCH_TOTAL:
case FD_QUERY_BATCH_SYSMEM:
case FD_QUERY_BATCH_GMEM:
case FD_QUERY_BATCH_RESTORE:
break;
default:
return NULL;
}
sq = CALLOC_STRUCT(fd_sw_query);
if (!sq)
return NULL;
q = &sq->base;
q->funcs = &sw_query_funcs;
q->type = query_type;
return q;
}

View File

@@ -0,0 +1,55 @@
/* -*- mode: C; c-file-style: "k&r"; tab-width 4; indent-tabs-mode: t; -*- */
/*
* Copyright (C) 2014 Rob Clark <robclark@freedesktop.org>
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the "Software"),
* to deal in the Software without restriction, including without limitation
* the rights to use, copy, modify, merge, publish, distribute, sublicense,
* and/or sell copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice (including the next
* paragraph) shall be included in all copies or substantial portions of the
* Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
* THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
* SOFTWARE.
*
* Authors:
* Rob Clark <robclark@freedesktop.org>
*/
#ifndef FREEDRENO_QUERY_SW_H_
#define FREEDRENO_QUERY_SW_H_
#include "freedreno_query.h"
/*
* SW Queries:
*
* In the core, we have some support for basic sw counters
*/
struct fd_sw_query {
struct fd_query base;
uint64_t begin_value, end_value;
uint64_t begin_time, end_time;
};
static inline struct fd_sw_query *
fd_sw_query(struct fd_query *q)
{
return (struct fd_sw_query *)q;
}
struct fd_query * fd_sw_create_query(struct fd_context *ctx,
unsigned query_type);
#endif /* FREEDRENO_QUERY_SW_H_ */

View File

@@ -36,6 +36,7 @@
#include "freedreno_screen.h"
#include "freedreno_surface.h"
#include "freedreno_context.h"
#include "freedreno_query_hw.h"
#include "freedreno_util.h"
#include <errno.h>
@@ -401,7 +402,9 @@ render_blit(struct pipe_context *pctx, struct pipe_blit_info *info)
util_blitter_save_fragment_sampler_views(ctx->blitter,
ctx->fragtex.num_textures, ctx->fragtex.textures);
fd_hw_query_set_stage(ctx, ctx->ring, FD_STAGE_BLIT);
util_blitter_blit(ctx->blitter, info);
fd_hw_query_set_stage(ctx, ctx->ring, FD_STAGE_NULL);
return true;
}

View File

@@ -143,6 +143,8 @@ tables for things that differ if the delta is not too much..
static int
fd_screen_get_param(struct pipe_screen *pscreen, enum pipe_cap param)
{
struct fd_screen *screen = fd_screen(pscreen);
/* this is probably not totally correct.. but it's a start: */
switch (param) {
/* Supported features (boolean caps). */
@@ -161,8 +163,6 @@ fd_screen_get_param(struct pipe_screen *pscreen, enum pipe_cap param)
case PIPE_CAP_TGSI_FS_COORD_PIXEL_CENTER_HALF_INTEGER:
case PIPE_CAP_SM3:
case PIPE_CAP_SEAMLESS_CUBE_MAP:
case PIPE_CAP_PRIMITIVE_RESTART:
case PIPE_CAP_CONDITIONAL_RENDER:
case PIPE_CAP_TEXTURE_BARRIER:
case PIPE_CAP_VERTEX_COLOR_UNCLAMPED:
case PIPE_CAP_QUADS_FOLLOW_PROVOKING_VERTEX_CONVENTION:
@@ -180,6 +180,8 @@ fd_screen_get_param(struct pipe_screen *pscreen, enum pipe_cap param)
case PIPE_CAP_SHADER_STENCIL_EXPORT:
case PIPE_CAP_TGSI_TEXCOORD:
case PIPE_CAP_PREFER_BLIT_BASED_TEXTURE_TRANSFER:
case PIPE_CAP_CONDITIONAL_RENDER:
case PIPE_CAP_PRIMITIVE_RESTART:
return 0;
case PIPE_CAP_CONSTANT_BUFFER_OFFSET_ALIGNMENT:
@@ -229,17 +231,18 @@ fd_screen_get_param(struct pipe_screen *pscreen, enum pipe_cap param)
case PIPE_CAP_MAX_TEXTURE_CUBE_LEVELS:
return MAX_MIP_LEVELS;
case PIPE_CAP_MAX_TEXTURE_ARRAY_LAYERS:
return 9192;
return 0; /* TODO: a3xx+ should support (required in gles3) */
/* Render targets. */
case PIPE_CAP_MAX_RENDER_TARGETS:
return 1;
/* Timer queries. */
/* Queries. */
case PIPE_CAP_QUERY_TIME_ELAPSED:
case PIPE_CAP_OCCLUSION_QUERY:
case PIPE_CAP_QUERY_TIMESTAMP:
return 0;
case PIPE_CAP_OCCLUSION_QUERY:
return (screen->gpu_id >= 300) ? 1: 0;
case PIPE_CAP_MIN_TEXTURE_GATHER_OFFSET:
case PIPE_CAP_MIN_TEXEL_OFFSET:
@@ -252,7 +255,7 @@ fd_screen_get_param(struct pipe_screen *pscreen, enum pipe_cap param)
case PIPE_CAP_ENDIANNESS:
return PIPE_ENDIAN_LITTLE;
case PIPE_CAP_MIN_MAP_BUFFER_ALIGNMENT:
case PIPE_CAP_MIN_MAP_BUFFER_ALIGNMENT:
return 64;
default:
@@ -315,7 +318,7 @@ fd_screen_get_shader_param(struct pipe_screen *pscreen, unsigned shader,
case PIPE_SHADER_CAP_MAX_CONTROL_FLOW_DEPTH:
return 8; /* XXX */
case PIPE_SHADER_CAP_MAX_INPUTS:
return 32;
return 16;
case PIPE_SHADER_CAP_MAX_TEMPS:
return 64; /* Max native temporaries. */
case PIPE_SHADER_CAP_MAX_ADDRS:

View File

@@ -223,11 +223,18 @@ OUT_IB(struct fd_ringbuffer *ring, struct fd_ringmarker *start,
emit_marker(ring, 6);
}
/* CP_SCRATCH_REG4 is used to hold base address for query results: */
#define HW_QUERY_BASE_REG REG_AXXX_CP_SCRATCH_REG4
static inline void
emit_marker(struct fd_ringbuffer *ring, int scratch_idx)
{
extern unsigned marker_cnt;
OUT_PKT0(ring, REG_AXXX_CP_SCRATCH_REG0 + scratch_idx, 1);
unsigned reg = REG_AXXX_CP_SCRATCH_REG0 + scratch_idx;
assert(reg != HW_QUERY_BASE_REG);
if (reg == HW_QUERY_BASE_REG)
return;
OUT_PKT0(ring, reg, 1);
OUT_RING(ring, ++marker_cnt);
}

View File

@@ -312,9 +312,15 @@ lp_rast_shade_tile(struct lp_rasterizer_task *task,
/* color buffer */
for (i = 0; i < scene->fb.nr_cbufs; i++){
stride[i] = scene->cbufs[i].stride;
color[i] = lp_rast_get_unswizzled_color_block_pointer(task, i, tile_x + x,
tile_y + y, inputs->layer);
if (scene->fb.cbufs[i]) {
stride[i] = scene->cbufs[i].stride;
color[i] = lp_rast_get_unswizzled_color_block_pointer(task, i, tile_x + x,
tile_y + y, inputs->layer);
}
else {
stride[i] = 0;
color[i] = NULL;
}
}
/* depth buffer */

View File

@@ -633,7 +633,7 @@ CodeEmitterGK110::emitISAD(const Instruction *i)
{
assert(i->dType == TYPE_S32 || i->dType == TYPE_U32);
emitForm_21(i, 0x1fc, 0xb74);
emitForm_21(i, 0x1f4, 0xb74);
if (i->dType == TYPE_S32)
code[1] |= 1 << 19;
@@ -915,6 +915,9 @@ CodeEmitterGK110::emitSET(const CmpInstruction *i)
modNegAbsF32_3b(i, 1);
}
FTZ_(3a);
if (i->dType == TYPE_F32)
code[1] |= 1 << 23;
}
if (i->sType == TYPE_S32)
code[1] |= 1 << 19;
@@ -949,7 +952,7 @@ CodeEmitterGK110::emitSLCT(const CmpInstruction *i)
FTZ_(32);
emitCondCode(cc, 0x33, 0xf);
} else {
emitForm_21(i, 0x1a4, 0xb20);
emitForm_21(i, 0x1a0, 0xb20);
emitCondCode(cc, 0x34, 0x7);
}
}
@@ -964,7 +967,7 @@ void CodeEmitterGK110::emitSELP(const Instruction *i)
void CodeEmitterGK110::emitTEXBAR(const Instruction *i)
{
code[0] = 0x00000002 | (i->subOp << 23);
code[0] = 0x0000003e | (i->subOp << 23);
code[1] = 0x77000000;
emitPredicate(i);
@@ -1201,7 +1204,7 @@ CodeEmitterGK110::emitFlow(const Instruction *i)
case OP_PRECONT: code[1] = 0x15800000; mask = 2; break;
case OP_PRERET: code[1] = 0x13800000; mask = 2; break;
case OP_QUADON: code[1] = 0x1b000000; mask = 0; break;
case OP_QUADON: code[1] = 0x1b800000; mask = 0; break;
case OP_QUADPOP: code[1] = 0x1c000000; mask = 0; break;
case OP_BRKPT: code[1] = 0x00000000; mask = 0; break;
default:
@@ -1323,7 +1326,8 @@ CodeEmitterGK110::emitOUT(const Instruction *i)
void
CodeEmitterGK110::emitInterpMode(const Instruction *i)
{
code[1] |= i->ipa << 21; // TODO: INTERP_SAMPLEID
code[1] |= (i->ipa & 0x3) << 21; // TODO: INTERP_SAMPLEID
code[1] |= (i->ipa & 0xc) << (19 - 2);
}
void

View File

@@ -2199,7 +2199,6 @@ Converter::handleInstruction(const struct tgsi_full_instruction *insn)
case TGSI_OPCODE_IMUL_HI:
case TGSI_OPCODE_UMUL_HI:
case TGSI_OPCODE_OR:
case TGSI_OPCODE_POW:
case TGSI_OPCODE_SHL:
case TGSI_OPCODE_ISHR:
case TGSI_OPCODE_USHR:
@@ -2254,6 +2253,11 @@ Converter::handleInstruction(const struct tgsi_full_instruction *insn)
FOR_EACH_DST_ENABLED_CHANNEL(0, c, tgsi)
mkOp1(OP_MOV, TYPE_U32, dst0[c], fetchSrc(0, c));
break;
case TGSI_OPCODE_POW:
val0 = mkOp2v(op, TYPE_F32, getScratch(), fetchSrc(0, 0), fetchSrc(1, 0));
FOR_EACH_DST_ENABLED_CHANNEL(0, c, tgsi)
mkOp1(OP_MOV, TYPE_F32, dst0[c], val0);
break;
case TGSI_OPCODE_EX2:
case TGSI_OPCODE_LG2:
val0 = mkOp1(op, TYPE_F32, getScratch(), fetchSrc(0, 0))->getDef(0);
@@ -2453,7 +2457,12 @@ Converter::handleInstruction(const struct tgsi_full_instruction *insn)
break;
case TGSI_OPCODE_KILL_IF:
val0 = new_LValue(func, FILE_PREDICATE);
mask = 0;
for (c = 0; c < 4; ++c) {
const int s = tgsi.getSrc(0).getSwizzle(c);
if (mask & (1 << s))
continue;
mask |= 1 << s;
mkCmp(OP_SET, CC_LT, TYPE_F32, val0, TYPE_F32, fetchSrc(0, c), zero);
mkOp(OP_DISCARD, TYPE_NONE, NULL)->setPredicate(CC_P, val0);
}

View File

@@ -37,18 +37,25 @@ namespace nv50_ir {
// ah*bl 00
//
// fffe0001 + fffe0001
//
// Note that this sort of splitting doesn't work for signed values, so we
// compute the sign on those manually and then perform an unsigned multiply.
static bool
expandIntegerMUL(BuildUtil *bld, Instruction *mul)
{
const bool highResult = mul->subOp == NV50_IR_SUBOP_MUL_HIGH;
DataType fTy = mul->sType; // full type
DataType hTy;
DataType fTy; // full type
switch (mul->sType) {
case TYPE_S32: fTy = TYPE_U32; break;
case TYPE_S64: fTy = TYPE_U64; break;
default: fTy = mul->sType; break;
}
DataType hTy; // half type
switch (fTy) {
case TYPE_S32: hTy = TYPE_S16; break;
case TYPE_U32: hTy = TYPE_U16; break;
case TYPE_U64: hTy = TYPE_U32; break;
case TYPE_S64: hTy = TYPE_S32; break;
default:
return false;
}
@@ -59,15 +66,25 @@ expandIntegerMUL(BuildUtil *bld, Instruction *mul)
bld->setPosition(mul, true);
Value *s[2];
Value *a[2], *b[2];
Value *c[2];
Value *t[4];
for (int j = 0; j < 4; ++j)
t[j] = bld->getSSA(fullSize);
s[0] = mul->getSrc(0);
s[1] = mul->getSrc(1);
if (isSignedType(mul->sType)) {
s[0] = bld->getSSA(fullSize);
s[1] = bld->getSSA(fullSize);
bld->mkOp1(OP_ABS, mul->sType, s[0], mul->getSrc(0));
bld->mkOp1(OP_ABS, mul->sType, s[1], mul->getSrc(1));
}
// split sources into halves
i[0] = bld->mkSplit(a, halfSize, mul->getSrc(0));
i[1] = bld->mkSplit(b, halfSize, mul->getSrc(1));
i[0] = bld->mkSplit(a, halfSize, s[0]);
i[1] = bld->mkSplit(b, halfSize, s[1]);
i[2] = bld->mkOp2(OP_MUL, fTy, t[0], a[0], b[1]);
i[3] = bld->mkOp3(OP_MAD, fTy, t[1], a[1], b[0], t[0]);
@@ -75,23 +92,76 @@ expandIntegerMUL(BuildUtil *bld, Instruction *mul)
i[4] = bld->mkOp3(OP_MAD, fTy, t[3], a[0], b[0], t[2]);
if (highResult) {
Value *r[3];
Value *c[2];
Value *r[5];
Value *imm = bld->loadImm(NULL, 1 << (halfSize * 8));
c[0] = bld->getSSA(1, FILE_FLAGS);
c[1] = bld->getSSA(1, FILE_FLAGS);
for (int j = 0; j < 3; ++j)
for (int j = 0; j < 5; ++j)
r[j] = bld->getSSA(fullSize);
i[8] = bld->mkOp2(OP_SHR, fTy, r[0], t[1], bld->mkImm(halfSize * 8));
i[6] = bld->mkOp2(OP_ADD, fTy, r[1], r[0], imm);
bld->mkOp2(OP_UNION, TYPE_U32, r[2], r[1], r[0]);
i[5] = bld->mkOp3(OP_MAD, fTy, mul->getDef(0), a[1], b[1], r[2]);
bld->mkMov(r[3], r[0])->setPredicate(CC_NC, c[0]);
bld->mkOp2(OP_UNION, TYPE_U32, r[2], r[1], r[3]);
i[5] = bld->mkOp3(OP_MAD, fTy, r[4], a[1], b[1], r[2]);
// set carry defs / sources
i[3]->setFlagsDef(1, c[0]);
i[4]->setFlagsDef(0, c[1]); // actual result not required, just the carry
// actual result required in negative case, but ignored for
// unsigned. for some reason the compiler ends up dropping the whole
// instruction if the destination is unused but the flags are.
if (isSignedType(mul->sType))
i[4]->setFlagsDef(1, c[1]);
else
i[4]->setFlagsDef(0, c[1]);
i[6]->setPredicate(CC_C, c[0]);
i[5]->setFlagsSrc(3, c[1]);
if (isSignedType(mul->sType)) {
Value *cc[2];
Value *rr[7];
Value *one = bld->getSSA(fullSize);
bld->loadImm(one, 1);
for (int j = 0; j < 7; j++)
rr[j] = bld->getSSA(fullSize);
// NOTE: this logic uses predicates because splitting basic blocks is
// ~impossible during the SSA phase. The RA relies on a correlation
// between edge order and phi node sources.
// Set the sign of the result based on the inputs
bld->mkOp2(OP_XOR, fTy, NULL, mul->getSrc(0), mul->getSrc(1))
->setFlagsDef(0, (cc[0] = bld->getSSA(1, FILE_FLAGS)));
// 1s complement of 64-bit value
bld->mkOp1(OP_NOT, fTy, rr[0], r[4])
->setPredicate(CC_S, cc[0]);
bld->mkOp1(OP_NOT, fTy, rr[1], t[3])
->setPredicate(CC_S, cc[0]);
// add to low 32-bits, keep track of the carry
Instruction *n = bld->mkOp2(OP_ADD, fTy, NULL, rr[1], one);
n->setPredicate(CC_S, cc[0]);
n->setFlagsDef(0, (cc[1] = bld->getSSA(1, FILE_FLAGS)));
// If there was a carry, add 1 to the upper 32 bits
// XXX: These get executed even if they shouldn't be
bld->mkOp2(OP_ADD, fTy, rr[2], rr[0], one)
->setPredicate(CC_C, cc[1]);
bld->mkMov(rr[3], rr[0])
->setPredicate(CC_NC, cc[1]);
bld->mkOp2(OP_UNION, fTy, rr[4], rr[2], rr[3]);
// Merge the results from the negative and non-negative paths
bld->mkMov(rr[5], rr[4])
->setPredicate(CC_S, cc[0]);
bld->mkMov(rr[6], r[4])
->setPredicate(CC_NS, cc[0]);
bld->mkOp2(OP_UNION, mul->sType, mul->getDef(0), rr[5], rr[6]);
} else {
bld->mkMov(mul->getDef(0), r[4]);
}
} else {
bld->mkMov(mul->getDef(0), t[3]);
}
@@ -591,6 +661,10 @@ void NV50LoweringPreSSA::loadTexMsInfo(uint32_t off, Value **ms,
Value *tmp = new_LValue(func, FILE_GPR);
uint8_t b = prog->driver->io.resInfoCBSlot;
off += prog->driver->io.suInfoBase;
if (prog->getType() > Program::TYPE_VERTEX)
off += 16 * 2 * 4;
if (prog->getType() > Program::TYPE_GEOMETRY)
off += 16 * 2 * 4;
*ms_x = bld.mkLoadv(TYPE_U32, bld.mkSymbol(
FILE_MEMORY_CONST, b, TYPE_U32, off + 0), NULL);
*ms_y = bld.mkLoadv(TYPE_U32, bld.mkSymbol(
@@ -1205,8 +1279,11 @@ NV50LoweringPreSSA::checkPredicate(Instruction *insn)
Value *pred = insn->getPredicate();
Value *cdst;
if (!pred || pred->reg.file == FILE_FLAGS)
// FILE_PREDICATE will simply be changed to FLAGS on conversion to SSA
if (!pred ||
pred->reg.file == FILE_FLAGS || pred->reg.file == FILE_PREDICATE)
return;
cdst = bld.getSSA(1, FILE_FLAGS);
bld.mkCmp(OP_SET, CC_NEU, insn->dType, cdst, insn->dType, bld.loadImm(NULL, 0), pred);

View File

@@ -187,7 +187,8 @@ LoadPropagation::checkSwapSrc01(Instruction *insn)
return;
}
if (insn->op == OP_SET)
if (insn->op == OP_SET || insn->op == OP_SET_AND ||
insn->op == OP_SET_OR || insn->op == OP_SET_XOR)
insn->asCmp()->setCond = reverseCondCode(insn->asCmp()->setCond);
else
if (insn->op == OP_SLCT)
@@ -424,7 +425,17 @@ ConstantFolding::expr(Instruction *i,
case TYPE_F32: res.data.f32 = a->data.f32 * b->data.f32; break;
case TYPE_F64: res.data.f64 = a->data.f64 * b->data.f64; break;
case TYPE_S32:
case TYPE_U32: res.data.u32 = a->data.u32 * b->data.u32; break;
if (i->subOp == NV50_IR_SUBOP_MUL_HIGH) {
res.data.s32 = ((int64_t)a->data.s32 * b->data.s32) >> 32;
break;
}
/* fallthrough */
case TYPE_U32:
if (i->subOp == NV50_IR_SUBOP_MUL_HIGH) {
res.data.u32 = ((uint64_t)a->data.u32 * b->data.u32) >> 32;
break;
}
res.data.u32 = a->data.u32 * b->data.u32; break;
default:
return;
}
@@ -550,8 +561,9 @@ ConstantFolding::expr(Instruction *i,
if (i->src(0).getImmediate(src0))
expr(i, src0, *i->getSrc(1)->asImm());
} else {
i->op = OP_MOV;
i->op = i->saturate ? OP_SAT : OP_MOV; /* SAT handled by unary() */
}
i->subOp = 0;
}
void
@@ -601,6 +613,7 @@ ConstantFolding::unary(Instruction *i, const ImmediateValue &imm)
switch (i->op) {
case OP_NEG: res.data.f32 = -imm.reg.data.f32; break;
case OP_ABS: res.data.f32 = fabsf(imm.reg.data.f32); break;
case OP_SAT: res.data.f32 = CLAMP(imm.reg.data.f32, 0.0f, 1.0f); break;
case OP_RCP: res.data.f32 = 1.0f / imm.reg.data.f32; break;
case OP_RSQ: res.data.f32 = 1.0f / sqrtf(imm.reg.data.f32); break;
case OP_LG2: res.data.f32 = log2f(imm.reg.data.f32); break;
@@ -690,12 +703,41 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue &imm0, int s)
{
const int t = !s;
const operation op = i->op;
Instruction *newi = i;
switch (i->op) {
case OP_MUL:
if (i->dType == TYPE_F32)
tryCollapseChainedMULs(i, s, imm0);
if (i->subOp == NV50_IR_SUBOP_MUL_HIGH) {
assert(!isFloatType(i->sType));
if (imm0.isInteger(1) && i->dType == TYPE_S32) {
bld.setPosition(i, false);
// Need to set to the sign value, which is a compare.
newi = bld.mkCmp(OP_SET, CC_LT, TYPE_S32, i->getDef(0),
TYPE_S32, i->getSrc(t), bld.mkImm(0));
delete_Instruction(prog, i);
} else if (imm0.isInteger(0) || imm0.isInteger(1)) {
// The high bits can't be set in this case (either mul by 0 or
// unsigned by 1)
i->op = OP_MOV;
i->subOp = 0;
i->setSrc(0, new_ImmediateValue(prog, 0u));
i->src(0).mod = Modifier(0);
i->setSrc(1, NULL);
} else if (!imm0.isNegative() && imm0.isPow2()) {
// Translate into a shift
imm0.applyLog2();
i->op = OP_SHR;
i->subOp = 0;
imm0.reg.data.u32 = 32 - imm0.reg.data.u32;
i->setSrc(0, i->getSrc(t));
i->src(0).mod = i->src(t).mod;
i->setSrc(1, new_ImmediateValue(prog, imm0.reg.data.u32));
i->src(1).mod = 0;
}
} else
if (imm0.isInteger(0)) {
i->op = OP_MOV;
i->setSrc(0, new_ImmediateValue(prog, 0u));
@@ -786,7 +828,7 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue &imm0, int s)
else
tA = tB;
tB = s ? bld.getSSA() : i->getDef(0);
bld.mkOp2(OP_ADD, TYPE_U32, tB, mul->getDef(0), tA);
newi = bld.mkOp2(OP_ADD, TYPE_U32, tB, mul->getDef(0), tA);
if (s)
bld.mkOp2(OP_SHR, TYPE_U32, i->getDef(0), tB, bld.mkImm(s));
@@ -818,7 +860,7 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue &imm0, int s)
tA = bld.getSSA();
bld.mkCmp(OP_SET, CC_LT, TYPE_S32, tA, TYPE_S32, i->getSrc(0), bld.mkImm(0));
tD = (d < 0) ? bld.getSSA() : i->getDef(0)->asLValue();
bld.mkOp2(OP_SUB, TYPE_U32, tD, tB, tA);
newi = bld.mkOp2(OP_SUB, TYPE_U32, tD, tB, tA);
if (d < 0)
bld.mkOp1(OP_NEG, TYPE_S32, i->getDef(0), tB);
@@ -882,6 +924,7 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue &imm0, int s)
case OP_ABS:
case OP_NEG:
case OP_SAT:
case OP_LG2:
case OP_RCP:
case OP_SQRT:
@@ -896,7 +939,7 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue &imm0, int s)
default:
return;
}
if (i->op != op)
if (newi->op != op)
foldCount++;
}

View File

@@ -998,7 +998,9 @@ GCRA::doCoalesce(ArrayList& insns, unsigned int mask)
case OP_TXQ:
case OP_TXD:
case OP_TXG:
case OP_TXLQ:
case OP_TEXCSAA:
case OP_TEXPREP:
if (!(mask & JOIN_MASK_TEX))
break;
for (c = 0; insn->srcExists(c) && c != insn->predSrc; ++c)

View File

@@ -331,6 +331,8 @@ TargetNV50::insnCanLoad(const Instruction *i, int s,
return false;
if (sf == FILE_IMMEDIATE)
return false;
if (i->subOp == NV50_IR_SUBOP_MUL_HIGH && sf == FILE_MEMORY_CONST)
return false;
ldSize = 2;
} else {
ldSize = typeSizeof(ld->dType);

View File

@@ -122,12 +122,9 @@ nv50_destroy(struct pipe_context *pipe)
{
struct nv50_context *nv50 = nv50_context(pipe);
if (nv50_context_screen(nv50)->cur_ctx == nv50) {
nv50->base.pushbuf->kick_notify = NULL;
if (nv50_context_screen(nv50)->cur_ctx == nv50)
nv50_context_screen(nv50)->cur_ctx = NULL;
nouveau_pushbuf_bufctx(nv50->base.pushbuf, NULL);
}
/* need to flush before destroying the bufctx */
nouveau_pushbuf_bufctx(nv50->base.pushbuf, NULL);
nouveau_pushbuf_kick(nv50->base.pushbuf, nv50->base.pushbuf->channel);
nv50_context_unreference_resources(nv50);

View File

@@ -78,16 +78,16 @@
/* 8 user clip planes, at 4 32-bit floats each */
#define NV50_CB_AUX_UCP_OFFSET 0x0000
#define NV50_CB_AUX_UCP_SIZE (8 * 4 * 4)
/* 256 textures, each with ms_x, ms_y u32 pairs */
/* 16 textures * 3 shaders, each with ms_x, ms_y u32 pairs */
#define NV50_CB_AUX_TEX_MS_OFFSET 0x0080
#define NV50_CB_AUX_TEX_MS_SIZE (256 * 2 * 4)
#define NV50_CB_AUX_TEX_MS_SIZE (16 * 3 * 2 * 4)
/* For each MS level (4), 8 sets of 32-bit integer pairs sample offsets */
#define NV50_CB_AUX_MS_OFFSET 0x880
#define NV50_CB_AUX_MS_OFFSET 0x200
#define NV50_CB_AUX_MS_SIZE (4 * 8 * 4 * 2)
/* Sample position pairs for the current output MS level */
#define NV50_CB_AUX_SAMPLE_OFFSET 0x980
#define NV50_CB_AUX_SAMPLE_OFFSET 0x300
#define NV50_CB_AUX_SAMPLE_OFFSET_SIZE (4 * 8 * 2)
/* next spot: 0x9c0 */
/* next spot: 0x340 */
/* 4 32-bit floats for the vertex runout, put at the end */
#define NV50_CB_AUX_RUNOUT_OFFSET (NV50_CB_AUX_SIZE - 0x10)

View File

@@ -332,7 +332,7 @@ nv50_render_condition(struct pipe_context *pipe,
nv50->cond_cond = condition;
nv50->cond_mode = mode;
PUSH_SPACE(push, 6);
PUSH_SPACE(push, 9);
if (!pq) {
BEGIN_NV04(push, NV50_3D(COND_MODE), 1);
@@ -351,6 +351,10 @@ nv50_render_condition(struct pipe_context *pipe,
PUSH_DATAh(push, q->bo->offset + q->offset);
PUSH_DATA (push, q->bo->offset + q->offset);
PUSH_DATA (push, NV50_3D_COND_MODE_RES_NON_ZERO);
BEGIN_NV04(push, NV50_2D(COND_ADDRESS_HIGH), 2);
PUSH_DATAh(push, q->bo->offset + q->offset);
PUSH_DATA (push, q->bo->offset + q->offset);
}
void

View File

@@ -397,6 +397,8 @@ nv50_screen_init_hwctx(struct nv50_screen *screen)
PUSH_DATA (push, 0);
BEGIN_NV04(push, SUBC_2D(0x0888), 1);
PUSH_DATA (push, 1);
BEGIN_NV04(push, NV50_2D(COND_MODE), 1);
PUSH_DATA (push, NV50_2D_COND_MODE_ALWAYS);
BEGIN_NV04(push, SUBC_3D(NV01_SUBCHAN_OBJECT), 1);
PUSH_DATA (push, screen->tesla->handle);

View File

@@ -400,6 +400,10 @@ nv50_switch_pipe_context(struct nv50_context *ctx_to)
ctx_to->viewports_dirty = ~0;
ctx_to->scissors_dirty = ~0;
ctx_to->constbuf_dirty[0] =
ctx_to->constbuf_dirty[1] =
ctx_to->constbuf_dirty[2] = (1 << NV50_MAX_PIPE_CONSTBUFS) - 1;
if (!ctx_to->vertex)
ctx_to->dirty &= ~(NV50_NEW_VERTEX | NV50_NEW_ARRAYS);

View File

@@ -288,6 +288,14 @@ nv50_clear_render_target(struct pipe_context *pipe,
PUSH_REFN(push, bo, mt->base.domain | NOUVEAU_BO_WR);
BEGIN_NV04(push, NV50_3D(SCREEN_SCISSOR_HORIZ), 2);
PUSH_DATA (push, ( width << 16) | dstx);
PUSH_DATA (push, (height << 16) | dsty);
BEGIN_NV04(push, NV50_3D(SCISSOR_HORIZ(0)), 2);
PUSH_DATA (push, 8192 << 16);
PUSH_DATA (push, 8192 << 16);
nv50->scissors_dirty |= 1;
BEGIN_NV04(push, NV50_3D(RT_CONTROL), 1);
PUSH_DATA (push, 1);
BEGIN_NV04(push, NV50_3D(RT_ADDRESS_HIGH(0)), 5);
@@ -325,7 +333,7 @@ nv50_clear_render_target(struct pipe_context *pipe,
(z << NV50_3D_CLEAR_BUFFERS_LAYER__SHIFT));
}
nv50->dirty |= NV50_NEW_FRAMEBUFFER;
nv50->dirty |= NV50_NEW_FRAMEBUFFER | NV50_NEW_SCISSOR;
}
static void
@@ -364,6 +372,14 @@ nv50_clear_depth_stencil(struct pipe_context *pipe,
PUSH_REFN(push, bo, mt->base.domain | NOUVEAU_BO_WR);
BEGIN_NV04(push, NV50_3D(SCREEN_SCISSOR_HORIZ), 2);
PUSH_DATA (push, ( width << 16) | dstx);
PUSH_DATA (push, (height << 16) | dsty);
BEGIN_NV04(push, NV50_3D(SCISSOR_HORIZ(0)), 2);
PUSH_DATA (push, 8192 << 16);
PUSH_DATA (push, 8192 << 16);
nv50->scissors_dirty |= 1;
BEGIN_NV04(push, NV50_3D(ZETA_ADDRESS_HIGH), 5);
PUSH_DATAh(push, bo->offset + sf->offset);
PUSH_DATA (push, bo->offset + sf->offset);
@@ -390,7 +406,7 @@ nv50_clear_depth_stencil(struct pipe_context *pipe,
(z << NV50_3D_CLEAR_BUFFERS_LAYER__SHIFT));
}
nv50->dirty |= NV50_NEW_FRAMEBUFFER;
nv50->dirty |= NV50_NEW_FRAMEBUFFER | NV50_NEW_SCISSOR;
}
void
@@ -611,6 +627,7 @@ struct nv50_blitctx
uint8_t mode;
uint16_t color_mask;
uint8_t filter;
uint8_t render_condition_enable;
enum pipe_texture_target target;
struct {
struct pipe_framebuffer_state fb;
@@ -697,6 +714,12 @@ nv50_blitter_make_fp(struct pipe_context *pipe,
tc = ureg_DECL_fs_input(
ureg, TGSI_SEMANTIC_GENERIC, 0, TGSI_INTERPOLATE_LINEAR);
if (ptarg == PIPE_TEXTURE_1D_ARRAY) {
/* Adjust coordinates. Depth is in z, but TEX expects it to be in y. */
tc = ureg_swizzle(tc, TGSI_SWIZZLE_X, TGSI_SWIZZLE_Z,
TGSI_SWIZZLE_Z, TGSI_SWIZZLE_Z);
}
data = ureg_DECL_temporary(ureg);
if (tex_s) {
@@ -933,7 +956,7 @@ nv50_blitctx_prepare_state(struct nv50_blitctx *blit)
{
struct nouveau_pushbuf *push = blit->nv50->base.pushbuf;
if (blit->nv50->cond_query) {
if (blit->nv50->cond_query && !blit->render_condition_enable) {
BEGIN_NV04(push, NV50_3D(COND_MODE), 1);
PUSH_DATA (push, NV50_3D_COND_MODE_ALWAYS);
}
@@ -1071,7 +1094,7 @@ nv50_blitctx_post_blit(struct nv50_blitctx *blit)
nv50->samplers[2][0] = blit->saved.sampler[0];
nv50->samplers[2][1] = blit->saved.sampler[1];
if (nv50->cond_query)
if (nv50->cond_query && !blit->render_condition_enable)
nv50->base.pipe.render_condition(&nv50->base.pipe, nv50->cond_query,
nv50->cond_cond, nv50->cond_mode);
@@ -1105,6 +1128,7 @@ nv50_blit_3d(struct nv50_context *nv50, const struct pipe_blit_info *info)
blit->mode = nv50_blit_select_mode(info);
blit->color_mask = nv50_blit_derive_color_mask(info);
blit->filter = nv50_blit_get_filter(info);
blit->render_condition_enable = info->render_condition_enable;
nv50_blit_select_fp(blit, info);
nv50_blitctx_pre_blit(blit);
@@ -1134,6 +1158,12 @@ nv50_blit_3d(struct nv50_context *nv50, const struct pipe_blit_info *info)
y0 *= (float)(1 << nv50_miptree(src)->ms_y);
y1 *= (float)(1 << nv50_miptree(src)->ms_y);
/* XXX: multiply by 6 for cube arrays ? */
dz = (float)info->src.box.depth / (float)info->dst.box.depth;
z = (float)info->src.box.z;
if (nv50_miptree(src)->layout_3d)
z += 0.5f * dz;
if (src->last_level > 0) {
/* If there are mip maps, GPU always assumes normalized coordinates. */
const unsigned l = info->src.level;
@@ -1143,14 +1173,12 @@ nv50_blit_3d(struct nv50_context *nv50, const struct pipe_blit_info *info)
x1 /= fh;
y0 /= fv;
y1 /= fv;
if (nv50_miptree(src)->layout_3d) {
z /= u_minify(src->depth0, l);
dz /= u_minify(src->depth0, l);
}
}
/* XXX: multiply by 6 for cube arrays ? */
dz = (float)info->src.box.depth / (float)info->dst.box.depth;
z = (float)info->src.box.z;
if (nv50_miptree(src)->layout_3d)
z += 0.5f * dz;
BEGIN_NV04(push, NV50_3D(VIEWPORT_TRANSFORM_EN), 1);
PUSH_DATA (push, 0);
BEGIN_NV04(push, NV50_3D(VIEW_VOLUME_CLIP_CTRL), 1);
@@ -1262,6 +1290,11 @@ nv50_blit_eng2d(struct nv50_context *nv50, const struct pipe_blit_info *info)
PUSH_DATA (push, 1); /* enable */
}
if (nv50->cond_query && info->render_condition_enable) {
BEGIN_NV04(push, NV50_2D(COND_MODE), 1);
PUSH_DATA (push, NV50_2D_COND_MODE_RES_NON_ZERO);
}
if (mask != 0xffffffff) {
BEGIN_NV04(push, NV50_2D(ROP), 1);
PUSH_DATA (push, 0xca); /* DPSDxax */
@@ -1384,6 +1417,10 @@ nv50_blit_eng2d(struct nv50_context *nv50, const struct pipe_blit_info *info)
BEGIN_NV04(push, NV50_2D(OPERATION), 1);
PUSH_DATA (push, NV50_2D_OPERATION_SRCCOPY);
}
if (nv50->cond_query && info->render_condition_enable) {
BEGIN_NV04(push, NV50_2D(COND_MODE), 1);
PUSH_DATA (push, NV50_2D_COND_MODE_ALWAYS);
}
}
static void

View File

@@ -286,7 +286,7 @@ nv50_validate_tic(struct nv50_context *nv50, int s)
}
if (nv50->num_textures[s]) {
BEGIN_NV04(push, NV50_3D(CB_ADDR), 1);
PUSH_DATA (push, (NV50_CB_AUX_TEX_MS_OFFSET << (8 - 2)) | NV50_CB_AUX);
PUSH_DATA (push, ((NV50_CB_AUX_TEX_MS_OFFSET + 16 * s * 2 * 4) << (8 - 2)) | NV50_CB_AUX);
BEGIN_NI04(push, NV50_3D(CB_DATA(0)), nv50->num_textures[s] * 2);
for (i = 0; i < nv50->num_textures[s]; i++) {
struct nv50_tic_entry *tic = nv50_tic_entry(nv50->textures[s][i]);

View File

@@ -123,11 +123,12 @@ nvc0_destroy(struct pipe_context *pipe)
{
struct nvc0_context *nvc0 = nvc0_context(pipe);
if (nvc0->screen->cur_ctx == nvc0) {
nvc0->base.pushbuf->kick_notify = NULL;
if (nvc0->screen->cur_ctx == nvc0)
nvc0->screen->cur_ctx = NULL;
nouveau_pushbuf_bufctx(nvc0->base.pushbuf, NULL);
}
/* Unset bufctx, we don't want to revalidate any resources after the flush.
* Other contexts will always set their bufctx again on action calls.
*/
nouveau_pushbuf_bufctx(nvc0->base.pushbuf, NULL);
nouveau_pushbuf_kick(nvc0->base.pushbuf, nvc0->base.pushbuf->channel);
nvc0_context_unreference_resources(nvc0);

View File

@@ -133,17 +133,12 @@ static int
nvc0_fp_assign_output_slots(struct nv50_ir_prog_info *info)
{
unsigned count = info->prop.fp.numColourResults * 4;
unsigned i, c, ci;
unsigned i, c;
for (i = 0, ci = 0; i < info->numOutputs; ++i) {
if (info->out[i].sn == TGSI_SEMANTIC_COLOR) {
for (i = 0; i < info->numOutputs; ++i)
if (info->out[i].sn == TGSI_SEMANTIC_COLOR)
for (c = 0; c < 4; ++c)
info->out[i].slot[c] = ci * 4 + c;
ci++;
}
}
assert(ci == info->prop.fp.numColourResults);
info->out[i].slot[c] = info->out[i].si * 4 + c;
if (info->io.sampleMask < PIPE_MAX_SHADER_OUTPUTS)
info->out[info->io.sampleMask].slot[0] = count++;

View File

@@ -585,12 +585,15 @@ nvc0_render_condition(struct pipe_context *pipe,
if (wait)
nvc0_query_fifo_wait(push, pq);
PUSH_SPACE(push, 4);
PUSH_SPACE(push, 7);
PUSH_REFN (push, q->bo, NOUVEAU_BO_GART | NOUVEAU_BO_RD);
BEGIN_NVC0(push, NVC0_3D(COND_ADDRESS_HIGH), 3);
PUSH_DATAh(push, q->bo->offset + q->offset);
PUSH_DATA (push, q->bo->offset + q->offset);
PUSH_DATA (push, cond);
BEGIN_NVC0(push, NVC0_2D(COND_ADDRESS_HIGH), 2);
PUSH_DATAh(push, q->bo->offset + q->offset);
PUSH_DATA (push, q->bo->offset + q->offset);
}
void

View File

@@ -171,7 +171,7 @@ nvc0_screen_get_param(struct pipe_screen *pscreen, enum pipe_cap param)
case PIPE_CAP_VERTEX_ELEMENT_SRC_OFFSET_4BYTE_ALIGNED_ONLY:
return 0;
case PIPE_CAP_COMPUTE:
return (class_3d >= NVE4_3D_CLASS) ? 1 : 0;
return (class_3d == NVE4_3D_CLASS) ? 1 : 0;
case PIPE_CAP_PREFER_BLIT_BASED_TEXTURE_TRANSFER:
return 1;
case PIPE_CAP_TEXTURE_BORDER_COLOR_QUIRK:
@@ -211,7 +211,7 @@ nvc0_screen_get_shader_param(struct pipe_screen *pscreen, unsigned shader,
case PIPE_SHADER_FRAGMENT:
break;
case PIPE_SHADER_COMPUTE:
if (class_3d < NVE4_3D_CLASS)
if (class_3d != NVE4_3D_CLASS)
return 0;
break;
default:
@@ -514,9 +514,10 @@ nvc0_screen_init_compute(struct nvc0_screen *screen)
return nvc0_screen_compute_setup(screen, screen->base.pushbuf);
return 0;
case 0xe0:
return nve4_screen_compute_setup(screen, screen->base.pushbuf);
case 0xf0:
case 0x100:
return nve4_screen_compute_setup(screen, screen->base.pushbuf);
return 0;
default:
return -1;
}
@@ -676,6 +677,8 @@ nvc0_screen_create(struct nouveau_device *dev)
PUSH_DATA (push, 0x3f);
BEGIN_NVC0(push, SUBC_2D(0x0888), 1);
PUSH_DATA (push, 1);
BEGIN_NVC0(push, NVC0_2D(COND_MODE), 1);
PUSH_DATA (push, NVC0_2D_COND_MODE_ALWAYS);
BEGIN_NVC0(push, SUBC_2D(NVC0_GRAPH_NOTIFY_ADDRESS_HIGH), 2);
PUSH_DATAh(push, screen->fence.bo->offset + 16);

View File

@@ -531,6 +531,7 @@ nvc0_switch_pipe_context(struct nvc0_context *ctx_to)
for (s = 0; s < 5; ++s) {
ctx_to->samplers_dirty[s] = ~0;
ctx_to->textures_dirty[s] = ~0;
ctx_to->constbuf_dirty[s] = (1 << NVC0_MAX_PIPE_CONSTBUFS) - 1;
}
if (!ctx_to->vertex)

View File

@@ -503,6 +503,7 @@ struct nvc0_blitctx
uint8_t mode;
uint16_t color_mask;
uint8_t filter;
uint8_t render_condition_enable;
enum pipe_texture_target target;
struct {
struct pipe_framebuffer_state fb;
@@ -542,9 +543,22 @@ nvc0_blitter_make_vp(struct nvc0_blitter *blit)
0x03f01c46, 0x0a7e0080, /* export b96 o[0x80] $r0:$r1:$r2 */
0x00001de7, 0x80000000, /* exit */
};
static const uint32_t code_gk110[] =
{
0x00000000, 0x08000000, /* sched */
0x401ffc12, 0x7ec7fc00, /* ld b64 $r4d a[0x80] 0x0 0x0 */
0x481ffc02, 0x7ecbfc00, /* ld b96 $r0t a[0x90] 0x0 0x0 */
0x381ffc12, 0x7f07fc00, /* st b64 a[0x70] $r4d 0x0 0x0 */
0x401ffc02, 0x7f0bfc00, /* st b96 a[0x80] $r0t 0x0 0x0 */
0x001c003c, 0x18000000, /* exit */
};
blit->vp.type = PIPE_SHADER_VERTEX;
blit->vp.translated = TRUE;
if (blit->screen->base.class_3d >= NVF0_3D_CLASS) {
blit->vp.code = (uint32_t *)code_gk110; /* const_cast */
blit->vp.code_size = sizeof(code_gk110);
} else
if (blit->screen->base.class_3d >= NVE4_3D_CLASS) {
blit->vp.code = (uint32_t *)code_nve4; /* const_cast */
blit->vp.code_size = sizeof(code_nve4);
@@ -691,7 +705,7 @@ nvc0_blitctx_prepare_state(struct nvc0_blitctx *blit)
/* TODO: maybe make this a MACRO (if we need more logic) ? */
if (blit->nvc0->cond_query)
if (blit->nvc0->cond_query && !blit->render_condition_enable)
IMMED_NVC0(push, NVC0_3D(COND_MODE), NVC0_3D_COND_MODE_ALWAYS);
/* blend state */
@@ -833,7 +847,7 @@ nvc0_blitctx_post_blit(struct nvc0_blitctx *blit)
nvc0->textures_dirty[4] |= 3;
nvc0->samplers_dirty[4] |= 3;
if (nvc0->cond_query)
if (nvc0->cond_query && !blit->render_condition_enable)
nvc0->base.pipe.render_condition(&nvc0->base.pipe, nvc0->cond_query,
nvc0->cond_cond, nvc0->cond_mode);
@@ -868,6 +882,7 @@ nvc0_blit_3d(struct nvc0_context *nvc0, const struct pipe_blit_info *info)
blit->mode = nv50_blit_select_mode(info);
blit->color_mask = nv50_blit_derive_color_mask(info);
blit->filter = nv50_blit_get_filter(info);
blit->render_condition_enable = info->render_condition_enable;
nvc0_blit_select_fp(blit, info);
nvc0_blitctx_pre_blit(blit);
@@ -894,6 +909,11 @@ nvc0_blit_3d(struct nvc0_context *nvc0, const struct pipe_blit_info *info)
y0 *= (float)(1 << nv50_miptree(src)->ms_y);
y1 *= (float)(1 << nv50_miptree(src)->ms_y);
dz = (float)info->src.box.depth / (float)info->dst.box.depth;
z = (float)info->src.box.z;
if (nv50_miptree(src)->layout_3d)
z += 0.5f * dz;
if (src->last_level > 0) {
/* If there are mip maps, GPU always assumes normalized coordinates. */
const unsigned l = info->src.level;
@@ -903,13 +923,12 @@ nvc0_blit_3d(struct nvc0_context *nvc0, const struct pipe_blit_info *info)
x1 /= fh;
y0 /= fv;
y1 /= fv;
if (nv50_miptree(src)->layout_3d) {
z /= u_minify(src->depth0, l);
dz /= u_minify(src->depth0, l);
}
}
dz = (float)info->src.box.depth / (float)info->dst.box.depth;
z = (float)info->src.box.z;
if (nv50_miptree(src)->layout_3d)
z += 0.5f * dz;
IMMED_NVC0(push, NVC0_3D(VIEWPORT_TRANSFORM_EN), 0);
IMMED_NVC0(push, NVC0_3D(VIEW_VOLUME_CLIP_CTRL), 0x2 |
NVC0_3D_VIEW_VOLUME_CLIP_CTRL_DEPTH_RANGE_0_1);
@@ -1030,6 +1049,9 @@ nvc0_blit_eng2d(struct nvc0_context *nvc0, const struct pipe_blit_info *info)
PUSH_DATA (push, 1); /* enable */
}
if (nvc0->cond_query && info->render_condition_enable)
IMMED_NVC0(push, NVC0_2D(COND_MODE), NVC0_2D_COND_MODE_RES_NON_ZERO);
if (mask != 0xffffffff) {
IMMED_NVC0(push, NVC0_2D(ROP), 0xca); /* DPSDxax */
IMMED_NVC0(push, NVC0_2D(PATTERN_COLOR_FORMAT),
@@ -1154,6 +1176,8 @@ nvc0_blit_eng2d(struct nvc0_context *nvc0, const struct pipe_blit_info *info)
IMMED_NVC0(push, NVC0_2D(CLIP_ENABLE), 0);
if (mask != 0xffffffff)
IMMED_NVC0(push, NVC0_2D(OPERATION), NVC0_2D_OPERATION_SRCCOPY);
if (nvc0->cond_query && info->render_condition_enable)
IMMED_NVC0(push, NVC0_2D(COND_MODE), NVC0_2D_COND_MODE_ALWAYS);
}
static void

View File

@@ -789,7 +789,8 @@ static bool do_hardware_msaa_resolve(struct pipe_context *ctx,
info->src.box.width == dst_width &&
info->src.box.height == dst_height &&
info->src.box.depth == 1 &&
dst->surface.level[info->dst.level].mode >= RADEON_SURF_MODE_1D) {
dst->surface.level[info->dst.level].mode >= RADEON_SURF_MODE_1D &&
(!dst->cmask.size || !dst->dirty_level_mask) /* dst cannot be fast-cleared */) {
r600_blitter_begin(ctx, R600_COLOR_RESOLVE);
util_blitter_custom_resolve_color(rctx->blitter,
info->dst.resource, info->dst.level,

View File

@@ -829,15 +829,6 @@ static INLINE uint32_t S_FIXED(float value, uint32_t frac_bits)
}
#define ALIGN_DIVUP(x, y) (((x) + (y) - 1) / (y))
static inline unsigned r600_tex_aniso_filter(unsigned filter)
{
if (filter <= 1) return 0;
if (filter <= 2) return 1;
if (filter <= 4) return 2;
if (filter <= 8) return 3;
/* else */ return 4;
}
/* 12.4 fixed-point */
static INLINE unsigned r600_pack_float_12p4(float x)
{

View File

@@ -489,6 +489,15 @@ r600_resource_reference(struct r600_resource **ptr, struct r600_resource *res)
(struct pipe_resource *)res);
}
static inline unsigned r600_tex_aniso_filter(unsigned filter)
{
if (filter <= 1) return 0;
if (filter <= 2) return 1;
if (filter <= 4) return 2;
if (filter <= 8) return 3;
/* else */ return 4;
}
#define R600_ERR(fmt, args...) \
fprintf(stderr, "EE %s:%d %s - "fmt, __FILE__, __LINE__, __func__, ##args)

View File

@@ -1235,6 +1235,9 @@ void evergreen_do_fast_color_clear(struct r600_common_context *rctx,
{
int i;
if (rctx->current_render_cond)
return;
for (i = 0; i < fb->nr_cbufs; i++) {
struct r600_texture *tex;
unsigned clear_bit = PIPE_CLEAR_COLOR0 << i;

View File

@@ -689,8 +689,10 @@ static bool do_hardware_msaa_resolve(struct pipe_context *ctx,
info->src.box.height == dst_height &&
info->src.box.depth == 1 &&
dst->surface.level[info->dst.level].mode >= RADEON_SURF_MODE_1D &&
!(dst->surface.flags & RADEON_SURF_SCANOUT)) {
!(dst->surface.flags & RADEON_SURF_SCANOUT) &&
(!dst->cmask.size || !dst->dirty_level_mask) /* dst cannot be fast-cleared */) {
si_blitter_begin(ctx, SI_COLOR_RESOLVE);
t-cleared
util_blitter_custom_resolve_color(sctx->blitter,
info->dst.resource, info->dst.level,
info->dst.box.z,

View File

@@ -152,7 +152,7 @@ static void si_update_descriptors(struct si_context *sctx,
7 + /* copy */
(4 + desc->element_dw_size) * util_bitcount(desc->dirty_mask) + /* update */
4; /* pointer update */
#if HAVE_LLVM >= 0x0305
#if LLVM_SUPPORTS_GEOM_SHADERS
if (desc->shader_userdata_reg >= R_00B130_SPI_SHADER_USER_DATA_VS_0 &&
desc->shader_userdata_reg < R_00B230_SPI_SHADER_USER_DATA_GS_0)
desc->atom.num_dw += 4; /* second pointer update */
@@ -177,7 +177,7 @@ static void si_emit_shader_pointer(struct si_context *sctx,
radeon_emit(cs, va);
radeon_emit(cs, va >> 32);
#if HAVE_LLVM >= 0x0305
#if LLVM_SUPPORTS_GEOM_SHADERS
if (desc->shader_userdata_reg >= R_00B130_SPI_SHADER_USER_DATA_VS_0 &&
desc->shader_userdata_reg < R_00B230_SPI_SHADER_USER_DATA_GS_0) {
radeon_emit(cs, PKT3(PKT3_SET_SH_REG, 2, 0));

View File

@@ -224,7 +224,7 @@ static int si_get_param(struct pipe_screen* pscreen, enum pipe_cap param)
return 4;
case PIPE_CAP_GLSL_FEATURE_LEVEL:
return HAVE_LLVM >= 0x0305 ? 330 : 140;
return (LLVM_SUPPORTS_GEOM_SHADERS) ? 330 : 140;
case PIPE_CAP_MAX_TEXTURE_BUFFER_SIZE:
return MIN2(sscreen->b.info.vram_size, 0xFFFFFFFF);
@@ -308,7 +308,7 @@ static int si_get_shader_param(struct pipe_screen* pscreen, unsigned shader, enu
case PIPE_SHADER_VERTEX:
break;
case PIPE_SHADER_GEOMETRY:
#if HAVE_LLVM < 0x0305
#if !(LLVM_SUPPORTS_GEOM_SHADERS)
return 0;
#endif
break;

View File

@@ -39,6 +39,10 @@
#define SI_MAX_DRAW_CS_DWORDS 18
#define LLVM_SUPPORTS_GEOM_SHADERS \
((HAVE_LLVM >= 0x0305) || \
(HAVE_LLVM == 0x0304 && LLVM_VERSION_PATCH >= 1))
struct si_pipe_compute;
struct si_screen {

View File

@@ -2173,7 +2173,7 @@ static void *si_create_fs_state(struct pipe_context *ctx,
return si_create_shader_state(ctx, state, PIPE_SHADER_FRAGMENT);
}
#if HAVE_LLVM >= 0x0305
#if LLVM_SUPPORTS_GEOM_SHADERS
static void *si_create_gs_state(struct pipe_context *ctx,
const struct pipe_shader_state *state)
@@ -2203,7 +2203,7 @@ static void si_bind_vs_shader(struct pipe_context *ctx, void *state)
sctx->vs_shader = sel;
}
#if HAVE_LLVM >= 0x0305
#if LLVM_SUPPORTS_GEOM_SHADERS
static void si_bind_gs_shader(struct pipe_context *ctx, void *state)
{
@@ -2271,7 +2271,7 @@ static void si_delete_vs_shader(struct pipe_context *ctx, void *state)
si_delete_shader_selector(ctx, sel);
}
#if HAVE_LLVM >= 0x0305
#if LLVM_SUPPORTS_GEOM_SHADERS
static void si_delete_gs_shader(struct pipe_context *ctx, void *state)
{
@@ -2599,16 +2599,15 @@ static void *si_create_sampler_state(struct pipe_context *ctx,
rstate->val[0] = (S_008F30_CLAMP_X(si_tex_wrap(state->wrap_s)) |
S_008F30_CLAMP_Y(si_tex_wrap(state->wrap_t)) |
S_008F30_CLAMP_Z(si_tex_wrap(state->wrap_r)) |
(state->max_anisotropy & 0x7) << 9 | /* XXX */
r600_tex_aniso_filter(state->max_anisotropy) << 9 |
S_008F30_DEPTH_COMPARE_FUNC(si_tex_compare(state->compare_func)) |
S_008F30_FORCE_UNNORMALIZED(!state->normalized_coords) |
aniso_flag_offset << 16 | /* XXX */
S_008F30_DISABLE_CUBE_WRAP(!state->seamless_cube_map));
rstate->val[1] = (S_008F34_MIN_LOD(S_FIXED(CLAMP(state->min_lod, 0, 15), 8)) |
S_008F34_MAX_LOD(S_FIXED(CLAMP(state->max_lod, 0, 15), 8)));
rstate->val[2] = (S_008F38_LOD_BIAS(S_FIXED(CLAMP(state->lod_bias, -16, 16), 8)) |
S_008F38_XY_MAG_FILTER(si_tex_filter(state->mag_img_filter)) |
S_008F38_XY_MIN_FILTER(si_tex_filter(state->min_img_filter)) |
S_008F38_XY_MAG_FILTER(si_tex_filter(state->mag_img_filter) | aniso_flag_offset) |
S_008F38_XY_MIN_FILTER(si_tex_filter(state->min_img_filter) | aniso_flag_offset) |
S_008F38_MIP_FILTER(si_tex_mipfilter(state->min_mip_filter)));
rstate->val[3] = S_008F3C_BORDER_COLOR_TYPE(border_color_type);
@@ -2767,7 +2766,7 @@ static void si_bind_vs_sampler_states(struct pipe_context *ctx, unsigned count,
si_set_sampler_states(sctx, pm4, count, states,
&sctx->samplers[PIPE_SHADER_VERTEX],
R_00B130_SPI_SHADER_USER_DATA_VS_0);
#if HAVE_LLVM >= 0x0305
#if LLVM_SUPPORTS_GEOM_SHADERS
si_set_sampler_states(sctx, pm4, count, states,
&sctx->samplers[PIPE_SHADER_VERTEX],
R_00B330_SPI_SHADER_USER_DATA_ES_0);
@@ -2999,7 +2998,7 @@ void si_init_state_functions(struct si_context *sctx)
sctx->b.b.bind_fs_state = si_bind_ps_shader;
sctx->b.b.delete_vs_state = si_delete_vs_shader;
sctx->b.b.delete_fs_state = si_delete_ps_shader;
#if HAVE_LLVM >= 0x0305
#if LLVM_SUPPORTS_GEOM_SHADERS
sctx->b.b.create_gs_state = si_create_gs_state;
sctx->b.b.bind_gs_state = si_bind_gs_shader;
sctx->b.b.delete_gs_state = si_delete_gs_shader;

View File

@@ -591,6 +591,9 @@ struct pipe_blit_info
boolean scissor_enable;
struct pipe_scissor_state scissor;
boolean render_condition_enable; /**< whether to leave current render
condition enabled */
};

View File

@@ -42,8 +42,11 @@ namespace {
device::device(clover::platform &platform, pipe_loader_device *ldev) :
platform(platform), ldev(ldev) {
pipe = pipe_loader_create_screen(ldev, PIPE_SEARCH_DIR);
if (!pipe || !pipe->get_param(pipe, PIPE_CAP_COMPUTE))
if (!pipe || !pipe->get_param(pipe, PIPE_CAP_COMPUTE)) {
if (pipe)
pipe->destroy(pipe);
throw error(CL_INVALID_DEVICE);
}
}
device::~device() {

View File

@@ -223,7 +223,7 @@ XA_EXPORT int
xa_copy_prepare(struct xa_context *ctx,
struct xa_surface *dst, struct xa_surface *src)
{
if (src == dst || ctx->srf != NULL)
if (src == dst)
return -XA_ERR_INVAL;
if (src->tex->format != dst->tex->format) {

View File

@@ -48,7 +48,7 @@ AM_LDFLAGS = \
-module \
-no-undefined \
-avoid-version \
-Wl,--version-script=$(top_srcdir)/src/gallium/targets/egl-static/egl.sym
-Wl,--version-script=$(top_srcdir)/src/gallium/targets/egl-static/egl.sym \
$(GC_SECTIONS) \
$(LD_NO_UNDEFINED)

View File

@@ -3652,11 +3652,15 @@ ast_declarator_list::hir(exec_list *instructions,
* instruction stream.
*/
exec_list initializer_instructions;
/* Examine var name here since var may get deleted in the next call */
bool var_is_gl_id = (strncmp(var->name, "gl_", 3) == 0);
ir_variable *earlier =
get_variable_being_redeclared(var, decl->get_location(), state,
false /* allow_all_redeclarations */);
if (earlier != NULL) {
if (strncmp(var->name, "gl_", 3) == 0 &&
if (var_is_gl_id &&
earlier->data.how_declared == ir_var_declared_in_block) {
_mesa_glsl_error(&loc, state,
"`%s' has already been redeclared using "

View File

@@ -62,11 +62,9 @@
#include "program/prog_instruction.h"
#include <limits>
#define f(x) join(x)
#define join(x) x ## f
#define M_PIf f(M_PI)
#define M_PI_2f f(M_PI_2)
#define M_PI_4f f(M_PI_4)
#define M_PIf ((float) M_PI)
#define M_PI_2f ((float) M_PI_2)
#define M_PI_4f ((float) M_PI_4)
using namespace ir_builder;

View File

@@ -1319,6 +1319,13 @@ layout_qualifier_id:
if (match_layout_qualifier("location", $1, state) == 0) {
$$.flags.q.explicit_location = 1;
if ($$.flags.q.attribute == 1 &&
state->ARB_explicit_attrib_location_warn) {
_mesa_glsl_warning(& @1, state,
"GL_ARB_explicit_attrib_location layout "
"identifier `%s' used", $1);
}
if ($3 >= 0) {
$$.location = $3;
} else {
@@ -1426,10 +1433,6 @@ layout_qualifier_id:
_mesa_glsl_error(& @1, state, "unrecognized layout identifier "
"`%s'", $1);
YYERROR;
} else if (state->ARB_explicit_attrib_location_warn) {
_mesa_glsl_warning(& @1, state,
"GL_ARB_explicit_attrib_location layout "
"identifier `%s' used", $1);
}
}
| interface_block_layout_qualifier

View File

@@ -1092,11 +1092,11 @@ bool
populate_consumer_input_sets(void *mem_ctx, exec_list *ir,
hash_table *consumer_inputs,
hash_table *consumer_interface_inputs,
ir_variable *consumer_inputs_with_locations[MAX_VARYING])
ir_variable *consumer_inputs_with_locations[VARYING_SLOT_MAX])
{
memset(consumer_inputs_with_locations,
0,
sizeof(consumer_inputs_with_locations[0]) * MAX_VARYING);
sizeof(consumer_inputs_with_locations[0]) * VARYING_SLOT_MAX);
foreach_list(node, ir) {
ir_variable *const input_var = ((ir_instruction *) node)->as_variable();
@@ -1152,7 +1152,7 @@ get_matching_input(void *mem_ctx,
const ir_variable *output_var,
hash_table *consumer_inputs,
hash_table *consumer_interface_inputs,
ir_variable *consumer_inputs_with_locations[MAX_VARYING])
ir_variable *consumer_inputs_with_locations[VARYING_SLOT_MAX])
{
ir_variable *input_var;
@@ -1277,7 +1277,7 @@ assign_varying_locations(struct gl_context *ctx,
= hash_table_ctor(0, hash_table_string_hash, hash_table_string_compare);
hash_table *consumer_interface_inputs
= hash_table_ctor(0, hash_table_string_hash, hash_table_string_compare);
ir_variable *consumer_inputs_with_locations[MAX_VARYING] = {
ir_variable *consumer_inputs_with_locations[VARYING_SLOT_MAX] = {
NULL,
};

View File

@@ -39,14 +39,14 @@ bool
populate_consumer_input_sets(void *mem_ctx, exec_list *ir,
hash_table *consumer_inputs,
hash_table *consumer_interface_inputs,
ir_variable *consumer_inputs_with_locations[MAX_VARYING]);
ir_variable *consumer_inputs_with_locations[VARYING_SLOT_MAX]);
ir_variable *
get_matching_input(void *mem_ctx,
const ir_variable *output_var,
hash_table *consumer_inputs,
hash_table *consumer_interface_inputs,
ir_variable *consumer_inputs_with_locations[MAX_VARYING]);
ir_variable *consumer_inputs_with_locations[VARYING_SLOT_MAX]);
}
class link_varyings : public ::testing::Test {
@@ -70,7 +70,7 @@ public:
hash_table *consumer_interface_inputs;
const glsl_type *simple_interface;
ir_variable *junk[MAX_VARYING];
ir_variable *junk[VARYING_SLOT_MAX];
};
link_varyings::link_varyings()
@@ -197,9 +197,8 @@ TEST_F(link_varyings, gl_ClipDistance)
consumer_interface_inputs,
junk));
EXPECT_EQ((void *) clipdistance,
hash_table_find(consumer_inputs, "gl_ClipDistance"));
EXPECT_EQ(1u, num_elements(consumer_inputs));
EXPECT_EQ(clipdistance, junk[VARYING_SLOT_CLIP_DIST0]);
EXPECT_TRUE(is_empty(consumer_inputs));
EXPECT_TRUE(is_empty(consumer_interface_inputs));
}

View File

@@ -73,11 +73,15 @@ apple_visual_create_pfobj(CGLPixelFormatObj * pfobj, const struct glx_config * m
GLint vsref = 0;
CGLError error = 0;
/* Request an OpenGL 3.2 profile if one is available */
if(apple_cgl.version_major > 1 || (apple_cgl.version_major == 1 && apple_cgl.version_minor >= 3)) {
attr[numattr++] = kCGLPFAOpenGLProfile;
attr[numattr++] = kCGLOGLPVersion_3_2_Core;
}
/* Request an OpenGL 3.2 profile if one is available and supported */
attr[numattr++] = kCGLPFAOpenGLProfile;
attr[numattr++] = kCGLOGLPVersion_3_2_Core;
/* Test for kCGLPFAOpenGLProfile support at runtime and roll it out if not supported */
attr[numattr] = 0;
error = apple_cgl.choose_pixel_format(attr, pfobj, &vsref);
if (error == kCGLBadAttribute)
numattr -= 2;
if (offscreen) {
apple_glx_diagnostic

View File

@@ -249,6 +249,10 @@ glx_display_free(struct glx_display *priv)
if (priv->dri2Display)
(*priv->dri2Display->destroyDisplay) (priv->dri2Display);
priv->dri2Display = NULL;
if (priv->dri3Display)
(*priv->dri3Display->destroyDisplay) (priv->dri3Display);
priv->dri3Display = NULL;
#endif
free((char *) priv);

View File

@@ -141,10 +141,10 @@ fake_queryString(__DRIscreen *screen, int attribute, const char **val)
}
static const __DRI2rendererQueryExtension rendererQueryExt = {
.base = { __DRI2_RENDERER_QUERY, 1 },
{ __DRI2_RENDERER_QUERY, 1 },
.queryInteger = fake_queryInteger,
.queryString = fake_queryString
fake_queryInteger,
fake_queryString
};
void dri2_query_renderer_string_test::SetUp()

View File

@@ -113,7 +113,7 @@ __glapi_gentable_set_remaining_noop(struct _glapi_table *disp) {
struct _glapi_table *
_glapi_create_table_from_handle(void *handle, const char *symbol_prefix) {
struct _glapi_table *disp = calloc(1, sizeof(struct _glapi_table));
struct _glapi_table *disp = calloc(1, _glapi_get_dispatch_table_size() * sizeof(_glapi_proc));
char symboln[512];
if(!disp)

View File

@@ -87,6 +87,63 @@
/* those link to libglapi.a should provide the entry points */
#define _GLAPI_SKIP_PROTO_ENTRY_POINTS
#endif
/* These prototypes are necessary because GLES1 library builds will create
* dispatch functions for them. We can't directly include GLES/gl.h because
* it would conflict the previously-included GL/gl.h. Since GLES1 ABI is not
* expected to every add more functions, the path of least resistance is to
* just duplicate the prototypes for the functions that aren't already in
* desktop OpenGL.
*/
#include <GLES/glplatform.h>
GL_API void GL_APIENTRY glClearDepthf (GLclampf depth);
GL_API void GL_APIENTRY glClipPlanef (GLenum plane, const GLfloat *equation);
GL_API void GL_APIENTRY glFrustumf (GLfloat left, GLfloat right, GLfloat bottom, GLfloat top, GLfloat zNear, GLfloat zFar);
GL_API void GL_APIENTRY glGetClipPlanef (GLenum pname, GLfloat eqn[4]);
GL_API void GL_APIENTRY glOrthof (GLfloat left, GLfloat right, GLfloat bottom, GLfloat top, GLfloat zNear, GLfloat zFar);
GL_API void GL_APIENTRY glAlphaFuncx (GLenum func, GLclampx ref);
GL_API void GL_APIENTRY glClearColorx (GLclampx red, GLclampx green, GLclampx blue, GLclampx alpha);
GL_API void GL_APIENTRY glClearDepthx (GLclampx depth);
GL_API void GL_APIENTRY glClipPlanex (GLenum plane, const GLfixed *equation);
GL_API void GL_APIENTRY glColor4x (GLfixed red, GLfixed green, GLfixed blue, GLfixed alpha);
GL_API void GL_APIENTRY glDepthRangex (GLclampx zNear, GLclampx zFar);
GL_API void GL_APIENTRY glFogx (GLenum pname, GLfixed param);
GL_API void GL_APIENTRY glFogxv (GLenum pname, const GLfixed *params);
GL_API void GL_APIENTRY glFrustumx (GLfixed left, GLfixed right, GLfixed bottom, GLfixed top, GLfixed zNear, GLfixed zFar);
GL_API void GL_APIENTRY glGetClipPlanex (GLenum pname, GLfixed eqn[4]);
GL_API void GL_APIENTRY glGetFixedv (GLenum pname, GLfixed *params);
GL_API void GL_APIENTRY glGetLightxv (GLenum light, GLenum pname, GLfixed *params);
GL_API void GL_APIENTRY glGetMaterialxv (GLenum face, GLenum pname, GLfixed *params);
GL_API void GL_APIENTRY glGetTexEnvxv (GLenum env, GLenum pname, GLfixed *params);
GL_API void GL_APIENTRY glGetTexParameterxv (GLenum target, GLenum pname, GLfixed *params);
GL_API void GL_APIENTRY glLightModelx (GLenum pname, GLfixed param);
GL_API void GL_APIENTRY glLightModelxv (GLenum pname, const GLfixed *params);
GL_API void GL_APIENTRY glLightx (GLenum light, GLenum pname, GLfixed param);
GL_API void GL_APIENTRY glLightxv (GLenum light, GLenum pname, const GLfixed *params);
GL_API void GL_APIENTRY glLineWidthx (GLfixed width);
GL_API void GL_APIENTRY glLoadMatrixx (const GLfixed *m);
GL_API void GL_APIENTRY glMaterialx (GLenum face, GLenum pname, GLfixed param);
GL_API void GL_APIENTRY glMaterialxv (GLenum face, GLenum pname, const GLfixed *params);
GL_API void GL_APIENTRY glMultMatrixx (const GLfixed *m);
GL_API void GL_APIENTRY glMultiTexCoord4x (GLenum target, GLfixed s, GLfixed t, GLfixed r, GLfixed q);
GL_API void GL_APIENTRY glNormal3x (GLfixed nx, GLfixed ny, GLfixed nz);
GL_API void GL_APIENTRY glOrthox (GLfixed left, GLfixed right, GLfixed bottom, GLfixed top, GLfixed zNear, GLfixed zFar);
GL_API void GL_APIENTRY glPointParameterx (GLenum pname, GLfixed param);
GL_API void GL_APIENTRY glPointParameterxv (GLenum pname, const GLfixed *params);
GL_API void GL_APIENTRY glPointSizex (GLfixed size);
GL_API void GL_APIENTRY glPolygonOffsetx (GLfixed factor, GLfixed units);
GL_API void GL_APIENTRY glRotatex (GLfixed angle, GLfixed x, GLfixed y, GLfixed z);
GL_API void GL_APIENTRY glSampleCoveragex (GLclampx value, GLboolean invert);
GL_API void GL_APIENTRY glScalex (GLfixed x, GLfixed y, GLfixed z);
GL_API void GL_APIENTRY glTexEnvx (GLenum target, GLenum pname, GLfixed param);
GL_API void GL_APIENTRY glTexEnvxv (GLenum target, GLenum pname, const GLfixed *params);
GL_API void GL_APIENTRY glTexParameterx (GLenum target, GLenum pname, GLfixed param);
GL_API void GL_APIENTRY glTexParameterxv (GLenum target, GLenum pname, const GLfixed *params);
GL_API void GL_APIENTRY glTranslatex (GLfixed x, GLfixed y, GLfixed z);
GL_API void GL_APIENTRY glPointSizePointerOES (GLenum type, GLsizei stride, const GLvoid *pointer);
#include "glapi/glapitemp.h"
#endif /* USE_X86_ASM */

View File

@@ -86,6 +86,9 @@
/** Return offset in bytes of the field within a vertex struct */
#define OFFSET(FIELD) ((void *) offsetof(struct vertex, FIELD))
static void
meta_clear(struct gl_context *ctx, GLbitfield buffers, bool glsl);
static struct blit_shader *
choose_blit_shader(GLenum target, struct blit_shader_table *table);
@@ -201,6 +204,31 @@ _mesa_meta_link_program_with_debug(struct gl_context *ctx, GLuint program)
return 0;
}
void
_mesa_meta_compile_and_link_program(struct gl_context *ctx,
const char *vs_source,
const char *fs_source,
const char *name,
GLuint *program)
{
GLuint vs = _mesa_meta_compile_shader_with_debug(ctx, GL_VERTEX_SHADER,
vs_source);
GLuint fs = _mesa_meta_compile_shader_with_debug(ctx, GL_FRAGMENT_SHADER,
fs_source);
*program = _mesa_CreateProgram();
_mesa_AttachShader(*program, fs);
_mesa_DeleteShader(fs);
_mesa_AttachShader(*program, vs);
_mesa_DeleteShader(vs);
_mesa_BindAttribLocation(*program, 0, "position");
_mesa_BindAttribLocation(*program, 1, "texcoords");
_mesa_meta_link_program_with_debug(ctx, *program);
_mesa_ObjectLabel(GL_PROGRAM, *program, -1, name);
_mesa_UseProgram(*program);
}
/**
* Generate a generic shader to blit from a texture to a framebuffer
*
@@ -214,12 +242,25 @@ _mesa_meta_setup_blit_shader(struct gl_context *ctx,
GLenum target,
struct blit_shader_table *table)
{
const char *vs_source;
char *fs_source;
GLuint vs, fs;
char *vs_source, *fs_source;
void *const mem_ctx = ralloc_context(NULL);
struct blit_shader *shader = choose_blit_shader(target, table);
char *name;
const char *vs_input, *vs_output, *fs_input, *vs_preprocess, *fs_preprocess;
if (ctx->Const.GLSLVersion < 130) {
vs_preprocess = "";
vs_input = "attribute";
vs_output = "varying";
fs_preprocess = "#extension GL_EXT_texture_array : enable";
fs_input = "varying";
} else {
vs_preprocess = "#version 130";
vs_input = "in";
vs_output = "out";
fs_preprocess = "#version 130";
fs_input = "in";
shader->func = "texture";
}
assert(shader != NULL);
@@ -228,73 +269,36 @@ _mesa_meta_setup_blit_shader(struct gl_context *ctx,
return;
}
if (ctx->Const.GLSLVersion < 130) {
vs_source =
"attribute vec2 position;\n"
"attribute vec4 textureCoords;\n"
"varying vec4 texCoords;\n"
"void main()\n"
"{\n"
" texCoords = textureCoords;\n"
" gl_Position = vec4(position, 0.0, 1.0);\n"
"}\n";
vs_source = ralloc_asprintf(mem_ctx,
"%s\n"
"%s vec2 position;\n"
"%s vec4 textureCoords;\n"
"%s vec4 texCoords;\n"
"void main()\n"
"{\n"
" texCoords = textureCoords;\n"
" gl_Position = vec4(position, 0.0, 1.0);\n"
"}\n",
vs_preprocess, vs_input, vs_input, vs_output);
fs_source = ralloc_asprintf(mem_ctx,
"#extension GL_EXT_texture_array : enable\n"
"#extension GL_ARB_texture_cube_map_array: enable\n"
"uniform %s texSampler;\n"
"varying vec4 texCoords;\n"
"void main()\n"
"{\n"
" gl_FragColor = %s(texSampler, %s);\n"
" gl_FragDepth = gl_FragColor.x;\n"
"}\n",
shader->type,
shader->func, shader->texcoords);
}
else {
vs_source = ralloc_asprintf(mem_ctx,
"#version 130\n"
"in vec2 position;\n"
"in vec4 textureCoords;\n"
"out vec4 texCoords;\n"
"void main()\n"
"{\n"
" texCoords = textureCoords;\n"
" gl_Position = vec4(position, 0.0, 1.0);\n"
"}\n");
fs_source = ralloc_asprintf(mem_ctx,
"#version 130\n"
"#extension GL_ARB_texture_cube_map_array: enable\n"
"uniform %s texSampler;\n"
"in vec4 texCoords;\n"
"out vec4 out_color;\n"
"\n"
"void main()\n"
"{\n"
" out_color = texture(texSampler, %s);\n"
" gl_FragDepth = out_color.x;\n"
"}\n",
shader->type,
shader->texcoords);
}
fs_source = ralloc_asprintf(mem_ctx,
"%s\n"
"#extension GL_ARB_texture_cube_map_array: enable\n"
"uniform %s texSampler;\n"
"%s vec4 texCoords;\n"
"void main()\n"
"{\n"
" gl_FragColor = %s(texSampler, %s);\n"
" gl_FragDepth = gl_FragColor.x;\n"
"}\n",
fs_preprocess, shader->type, fs_input,
shader->func, shader->texcoords);
vs = _mesa_meta_compile_shader_with_debug(ctx, GL_VERTEX_SHADER, vs_source);
fs = _mesa_meta_compile_shader_with_debug(ctx, GL_FRAGMENT_SHADER, fs_source);
shader->shader_prog = _mesa_CreateProgram();
_mesa_AttachShader(shader->shader_prog, fs);
_mesa_DeleteShader(fs);
_mesa_AttachShader(shader->shader_prog, vs);
_mesa_DeleteShader(vs);
_mesa_BindAttribLocation(shader->shader_prog, 0, "position");
_mesa_BindAttribLocation(shader->shader_prog, 1, "texcoords");
_mesa_meta_link_program_with_debug(ctx, shader->shader_prog);
name = ralloc_asprintf(mem_ctx, "%s blit", shader->type);
_mesa_ObjectLabel(GL_PROGRAM, shader->shader_prog, -1, name);
_mesa_meta_compile_and_link_program(ctx, vs_source, fs_source,
ralloc_asprintf(mem_ctx, "%s blit",
shader->type),
&shader->shader_prog);
ralloc_free(mem_ctx);
_mesa_UseProgram(shader->shader_prog);
}
/**
@@ -389,6 +393,24 @@ _mesa_meta_init(struct gl_context *ctx)
ctx->Meta = CALLOC_STRUCT(gl_meta_state);
}
static GLenum
gl_buffer_index_to_drawbuffers_enum(gl_buffer_index bufindex)
{
assert(bufindex < BUFFER_COUNT);
if (bufindex >= BUFFER_COLOR0)
return GL_COLOR_ATTACHMENT0 + bufindex - BUFFER_COLOR0;
else if (bufindex == BUFFER_FRONT_LEFT)
return GL_FRONT_LEFT;
else if (bufindex == BUFFER_FRONT_RIGHT)
return GL_FRONT_RIGHT;
else if (bufindex == BUFFER_BACK_LEFT)
return GL_BACK_LEFT;
else if (bufindex == BUFFER_BACK_RIGHT)
return GL_BACK_RIGHT;
return GL_NONE;
}
/**
* Free context meta-op state.
@@ -775,6 +797,23 @@ _mesa_meta_begin(struct gl_context *ctx, GLbitfield state)
_mesa_set_framebuffer_srgb(ctx, GL_FALSE);
}
if (state & MESA_META_DRAW_BUFFERS) {
int buf, real_color_buffers = 0;
memset(save->ColorDrawBuffers, 0, sizeof(save->ColorDrawBuffers));
for (buf = 0; buf < MAX_DRAW_BUFFERS; buf++) {
int buf_index = ctx->DrawBuffer->_ColorDrawBufferIndexes[buf];
if (buf_index == -1)
continue;
save->ColorDrawBuffers[buf] =
gl_buffer_index_to_drawbuffers_enum(buf_index);
if (++real_color_buffers >= ctx->DrawBuffer->_NumColorDrawBuffers)
break;
}
}
/* misc */
{
save->Lighting = ctx->Light.Enabled;
@@ -1173,6 +1212,10 @@ _mesa_meta_end(struct gl_context *ctx)
ctx->CurrentRenderbuffer->Name != save->RenderbufferName)
_mesa_BindRenderbuffer(GL_RENDERBUFFER, save->RenderbufferName);
if (state & MESA_META_DRAW_BUFFERS) {
_mesa_DrawBuffers(MAX_DRAW_BUFFERS, save->ColorDrawBuffers);
}
ctx->Meta->SaveStackDepth--;
ctx->API = save->API;
@@ -1459,100 +1502,13 @@ _mesa_meta_setup_ff_tnl_for_blit(GLuint *VAO, GLuint *VBO,
void
_mesa_meta_Clear(struct gl_context *ctx, GLbitfield buffers)
{
struct clear_state *clear = &ctx->Meta->Clear;
struct vertex verts[4];
/* save all state but scissor, pixel pack/unpack */
GLbitfield metaSave = (MESA_META_ALL -
MESA_META_SCISSOR -
MESA_META_PIXEL_STORE -
MESA_META_CONDITIONAL_RENDER -
MESA_META_FRAMEBUFFER_SRGB);
const GLuint stencilMax = (1 << ctx->DrawBuffer->Visual.stencilBits) - 1;
meta_clear(ctx, buffers, false);
}
if (buffers & BUFFER_BITS_COLOR) {
/* if clearing color buffers, don't save/restore colormask */
metaSave -= MESA_META_COLOR_MASK;
}
_mesa_meta_begin(ctx, metaSave);
_mesa_meta_setup_vertex_objects(&clear->VAO, &clear->VBO, false, 3, 0, 4);
/* GL_COLOR_BUFFER_BIT */
if (buffers & BUFFER_BITS_COLOR) {
/* leave colormask, glDrawBuffer state as-is */
/* Clears never have the color clamped. */
if (ctx->Extensions.ARB_color_buffer_float)
_mesa_ClampColor(GL_CLAMP_FRAGMENT_COLOR, GL_FALSE);
}
else {
ASSERT(metaSave & MESA_META_COLOR_MASK);
_mesa_ColorMask(GL_FALSE, GL_FALSE, GL_FALSE, GL_FALSE);
}
/* GL_DEPTH_BUFFER_BIT */
if (buffers & BUFFER_BIT_DEPTH) {
_mesa_set_enable(ctx, GL_DEPTH_TEST, GL_TRUE);
_mesa_DepthFunc(GL_ALWAYS);
_mesa_DepthMask(GL_TRUE);
}
else {
assert(!ctx->Depth.Test);
}
/* GL_STENCIL_BUFFER_BIT */
if (buffers & BUFFER_BIT_STENCIL) {
_mesa_set_enable(ctx, GL_STENCIL_TEST, GL_TRUE);
_mesa_StencilOpSeparate(GL_FRONT_AND_BACK,
GL_REPLACE, GL_REPLACE, GL_REPLACE);
_mesa_StencilFuncSeparate(GL_FRONT_AND_BACK, GL_ALWAYS,
ctx->Stencil.Clear & stencilMax,
ctx->Stencil.WriteMask[0]);
}
else {
assert(!ctx->Stencil.Enabled);
}
/* vertex positions/colors */
{
const GLfloat x0 = (GLfloat) ctx->DrawBuffer->_Xmin;
const GLfloat y0 = (GLfloat) ctx->DrawBuffer->_Ymin;
const GLfloat x1 = (GLfloat) ctx->DrawBuffer->_Xmax;
const GLfloat y1 = (GLfloat) ctx->DrawBuffer->_Ymax;
const GLfloat z = invert_z(ctx->Depth.Clear);
GLuint i;
verts[0].x = x0;
verts[0].y = y0;
verts[0].z = z;
verts[1].x = x1;
verts[1].y = y0;
verts[1].z = z;
verts[2].x = x1;
verts[2].y = y1;
verts[2].z = z;
verts[3].x = x0;
verts[3].y = y1;
verts[3].z = z;
/* vertex colors */
for (i = 0; i < 4; i++) {
verts[i].r = ctx->Color.ClearColor.f[0];
verts[i].g = ctx->Color.ClearColor.f[1];
verts[i].b = ctx->Color.ClearColor.f[2];
verts[i].a = ctx->Color.ClearColor.f[3];
}
/* upload new vertex data */
_mesa_BufferData(GL_ARRAY_BUFFER_ARB, sizeof(verts), verts,
GL_DYNAMIC_DRAW_ARB);
}
/* draw quad */
_mesa_DrawArrays(GL_TRIANGLE_FAN, 0, 4);
_mesa_meta_end(ctx);
void
_mesa_meta_glsl_Clear(struct gl_context *ctx, GLbitfield buffers)
{
meta_clear(ctx, buffers, true);
}
static void
@@ -1699,22 +1655,61 @@ meta_glsl_clear_cleanup(struct clear_state *clear)
}
}
/**
* Given a bitfield of BUFFER_BIT_x draw buffers, call glDrawBuffers to
* set GL to only draw to those buffers.
*
* Since the bitfield has no associated order, the assignment of draw buffer
* indices to color attachment indices is rather arbitrary.
*/
static void
drawbuffers_from_bitfield(GLbitfield bits)
{
GLenum enums[MAX_DRAW_BUFFERS];
int i = 0;
int n;
/* This function is only legal for color buffer bitfields. */
assert((bits & ~BUFFER_BITS_COLOR) == 0);
/* Make sure we don't overflow any arrays. */
assert(_mesa_bitcount(bits) <= MAX_DRAW_BUFFERS);
enums[0] = GL_NONE;
if (bits & BUFFER_BIT_FRONT_LEFT)
enums[i++] = GL_FRONT_LEFT;
if (bits & BUFFER_BIT_FRONT_RIGHT)
enums[i++] = GL_FRONT_RIGHT;
if (bits & BUFFER_BIT_BACK_LEFT)
enums[i++] = GL_BACK_LEFT;
if (bits & BUFFER_BIT_BACK_RIGHT)
enums[i++] = GL_BACK_RIGHT;
for (n = 0; n < MAX_COLOR_ATTACHMENTS; n++) {
if (bits & (1 << (BUFFER_COLOR0 + n)))
enums[i++] = GL_COLOR_ATTACHMENT0 + n;
}
_mesa_DrawBuffers(i, enums);
}
/**
* Meta implementation of ctx->Driver.Clear() in terms of polygon rendering.
*/
void
_mesa_meta_glsl_Clear(struct gl_context *ctx, GLbitfield buffers)
static void
meta_clear(struct gl_context *ctx, GLbitfield buffers, bool glsl)
{
struct clear_state *clear = &ctx->Meta->Clear;
GLbitfield metaSave;
const GLuint stencilMax = (1 << ctx->DrawBuffer->Visual.stencilBits) - 1;
struct gl_framebuffer *fb = ctx->DrawBuffer;
const float x0 = ((float)fb->_Xmin / fb->Width) * 2.0f - 1.0f;
const float y0 = ((float)fb->_Ymin / fb->Height) * 2.0f - 1.0f;
const float x1 = ((float)fb->_Xmax / fb->Width) * 2.0f - 1.0f;
const float y1 = ((float)fb->_Ymax / fb->Height) * 2.0f - 1.0f;
const float z = -invert_z(ctx->Depth.Clear);
float x0, y0, x1, y1, z;
struct vertex verts[4];
int i;
metaSave = (MESA_META_ALPHA_TEST |
MESA_META_BLEND |
@@ -1729,7 +1724,18 @@ _mesa_meta_glsl_Clear(struct gl_context *ctx, GLbitfield buffers)
MESA_META_MULTISAMPLE |
MESA_META_OCCLUSION_QUERY);
if (!(buffers & BUFFER_BITS_COLOR)) {
if (!glsl) {
metaSave |= MESA_META_FOG |
MESA_META_PIXEL_TRANSFER |
MESA_META_TRANSFORM |
MESA_META_TEXTURE |
MESA_META_CLAMP_VERTEX_COLOR |
MESA_META_SELECT_FEEDBACK;
}
if (buffers & BUFFER_BITS_COLOR) {
metaSave |= MESA_META_DRAW_BUFFERS;
} else {
/* We'll use colormask to disable color writes. Otherwise,
* respect color mask
*/
@@ -1738,13 +1744,30 @@ _mesa_meta_glsl_Clear(struct gl_context *ctx, GLbitfield buffers)
_mesa_meta_begin(ctx, metaSave);
meta_glsl_clear_init(ctx, clear);
if (glsl) {
meta_glsl_clear_init(ctx, clear);
x0 = ((float) fb->_Xmin / fb->Width) * 2.0f - 1.0f;
y0 = ((float) fb->_Ymin / fb->Height) * 2.0f - 1.0f;
x1 = ((float) fb->_Xmax / fb->Width) * 2.0f - 1.0f;
y1 = ((float) fb->_Ymax / fb->Height) * 2.0f - 1.0f;
z = -invert_z(ctx->Depth.Clear);
} else {
_mesa_meta_setup_vertex_objects(&clear->VAO, &clear->VBO, false, 3, 0, 4);
x0 = (float) fb->_Xmin;
y0 = (float) fb->_Ymin;
x1 = (float) fb->_Xmax;
y1 = (float) fb->_Ymax;
z = invert_z(ctx->Depth.Clear);
}
if (fb->_IntegerColor) {
assert(glsl);
_mesa_UseProgram(clear->IntegerShaderProg);
_mesa_Uniform4iv(clear->IntegerColorLocation, 1,
ctx->Color.ClearColor.i);
} else {
} else if (glsl) {
_mesa_UseProgram(clear->ShaderProg);
_mesa_Uniform4fv(clear->ColorLocation, 1,
ctx->Color.ClearColor.f);
@@ -1752,7 +1775,10 @@ _mesa_meta_glsl_Clear(struct gl_context *ctx, GLbitfield buffers)
/* GL_COLOR_BUFFER_BIT */
if (buffers & BUFFER_BITS_COLOR) {
/* leave colormask, glDrawBuffer state as-is */
/* Only draw to the buffers we were asked to clear. */
drawbuffers_from_bitfield(buffers & BUFFER_BITS_COLOR);
/* leave colormask state as-is */
/* Clears never have the color clamped. */
if (ctx->Extensions.ARB_color_buffer_float)
@@ -1800,6 +1826,15 @@ _mesa_meta_glsl_Clear(struct gl_context *ctx, GLbitfield buffers)
verts[3].y = y1;
verts[3].z = z;
if (!glsl) {
for (i = 0; i < 4; i++) {
verts[i].r = ctx->Color.ClearColor.f[0];
verts[i].g = ctx->Color.ClearColor.f[1];
verts[i].b = ctx->Color.ClearColor.f[2];
verts[i].a = ctx->Color.ClearColor.f[3];
}
}
/* upload new vertex data */
_mesa_BufferData(GL_ARRAY_BUFFER_ARB, sizeof(verts), verts,
GL_DYNAMIC_DRAW_ARB);
@@ -1807,6 +1842,7 @@ _mesa_meta_glsl_Clear(struct gl_context *ctx, GLbitfield buffers)
/* draw quad(s) */
if (fb->MaxNumLayers > 0) {
unsigned layer;
assert(glsl);
for (layer = 0; layer < fb->MaxNumLayers; layer++) {
if (fb->_IntegerColor)
_mesa_Uniform1i(clear->IntegerLayerLocation, layer);
@@ -2774,7 +2810,7 @@ copytexsubimage_using_blit_framebuffer(struct gl_context *ctx, GLuint dims,
_mesa_unlock_texture(ctx, texObj);
_mesa_meta_begin(ctx, MESA_META_ALL);
_mesa_meta_begin(ctx, MESA_META_ALL & ~MESA_META_DRAW_BUFFERS);
_mesa_GenFramebuffers(1, &fbo);
_mesa_BindFramebuffer(GL_DRAW_FRAMEBUFFER, fbo);
@@ -2812,13 +2848,13 @@ copytexsubimage_using_blit_framebuffer(struct gl_context *ctx, GLuint dims,
* are too strict for CopyTexImage. We know meta will be fine with format
* changes.
*/
_mesa_meta_BlitFramebuffer(ctx, x, y,
x + width, y + height,
xoffset, yoffset,
xoffset + width, yoffset + height,
mask, GL_NEAREST);
mask = _mesa_meta_BlitFramebuffer(ctx, x, y,
x + width, y + height,
xoffset, yoffset,
xoffset + width, yoffset + height,
mask, GL_NEAREST);
ctx->Meta->Blit.no_ctsi_fallback = false;
success = true;
success = mask == 0x0;
out:
_mesa_lock_texture(ctx, texObj);
@@ -2996,7 +3032,8 @@ decompress_texture_image(struct gl_context *ctx,
break;
}
_mesa_meta_begin(ctx, MESA_META_ALL & ~MESA_META_PIXEL_STORE);
_mesa_meta_begin(ctx, MESA_META_ALL & ~(MESA_META_PIXEL_STORE |
MESA_META_DRAW_BUFFERS));
samplerSave = ctx->Texture.Unit[ctx->Texture.CurrentUnit].Sampler ?
ctx->Texture.Unit[ctx->Texture.CurrentUnit].Sampler->Name : 0;

View File

@@ -58,6 +58,7 @@
#define MESA_META_MULTISAMPLE 0x100000
#define MESA_META_FRAMEBUFFER_SRGB 0x200000
#define MESA_META_OCCLUSION_QUERY 0x400000
#define MESA_META_DRAW_BUFFERS 0x800000
/**\}*/
/**
@@ -180,6 +181,9 @@ struct save_state
GLboolean TransformFeedbackNeedsResume;
GLuint DrawBufferName, ReadBufferName, RenderbufferName;
/** MESA_META_DRAW_BUFFERS */
GLenum ColorDrawBuffers[MAX_DRAW_BUFFERS];
};
/**
@@ -263,6 +267,13 @@ struct blit_state
bool no_ctsi_fallback;
};
struct fb_tex_blit_state
{
GLint baseLevelSave, maxLevelSave;
GLuint sampler, samplerSave, stencilSamplingSave;
GLuint tempTex;
};
/**
* State for glClear()
@@ -392,11 +403,39 @@ extern GLboolean
_mesa_meta_in_progress(struct gl_context *ctx);
extern void
_mesa_meta_fb_tex_blit_begin(const struct gl_context *ctx,
struct fb_tex_blit_state *blit);
extern void
_mesa_meta_fb_tex_blit_end(struct gl_context *ctx, GLenum target,
struct fb_tex_blit_state *blit);
extern GLboolean
_mesa_meta_bind_rb_as_tex_image(struct gl_context *ctx,
struct gl_renderbuffer *rb,
GLuint *tex,
struct gl_texture_object **texObj,
GLenum *target);
GLuint
_mesa_meta_setup_sampler(struct gl_context *ctx,
const struct gl_texture_object *texObj,
GLenum target, GLenum filter, GLuint srcLevel);
extern GLbitfield
_mesa_meta_BlitFramebuffer(struct gl_context *ctx,
GLint srcX0, GLint srcY0, GLint srcX1, GLint srcY1,
GLint dstX0, GLint dstY0, GLint dstX1, GLint dstY1,
GLbitfield mask, GLenum filter);
extern void
_mesa_meta_and_swrast_BlitFramebuffer(struct gl_context *ctx,
GLint srcX0, GLint srcY0,
GLint srcX1, GLint srcY1,
GLint dstX0, GLint dstY0,
GLint dstX1, GLint dstY1,
GLbitfield mask, GLenum filter);
extern void
_mesa_meta_Clear(struct gl_context *ctx, GLbitfield buffers);
@@ -451,6 +490,13 @@ _mesa_meta_compile_shader_with_debug(struct gl_context *ctx, GLenum target,
GLuint
_mesa_meta_link_program_with_debug(struct gl_context *ctx, GLuint program);
void
_mesa_meta_compile_and_link_program(struct gl_context *ctx,
const char *vs_source,
const char *fs_source,
const char *name,
GLuint *program);
GLboolean
_mesa_meta_alloc_texture(struct temp_texture *tex,
GLsizei width, GLsizei height, GLenum intFormat);

View File

@@ -62,7 +62,6 @@ setup_glsl_msaa_blit_shader(struct gl_context *ctx,
{
const char *vs_source;
char *fs_source;
GLuint vs, fs;
void *mem_ctx;
enum blit_msaa_shader shader_index;
bool dst_is_msaa = false;
@@ -274,7 +273,7 @@ setup_glsl_msaa_blit_shader(struct gl_context *ctx,
samples);
} else {
ralloc_asprintf_append(&sample_resolve,
" out_color = sample_%d_0 / %f;\n",
" gl_FragColor = sample_%d_0 / %f;\n",
samples, (float)samples);
}
}
@@ -314,21 +313,10 @@ setup_glsl_msaa_blit_shader(struct gl_context *ctx,
sample_resolve);
}
vs = _mesa_meta_compile_shader_with_debug(ctx, GL_VERTEX_SHADER, vs_source);
fs = _mesa_meta_compile_shader_with_debug(ctx, GL_FRAGMENT_SHADER, fs_source);
_mesa_meta_compile_and_link_program(ctx, vs_source, fs_source, name,
&blit->msaa_shaders[shader_index]);
blit->msaa_shaders[shader_index] = _mesa_CreateProgram();
_mesa_AttachShader(blit->msaa_shaders[shader_index], fs);
_mesa_DeleteShader(fs);
_mesa_AttachShader(blit->msaa_shaders[shader_index], vs);
_mesa_DeleteShader(vs);
_mesa_BindAttribLocation(blit->msaa_shaders[shader_index], 0, "position");
_mesa_BindAttribLocation(blit->msaa_shaders[shader_index], 1, "texcoords");
_mesa_meta_link_program_with_debug(ctx, blit->msaa_shaders[shader_index]);
_mesa_ObjectLabel(GL_PROGRAM, blit->msaa_shaders[shader_index], -1, name);
ralloc_free(mem_ctx);
_mesa_UseProgram(blit->msaa_shaders[shader_index]);
}
static void
@@ -340,7 +328,10 @@ setup_glsl_blit_framebuffer(struct gl_context *ctx,
/* target = GL_TEXTURE_RECTANGLE is not supported in GLES 3.0 */
assert(_mesa_is_desktop_gl(ctx) || target == GL_TEXTURE_2D);
_mesa_meta_setup_vertex_objects(&blit->VAO, &blit->VBO, true, 2, 2, 0);
unsigned texcoord_size = 2 + (src_rb->Depth > 1 ? 1 : 0);
_mesa_meta_setup_vertex_objects(&blit->VAO, &blit->VBO, true,
2, texcoord_size, 0);
if (target == GL_TEXTURE_2D_MULTISAMPLE ||
target == GL_TEXTURE_2D_MULTISAMPLE_ARRAY) {
@@ -368,19 +359,14 @@ blitframebuffer_texture(struct gl_context *ctx,
const struct gl_renderbuffer_attachment *readAtt =
&readFb->Attachment[att_index];
struct blit_state *blit = &ctx->Meta->Blit;
struct fb_tex_blit_state fb_tex_blit;
const GLint dstX = MIN2(dstX0, dstX1);
const GLint dstY = MIN2(dstY0, dstY1);
const GLint dstW = abs(dstX1 - dstX0);
const GLint dstH = abs(dstY1 - dstY0);
struct gl_texture_object *texObj;
GLuint srcLevel;
GLint baseLevelSave;
GLint maxLevelSave;
GLenum target;
GLuint sampler, samplerSave =
ctx->Texture.Unit[ctx->Texture.CurrentUnit].Sampler ?
ctx->Texture.Unit[ctx->Texture.CurrentUnit].Sampler->Name : 0;
GLuint tempTex = 0;
struct gl_renderbuffer *rb = readAtt->Renderbuffer;
struct temp_texture *meta_temp_texture;
@@ -392,6 +378,8 @@ blitframebuffer_texture(struct gl_context *ctx,
filter = GL_LINEAR;
}
_mesa_meta_fb_tex_blit_begin(ctx, &fb_tex_blit);
if (readAtt->Texture &&
(readAtt->Texture->Target == GL_TEXTURE_2D ||
readAtt->Texture->Target == GL_TEXTURE_RECTANGLE ||
@@ -404,38 +392,16 @@ blitframebuffer_texture(struct gl_context *ctx,
texObj = readAtt->Texture;
target = texObj->Target;
} else if (!readAtt->Texture && ctx->Driver.BindRenderbufferTexImage) {
/* Otherwise, we need the driver to be able to bind a renderbuffer as
* a texture image.
*/
struct gl_texture_image *texImage;
if (rb->NumSamples > 1)
target = GL_TEXTURE_2D_MULTISAMPLE;
else
target = GL_TEXTURE_2D;
_mesa_GenTextures(1, &tempTex);
_mesa_BindTexture(target, tempTex);
srcLevel = 0;
texObj = _mesa_lookup_texture(ctx, tempTex);
texImage = _mesa_get_tex_image(ctx, texObj, target, srcLevel);
if (!ctx->Driver.BindRenderbufferTexImage(ctx, rb, texImage)) {
_mesa_DeleteTextures(1, &tempTex);
if (!_mesa_meta_bind_rb_as_tex_image(ctx, rb, &fb_tex_blit.tempTex,
&texObj, &target))
return false;
} else {
if (ctx->Driver.FinishRenderTexture &&
!rb->NeedsFinishRenderTexture) {
rb->NeedsFinishRenderTexture = true;
ctx->Driver.FinishRenderTexture(ctx, rb);
}
if (_mesa_is_winsys_fbo(readFb)) {
GLint temp = srcY0;
srcY0 = rb->Height - srcY1;
srcY1 = rb->Height - temp;
flipY = -flipY;
}
srcLevel = 0;
if (_mesa_is_winsys_fbo(readFb)) {
GLint temp = srcY0;
srcY0 = rb->Height - srcY1;
srcY1 = rb->Height - temp;
flipY = -flipY;
}
} else {
GLenum tex_base_format;
@@ -476,8 +442,9 @@ blitframebuffer_texture(struct gl_context *ctx,
srcY1 = srcH;
}
baseLevelSave = texObj->BaseLevel;
maxLevelSave = texObj->MaxLevel;
fb_tex_blit.baseLevelSave = texObj->BaseLevel;
fb_tex_blit.maxLevelSave = texObj->MaxLevel;
fb_tex_blit.stencilSamplingSave = texObj->StencilSampling;
if (glsl_version) {
setup_glsl_blit_framebuffer(ctx, blit, rb, target);
@@ -488,25 +455,14 @@ blitframebuffer_texture(struct gl_context *ctx,
2);
}
_mesa_GenSamplers(1, &sampler);
_mesa_BindSampler(ctx->Texture.CurrentUnit, sampler);
/*
printf("Blit from texture!\n");
printf(" srcAtt %p dstAtt %p\n", readAtt, drawAtt);
printf(" srcTex %p dstText %p\n", texObj, drawAtt->Texture);
*/
/* Prepare src texture state */
_mesa_BindTexture(target, texObj->Name);
_mesa_SamplerParameteri(sampler, GL_TEXTURE_MIN_FILTER, filter);
_mesa_SamplerParameteri(sampler, GL_TEXTURE_MAG_FILTER, filter);
if (target != GL_TEXTURE_RECTANGLE_ARB) {
_mesa_TexParameteri(target, GL_TEXTURE_BASE_LEVEL, srcLevel);
_mesa_TexParameteri(target, GL_TEXTURE_MAX_LEVEL, srcLevel);
}
_mesa_SamplerParameteri(sampler, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE);
_mesa_SamplerParameteri(sampler, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE);
fb_tex_blit.sampler = _mesa_meta_setup_sampler(ctx, texObj, target, filter,
srcLevel);
/* Always do our blits with no net sRGB decode or encode.
*
@@ -527,11 +483,12 @@ blitframebuffer_texture(struct gl_context *ctx,
if (ctx->Extensions.EXT_texture_sRGB_decode) {
if (_mesa_get_format_color_encoding(rb->Format) == GL_SRGB &&
ctx->DrawBuffer->Visual.sRGBCapable) {
_mesa_SamplerParameteri(sampler, GL_TEXTURE_SRGB_DECODE_EXT,
GL_DECODE_EXT);
_mesa_SamplerParameteri(fb_tex_blit.sampler,
GL_TEXTURE_SRGB_DECODE_EXT, GL_DECODE_EXT);
_mesa_set_framebuffer_srgb(ctx, GL_TRUE);
} else {
_mesa_SamplerParameteri(sampler, GL_TEXTURE_SRGB_DECODE_EXT,
_mesa_SamplerParameteri(fb_tex_blit.sampler,
GL_TEXTURE_SRGB_DECODE_EXT,
GL_SKIP_DECODE_EXT);
/* set_framebuffer_srgb was set by _mesa_meta_begin(). */
}
@@ -580,12 +537,16 @@ blitframebuffer_texture(struct gl_context *ctx,
verts[0].tex[0] = s0;
verts[0].tex[1] = t0;
verts[0].tex[2] = readAtt->Zoffset;
verts[1].tex[0] = s1;
verts[1].tex[1] = t0;
verts[1].tex[2] = readAtt->Zoffset;
verts[2].tex[0] = s1;
verts[2].tex[1] = t1;
verts[2].tex[2] = readAtt->Zoffset;
verts[3].tex[0] = s0;
verts[3].tex[1] = t1;
verts[3].tex[2] = readAtt->Zoffset;
_mesa_BufferSubData(GL_ARRAY_BUFFER_ARB, 0, sizeof(verts), verts);
}
@@ -598,28 +559,110 @@ blitframebuffer_texture(struct gl_context *ctx,
_mesa_DepthFunc(GL_ALWAYS);
_mesa_DrawArrays(GL_TRIANGLE_FAN, 0, 4);
_mesa_meta_fb_tex_blit_end(ctx, target, &fb_tex_blit);
return true;
}
void
_mesa_meta_fb_tex_blit_begin(const struct gl_context *ctx,
struct fb_tex_blit_state *blit)
{
blit->samplerSave =
ctx->Texture.Unit[ctx->Texture.CurrentUnit].Sampler ?
ctx->Texture.Unit[ctx->Texture.CurrentUnit].Sampler->Name : 0;
blit->tempTex = 0;
}
void
_mesa_meta_fb_tex_blit_end(struct gl_context *ctx, GLenum target,
struct fb_tex_blit_state *blit)
{
/* Restore texture object state, the texture binding will
* be restored by _mesa_meta_end().
*/
if (target != GL_TEXTURE_RECTANGLE_ARB) {
_mesa_TexParameteri(target, GL_TEXTURE_BASE_LEVEL, baseLevelSave);
_mesa_TexParameteri(target, GL_TEXTURE_MAX_LEVEL, maxLevelSave);
_mesa_TexParameteri(target, GL_TEXTURE_BASE_LEVEL, blit->baseLevelSave);
_mesa_TexParameteri(target, GL_TEXTURE_MAX_LEVEL, blit->maxLevelSave);
if (ctx->Extensions.ARB_stencil_texturing) {
const struct gl_texture_object *texObj =
_mesa_get_current_tex_object(ctx, target);
if (texObj->StencilSampling != blit->stencilSamplingSave)
_mesa_TexParameteri(target, GL_DEPTH_STENCIL_TEXTURE_MODE,
blit->stencilSamplingSave ?
GL_STENCIL_INDEX : GL_DEPTH_COMPONENT);
}
}
_mesa_BindSampler(ctx->Texture.CurrentUnit, samplerSave);
_mesa_DeleteSamplers(1, &sampler);
if (tempTex)
_mesa_DeleteTextures(1, &tempTex);
_mesa_BindSampler(ctx->Texture.CurrentUnit, blit->samplerSave);
_mesa_DeleteSamplers(1, &blit->sampler);
if (blit->tempTex)
_mesa_DeleteTextures(1, &blit->tempTex);
}
GLboolean
_mesa_meta_bind_rb_as_tex_image(struct gl_context *ctx,
struct gl_renderbuffer *rb,
GLuint *tex,
struct gl_texture_object **texObj,
GLenum *target)
{
struct gl_texture_image *texImage;
if (rb->NumSamples > 1)
*target = GL_TEXTURE_2D_MULTISAMPLE;
else
*target = GL_TEXTURE_2D;
_mesa_GenTextures(1, tex);
_mesa_BindTexture(*target, *tex);
*texObj = _mesa_lookup_texture(ctx, *tex);
texImage = _mesa_get_tex_image(ctx, *texObj, *target, 0);
if (!ctx->Driver.BindRenderbufferTexImage(ctx, rb, texImage)) {
_mesa_DeleteTextures(1, tex);
return false;
}
if (ctx->Driver.FinishRenderTexture && !rb->NeedsFinishRenderTexture) {
rb->NeedsFinishRenderTexture = true;
ctx->Driver.FinishRenderTexture(ctx, rb);
}
return true;
}
GLuint
_mesa_meta_setup_sampler(struct gl_context *ctx,
const struct gl_texture_object *texObj,
GLenum target, GLenum filter, GLuint srcLevel)
{
GLuint sampler;
_mesa_GenSamplers(1, &sampler);
_mesa_BindSampler(ctx->Texture.CurrentUnit, sampler);
/* Prepare src texture state */
_mesa_BindTexture(target, texObj->Name);
_mesa_SamplerParameteri(sampler, GL_TEXTURE_MIN_FILTER, filter);
_mesa_SamplerParameteri(sampler, GL_TEXTURE_MAG_FILTER, filter);
if (target != GL_TEXTURE_RECTANGLE_ARB) {
_mesa_TexParameteri(target, GL_TEXTURE_BASE_LEVEL, srcLevel);
_mesa_TexParameteri(target, GL_TEXTURE_MAX_LEVEL, srcLevel);
}
_mesa_SamplerParameteri(sampler, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE);
_mesa_SamplerParameteri(sampler, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE);
return sampler;
}
/**
* Meta implementation of ctx->Driver.BlitFramebuffer() in terms
* of texture mapping and polygon rendering.
*/
void
GLbitfield
_mesa_meta_BlitFramebuffer(struct gl_context *ctx,
GLint srcX0, GLint srcY0, GLint srcX1, GLint srcY1,
GLint dstX0, GLint dstY0, GLint dstX1, GLint dstY1,
@@ -644,7 +687,7 @@ _mesa_meta_BlitFramebuffer(struct gl_context *ctx,
/* Multisample texture blit support requires texture multisample. */
if (ctx->ReadBuffer->Visual.samples > 0 &&
!ctx->Extensions.ARB_texture_multisample) {
goto fallback;
return mask;
}
/* Clip a copy of the blit coordinates. If these differ from the input
@@ -653,13 +696,13 @@ _mesa_meta_BlitFramebuffer(struct gl_context *ctx,
if (!_mesa_clip_blit(ctx, &clip.srcX0, &clip.srcY0, &clip.srcX1, &clip.srcY1,
&clip.dstX0, &clip.dstY0, &clip.dstX1, &clip.dstY1)) {
/* clipped/scissored everything away */
return;
return 0;
}
/* Only scissor affects blit, but we're doing to set a custom scissor if
* necessary anyway, so save/clear state.
*/
_mesa_meta_begin(ctx, MESA_META_ALL);
_mesa_meta_begin(ctx, MESA_META_ALL & ~MESA_META_DRAW_BUFFERS);
/* If the clipping earlier changed the destination rect at all, then
* enable the scissor to clip to it.
@@ -680,10 +723,6 @@ _mesa_meta_BlitFramebuffer(struct gl_context *ctx,
filter, dstFlipX, dstFlipY,
use_glsl_version, false)) {
mask &= ~GL_COLOR_BUFFER_BIT;
if (mask == 0x0) {
_mesa_meta_end(ctx);
return;
}
}
}
@@ -693,10 +732,6 @@ _mesa_meta_BlitFramebuffer(struct gl_context *ctx,
filter, dstFlipX, dstFlipY,
use_glsl_version, true)) {
mask &= ~GL_DEPTH_BUFFER_BIT;
if (mask == 0x0) {
_mesa_meta_end(ctx);
return;
}
}
}
@@ -706,11 +741,7 @@ _mesa_meta_BlitFramebuffer(struct gl_context *ctx,
_mesa_meta_end(ctx);
fallback:
if (mask) {
_swrast_BlitFramebuffer(ctx, srcX0, srcY0, srcX1, srcY1,
dstX0, dstY0, dstX1, dstY1, mask, filter);
}
return mask;
}
void
@@ -728,3 +759,24 @@ _mesa_meta_glsl_blit_cleanup(struct blit_state *blit)
_mesa_DeleteTextures(1, &blit->depthTex.TexObj);
blit->depthTex.TexObj = 0;
}
void
_mesa_meta_and_swrast_BlitFramebuffer(struct gl_context *ctx,
GLint srcX0, GLint srcY0,
GLint srcX1, GLint srcY1,
GLint dstX0, GLint dstY0,
GLint dstX1, GLint dstY1,
GLbitfield mask, GLenum filter)
{
mask = _mesa_meta_BlitFramebuffer(ctx,
srcX0, srcY0, srcX1, srcY1,
dstX0, dstY0, dstX1, dstY1,
mask, filter);
if (mask == 0x0)
return;
_swrast_BlitFramebuffer(ctx,
srcX0, srcY0, srcX1, srcY1,
dstX0, dstY0, dstX1, dstY1,
mask, filter);
}

View File

@@ -182,7 +182,7 @@ _mesa_meta_GenerateMipmap(struct gl_context *ctx, GLenum target,
faceTarget = target;
}
_mesa_meta_begin(ctx, MESA_META_ALL);
_mesa_meta_begin(ctx, MESA_META_ALL & ~MESA_META_DRAW_BUFFERS);
/* Choose between glsl version and fixed function version of
* GenerateMipmap function.

View File

@@ -507,7 +507,7 @@ intelInitContext(struct intel_context *intel,
_mesa_meta_init(ctx);
intel->hw_stencil = mesaVis->stencilBits && mesaVis->depthBits == 24;
intel->hw_stencil = mesaVis && mesaVis->stencilBits && mesaVis->depthBits == 24;
intel->hw_stipple = 1;
intel->RenderIndex = ~0;

View File

@@ -741,10 +741,10 @@ intel_blit_framebuffer(struct gl_context *ctx,
return;
_mesa_meta_BlitFramebuffer(ctx,
srcX0, srcY0, srcX1, srcY1,
dstX0, dstY0, dstX1, dstY1,
mask, filter);
_mesa_meta_and_swrast_BlitFramebuffer(ctx,
srcX0, srcY0, srcX1, srcY1,
dstX0, dstY0, dstX1, dstY1,
mask, filter);
}
/**

View File

@@ -76,6 +76,8 @@ i965_FILES = \
brw_lower_texture_gradients.cpp \
brw_lower_unnormalized_offset.cpp \
brw_meta_updownsample.c \
brw_meta_stencil_blit.c \
brw_meta_util.c \
brw_misc_state.c \
brw_object_purgeable.c \
brw_performance_monitor.c \

View File

@@ -31,87 +31,10 @@
#include "brw_context.h"
#include "brw_blorp_blit_eu.h"
#include "brw_state.h"
#include "brw_meta_util.h"
#define FILE_DEBUG_FLAG DEBUG_BLORP
/**
* Helper function for handling mirror image blits.
*
* If coord0 > coord1, swap them and invert the "mirror" boolean.
*/
static inline void
fixup_mirroring(bool &mirror, GLfloat &coord0, GLfloat &coord1)
{
if (coord0 > coord1) {
mirror = !mirror;
GLfloat tmp = coord0;
coord0 = coord1;
coord1 = tmp;
}
}
/**
* Adjust {src,dst}_x{0,1} to account for clipping and scissoring of
* destination coordinates.
*
* Return true if there is still blitting to do, false if all pixels got
* rejected by the clip and/or scissor.
*
* For clarity, the nomenclature of this function assumes we are clipping and
* scissoring the X coordinate; the exact same logic applies for Y
* coordinates.
*
* Note: this function may also be used to account for clipping of source
* coordinates, by swapping the roles of src and dst.
*/
static inline bool
clip_or_scissor(bool mirror, GLfloat &src_x0, GLfloat &src_x1, GLfloat &dst_x0,
GLfloat &dst_x1, GLfloat fb_xmin, GLfloat fb_xmax)
{
float scale = (float) (src_x1 - src_x0) / (dst_x1 - dst_x0);
/* If we are going to scissor everything away, stop. */
if (!(fb_xmin < fb_xmax &&
dst_x0 < fb_xmax &&
fb_xmin < dst_x1 &&
dst_x0 < dst_x1)) {
return false;
}
/* Clip the destination rectangle, and keep track of how many pixels we
* clipped off of the left and right sides of it.
*/
GLint pixels_clipped_left = 0;
GLint pixels_clipped_right = 0;
if (dst_x0 < fb_xmin) {
pixels_clipped_left = fb_xmin - dst_x0;
dst_x0 = fb_xmin;
}
if (fb_xmax < dst_x1) {
pixels_clipped_right = dst_x1 - fb_xmax;
dst_x1 = fb_xmax;
}
/* If we are mirrored, then before applying pixels_clipped_{left,right} to
* the source coordinates, we need to flip them to account for the
* mirroring.
*/
if (mirror) {
GLint tmp = pixels_clipped_left;
pixels_clipped_left = pixels_clipped_right;
pixels_clipped_right = tmp;
}
/* Adjust the source rectangle to remove the pixels corresponding to those
* that were clipped/scissored out of the destination rectangle.
*/
src_x0 += pixels_clipped_left * scale;
src_x1 -= pixels_clipped_right * scale;
return true;
}
static struct intel_mipmap_tree *
find_miptree(GLbitfield buffer_bit, struct intel_renderbuffer *irb)
{
@@ -244,47 +167,12 @@ try_blorp_blit(struct brw_context *brw,
const struct gl_framebuffer *read_fb = ctx->ReadBuffer;
const struct gl_framebuffer *draw_fb = ctx->DrawBuffer;
/* Detect if the blit needs to be mirrored */
bool mirror_x = false, mirror_y = false;
fixup_mirroring(mirror_x, srcX0, srcX1);
fixup_mirroring(mirror_x, dstX0, dstX1);
fixup_mirroring(mirror_y, srcY0, srcY1);
fixup_mirroring(mirror_y, dstY0, dstY1);
/* If the destination rectangle needs to be clipped or scissored, do so.
*/
if (!(clip_or_scissor(mirror_x, srcX0, srcX1, dstX0, dstX1,
draw_fb->_Xmin, draw_fb->_Xmax) &&
clip_or_scissor(mirror_y, srcY0, srcY1, dstY0, dstY1,
draw_fb->_Ymin, draw_fb->_Ymax))) {
/* Everything got clipped/scissored away, so the blit was successful. */
bool mirror_x, mirror_y;
if (brw_meta_mirror_clip_and_scissor(ctx,
&srcX0, &srcY0, &srcX1, &srcY1,
&dstX0, &dstY0, &dstX1, &dstY1,
&mirror_x, &mirror_y))
return true;
}
/* If the source rectangle needs to be clipped or scissored, do so. */
if (!(clip_or_scissor(mirror_x, dstX0, dstX1, srcX0, srcX1,
0, read_fb->Width) &&
clip_or_scissor(mirror_y, dstY0, dstY1, srcY0, srcY1,
0, read_fb->Height))) {
/* Everything got clipped/scissored away, so the blit was successful. */
return true;
}
/* Account for the fact that in the system framebuffer, the origin is at
* the lower left.
*/
if (_mesa_is_winsys_fbo(read_fb)) {
GLint tmp = read_fb->Height - srcY0;
srcY0 = read_fb->Height - srcY1;
srcY1 = tmp;
mirror_y = !mirror_y;
}
if (_mesa_is_winsys_fbo(draw_fb)) {
GLint tmp = draw_fb->Height - dstY0;
dstY0 = draw_fb->Height - dstY1;
dstY1 = tmp;
mirror_y = !mirror_y;
}
/* Find buffers */
struct intel_renderbuffer *src_irb;

View File

@@ -1475,9 +1475,23 @@ GLboolean brwCreateContext(gl_api api,
/*======================================================================
* brw_misc_state.c
*/
GLuint brw_get_rb_for_slice(struct brw_context *brw,
struct intel_mipmap_tree *mt,
unsigned level, unsigned layer, bool flat);
void brw_meta_updownsample(struct brw_context *brw,
struct intel_mipmap_tree *src,
struct intel_mipmap_tree *dst);
void brw_meta_fbo_stencil_blit(struct brw_context *brw,
GLfloat srcX0, GLfloat srcY0,
GLfloat srcX1, GLfloat srcY1,
GLfloat dstX0, GLfloat dstY0,
GLfloat dstX1, GLfloat dstY1);
void brw_meta_stencil_updownsample(struct brw_context *brw,
struct intel_mipmap_tree *src,
struct intel_mipmap_tree *dst);
/*======================================================================
* brw_misc_state.c
*/

View File

@@ -606,6 +606,7 @@
#define BRW_TEXCOORDMODE_CUBE 3
#define BRW_TEXCOORDMODE_CLAMP_BORDER 4
#define BRW_TEXCOORDMODE_MIRROR_ONCE 5
#define GEN8_TEXCOORDMODE_HALF_BORDER 6
#define BRW_THREAD_PRIORITY_NORMAL 0
#define BRW_THREAD_PRIORITY_HIGH 1
@@ -1694,7 +1695,7 @@ enum brw_message_target {
/* GEN7/DW1: */
# define GEN7_SF_DEPTH_BUFFER_SURFACE_FORMAT_SHIFT 12
/* GEN7/DW2: */
# define HSW_SF_LINE_STIPPLE_ENABLE 14
# define HSW_SF_LINE_STIPPLE_ENABLE (1 << 14)
# define GEN8_SF_SMOOTH_POINT_ENABLE (1 << 13)

View File

@@ -192,33 +192,44 @@ static const struct brw_device_info brw_device_info_hsw_gt3 = {
},
};
/* Thread counts and URB limits are placeholders, and may not be accurate. */
#define GEN8_FEATURES \
.gen = 8, \
.has_hiz_and_separate_stencil = true, \
.must_use_separate_stencil = true, \
.has_llc = true, \
.has_pln = true, \
.max_vs_threads = 280, \
.max_gs_threads = 256, \
.max_wm_threads = 408, \
.urb = { \
.size = 128, \
.min_vs_entries = 64, \
.max_vs_entries = 1664, \
.max_gs_entries = 640, \
}
.max_vs_threads = 504, \
.max_gs_threads = 504, \
.max_wm_threads = 384 \
static const struct brw_device_info brw_device_info_bdw_gt1 = {
GEN8_FEATURES, .gt = 1,
.urb = {
.size = 192,
.min_vs_entries = 64,
.max_vs_entries = 2560,
.max_gs_entries = 960,
}
};
static const struct brw_device_info brw_device_info_bdw_gt2 = {
GEN8_FEATURES, .gt = 2,
.urb = {
.size = 384,
.min_vs_entries = 64,
.max_vs_entries = 2560,
.max_gs_entries = 960,
}
};
static const struct brw_device_info brw_device_info_bdw_gt3 = {
GEN8_FEATURES, .gt = 3,
.urb = {
.size = 384,
.min_vs_entries = 64,
.max_vs_entries = 2560,
.max_gs_entries = 960,
}
};
/* Thread counts and URB limits are placeholders, and may not be accurate.

View File

@@ -77,21 +77,40 @@ is_coalesce_candidate(const fs_inst *inst, const int *virtual_grf_sizes)
static bool
can_coalesce_vars(brw::fs_live_variables *live_intervals,
const exec_list *instructions, const fs_inst *inst, int ip,
const exec_list *instructions, const fs_inst *inst,
int var_to, int var_from)
{
if (!live_intervals->vars_interfere(var_from, var_to))
return true;
assert(ip >= live_intervals->start[var_to]);
/* We know that the live ranges of A (var_from) and B (var_to)
* interfere because of the ->vars_interfere() call above. If the end
* of B's live range is after the end of A's range, then we know two
* things:
* - the start of B's live range must be in A's live range (since we
* already know the two ranges interfere, this is the only remaining
* possibility)
* - the interference isn't of the form we're looking for (where B is
* entirely inside A)
*/
if (live_intervals->end[var_to] > live_intervals->end[var_from])
return false;
fs_inst *scan_inst;
for (scan_inst = (fs_inst *)inst->next;
!scan_inst->is_tail_sentinel() && ip <= live_intervals->end[var_to];
scan_inst = (fs_inst *)scan_inst->next, ip++) {
if (scan_inst->opcode == BRW_OPCODE_WHILE)
int scan_ip = -1;
foreach_list(n, instructions) {
fs_inst *scan_inst = (fs_inst *)n;
scan_ip++;
if (scan_inst->is_control_flow())
return false;
if (scan_ip <= live_intervals->start[var_to])
continue;
if (scan_ip > live_intervals->end[var_to])
break;
if (scan_inst->dst.equals(inst->dst) ||
scan_inst->dst.equals(inst->src[0]))
return false;
@@ -114,11 +133,9 @@ fs_visitor::register_coalesce()
fs_inst *mov[MAX_SAMPLER_MESSAGE_SIZE];
int var_to[MAX_SAMPLER_MESSAGE_SIZE];
int var_from[MAX_SAMPLER_MESSAGE_SIZE];
int ip = -1;
foreach_list(node, &this->instructions) {
fs_inst *inst = (fs_inst *)node;
ip++;
if (!is_coalesce_candidate(inst, virtual_grf_sizes))
continue;
@@ -157,7 +174,7 @@ fs_visitor::register_coalesce()
var_to[i] = live_intervals->var_from_vgrf[reg_to] + reg_to_offset[i];
var_from[i] = live_intervals->var_from_vgrf[reg_from] + i;
if (!can_coalesce_vars(live_intervals, &instructions, inst, ip,
if (!can_coalesce_vars(live_intervals, &instructions, inst,
var_to[i], var_from[i])) {
can_coalesce = false;
reg_from = -1;

View File

@@ -221,15 +221,18 @@ fs_visitor::emit_lrp(const fs_reg &dst, const fs_reg &x, const fs_reg &y,
!y.is_valid_3src() ||
!a.is_valid_3src()) {
/* We can't use the LRP instruction. Emit x*(1-a) + y*a. */
fs_reg y_times_a = fs_reg(this, glsl_type::float_type);
fs_reg one_minus_a = fs_reg(this, glsl_type::float_type);
fs_reg x_times_one_minus_a = fs_reg(this, glsl_type::float_type);
emit(MUL(y_times_a, y, a));
fs_reg negative_a = a;
negative_a.negate = !a.negate;
emit(ADD(one_minus_a, negative_a, fs_reg(1.0f)));
fs_inst *mul = emit(MUL(reg_null_f, y, a));
mul->writes_accumulator = true;
emit(MAC(dst, x, one_minus_a));
emit(MUL(x_times_one_minus_a, x, one_minus_a));
emit(ADD(dst, x_times_one_minus_a, y_times_a));
} else {
/* The LRP instruction actually does op1 * op0 + op2 * (1 - op0), so
* we need to reorder the operands.
@@ -1480,15 +1483,28 @@ fs_visitor::rescale_texcoord(ir_texture *ir, fs_reg coordinate,
return coordinate;
}
scale_x = fs_reg(UNIFORM, uniforms);
scale_y = fs_reg(UNIFORM, uniforms + 1);
GLuint index = _mesa_add_state_reference(params,
(gl_state_index *)tokens);
stage_prog_data->param[uniforms++] =
&prog->Parameters->ParameterValues[index][0].f;
stage_prog_data->param[uniforms++] =
&prog->Parameters->ParameterValues[index][1].f;
/* Try to find existing copies of the texrect scale uniforms. */
for (unsigned i = 0; i < uniforms; i++) {
if (stage_prog_data->param[i] ==
&prog->Parameters->ParameterValues[index][0].f) {
scale_x = fs_reg(UNIFORM, i);
scale_y = fs_reg(UNIFORM, i + 1);
break;
}
}
/* If we didn't already set them up, do so now. */
if (scale_x.file == BAD_FILE) {
scale_x = fs_reg(UNIFORM, uniforms);
scale_y = fs_reg(UNIFORM, uniforms + 1);
stage_prog_data->param[uniforms++] =
&prog->Parameters->ParameterValues[index][0].f;
stage_prog_data->param[uniforms++] =
&prog->Parameters->ParameterValues[index][1].f;
}
}
/* The 965 requires the EU to do the normalization of GL rectangle

View File

@@ -0,0 +1,525 @@
/*
* Copyright © 2014 Intel Corporation
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the "Software"),
* to deal in the Software without restriction, including without limitation
* the rights to use, copy, modify, merge, publish, distribute, sublicense,
* and/or sell copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice (including the next
* paragraph) shall be included in all copies or substantial portions of the
* Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
* THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
* IN THE SOFTWARE.
*/
/**
* @file brw_meta_stencil_blit.c
*
* Implements upsampling, downsampling and scaling of stencil miptrees. The
* logic can be originally found in brw_blorp_blit.c.
* Implementation creates a temporary draw framebuffer object and attaches the
* destination stencil buffer attachment as color attachment. Source attachment
* is in turn treated as a stencil texture and the glsl program used for the
* blitting samples it using stencil-indexing.
*
* Unfortunately as the data port does not support interleaved msaa-surfaces
* (stencil is always IMS), the glsl program needs to handle the writing of
* individual samples manually. Surface is configured as if it were single
* sampled (with adjusted dimensions) and the glsl program extracts the
* sample indices from the input coordinates for correct texturing.
*
* Target surface is also configured as Y-tiled instead of W-tiled in order
* to support generations 6-7. Later hardware supports W-tiled as render target
* and the logic here could be simplified for those.
*/
#include "brw_context.h"
#include "intel_batchbuffer.h"
#include "intel_fbo.h"
#include "main/blit.h"
#include "main/buffers.h"
#include "main/fbobject.h"
#include "main/uniforms.h"
#include "main/texparam.h"
#include "main/texobj.h"
#include "main/viewport.h"
#include "main/enable.h"
#include "main/blend.h"
#include "main/varray.h"
#include "main/shaderapi.h"
#include "glsl/ralloc.h"
#include "drivers/common/meta.h"
#include "brw_meta_util.h"
#define FILE_DEBUG_FLAG DEBUG_FBO
struct blit_dims {
int src_x0, src_y0, src_x1, src_y1;
int dst_x0, dst_y0, dst_x1, dst_y1;
bool mirror_x, mirror_y;
};
static const char *vs_source =
"#version 130\n"
"in vec2 position;\n"
"out vec2 tex_coords;\n"
"void main()\n"
"{\n"
" tex_coords = (position + 1.0) / 2.0;\n"
" gl_Position = vec4(position, 0.0, 1.0);\n"
"}\n";
static const struct sampler_and_fetch {
const char *sampler;
const char *fetch;
} samplers[] = {
{ "uniform usampler2D texSampler;\n",
" out_color = texelFetch(texSampler, txl_coords, 0)" },
{ "#extension GL_ARB_texture_multisample : enable\n"
"uniform usampler2DMS texSampler;\n",
" out_color = texelFetch(texSampler, txl_coords, sample_index)" }
};
/**
* Translating Y-tiled to W-tiled:
*
* X' = (X & ~0b1011) >> 1 | (Y & 0b1) << 2 | X & 0b1
* Y' = (Y & ~0b1) << 1 | (X & 0b1000) >> 2 | (X & 0b10) >> 1
*/
static const char *fs_tmpl =
"#version 130\n"
"%s"
"uniform float src_x_scale;\n"
"uniform float src_y_scale;\n"
"uniform float src_x_off;\n" /* Top right coordinates of the source */
"uniform float src_y_off;\n" /* rectangle in W-tiled space. */
"uniform float dst_x_off;\n" /* Top right coordinates of the target */
"uniform float dst_y_off;\n" /* rectangle in Y-tiled space. */
"uniform float draw_rect_w;\n" /* This is the unnormalized size of the */
"uniform float draw_rect_h;\n" /* drawing rectangle in Y-tiled space. */
"uniform int dst_x0;\n" /* This is the bounding rectangle in the W-tiled */
"uniform int dst_x1;\n" /* space that will be used to skip pixels lying */
"uniform int dst_y0;\n" /* outside. In some cases the Y-tiled rectangle */
"uniform int dst_y1;\n" /* is larger. */
"uniform int dst_num_samples;\n"
"in vec2 tex_coords;\n"
"ivec2 txl_coords;\n"
"int sample_index;\n"
"out uvec4 out_color;\n"
"\n"
"void get_unorm_target_coords()\n"
"{\n"
" txl_coords.x = int(tex_coords.x * draw_rect_w + dst_x_off);\n"
" txl_coords.y = int(tex_coords.y * draw_rect_h + dst_y_off);\n"
"}\n"
"\n"
"void translate_dst_to_src()\n"
"{\n"
" txl_coords.x = int(float(txl_coords.x) * src_x_scale + src_x_off);\n"
" txl_coords.y = int(float(txl_coords.y) * src_y_scale + src_y_off);\n"
"}\n"
"\n"
"void translate_y_to_w_tiling()\n"
"{\n"
" int X = txl_coords.x;\n"
" int Y = txl_coords.y;\n"
" txl_coords.x = (X & int(0xfff4)) >> 1;\n"
" txl_coords.x |= ((Y & int(0x1)) << 2);\n"
" txl_coords.x |= (X & int(0x1));\n"
" txl_coords.y = (Y & int(0xfffe)) << 1;\n"
" txl_coords.y |= ((X & int(0x8)) >> 2);\n"
" txl_coords.y |= ((X & int(0x2)) >> 1);\n"
"}\n"
"\n"
"void decode_msaa()\n"
"{\n"
" int X = txl_coords.x;\n"
" int Y = txl_coords.y;\n"
" switch (dst_num_samples) {\n"
" case 0:\n"
" sample_index = 0;\n"
" break;\n"
" case 2:\n"
" txl_coords.x = ((X & int(0xfffc)) >> 1) | (X & int(0x1));\n"
" sample_index = (X & 0x2) >> 1;\n"
" break;\n"
" case 4:\n"
" txl_coords.x = ((X & int(0xfffc)) >> 1) | (X & int(0x1));\n"
" txl_coords.y = ((Y & int(0xfffc)) >> 1) | (Y & int(0x1));\n"
" sample_index = (Y & 0x2) | ((X & 0x2) >> 1);\n"
" break;\n"
" case 8:\n"
" txl_coords.x = ((X & int(0xfff8)) >> 2) | (X & int(0x1));\n"
" txl_coords.y = ((Y & int(0xfffc)) >> 1) | (Y & int(0x1));\n"
" sample_index = (X & 0x4) | (Y & 0x2) | ((X & 0x2) >> 1);\n"
" }\n"
"}\n"
"\n"
"void discard_outside_bounding_rect()\n"
"{\n"
" int X = txl_coords.x;\n"
" int Y = txl_coords.y;\n"
" if (X >= dst_x1 || X < dst_x0 || Y >= dst_y1 || Y < dst_y0)\n"
" discard;\n"
"}\n"
"\n"
"void main()\n"
"{\n"
" get_unorm_target_coords();\n"
" translate_y_to_w_tiling();\n"
" decode_msaa();"
" discard_outside_bounding_rect();\n"
" translate_dst_to_src();\n"
" %s;\n"
"}\n";
/**
* Setup uniforms telling the coordinates of the destination rectangle in the
* native w-tiled space. These are needed to ignore pixels that lie outside.
* The destination is drawn as Y-tiled and in some cases the Y-tiled drawing
* rectangle is larger than the original (for example 1x4 w-tiled requires
* 16x2 y-tiled).
*/
static void
setup_bounding_rect(GLuint prog, const struct blit_dims *dims)
{
_mesa_Uniform1i(_mesa_GetUniformLocation(prog, "dst_x0"), dims->dst_x0);
_mesa_Uniform1i(_mesa_GetUniformLocation(prog, "dst_x1"), dims->dst_x1);
_mesa_Uniform1i(_mesa_GetUniformLocation(prog, "dst_y0"), dims->dst_y0);
_mesa_Uniform1i(_mesa_GetUniformLocation(prog, "dst_y1"), dims->dst_y1);
}
/**
* Setup uniforms telling the destination width, height and the offset. These
* are needed to unnoormalize the input coordinates and to correctly translate
* between destination and source that may have differing offsets.
*/
static void
setup_drawing_rect(GLuint prog, const struct blit_dims *dims)
{
_mesa_Uniform1f(_mesa_GetUniformLocation(prog, "draw_rect_w"),
dims->dst_x1 - dims->dst_x0);
_mesa_Uniform1f(_mesa_GetUniformLocation(prog, "draw_rect_h"),
dims->dst_y1 - dims->dst_y0);
_mesa_Uniform1f(_mesa_GetUniformLocation(prog, "dst_x_off"), dims->dst_x0);
_mesa_Uniform1f(_mesa_GetUniformLocation(prog, "dst_y_off"), dims->dst_y0);
}
/**
* When not mirroring a coordinate (say, X), we need:
* src_x - src_x0 = (dst_x - dst_x0 + 0.5) * scale
* Therefore:
* src_x = src_x0 + (dst_x - dst_x0 + 0.5) * scale
*
* The program uses "round toward zero" to convert the transformed floating
* point coordinates to integer coordinates, whereas the behaviour we actually
* want is "round to nearest", so 0.5 provides the necessary correction.
*
* When mirroring X we need:
* src_x - src_x0 = dst_x1 - dst_x - 0.5
* Therefore:
* src_x = src_x0 + (dst_x1 -dst_x - 0.5) * scale
*/
static void
setup_coord_coeff(GLuint prog, GLuint multiplier, GLuint offset,
int src_0, int src_1, int dst_0, int dst_1, bool mirror)
{
const float scale = ((float)(src_1 - src_0)) / (dst_1 - dst_0);
if (mirror) {
_mesa_Uniform1f(multiplier, -scale);
_mesa_Uniform1f(offset, src_0 + (dst_1 - 0.5) * scale);
} else {
_mesa_Uniform1f(multiplier, scale);
_mesa_Uniform1f(offset, src_0 + (-dst_0 + 0.5) * scale);
}
}
/**
* Setup uniforms providing relation between source and destination surfaces.
* Destination coordinates are in Y-tiling layout while texelFetch() expects
* W-tiled coordinates. Once the destination coordinates are re-interpreted by
* the program into the original W-tiled layout, the program needs to know the
* offset and scaling factors between the destination and source.
* Note that these are calculated in the original W-tiled space before the
* destination rectangle is adjusted for possible msaa and Y-tiling.
*/
static void
setup_coord_transform(GLuint prog, const struct blit_dims *dims)
{
setup_coord_coeff(prog,
_mesa_GetUniformLocation(prog, "src_x_scale"),
_mesa_GetUniformLocation(prog, "src_x_off"),
dims->src_x0, dims->src_x1, dims->dst_x0, dims->dst_x1,
dims->mirror_x);
setup_coord_coeff(prog,
_mesa_GetUniformLocation(prog, "src_y_scale"),
_mesa_GetUniformLocation(prog, "src_y_off"),
dims->src_y0, dims->src_y1, dims->dst_y0, dims->dst_y1,
dims->mirror_y);
}
static GLuint
setup_program(struct gl_context *ctx, bool msaa_tex)
{
struct blit_state *blit = &ctx->Meta->Blit;
static GLuint prog_cache[] = { 0, 0 };
const char *fs_source;
const struct sampler_and_fetch *sampler = &samplers[msaa_tex];
_mesa_meta_setup_vertex_objects(&blit->VAO, &blit->VBO, true, 2, 2, 0);
if (prog_cache[msaa_tex]) {
_mesa_UseProgram(prog_cache[msaa_tex]);
return prog_cache[msaa_tex];
}
fs_source = ralloc_asprintf(NULL, fs_tmpl, sampler->sampler,
sampler->fetch);
_mesa_meta_compile_and_link_program(ctx, vs_source, fs_source,
"i965 stencil blit",
&prog_cache[msaa_tex]);
ralloc_free(fs_source);
return prog_cache[msaa_tex];
}
/**
* Samples in stencil buffer are interleaved, and unfortunately the data port
* does not support it as render target. Therefore the surface is set up as
* single sampled and the program handles the interleaving.
* In case of single sampled stencil, the render buffer is adjusted with
* twice the base level height in order for the program to be able to write
* any mip-level. (Used to set the drawing rectangle for the hw).
*/
static void
adjust_msaa(struct blit_dims *dims, int num_samples)
{
if (num_samples == 2) {
dims->dst_x0 *= 2;
dims->dst_x1 *= 2;
} else if (num_samples) {
const int x_num_samples = num_samples / 2;
dims->dst_x0 = ROUND_DOWN_TO(dims->dst_x0 * x_num_samples, num_samples);
dims->dst_y0 = ROUND_DOWN_TO(dims->dst_y0 * 2, 4);
dims->dst_x1 = ALIGN(dims->dst_x1 * x_num_samples, num_samples);
dims->dst_y1 = ALIGN(dims->dst_y1 * 2, 4);
}
}
/**
* Stencil is mapped as Y-tiled render target and the dimensions need to be
* adjusted in order for the Y-tiled rectangle to cover the entire linear
* memory space of the original W-tiled rectangle.
*/
static void
adjust_tiling(struct blit_dims *dims, int num_samples)
{
const unsigned x_align = 8, y_align = num_samples > 2 ? 8 : 4;
dims->dst_x0 = ROUND_DOWN_TO(dims->dst_x0, x_align) * 2;
dims->dst_y0 = ROUND_DOWN_TO(dims->dst_y0, y_align) / 2;
dims->dst_x1 = ALIGN(dims->dst_x1, x_align) * 2;
dims->dst_y1 = ALIGN(dims->dst_y1, y_align) / 2;
}
/**
* When stencil is mapped as Y-tiled render target the mip-level offsets
* calculated for the Y-tiling do not always match the offsets in W-tiling.
* Therefore the sampling engine cannot be used for individual mip-level
* access but the program needs to do it internally. This can be achieved
* by shifting the coordinates of the blit rectangle here.
*/
static void
adjust_mip_level(const struct intel_mipmap_tree *mt,
unsigned level, unsigned layer, struct blit_dims *dims)
{
unsigned x_offset;
unsigned y_offset;
intel_miptree_get_image_offset(mt, level, layer, &x_offset, &y_offset);
dims->dst_x0 += x_offset;
dims->dst_y0 += y_offset;
dims->dst_x1 += x_offset;
dims->dst_y1 += y_offset;
}
static void
prepare_vertex_data(void)
{
static const struct vertex verts[] = {
{ .x = -1.0f, .y = -1.0f },
{ .x = 1.0f, .y = -1.0f },
{ .x = 1.0f, .y = 1.0f },
{ .x = -1.0f, .y = 1.0f } };
_mesa_BufferSubData(GL_ARRAY_BUFFER_ARB, 0, sizeof(verts), verts);
}
static void
set_read_rb_tex_image(struct gl_context *ctx, struct fb_tex_blit_state *blit,
GLenum *target)
{
const struct gl_renderbuffer_attachment *att =
&ctx->ReadBuffer->Attachment[BUFFER_STENCIL];
struct gl_renderbuffer *rb = att->Renderbuffer;
struct gl_texture_object *tex_obj;
unsigned level = 0;
/* If the renderbuffer is already backed by an tex image, use it. */
if (att->Texture) {
tex_obj = att->Texture;
*target = tex_obj->Target;
level = att->TextureLevel;
} else {
_mesa_meta_bind_rb_as_tex_image(ctx, rb, &blit->tempTex, &tex_obj,
target);
}
blit->baseLevelSave = tex_obj->BaseLevel;
blit->maxLevelSave = tex_obj->MaxLevel;
blit->stencilSamplingSave = tex_obj->StencilSampling;
blit->sampler = _mesa_meta_setup_sampler(ctx, tex_obj, *target,
GL_NEAREST, level);
}
static void
brw_meta_stencil_blit(struct brw_context *brw,
struct intel_mipmap_tree *dst_mt,
unsigned dst_level, unsigned dst_layer,
const struct blit_dims *orig_dims)
{
struct gl_context *ctx = &brw->ctx;
struct blit_dims dims = *orig_dims;
struct fb_tex_blit_state blit;
GLuint prog, fbo, rbo;
GLenum target;
_mesa_meta_fb_tex_blit_begin(ctx, &blit);
_mesa_GenFramebuffers(1, &fbo);
/* Force the surface to be configured for level zero. */
rbo = brw_get_rb_for_slice(brw, dst_mt, 0, dst_layer, true);
adjust_msaa(&dims, dst_mt->num_samples);
adjust_tiling(&dims, dst_mt->num_samples);
_mesa_BindFramebuffer(GL_DRAW_FRAMEBUFFER, fbo);
_mesa_FramebufferRenderbuffer(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT0,
GL_RENDERBUFFER, rbo);
_mesa_DrawBuffer(GL_COLOR_ATTACHMENT0);
ctx->DrawBuffer->_Status = GL_FRAMEBUFFER_COMPLETE;
set_read_rb_tex_image(ctx, &blit, &target);
_mesa_TexParameteri(target, GL_DEPTH_STENCIL_TEXTURE_MODE,
GL_STENCIL_INDEX);
prog = setup_program(ctx, target != GL_TEXTURE_2D);
setup_bounding_rect(prog, orig_dims);
setup_drawing_rect(prog, &dims);
setup_coord_transform(prog, orig_dims);
_mesa_Uniform1i(_mesa_GetUniformLocation(prog, "dst_num_samples"),
dst_mt->num_samples);
prepare_vertex_data();
_mesa_set_viewport(ctx, 0, dims.dst_x0, dims.dst_y0,
dims.dst_x1 - dims.dst_x0, dims.dst_y1 - dims.dst_y0);
_mesa_ColorMask(GL_TRUE, GL_TRUE, GL_TRUE, GL_TRUE);
_mesa_set_enable(ctx, GL_DEPTH_TEST, false);
_mesa_DrawArrays(GL_TRIANGLE_FAN, 0, 4);
_mesa_meta_fb_tex_blit_end(ctx, target, &blit);
_mesa_meta_end(ctx);
_mesa_DeleteRenderbuffers(1, &rbo);
_mesa_DeleteFramebuffers(1, &fbo);
}
void
brw_meta_fbo_stencil_blit(struct brw_context *brw,
GLfloat src_x0, GLfloat src_y0,
GLfloat src_x1, GLfloat src_y1,
GLfloat dst_x0, GLfloat dst_y0,
GLfloat dst_x1, GLfloat dst_y1)
{
struct gl_context *ctx = &brw->ctx;
struct gl_renderbuffer *draw_fb =
ctx->DrawBuffer->Attachment[BUFFER_STENCIL].Renderbuffer;
const struct intel_renderbuffer *dst_irb = intel_renderbuffer(draw_fb);
struct intel_mipmap_tree *dst_mt = dst_irb->mt;
if (!dst_mt)
return;
if (dst_mt->stencil_mt)
dst_mt = dst_mt->stencil_mt;
bool mirror_x, mirror_y;
if (brw_meta_mirror_clip_and_scissor(ctx,
&src_x0, &src_y0, &src_x1, &src_y1,
&dst_x0, &dst_y0, &dst_x1, &dst_y1,
&mirror_x, &mirror_y))
return;
struct blit_dims dims = { .src_x0 = src_x0, .src_y0 = src_y0,
.src_x1 = src_x1, .src_y1 = src_y1,
.dst_x0 = dst_x0, .dst_y0 = dst_y0,
.dst_x1 = dst_x1, .dst_y1 = dst_y1,
.mirror_x = mirror_x, .mirror_y = mirror_y };
adjust_mip_level(dst_mt, dst_irb->mt_level, dst_irb->mt_layer, &dims);
intel_batchbuffer_emit_mi_flush(brw);
_mesa_meta_begin(ctx, MESA_META_ALL);
brw_meta_stencil_blit(brw,
dst_mt, dst_irb->mt_level, dst_irb->mt_layer, &dims);
intel_batchbuffer_emit_mi_flush(brw);
}
void
brw_meta_stencil_updownsample(struct brw_context *brw,
struct intel_mipmap_tree *src,
struct intel_mipmap_tree *dst)
{
struct gl_context *ctx = &brw->ctx;
struct blit_dims dims = {
.src_x0 = 0, .src_y0 = 0,
.src_x1 = src->logical_width0, .src_y1 = src->logical_height0,
.dst_x0 = 0, .dst_y0 = 0,
.dst_x1 = dst->logical_width0, .dst_y1 = dst->logical_height0,
.mirror_x = 0, .mirror_y = 0 };
GLuint fbo, rbo;
if (dst->stencil_mt)
dst = dst->stencil_mt;
intel_batchbuffer_emit_mi_flush(brw);
_mesa_meta_begin(ctx, MESA_META_ALL);
_mesa_GenFramebuffers(1, &fbo);
rbo = brw_get_rb_for_slice(brw, src, 0, 0, false);
_mesa_BindFramebuffer(GL_READ_FRAMEBUFFER, fbo);
_mesa_FramebufferRenderbuffer(GL_READ_FRAMEBUFFER, GL_STENCIL_ATTACHMENT,
GL_RENDERBUFFER, rbo);
brw_meta_stencil_blit(brw, dst, 0, 0, &dims);
intel_batchbuffer_emit_mi_flush(brw);
_mesa_DeleteRenderbuffers(1, &rbo);
_mesa_DeleteFramebuffers(1, &fbo);
}

View File

@@ -27,6 +27,7 @@
#include "main/blit.h"
#include "main/buffers.h"
#include "main/enums.h"
#include "main/fbobject.h"
#include "drivers/common/meta.h"
@@ -44,8 +45,10 @@
*
* Clobbers the current renderbuffer binding (ctx->CurrentRenderbuffer).
*/
static GLuint
brw_get_rb_for_first_slice(struct brw_context *brw, struct intel_mipmap_tree *mt)
GLuint
brw_get_rb_for_slice(struct brw_context *brw,
struct intel_mipmap_tree *mt,
unsigned level, unsigned layer, bool flat)
{
struct gl_context *ctx = &brw->ctx;
GLuint rbo;
@@ -62,11 +65,27 @@ brw_get_rb_for_first_slice(struct brw_context *brw, struct intel_mipmap_tree *mt
irb = intel_renderbuffer(rb);
rb->Format = mt->format;
rb->_BaseFormat = _mesa_base_fbo_format(ctx, mt->format);
rb->_BaseFormat = _mesa_get_format_base_format(mt->format);
rb->NumSamples = mt->num_samples;
rb->Width = mt->logical_width0;
rb->Height = mt->logical_height0;
/* Program takes care of msaa and mip-level access manually for stencil.
* The surface is also treated as Y-tiled instead of as W-tiled calling for
* twice the width and half the height in dimensions.
*/
if (flat) {
const unsigned halign_stencil = 8;
rb->NumSamples = 0;
rb->Width = ALIGN(mt->total_width, halign_stencil) * 2;
rb->Height = (mt->total_height / mt->physical_depth0) / 2;
irb->mt_level = 0;
} else {
rb->NumSamples = mt->num_samples;
rb->Width = mt->logical_width0;
rb->Height = mt->logical_height0;
irb->mt_level = level;
}
irb->mt_layer = layer;
intel_miptree_reference(&irb->mt, mt);
@@ -101,8 +120,8 @@ brw_meta_updownsample(struct brw_context *brw,
_mesa_meta_begin(ctx, MESA_META_ALL);
_mesa_GenFramebuffers(2, fbos);
src_rbo = brw_get_rb_for_first_slice(brw, src_mt);
dst_rbo = brw_get_rb_for_first_slice(brw, dst_mt);
src_rbo = brw_get_rb_for_slice(brw, src_mt, 0, 0, false);
dst_rbo = brw_get_rb_for_slice(brw, dst_mt, 0, 0, false);
src_fbo = fbos[0];
dst_fbo = fbos[1];

Some files were not shown because too many files have changed in this diff Show More