Compare commits

...

61 Commits

Author SHA1 Message Date
Emil Velikov
5a616125ac docs: Update 11.1.0 release notes
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2015-12-15 14:49:25 +00:00
Emil Velikov
a8b2698494 Update version to 11.1.0(final)
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2015-12-14 12:20:18 +00:00
Francisco Jerez
7753691f1a i965: Resolve color and flush for all active shader images in intel_update_state().
Fixes arb_shader_image_load_store/execution/load-from-cleared-image.shader_test.

Couldn't reproduce any significant FPS regression in CPU-bound
benchmarks from the Finnish benchmarking system on neither VLV nor BSW
after 30 runs with 95% confidence level.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92849
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jason Ekstrand <jason.ekstrand@intel.com>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Tested-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
(cherry picked from commit 595c818071)
2015-12-12 19:39:03 +00:00
Dave Airlie
ce914d941d radeonsi: handle loading doubles as geometry shader inputs.
This adds the double code to the geometry shader input handling.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit e307cfa7d9)
2015-12-12 19:39:03 +00:00
Dave Airlie
300f807649 radeonsi: handle doubles in lds load path.
This handles loading doubles from LDS properly.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Cc: "11.0 11.1" <mesa-stable@lists.fedoraproject.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 8c9e40ac22)
2015-12-12 19:39:03 +00:00
Dave Airlie
61a275b789 r600: handle geometry dynamic input array index
This fixes:
glsl-1.50/execution/geometry/dynamic_input_array_index.shader_test
my profanity.

We need to load the AR register with the value from the index reg

Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit cce3864046)
2015-12-12 19:39:03 +00:00
Dave Airlie
0f3892ed9d r600g: fix geom shader input indirect indexing.
This fixes:
gs-input-array-vec4-index-rd

The others run out of gprs unfortunately.

Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 38542921c7)
2015-12-12 19:39:03 +00:00
Dave Airlie
3d942ee4e5 r600/shader: add utility functions to do single slot arithmatic
These utilities are to be used to do things like integer adds and
multiplies to be used in calculating the LDS offsets etc.

It handles CAYMAN MULLO differences as well.

Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 0696ebc899)
2015-12-12 19:39:03 +00:00
Dave Airlie
efdf841238 r600/shader: split address get out to a function.
This will be used in the tess shaders.

Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 4d64459a92)
2015-12-12 19:39:02 +00:00
Dave Airlie
5913a8c9ec r600g: fix outputing to non-0 buffers for stream 0.
This fixes:
arb_transform_feedback3-ext_interleaved_two_bufs_gs
arb_transform_feedback3-ext_interleaved_two_bufs_gs_max
transform-feedback-builtins

If we are only emitting one ring, then emit all output
buffers on it.

Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit e97ac006d7)
[Emil Velikov: squash trivial conflicts]
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>

Conflicts:
	src/gallium/drivers/r600/r600_shader.c
2015-12-12 19:39:02 +00:00
Ilia Mirkin
3c9e76fc24 nv50/ir: fix cutoff for using r63 vs r127 when replacing zero
The only effect here is a space savings - 822 programs in shader-db
affected with the following overall change:

total bytes used in shared programs   : 44154976 -> 44139880 (-0.03%)

Fixes: 641eda0c (nv50/ir: r63 is only 0 if we are using less than 63 registers)
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit f920f8eb02)
2015-12-12 19:39:02 +00:00
Matt Turner
67b1e7b947 glsl: Relax qualifier ordering restriction in ES 3.1.
... and allow the "binding" qualifier in ES 3.1 as well.

GLSL ES 3.1 incorporates only a few features from the extension
ARB_shading_language_420pack: the relaxed qualifier ordering
requirements and the binding qualifier.

Cc: "11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit eca846e7ae)
2015-12-12 19:39:02 +00:00
Matt Turner
0586c5844f glsl: Use has_420pack().
These features would not have been enabled with #version 420 otherwise.

Cc: "11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
(cherry picked from commit 79da7220db)
2015-12-12 19:39:02 +00:00
Matt Turner
7d226ee279 glsl: Allow binding of image variables with 420pack.
This interaction was missed in the addition of ARB_image_load_store.

Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93266
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
(cherry picked from commit c200e606f7)
2015-12-12 19:39:02 +00:00
Jason Ekstrand
36ff210d0e i965/nir: Remove unused indirect handling
The one and only place where the FS backend allows reladdr is on uniforms.
For locals, inputs, and outputs, we lower it away before the backend ever
sees it.  This commit gets rid of the dead indirect handling code.

Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 22c273de2b)
2015-12-12 19:39:02 +00:00
Jason Ekstrand
017f4755fd i965/state: Get rid of dword_pitch arguments to buffer functions
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit abb569ca18)
2015-12-12 19:39:02 +00:00
Jason Ekstrand
61cb4db868 i965/vec4: Use a stride of 1 and byte offsets for UBOs
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92909
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 05bdc21f84)
2015-12-12 19:39:02 +00:00
Jason Ekstrand
34785fb7b9 i965/fs: Use a stride of 1 and byte offsets for UBOs
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 13ad8d03f2)
2015-12-12 19:39:02 +00:00
Jason Ekstrand
22d6bf5078 i965/vec4: Use byte offsets for UBO pulls on Sandy Bridge
Previously, the VS_OPCODE_PULL_CONSTANT_LOAD opcode operated on
vec4-aligned byte offsets on Iron Lake and below and worked in terms of
vec4 offsets on Sandy Bridge.  On Ivy Bridge, we add a new *LOAD_GEN7
variant which works in terms of vec4s.  We're about to change the GEN7
version to work in terms of bytes, so this is a nice unification.

Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit e3e70698c3)
2015-12-12 19:39:02 +00:00
Nicolai Hähnle
9908d19699 radeonsi: last_gfx_fence is a winsys fence
Cc: "11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit d5a5dbd71f)
2015-12-12 19:39:02 +00:00
Ilia Mirkin
a500109aad gk110/ir: fix imad sat/hi flag emission for immediate args
According to nvdisasm both the immediate and non-imm cases use the same
bits. Both of these flags are quite rarely set though.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 1d708aacb7)
2015-12-12 19:39:01 +00:00
Ilia Mirkin
0e78a67709 gk104/ir: sampler doesn't matter for txf
We actually leave the sampler unset for OP_TXF, which caused the GK104+
logic to treat some texel fetches as indirect. While this works, it's
incredibly wasteful. This only happened when the texture was > 0 (since
sampler remained == 0).

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 63b850403c)
2015-12-12 19:39:01 +00:00
Marek Olšák
4bb16d712a radeonsi: disable DCC on Stoney
Cc: 11.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 32f05fadbb)
2015-12-12 19:39:01 +00:00
Christian König
950e9886d0 st/va: disable MPEG4 by default v2
The workarounds are too hacky to enable them by default
and otherwise MPEG4 doesn't work reliably.

v2: add docs/envvars.html, CC stable and fix typos

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com> (v1)
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> (v1)
Cc: "11.1.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit a2c5200a4b)
2015-12-12 19:39:01 +00:00
Ilia Mirkin
dff89432d8 gk110/ir: fix imul hi emission with limm arg
The elemental demo hits this case.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit db072d2086)
2015-12-12 19:39:01 +00:00
Timothy Arceri
499d409a20 mesa: move pipeline input/output validation inside _mesa_validate_program_pipeline()
This allows validation to be done on rendering calls also.

Fixes 3 dEQP-GLES31.functional.separate tests.

Cc: "11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 4dd096d741)
2015-12-12 19:39:01 +00:00
Timothy Arceri
a16f5195ef glsl: don't generate extra errors in ValidateProgramPipeline
From Section 11.1.3.11 (Validation) of the GLES 3.1 spec:

   "An INVALID_OPERATION error is generated by any command that trans-
   fers vertices to the GL or launches compute work if the current set
   of active program objects cannot be executed, for reasons including:"

It then goes on to list the rules we validate in the
_mesa_validate_program_pipeline() function.

For ValidateProgramPipeline the only mention of generating an error is:

   "An INVALID_OPERATION error is generated if pipeline is not a name re-
   turned from a previous call to GenProgramPipelines or if such a name has
   since been deleted by DeleteProgramPipelines,"

Which we handle separately.

This fixes:
ES31-CTS.sepshaderobjs.PipelineApi

No regressions on the eEQP 3.1 tests.

Cc: Gregory Hainaut <gregory.hainaut@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
(cherry picked from commit c3ec12ec3c)
Nominated-by: Emil Velikov <emil.velikov@collabora.com>
2015-12-12 19:39:01 +00:00
Timothy Arceri
f65b790089 glsl: re-validate program pipeline after sampler change
Cc: "11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
https://bugs.freedesktop.org/show_bug.cgi?id=93180
(cherry picked from commit da1a01361b)
2015-12-12 19:39:01 +00:00
Gregory Hainaut
aa19234943 glsl: don't sort varying in separate shader mode
This fixes an issue where the addition of the FLAT qualifier in
varying_matches::record() can break the expected varying order.

It also avoids a future issue with the relaxing of interpolation
qualifier matching constraints in GLSL 4.50.

V2: (by Timothy Arceri)
* reworked comment slightly

Signed-off-by: Gregory Hainaut <gregory.hainaut@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
(cherry picked from commit 2ab9cd0c4d)
Nominated-by: Timothy Arceri <timothy.arceri@collabora.com>
2015-12-12 19:39:01 +00:00
Gregory Hainaut
66f216d8ce glsl: don't dead code remove SSO varyings marked as active
GL_ARB_separate_shader_objects allow matching by name variable or block
interface. Input varyings can't be removed because it is will impact the
location assignment.

This fixes the bug 79783 and likely any application that uses
GL_ARB_separate_shader_objects extension.

V2 (by Timothy Arceri):
* simplify now that builtins are not set as always active

Signed-off-by: Gregory Hainaut <gregory.hainaut@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
https://bugs.freedesktop.org/show_bug.cgi?id=79783
(cherry picked from commit 8117f46f49)
Nominated-by: Timothy Arceri <timothy.arceri@collabora.com>
2015-12-12 19:39:01 +00:00
Gregory Hainaut
4d34038ae5 glsl: add always_active_io attribute to ir_variable
The value will be set in separate-shader program when an input/output
must remains active. e.g. when deadcode removal isn't allowed because
it will create interface location/name-matching mismatch.

v3:
* Rename the attribute
* Use ir_variable directly instead of ir_variable_refcount_visitor
* Move the foreach IR code in the linker file

v4:
* Fix variable name in assert

v5 (by Timothy Arceri):
* Rename functions and reword comments
* Don't set always active on builtins

Signed-off-by: Gregory Hainaut <gregory.hainaut@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
(cherry picked from commit 618612f867)
Nominated-by: Timothy Arceri <timothy.arceri@collabora.com>
2015-12-12 19:39:01 +00:00
Timothy Arceri
781a68555d glsl: copy how_declared when lowering interface blocks
Cc: Gregory Hainaut <gregory.hainaut@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
(cherry picked from commit 76c09c1792)
2015-12-12 19:39:01 +00:00
Marek Olšák
e0b11bcc87 radeonsi: fix occlusion queries on Fiji
Tested.

(cherry picked from commit bfc14796b0)
2015-12-12 19:39:01 +00:00
Matt Turner
359679cb33 i965: Pass brw_context pointer, not gl_context pointer.
Fixes a warning introduced by commit dcadd855.

(cherry picked from commit f1b7fefd4e)
2015-12-12 19:39:00 +00:00
Marta Lofstedt
fcf6091521 gles2: Update gl2ext.h to revision: 32120
This is needed to be able to implement the accepted OES
extensions.

Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Marta Lofstedt <marta.lofstedt@linux.intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
(cherry picked from commit 1d5b88e33b)
2015-12-12 19:38:39 +00:00
Emil Velikov
aa5082b135 Revert "cherry-ignore: ignore unneeded header update"
This reverts commit 79f3aaca4f.

The commit (header update) was not needed for the 11.0 branch as opposed
to this one (11.1)
2015-12-12 19:38:39 +00:00
Eric Anholt
1df00e17d3 vc4: When doing algebraic optimization into a MOV, use the right MOV.
If there were src unpacks, changing to the integer MOV instead of float
(for example) would change the unpack operation.

(cherry picked from commit e3efc4b023)
2015-12-11 17:04:11 -08:00
Eric Anholt
ad3df9d168 vc4: Fix handling of src packs on in qir_follow_movs().
The caller isn't going to expect it from a return, so it would probably
get misinterpreted.  If the caller had an unpack in its reg, that's fine,
but don't lose track of it.

(cherry picked from commit 2591beef89)
2015-12-11 17:04:08 -08:00
Eric Anholt
e4cf550501 vc4: Add missing progress note in opt_algebraic.
(cherry picked from commit b70a2f4d81)
2015-12-11 17:04:00 -08:00
Eric Anholt
ecf2885d7f vc4: Fix handling of sample_mask output.
I apparently broke this in a late refactor, in such a way that I decided
its tests were some of those interminable ones that I should just
blacklist from my testing.  As a result, the refactors related to it were
totally wrong.

(cherry picked from commit 53b2523c6e)
2015-12-11 17:03:51 -08:00
Eric Anholt
fc59ca4064 vc4: Enable MSAA.
We still have several failures in the newly enabled tests in simulation:
sRGB downsampling is done as if it was just linear, stencil blits are not
supported on MSAA either, and derivatives are still not supported
(breaking some MSAA simulation shaders).  So, other than sRGB downsampling
quality, things seem to be in good shape.

(cherry picked from commit f61ceeb3fd)
2015-12-11 17:03:44 -08:00
Eric Anholt
396fbdc721 vc4: Add support for mapping of MSAA resources.
The pipe_transfer_map API requires that we do an implicit
downsample/upsample and return a mapping of that.

(cherry picked from commit fc4a1bfb88)
2015-12-11 17:03:40 -08:00
Eric Anholt
50ac2100df vc4: Add support for texel fetches from MSAA resources.
This is the core of ARB_texture_multisample.  Most of the piglit tests for
GL_ARB_texture_multisample require GL 3.0, but exposing support for this
lets us use the gallium blitter for multisample resolves.  We can
sometimes multisample resolve using just the RCL, but that requires that
the blit is 1:1, unflipped, and aligned to tile boundaries.

(cherry picked from commit 6b4dfd53ae)
2015-12-11 17:03:36 -08:00
Eric Anholt
08cf0f8529 vc4: Add support for multisample framebuffer operations.
This includes GL_SAMPLE_COVERAGE, GL_SAMPLE_ALPHA_TO_ONE, and
GL_SAMPLE_ALPHA_TO_COVAGE.

I haven't implemented a dithering function yet, and gallium doesn't give
me a good chance to do so for GL_SAMPLE_COVERAGE.

(cherry picked from commit a97b40dca4)
2015-12-11 17:03:31 -08:00
Eric Anholt
ba51596b1d vc4: Add a workaround for HW-2905, and additional failure I saw with MSAA.
I only stumbled on this while experimenting due to reading about HW-2905.
I don't know if the EZ disable in the Z-clear is actually necessary, but
go with it for now.

(cherry picked from commit edc3305de7)
2015-12-11 17:03:03 -08:00
Eric Anholt
3d13bb8851 vc4: Add support for drawing in MSAA.
(cherry picked from commit edfd4d853a)
2015-12-11 17:03:03 -08:00
Eric Anholt
3bf2c6b96a vc4: Add kernel RCL support for MSAA rendering.
(cherry picked from commit e7c8ad0a6c)
2015-12-11 17:03:03 -08:00
Eric Anholt
5ab1bb4bec vc4: Rename color_ms_write to color_write.
I was thinking this was the only MSAA resolve thing, so it should be noted
separately, but actually load/store general also do MSAA resolve.

(cherry picked from commit 568d3a8e32)
2015-12-11 17:03:03 -08:00
Eric Anholt
c5ca18ec2f vc4: Allow RCL blits to the edge of the surface.
The recent unaligned fix successfully prevented RCL blits that weren't
aligned inside of the surface, but we also want to be able to do RCL blits
for the whole surface when the width or height of the surface aren't
aligned (we don't care what renders inside of the padding).

(cherry picked from commit bf92017ace)
2015-12-11 17:03:03 -08:00
Eric Anholt
f6cca7a0c9 vc4: Fix check for tile RCL blits with mismatched y.
This was a typo in 3a508a0d94 that didn't
show up in testcases at that moment.

(cherry picked from commit 2792d118f1)
2015-12-11 17:03:03 -08:00
Eric Anholt
ae649bf1ad vc4: Fix compiler warning from size_t change.
I missed this when bringing over the kernel changes.

(cherry picked from commit 1529f138ff)
2015-12-11 17:03:03 -08:00
Eric Anholt
132303cfe4 vc4: Fix accidental scissoring when scissor is disabled.
Even if the rasterizer has scissor disabled, we'll have whatever
vc4->scissor bounds were last set when someone set up a scissor, so we
shouldn't clip to them in that case.

Fixes piglit fbo-blit-rect, and a lot of MSAA tests once they're enabled.

(cherry picked from commit a4eff86f4a)
2015-12-11 17:03:03 -08:00
Eric Anholt
9df2431194 vc4: Disable RCL blitting when scissors are enabled.
We could potentially handle scissored blits when they're tile aligned, but
it doesn't seem worth it.  If you're doing a scissored blit, you're
probably a testcase.

Fixes piglit's fbo-scissor-blit fbo

(cherry picked from commit d16d666776)
2015-12-11 17:03:03 -08:00
Eric Anholt
dd409e2a41 vc4: Bring over cleanups from submitting to the kernel.
(cherry picked from commit 0afe83078d)
2015-12-11 17:03:03 -08:00
Eric Anholt
38c770ec29 vc4: Add debug dumping of MSAA surfaces.
(cherry picked from commit a69ac4e89c)
2015-12-11 17:03:03 -08:00
Eric Anholt
d8450616d9 vc4: Add support for laying out MSAA resources.
For MSAA, we store full resolution tile buffer contents, which have their
own tiling format.  Since they're full resolution buffers, we have to
align their size to full tiles.

(cherry picked from commit 3c3b1184eb)
2015-12-11 17:03:02 -08:00
Eric Anholt
c9fe9e4b42 vc4: Add support for storing sample mask.
From the API perspective, writing 1 bits can't turn on pixels that were
off, so we AND it with the sample mask from the payload.

(cherry picked from commit 74c4b3b80c)
2015-12-11 17:03:02 -08:00
Eric Anholt
693e938321 vc4: Fix up tile alignment checks for blitting using just an RCL.
We were checking that the blit started at 0 and was 1:1, but not that it
went to the full width of the surface, or that the width was aligned to a
tile.  We then told it to blit to the full width/height of the surface,
causing contents to be stomped in a bunch of MSAA tests that happen to
include half-screen-width blits to 0,0.

(cherry picked from commit 3a508a0d94)
2015-12-11 17:03:02 -08:00
Eric Anholt
7a0661839b vc4: Add support for loading sample mask.
(cherry picked from commit a664233042)
2015-12-11 17:03:02 -08:00
Eric Anholt
4c234d183b vc4: Use nir_channel() to simplify all of our nir_swizzle() cases.
(cherry picked from commit 4cff16bc3a)
2015-12-11 17:03:02 -08:00
Eric Anholt
b37189523e vc4: Fix point size lookup.
I think I may have regressed this in the NIR conversion.  TGSI-to-NIR is
putting the PSIZ in the .x channel, not .w, so we were grabbing some
garbage for point size, which ended up meaning just not drawing points.

Fixes glean pointAtten and pointsprite.

(cherry picked from commit 81544f231a)
2015-12-11 16:57:39 -08:00
68 changed files with 2947 additions and 456 deletions

View File

@@ -1 +1 @@
11.1.0-rc3
11.1.0

View File

@@ -1,2 +0,0 @@
# The introduced definitions are not used/implemented by mesa
1d5b88e33b07bc26d612720e6cb197a6917ba75f gles2: Update gl2ext.h to revision: 32120

View File

@@ -238,6 +238,12 @@ for details.
</ul>
<h3>VA-API state tracker environment variables</h3>
<ul>
<li>VAAPI_MPEG4_ENABLED - enable MPEG4 for VA-API, disabled by default.
</ul>
<p>
Other Gallium drivers have their own environment variables. These may change
frequently so the source code should be consulted for details.

View File

@@ -14,7 +14,7 @@
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 11.1.0 Release Notes / TBD</h1>
<h1>Mesa 11.1.0 Release Notes / 15 December 2015</h1>
<p>
Mesa 11.1.0 is a new development release.
@@ -84,11 +84,196 @@ Note: some of the new features are only available with certain drivers.
<h2>Bug fixes</h2>
TBD.
<p>This list is likely incomplete.</p>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=28130">Bug 28130</a> - vbo: premature flushing breaks GL_LINE_LOOP</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=38109">Bug 38109</a> - i915 driver crashes if too few vertices are submitted (Mesa 7.10.2)</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=49779">Bug 49779</a> - Extra line segments in GL_LINE_LOOP</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=55552">Bug 55552</a> - Compile errors with --enable-mangling</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=71789">Bug 71789</a> - [r300g] Visuals not found in (default) depth = 24</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=79783">Bug 79783</a> - Distorted output in obs-studio where other vendors &quot;work&quot;</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=80821">Bug 80821</a> - When LIBGL_ALWAYS_SOFTWARE is set, KHR_create_context is not supported</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=81174">Bug 81174</a> - Gallium: GL_LINE_LOOP broken with more than 512 points</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=83508">Bug 83508</a> - [UBO] Assertion for array of blocks</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=84677">Bug 84677</a> - Triangle disappears with glPolygonMode GL_LINE</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=86281">Bug 86281</a> - brw_meta_fast_clear (brw=brw&#64;entry=0x7fffd4097a08, fb=fb&#64;entry=0x7fffd40fa900, buffers=buffers&#64;entry=2, partial_clear=partial_clear&#64;entry=false)</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=86469">Bug 86469</a> - Unreal Engine demo doesn't run</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=86720">Bug 86720</a> - [radeon] Europa Universalis 4 freezing during game start (10.3.3+, still broken on 11.0.2)</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=89014">Bug 89014</a> - PIPE_QUERY_GPU_FINISHED is not acting as expected on SI</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90175">Bug 90175</a> - [hsw bisected][PATCH] atomic counters doesn't work for a binding point different to zero</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90348">Bug 90348</a> - Spilling failure of b96 merged value</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90631">Bug 90631</a> - Compilation failure for fragment shader with many branches on Sandy Bridge</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90734">Bug 90734</a> - glBufferSubData is corrupting data when buffer is &gt; 32k</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90887">Bug 90887</a> - PhiMovesPass in register allocator broken</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91044">Bug 91044</a> - piglit spec/egl_khr_create_context/valid debug flag gles* fail</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91114">Bug 91114</a> - ES3-CTS.gtf.GL3Tests.shadow.shadow_execution_vert fails</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91254">Bug 91254</a> - (regresion) video using VA-API on Intel slow and freeze system with mesa 10.6 or 10.6.1</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91292">Bug 91292</a> - [BDW+] glVertexAttribDivisor not working in combination with glPolygonMode</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91342">Bug 91342</a> - Very dark textures on some objects in indoors environments in Postal 2</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91526">Bug 91526</a> - World of Warcraft (on Wine) has UI corruption with nouveau</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91551">Bug 91551</a> - DXTn compressed normal maps produce severe artifacts on all NV5x and NVDx chipsets</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91596">Bug 91596</a> - EGL_KHR_gl_colorspace (v2) causes problem with Android-x86 GUI</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91716">Bug 91716</a> - [bisected] piglit.shaders.glsl-vs-int-attrib regresses on 32 bit BYT, HSW, IVB, SNB</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91718">Bug 91718</a> - piglit.spec.arb_shader_image_load_store.invalid causes intermittent GPU HANG</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91719">Bug 91719</a> - [SNB,HSW,BYT] dEQP regressions associated with using NIR for vertex shaders</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91726">Bug 91726</a> - R600 asserts in tgsi_cmp/make_src_for_op3</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91780">Bug 91780</a> - Rendering issues with geometry shader</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91785">Bug 91785</a> - make check DispatchSanity_test.GLES31 regression</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91788">Bug 91788</a> - [HSW Regression] Synmark2_v6 Multithread performance case FPS reduced by 36%</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91847">Bug 91847</a> - glGenerateTextureMipmap not working (no errors) unless glActiveTexture(GL_TEXTURE1) is called before</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91857">Bug 91857</a> - Mesa 10.6.3 linker is slow</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91881">Bug 91881</a> - regression: GPU lockups since mesa-11.0.0_rc1 on RV620 (r600) driver</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91890">Bug 91890</a> - [nve7] witcher2: blurry image &amp; DATA_ERRORs (class 0xa097 mthd 0x2380/0x238c)</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91898">Bug 91898</a> - src/util/mesa-sha1.c:250:25: fatal error: openssl/sha.h: No such file or directory</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91927">Bug 91927</a> - [SKL] [regression] piglit compressed textures tests fail with kernel upgrade</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91930">Bug 91930</a> - Program with GtkGLArea widget does not redraw</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91970">Bug 91970</a> - [BSW regression] dEQP-GLES3.functional.shaders.precision.int.highp_mul_vertex</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91985">Bug 91985</a> - [regression, bisected] FTBFS with commit f9caabe8f1: R600_UCP_CONST_BUFFER is undefined</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91993">Bug 91993</a> - Graphical glitch in Astromenace (open-source game).</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92009">Bug 92009</a> - ES3-CTS.gtf.GL3Tests.packed_pixels.packed_pixels fails</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92033">Bug 92033</a> - [SNB,regression,dEQP,bisected] functional.shaders.random tests regressed</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92052">Bug 92052</a> - nir/nir_builder.h:79: error: expected primary-expression before . token</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92054">Bug 92054</a> - make check gbm-symbols-check regression</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92066">Bug 92066</a> - [ILK,G45,regression] New assertion on BRW_MAX_MRF breaks ilk and g45</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92072">Bug 92072</a> - Wine breakage since d082c5324 (st/mesa: don't call st_validate_state in BlitFramebuffer)</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92095">Bug 92095</a> - [Regression, bisected] arb_shader_atomic_counters.compiler.builtins.frag</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92122">Bug 92122</a> - [bisected, cts] Regression with Assault Android Cactus</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92124">Bug 92124</a> - shader_query.cpp:841:34: error: strndup was not declared in this scope</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92183">Bug 92183</a> - linker.cpp:3187:46: error: strtok_r was not declared in this scope</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92193">Bug 92193</a> - [SKL] ES2-CTS.gtf.GL2ExtensionTests.compressed_astc_texture.compressed_astc_texture fails</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92214">Bug 92214</a> - Flightgear crashes during splashboot with R600 driver, LLVM 3.7.0 and mesa 11.0.2</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92221">Bug 92221</a> - Unintended code changes in _mesa_base_tex_format commit</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92265">Bug 92265</a> - Black windows in weston after update mesa to 11.0.2-1</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92304">Bug 92304</a> - [cts] cts.shaders.negative conformance tests fail</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92363">Bug 92363</a> - [BSW/BDW] ogles1conform Gets test fails</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92437">Bug 92437</a> - osmesa: Expose GL entry points for Windows build, via .def file</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92438">Bug 92438</a> - Segfault in pushbuf_kref when running the android emulator (qemu) on nv50</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92476">Bug 92476</a> - [cts] ES2-CTS.gtf.GL2ExtensionTests.egl_image.egl_image fails</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92588">Bug 92588</a> - [HSW,BDW,BSW,SKL-Y][GLES 3.1 CTS] ES31-CTS.arrays_of_arrays.InteractionFunctionCalls2 - assert</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92621">Bug 92621</a> - [G965 ILK G45] Regression: 24 piglit regressions in glsl-1.10</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92623">Bug 92623</a> - Differences in prog_data ignored when caching fragment programs (causes hangs)</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92634">Bug 92634</a> - gallium's vl_mpeg12_decoder does not work with st/va</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92639">Bug 92639</a> - [Regression bisected] Ogles1conform mustpass.c fail</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92641">Bug 92641</a> - [SKL BSW] [Regression] Ogles1conform userclip.c fail</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92645">Bug 92645</a> - kodi vdpau interop fails since mesa,meta: move gl_texture_object::TargetIndex initializations</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92705">Bug 92705</a> - [clover] fail to build with llvm-svn/clang-svn 3.8</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92709">Bug 92709</a> - &quot;LLVM triggered Diagnostic Handler: unsupported call to function ldexpf in main&quot; when starting race in stuntrally</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92738">Bug 92738</a> - Randon R7 240 doesn't work on 16KiB page size platform</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92744">Bug 92744</a> - [g965 Regression bisected] Performance regression and piglit assertions due to liveness analysis</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92770">Bug 92770</a> - [SNB, regression, dEQP] deqp-gles3.functional.shaders.discard.dynamic_loop_texture</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92824">Bug 92824</a> - [regression, bisected] `make check` dispatch-sanity broken by GL_EXT_buffer_storage</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92849">Bug 92849</a> - [IVB HSW BDW] piglit image load/store load-from-cleared-image.shader_test fails</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92859">Bug 92859</a> - [regression, bisected] validate_intrinsic_instr: Assertion triggered</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92860">Bug 92860</a> - [radeonsi][bisected] st/mesa: implement ARB_copy_image - Corruption in ARK Survival Evolved</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92900">Bug 92900</a> - [regression bisected] About 700 piglit regressions is what could go wrong</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92909">Bug 92909</a> - Offset/alignment issue with layout std140 and vec3</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92985">Bug 92985</a> - Mac OS X build error &quot;ar: no archive members specified&quot;</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93015">Bug 93015</a> - Tonga Elemental segfault + VM faults since radeon: implement r600_query_hw_get_result via function pointers</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93048">Bug 93048</a> - [CTS regression] mesa af2723 breaks GL Conformance for debug extension</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93063">Bug 93063</a> - drm_helper.h:227:1: error: static declaration of pipe_virgl_create_screen follows non-static declaration</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93091">Bug 93091</a> - [opencl] segfault when running any opencl programs (like clinfo)</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93126">Bug 93126</a> - wrongly claim supporting GL_EXT_texture_rg</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93180">Bug 93180</a> - [regression] arb_separate_shader_objects.active sampler conflict fails</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93235">Bug 93235</a> - [regression] dispatch sanity broken by GetPointerv</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93266">Bug 93266</a> - gl_arb_shading_language_420pack does not allow binding of image variables</li>
</ul>
<h2>Changes</h2>
TBD.
<li>MPEG4 decoding has been disabled by default in the VAAPI driver</li>
</div>
</body>

File diff suppressed because it is too large Load Diff

View File

@@ -575,8 +575,8 @@ CodeEmitterGK110::emitIMUL(const Instruction *i)
if (isLIMM(i->src(1), TYPE_S32)) {
emitForm_L(i, 0x280, 2, Modifier(0));
assert(i->subOp != NV50_IR_SUBOP_MUL_HIGH);
if (i->subOp == NV50_IR_SUBOP_MUL_HIGH)
code[1] |= 1 << 24;
if (i->sType == TYPE_S32)
code[1] |= 3 << 25;
} else {
@@ -695,14 +695,9 @@ CodeEmitterGK110::emitIMAD(const Instruction *i)
if (i->sType == TYPE_S32)
code[1] |= (1 << 19) | (1 << 24);
if (code[0] & 0x1) {
assert(!i->subOp);
SAT_(39);
} else {
if (i->subOp == NV50_IR_SUBOP_MUL_HIGH)
code[1] |= 1 << 25;
SAT_(35);
}
if (i->subOp == NV50_IR_SUBOP_MUL_HIGH)
code[1] |= 1 << 25;
SAT_(35);
}
void

View File

@@ -202,7 +202,8 @@ NV50LegalizePostRA::visit(Function *fn)
Program *prog = fn->getProgram();
r63 = new_LValue(fn, FILE_GPR);
if (prog->maxGPR < 63)
// GPR units on nv50 are in half-regs
if (prog->maxGPR < 126)
r63->reg.data.id = 63;
else
r63->reg.data.id = 127;

View File

@@ -686,7 +686,7 @@ NVC0LoweringPass::handleTEX(TexInstruction *i)
i->tex.s = 0x1f;
i->setIndirectR(hnd);
i->setIndirectS(NULL);
} else if (i->tex.r == i->tex.s) {
} else if (i->tex.r == i->tex.s || i->op == OP_TXF) {
i->tex.r += prog->driver->io.texBindBase / 4;
i->tex.s = 0; // only a single cX[] value possible here
} else {

View File

@@ -598,6 +598,106 @@ static int select_twoside_color(struct r600_shader_ctx *ctx, int front, int back
return 0;
}
/* execute a single slot ALU calculation */
static int single_alu_op2(struct r600_shader_ctx *ctx, int op,
int dst_sel, int dst_chan,
int src0_sel, unsigned src0_chan_val,
int src1_sel, unsigned src1_chan_val)
{
struct r600_bytecode_alu alu;
int r, i;
if (ctx->bc->chip_class == CAYMAN && op == ALU_OP2_MULLO_INT) {
for (i = 0; i < 4; i++) {
memset(&alu, 0, sizeof(struct r600_bytecode_alu));
alu.op = op;
alu.src[0].sel = src0_sel;
if (src0_sel == V_SQ_ALU_SRC_LITERAL)
alu.src[0].value = src0_chan_val;
else
alu.src[0].chan = src0_chan_val;
alu.src[1].sel = src1_sel;
if (src1_sel == V_SQ_ALU_SRC_LITERAL)
alu.src[1].value = src1_chan_val;
else
alu.src[1].chan = src1_chan_val;
alu.dst.sel = dst_sel;
alu.dst.chan = i;
alu.dst.write = i == dst_chan;
alu.last = (i == 3);
r = r600_bytecode_add_alu(ctx->bc, &alu);
if (r)
return r;
}
return 0;
}
memset(&alu, 0, sizeof(struct r600_bytecode_alu));
alu.op = op;
alu.src[0].sel = src0_sel;
if (src0_sel == V_SQ_ALU_SRC_LITERAL)
alu.src[0].value = src0_chan_val;
else
alu.src[0].chan = src0_chan_val;
alu.src[1].sel = src1_sel;
if (src1_sel == V_SQ_ALU_SRC_LITERAL)
alu.src[1].value = src1_chan_val;
else
alu.src[1].chan = src1_chan_val;
alu.dst.sel = dst_sel;
alu.dst.chan = dst_chan;
alu.dst.write = 1;
alu.last = 1;
r = r600_bytecode_add_alu(ctx->bc, &alu);
if (r)
return r;
return 0;
}
/* execute a single slot ALU calculation */
static int single_alu_op3(struct r600_shader_ctx *ctx, int op,
int dst_sel, int dst_chan,
int src0_sel, unsigned src0_chan_val,
int src1_sel, unsigned src1_chan_val,
int src2_sel, unsigned src2_chan_val)
{
struct r600_bytecode_alu alu;
int r;
/* validate this for other ops */
assert(op == ALU_OP3_MULADD_UINT24);
memset(&alu, 0, sizeof(struct r600_bytecode_alu));
alu.op = op;
alu.src[0].sel = src0_sel;
if (src0_sel == V_SQ_ALU_SRC_LITERAL)
alu.src[0].value = src0_chan_val;
else
alu.src[0].chan = src0_chan_val;
alu.src[1].sel = src1_sel;
if (src1_sel == V_SQ_ALU_SRC_LITERAL)
alu.src[1].value = src1_chan_val;
else
alu.src[1].chan = src1_chan_val;
alu.src[2].sel = src2_sel;
if (src2_sel == V_SQ_ALU_SRC_LITERAL)
alu.src[2].value = src2_chan_val;
else
alu.src[2].chan = src2_chan_val;
alu.dst.sel = dst_sel;
alu.dst.chan = dst_chan;
alu.is_op3 = 1;
alu.last = 1;
r = r600_bytecode_add_alu(ctx->bc, &alu);
if (r)
return r;
return 0;
}
static inline int get_address_file_reg(struct r600_shader_ctx *ctx, int index)
{
return index > 0 ? ctx->bc->index_reg[index - 1] : ctx->bc->ar_reg;
}
static int vs_add_primid_output(struct r600_shader_ctx *ctx, int prim_id_sid)
{
int i;
@@ -1129,6 +1229,7 @@ static int fetch_gs_input(struct r600_shader_ctx *ctx, struct tgsi_full_src_regi
unsigned vtx_id = src->Dimension.Index;
int offset_reg = vtx_id / 3;
int offset_chan = vtx_id % 3;
int t2 = 0;
/* offsets of per-vertex data in ESGS ring are passed to GS in R0.x, R0.y,
* R0.w, R1.x, R1.y, R1.z (it seems R0.z is used for PrimitiveID) */
@@ -1136,13 +1237,24 @@ static int fetch_gs_input(struct r600_shader_ctx *ctx, struct tgsi_full_src_regi
if (offset_reg == 0 && offset_chan == 2)
offset_chan = 3;
if (src->Dimension.Indirect || src->Register.Indirect)
t2 = r600_get_temp(ctx);
if (src->Dimension.Indirect) {
int treg[3];
int t2;
struct r600_bytecode_alu alu;
int r, i;
/* you have got to be shitting me -
unsigned addr_reg;
addr_reg = get_address_file_reg(ctx, src->DimIndirect.Index);
if (src->DimIndirect.Index > 0) {
r = single_alu_op2(ctx, ALU_OP1_MOV,
ctx->bc->ar_reg, 0,
addr_reg, 0,
0, 0);
if (r)
return r;
}
/*
we have to put the R0.x/y/w into Rt.x Rt+1.x Rt+2.x then index reg from Rt.
at least this is what fglrx seems to do. */
for (i = 0; i < 3; i++) {
@@ -1150,7 +1262,6 @@ static int fetch_gs_input(struct r600_shader_ctx *ctx, struct tgsi_full_src_regi
}
r600_add_gpr_array(ctx->shader, treg[0], 3, 0x0F);
t2 = r600_get_temp(ctx);
for (i = 0; i < 3; i++) {
memset(&alu, 0, sizeof(struct r600_bytecode_alu));
alu.op = ALU_OP1_MOV;
@@ -1175,8 +1286,33 @@ static int fetch_gs_input(struct r600_shader_ctx *ctx, struct tgsi_full_src_regi
if (r)
return r;
offset_reg = t2;
offset_chan = 0;
}
if (src->Register.Indirect) {
int addr_reg;
unsigned first = ctx->info.input_array_first[src->Indirect.ArrayID];
addr_reg = get_address_file_reg(ctx, src->Indirect.Index);
/* pull the value from index_reg */
r = single_alu_op2(ctx, ALU_OP2_ADD_INT,
t2, 1,
addr_reg, 0,
V_SQ_ALU_SRC_LITERAL, first);
if (r)
return r;
r = single_alu_op3(ctx, ALU_OP3_MULADD_UINT24,
t2, 0,
t2, 1,
V_SQ_ALU_SRC_LITERAL, 4,
offset_reg, offset_chan);
if (r)
return r;
offset_reg = t2;
offset_chan = 0;
index = src->Register.Index - first;
}
memset(&vtx, 0, sizeof(vtx));
vtx.buffer_id = R600_GS_RING_CONST_BUFFER;
@@ -1222,6 +1358,7 @@ static int tgsi_split_gs_inputs(struct r600_shader_ctx *ctx)
fetch_gs_input(ctx, src, treg);
ctx->src[i].sel = treg;
ctx->src[i].rel = 0;
}
}
return 0;
@@ -1498,7 +1635,7 @@ static int generate_gs_copy_shader(struct r600_context *rctx,
*last_exp_pos = NULL, *last_exp_param = NULL;
int i, j, next_clip_pos = 61, next_param = 0;
int ring;
bool only_ring_0 = true;
cshader = calloc(1, sizeof(struct r600_pipe_shader));
if (!cshader)
return 0;
@@ -1570,6 +1707,8 @@ static int generate_gs_copy_shader(struct r600_context *rctx,
for (i = 0; i < so->num_outputs; i++) {
if (so->output[i].stream == ring) {
enabled = true;
if (ring > 0)
only_ring_0 = false;
break;
}
}
@@ -1604,7 +1743,7 @@ static int generate_gs_copy_shader(struct r600_context *rctx,
cf_jump = ctx.bc->cf_last;
if (enabled)
emit_streamout(&ctx, so, ring, &cshader->shader.ring_item_sizes[ring]);
emit_streamout(&ctx, so, only_ring_0 ? -1 : ring, &cshader->shader.ring_item_sizes[ring]);
cshader->shader.ring_item_sizes[ring] = ocnt * 16;
}
@@ -7185,7 +7324,7 @@ static int tgsi_eg_arl(struct r600_shader_ctx *ctx)
struct r600_bytecode_alu alu;
int r;
int i, lasti = tgsi_last_instruction(inst->Dst[0].Register.WriteMask);
unsigned reg = inst->Dst[0].Register.Index > 0 ? ctx->bc->index_reg[inst->Dst[0].Register.Index - 1] : ctx->bc->ar_reg;
unsigned reg = get_address_file_reg(ctx, inst->Dst[0].Register.Index);
assert(inst->Dst[0].Register.Index < 3);
memset(&alu, 0, sizeof(struct r600_bytecode_alu));

View File

@@ -239,8 +239,8 @@ bool r600_common_context_init(struct r600_common_context *rctx,
rctx->family = rscreen->family;
rctx->chip_class = rscreen->chip_class;
if (rscreen->family == CHIP_HAWAII)
rctx->max_db = 16;
if (rscreen->chip_class >= CIK)
rctx->max_db = MAX2(8, rscreen->info.r600_num_backends);
else if (rscreen->chip_class >= EVERGREEN)
rctx->max_db = 8;
else

View File

@@ -489,6 +489,10 @@ static void vi_texture_alloc_dcc_separate(struct r600_common_screen *rscreen,
if (rscreen->debug_flags & DBG_NO_DCC)
return;
/* TODO: DCC is broken on Stoney */
if (rscreen->family == CHIP_STONEY)
return;
rtex->dcc_buffer = (struct r600_resource *)
r600_aligned_buffer_create(&rscreen->b, PIPE_BIND_CUSTOM,
PIPE_USAGE_DEFAULT, rtex->surface.dcc_size, rtex->surface.dcc_alignment);

View File

@@ -632,7 +632,7 @@ void si_check_vm_faults(struct si_context *sctx)
/* Use conservative timeout 800ms, after which we won't wait any
* longer and assume the GPU is hung.
*/
screen->fence_finish(screen, sctx->last_gfx_fence, 800*1000*1000);
sctx->b.ws->fence_wait(sctx->b.ws, sctx->last_gfx_fence, 800*1000*1000);
if (!si_vm_fault_occured(sctx, &addr))
return;

View File

@@ -594,6 +594,14 @@ static LLVMValueRef lds_load(struct lp_build_tgsi_context *bld_base,
lp_build_const_int32(gallivm, swizzle));
value = build_indexed_load(si_shader_ctx, si_shader_ctx->lds, dw_addr);
if (type == TGSI_TYPE_DOUBLE) {
LLVMValueRef value2;
dw_addr = lp_build_add(&bld_base->uint_bld, dw_addr,
lp_build_const_int32(gallivm, swizzle + 1));
value2 = build_indexed_load(si_shader_ctx, si_shader_ctx->lds, dw_addr);
return radeon_llvm_emit_fetch_double(bld_base, value, value2);
}
return LLVMBuildBitCast(gallivm->builder, value,
tgsi2llvmtype(bld_base, type), "");
}
@@ -733,6 +741,7 @@ static LLVMValueRef fetch_input_gs(
unsigned semantic_name = info->input_semantic_name[reg->Register.Index];
unsigned semantic_index = info->input_semantic_index[reg->Register.Index];
unsigned param;
LLVMValueRef value;
if (swizzle != ~0 && semantic_name == TGSI_SEMANTIC_PRIMID)
return get_primitive_id(bld_base, swizzle);
@@ -774,11 +783,22 @@ static LLVMValueRef fetch_input_gs(
args[7] = uint->zero; /* SLC */
args[8] = uint->zero; /* TFE */
value = lp_build_intrinsic(gallivm->builder,
"llvm.SI.buffer.load.dword.i32.i32",
i32, args, 9,
LLVMReadOnlyAttribute | LLVMNoUnwindAttribute);
if (type == TGSI_TYPE_DOUBLE) {
LLVMValueRef value2;
args[2] = lp_build_const_int32(gallivm, (param * 4 + swizzle + 1) * 256);
value2 = lp_build_intrinsic(gallivm->builder,
"llvm.SI.buffer.load.dword.i32.i32",
i32, args, 9,
LLVMReadOnlyAttribute | LLVMNoUnwindAttribute);
return radeon_llvm_emit_fetch_double(bld_base,
value, value2);
}
return LLVMBuildBitCast(gallivm->builder,
lp_build_intrinsic(gallivm->builder,
"llvm.SI.buffer.load.dword.i32.i32",
i32, args, 9,
LLVMReadOnlyAttribute | LLVMNoUnwindAttribute),
value,
tgsi2llvmtype(bld_base, type), "");
}

View File

@@ -21,6 +21,7 @@ C_SOURCES := \
vc4_job.c \
vc4_nir_lower_blend.c \
vc4_nir_lower_io.c \
vc4_nir_lower_txf_ms.c \
vc4_opt_algebraic.c \
vc4_opt_constant_folding.c \
vc4_opt_copy_propagation.c \

View File

@@ -121,6 +121,11 @@ enum vc4_packet {
#define VC4_PACKET_TILE_COORDINATES_SIZE 3
#define VC4_PACKET_GEM_HANDLES_SIZE 9
/* Number of multisamples supported. */
#define VC4_MAX_SAMPLES 4
/* Size of a full resolution color or Z tile buffer load/store. */
#define VC4_TILE_BUFFER_SIZE (64 * 64 * 4)
#define VC4_MASK(high, low) (((1 << ((high) - (low) + 1)) - 1) << (low))
/* Using the GNU statement expression extension */
#define VC4_SET_FIELD(value, field) \
@@ -151,6 +156,16 @@ enum vc4_packet {
#define VC4_LOADSTORE_FULL_RES_DISABLE_ZS (1 << 1)
#define VC4_LOADSTORE_FULL_RES_DISABLE_COLOR (1 << 0)
/** @{
*
* low bits of VC4_PACKET_STORE_FULL_RES_TILE_BUFFER and
* VC4_PACKET_LOAD_FULL_RES_TILE_BUFFER.
*/
#define VC4_LOADSTORE_FULL_RES_EOF (1 << 3)
#define VC4_LOADSTORE_FULL_RES_DISABLE_CLEAR_ALL (1 << 2)
#define VC4_LOADSTORE_FULL_RES_DISABLE_ZS (1 << 1)
#define VC4_LOADSTORE_FULL_RES_DISABLE_COLOR (1 << 0)
/** @{
*
* byte 2 of VC4_PACKET_STORE_TILE_BUFFER_GENERAL and

View File

@@ -36,9 +36,11 @@
struct vc4_rcl_setup {
struct drm_gem_cma_object *color_read;
struct drm_gem_cma_object *color_ms_write;
struct drm_gem_cma_object *color_write;
struct drm_gem_cma_object *zs_read;
struct drm_gem_cma_object *zs_write;
struct drm_gem_cma_object *msaa_color_write;
struct drm_gem_cma_object *msaa_zs_write;
struct drm_gem_cma_object *rcl;
u32 next_offset;
@@ -62,7 +64,6 @@ static inline void rcl_u32(struct vc4_rcl_setup *setup, u32 val)
setup->next_offset += 4;
}
/*
* Emits a no-op STORE_TILE_BUFFER_GENERAL.
*
@@ -81,6 +82,22 @@ static void vc4_store_before_load(struct vc4_rcl_setup *setup)
rcl_u32(setup, 0); /* no address, since we're in None mode */
}
/*
* Calculates the physical address of the start of a tile in a RCL surface.
*
* Unlike the other load/store packets,
* VC4_PACKET_LOAD/STORE_FULL_RES_TILE_BUFFER don't look at the tile
* coordinates packet, and instead just store to the address given.
*/
static uint32_t vc4_full_res_offset(struct vc4_exec_info *exec,
struct drm_gem_cma_object *bo,
struct drm_vc4_submit_rcl_surface *surf,
uint8_t x, uint8_t y)
{
return bo->paddr + surf->offset + VC4_TILE_BUFFER_SIZE *
(DIV_ROUND_UP(exec->args->width, 32) * y + x);
}
/*
* Emits a PACKET_TILE_COORDINATES if one isn't already pending.
*
@@ -108,22 +125,41 @@ static void emit_tile(struct vc4_exec_info *exec,
* may be outstanding at a time.
*/
if (setup->color_read) {
rcl_u8(setup, VC4_PACKET_LOAD_TILE_BUFFER_GENERAL);
rcl_u16(setup, args->color_read.bits);
rcl_u32(setup,
setup->color_read->paddr + args->color_read.offset);
if (args->color_read.flags &
VC4_SUBMIT_RCL_SURFACE_READ_IS_FULL_RES) {
rcl_u8(setup, VC4_PACKET_LOAD_FULL_RES_TILE_BUFFER);
rcl_u32(setup,
vc4_full_res_offset(exec, setup->color_read,
&args->color_read, x, y) |
VC4_LOADSTORE_FULL_RES_DISABLE_ZS);
} else {
rcl_u8(setup, VC4_PACKET_LOAD_TILE_BUFFER_GENERAL);
rcl_u16(setup, args->color_read.bits);
rcl_u32(setup, setup->color_read->paddr +
args->color_read.offset);
}
}
if (setup->zs_read) {
if (setup->color_read) {
/* Exec previous load. */
vc4_tile_coordinates(setup, x, y);
vc4_store_before_load(setup);
}
if (args->zs_read.flags &
VC4_SUBMIT_RCL_SURFACE_READ_IS_FULL_RES) {
rcl_u8(setup, VC4_PACKET_LOAD_FULL_RES_TILE_BUFFER);
rcl_u32(setup,
vc4_full_res_offset(exec, setup->zs_read,
&args->zs_read, x, y) |
VC4_LOADSTORE_FULL_RES_DISABLE_COLOR);
} else {
if (setup->color_read) {
/* Exec previous load. */
vc4_tile_coordinates(setup, x, y);
vc4_store_before_load(setup);
}
rcl_u8(setup, VC4_PACKET_LOAD_TILE_BUFFER_GENERAL);
rcl_u16(setup, args->zs_read.bits);
rcl_u32(setup, setup->zs_read->paddr + args->zs_read.offset);
rcl_u8(setup, VC4_PACKET_LOAD_TILE_BUFFER_GENERAL);
rcl_u16(setup, args->zs_read.bits);
rcl_u32(setup, setup->zs_read->paddr +
args->zs_read.offset);
}
}
/* Clipping depends on tile coordinates having been
@@ -144,20 +180,60 @@ static void emit_tile(struct vc4_exec_info *exec,
(y * exec->bin_tiles_x + x) * 32));
}
if (setup->msaa_color_write) {
bool last_tile_write = (!setup->msaa_zs_write &&
!setup->zs_write &&
!setup->color_write);
uint32_t bits = VC4_LOADSTORE_FULL_RES_DISABLE_ZS;
if (!last_tile_write)
bits |= VC4_LOADSTORE_FULL_RES_DISABLE_CLEAR_ALL;
else if (last)
bits |= VC4_LOADSTORE_FULL_RES_EOF;
rcl_u8(setup, VC4_PACKET_STORE_FULL_RES_TILE_BUFFER);
rcl_u32(setup,
vc4_full_res_offset(exec, setup->msaa_color_write,
&args->msaa_color_write, x, y) |
bits);
}
if (setup->msaa_zs_write) {
bool last_tile_write = (!setup->zs_write &&
!setup->color_write);
uint32_t bits = VC4_LOADSTORE_FULL_RES_DISABLE_COLOR;
if (setup->msaa_color_write)
vc4_tile_coordinates(setup, x, y);
if (!last_tile_write)
bits |= VC4_LOADSTORE_FULL_RES_DISABLE_CLEAR_ALL;
else if (last)
bits |= VC4_LOADSTORE_FULL_RES_EOF;
rcl_u8(setup, VC4_PACKET_STORE_FULL_RES_TILE_BUFFER);
rcl_u32(setup,
vc4_full_res_offset(exec, setup->msaa_zs_write,
&args->msaa_zs_write, x, y) |
bits);
}
if (setup->zs_write) {
bool last_tile_write = !setup->color_write;
if (setup->msaa_color_write || setup->msaa_zs_write)
vc4_tile_coordinates(setup, x, y);
rcl_u8(setup, VC4_PACKET_STORE_TILE_BUFFER_GENERAL);
rcl_u16(setup, args->zs_write.bits |
(setup->color_ms_write ?
VC4_STORE_TILE_BUFFER_DISABLE_COLOR_CLEAR : 0));
(last_tile_write ?
0 : VC4_STORE_TILE_BUFFER_DISABLE_COLOR_CLEAR));
rcl_u32(setup,
(setup->zs_write->paddr + args->zs_write.offset) |
((last && !setup->color_ms_write) ?
((last && last_tile_write) ?
VC4_LOADSTORE_TILE_BUFFER_EOF : 0));
}
if (setup->color_ms_write) {
if (setup->zs_write) {
/* Reset after previous store */
if (setup->color_write) {
if (setup->msaa_color_write || setup->msaa_zs_write ||
setup->zs_write) {
vc4_tile_coordinates(setup, x, y);
}
@@ -192,14 +268,26 @@ static int vc4_create_rcl_bo(struct drm_device *dev, struct vc4_exec_info *exec,
}
if (setup->color_read) {
loop_body_size += (VC4_PACKET_LOAD_TILE_BUFFER_GENERAL_SIZE);
if (args->color_read.flags &
VC4_SUBMIT_RCL_SURFACE_READ_IS_FULL_RES) {
loop_body_size += VC4_PACKET_LOAD_FULL_RES_TILE_BUFFER_SIZE;
} else {
loop_body_size += VC4_PACKET_LOAD_TILE_BUFFER_GENERAL_SIZE;
}
}
if (setup->zs_read) {
if (setup->color_read) {
loop_body_size += VC4_PACKET_TILE_COORDINATES_SIZE;
loop_body_size += VC4_PACKET_STORE_TILE_BUFFER_GENERAL_SIZE;
if (args->zs_read.flags &
VC4_SUBMIT_RCL_SURFACE_READ_IS_FULL_RES) {
loop_body_size += VC4_PACKET_LOAD_FULL_RES_TILE_BUFFER_SIZE;
} else {
if (setup->color_read &&
!(args->color_read.flags &
VC4_SUBMIT_RCL_SURFACE_READ_IS_FULL_RES)) {
loop_body_size += VC4_PACKET_TILE_COORDINATES_SIZE;
loop_body_size += VC4_PACKET_STORE_TILE_BUFFER_GENERAL_SIZE;
}
loop_body_size += VC4_PACKET_LOAD_TILE_BUFFER_GENERAL_SIZE;
}
loop_body_size += VC4_PACKET_LOAD_TILE_BUFFER_GENERAL_SIZE;
}
if (has_bin) {
@@ -207,13 +295,23 @@ static int vc4_create_rcl_bo(struct drm_device *dev, struct vc4_exec_info *exec,
loop_body_size += VC4_PACKET_BRANCH_TO_SUB_LIST_SIZE;
}
if (setup->msaa_color_write)
loop_body_size += VC4_PACKET_STORE_FULL_RES_TILE_BUFFER_SIZE;
if (setup->msaa_zs_write)
loop_body_size += VC4_PACKET_STORE_FULL_RES_TILE_BUFFER_SIZE;
if (setup->zs_write)
loop_body_size += VC4_PACKET_STORE_TILE_BUFFER_GENERAL_SIZE;
if (setup->color_ms_write) {
if (setup->zs_write)
loop_body_size += VC4_PACKET_TILE_COORDINATES_SIZE;
if (setup->color_write)
loop_body_size += VC4_PACKET_STORE_MS_TILE_BUFFER_SIZE;
}
/* We need a VC4_PACKET_TILE_COORDINATES in between each store. */
loop_body_size += VC4_PACKET_TILE_COORDINATES_SIZE *
((setup->msaa_color_write != NULL) +
(setup->msaa_zs_write != NULL) +
(setup->color_write != NULL) +
(setup->zs_write != NULL) - 1);
size += xtiles * ytiles * loop_body_size;
setup->rcl = drm_gem_cma_create(dev, size);
@@ -224,13 +322,12 @@ static int vc4_create_rcl_bo(struct drm_device *dev, struct vc4_exec_info *exec,
rcl_u8(setup, VC4_PACKET_TILE_RENDERING_MODE_CONFIG);
rcl_u32(setup,
(setup->color_ms_write ?
(setup->color_ms_write->paddr +
args->color_ms_write.offset) :
(setup->color_write ? (setup->color_write->paddr +
args->color_write.offset) :
0));
rcl_u16(setup, args->width);
rcl_u16(setup, args->height);
rcl_u16(setup, args->color_ms_write.bits);
rcl_u16(setup, args->color_write.bits);
/* The tile buffer gets cleared when the previous tile is stored. If
* the clear values changed between frames, then the tile buffer has
@@ -255,6 +352,7 @@ static int vc4_create_rcl_bo(struct drm_device *dev, struct vc4_exec_info *exec,
for (x = min_x_tile; x <= max_x_tile; x++) {
bool first = (x == min_x_tile && y == min_y_tile);
bool last = (x == max_x_tile && y == max_y_tile);
emit_tile(exec, setup, x, y, first, last);
}
}
@@ -266,6 +364,56 @@ static int vc4_create_rcl_bo(struct drm_device *dev, struct vc4_exec_info *exec,
return 0;
}
static int vc4_full_res_bounds_check(struct vc4_exec_info *exec,
struct drm_gem_cma_object *obj,
struct drm_vc4_submit_rcl_surface *surf)
{
struct drm_vc4_submit_cl *args = exec->args;
u32 render_tiles_stride = DIV_ROUND_UP(exec->args->width, 32);
if (surf->offset > obj->base.size) {
DRM_ERROR("surface offset %d > BO size %zd\n",
surf->offset, obj->base.size);
return -EINVAL;
}
if ((obj->base.size - surf->offset) / VC4_TILE_BUFFER_SIZE <
render_tiles_stride * args->max_y_tile + args->max_x_tile) {
DRM_ERROR("MSAA tile %d, %d out of bounds "
"(bo size %zd, offset %d).\n",
args->max_x_tile, args->max_y_tile,
obj->base.size,
surf->offset);
return -EINVAL;
}
return 0;
}
static int vc4_rcl_msaa_surface_setup(struct vc4_exec_info *exec,
struct drm_gem_cma_object **obj,
struct drm_vc4_submit_rcl_surface *surf)
{
if (surf->flags != 0 || surf->bits != 0) {
DRM_ERROR("MSAA surface had nonzero flags/bits\n");
return -EINVAL;
}
if (surf->hindex == ~0)
return 0;
*obj = vc4_use_bo(exec, surf->hindex);
if (!*obj)
return -EINVAL;
if (surf->offset & 0xf) {
DRM_ERROR("MSAA write must be 16b aligned.\n");
return -EINVAL;
}
return vc4_full_res_bounds_check(exec, *obj, surf);
}
static int vc4_rcl_surface_setup(struct vc4_exec_info *exec,
struct drm_gem_cma_object **obj,
struct drm_vc4_submit_rcl_surface *surf)
@@ -277,9 +425,10 @@ static int vc4_rcl_surface_setup(struct vc4_exec_info *exec,
uint8_t format = VC4_GET_FIELD(surf->bits,
VC4_LOADSTORE_TILE_BUFFER_FORMAT);
int cpp;
int ret;
if (surf->pad != 0) {
DRM_ERROR("Padding unset\n");
if (surf->flags & ~VC4_SUBMIT_RCL_SURFACE_READ_IS_FULL_RES) {
DRM_ERROR("Extra flags set\n");
return -EINVAL;
}
@@ -290,6 +439,25 @@ static int vc4_rcl_surface_setup(struct vc4_exec_info *exec,
if (!*obj)
return -EINVAL;
if (surf->flags & VC4_SUBMIT_RCL_SURFACE_READ_IS_FULL_RES) {
if (surf == &exec->args->zs_write) {
DRM_ERROR("general zs write may not be a full-res.\n");
return -EINVAL;
}
if (surf->bits != 0) {
DRM_ERROR("load/store general bits set with "
"full res load/store.\n");
return -EINVAL;
}
ret = vc4_full_res_bounds_check(exec, *obj, surf);
if (!ret)
return ret;
return 0;
}
if (surf->bits & ~(VC4_LOADSTORE_TILE_BUFFER_TILING_MASK |
VC4_LOADSTORE_TILE_BUFFER_BUFFER_MASK |
VC4_LOADSTORE_TILE_BUFFER_FORMAT_MASK)) {
@@ -341,9 +509,10 @@ static int vc4_rcl_surface_setup(struct vc4_exec_info *exec,
}
static int
vc4_rcl_ms_surface_setup(struct vc4_exec_info *exec,
struct drm_gem_cma_object **obj,
struct drm_vc4_submit_rcl_surface *surf)
vc4_rcl_render_config_surface_setup(struct vc4_exec_info *exec,
struct vc4_rcl_setup *setup,
struct drm_gem_cma_object **obj,
struct drm_vc4_submit_rcl_surface *surf)
{
uint8_t tiling = VC4_GET_FIELD(surf->bits,
VC4_RENDER_CONFIG_MEMORY_FORMAT);
@@ -351,13 +520,15 @@ vc4_rcl_ms_surface_setup(struct vc4_exec_info *exec,
VC4_RENDER_CONFIG_FORMAT);
int cpp;
if (surf->pad != 0) {
DRM_ERROR("Padding unset\n");
if (surf->flags != 0) {
DRM_ERROR("No flags supported on render config.\n");
return -EINVAL;
}
if (surf->bits & ~(VC4_RENDER_CONFIG_MEMORY_FORMAT_MASK |
VC4_RENDER_CONFIG_FORMAT_MASK)) {
VC4_RENDER_CONFIG_FORMAT_MASK |
VC4_RENDER_CONFIG_MS_MODE_4X |
VC4_RENDER_CONFIG_DECIMATE_MODE_4X)) {
DRM_ERROR("Unknown bits in render config: 0x%04x\n",
surf->bits);
return -EINVAL;
@@ -414,18 +585,20 @@ int vc4_get_rcl(struct drm_device *dev, struct vc4_exec_info *exec)
if (has_bin &&
(args->max_x_tile > exec->bin_tiles_x ||
args->max_y_tile > exec->bin_tiles_y)) {
DRM_ERROR("Render tiles (%d,%d) outside of bin config (%d,%d)\n",
DRM_ERROR("Render tiles (%d,%d) outside of bin config "
"(%d,%d)\n",
args->max_x_tile, args->max_y_tile,
exec->bin_tiles_x, exec->bin_tiles_y);
return -EINVAL;
}
ret = vc4_rcl_surface_setup(exec, &setup.color_read, &args->color_read);
ret = vc4_rcl_render_config_surface_setup(exec, &setup,
&setup.color_write,
&args->color_write);
if (ret)
return ret;
ret = vc4_rcl_ms_surface_setup(exec, &setup.color_ms_write,
&args->color_ms_write);
ret = vc4_rcl_surface_setup(exec, &setup.color_read, &args->color_read);
if (ret)
return ret;
@@ -437,10 +610,21 @@ int vc4_get_rcl(struct drm_device *dev, struct vc4_exec_info *exec)
if (ret)
return ret;
ret = vc4_rcl_msaa_surface_setup(exec, &setup.msaa_color_write,
&args->msaa_color_write);
if (ret)
return ret;
ret = vc4_rcl_msaa_surface_setup(exec, &setup.msaa_zs_write,
&args->msaa_zs_write);
if (ret)
return ret;
/* We shouldn't even have the job submitted to us if there's no
* surface to write out.
*/
if (!setup.color_ms_write && !setup.zs_write) {
if (!setup.color_write && !setup.zs_write &&
!setup.msaa_color_write && !setup.msaa_zs_write) {
DRM_ERROR("RCL requires color or Z/S write\n");
return -EINVAL;
}

View File

@@ -47,7 +47,6 @@
void *validated, \
void *untrusted
/** Return the width in pixels of a 64-byte microtile. */
static uint32_t
utile_width(int cpp)
@@ -191,7 +190,7 @@ vc4_check_tex_size(struct vc4_exec_info *exec, struct drm_gem_cma_object *fbo,
if (size + offset < size ||
size + offset > fbo->base.size) {
DRM_ERROR("Overflow in %dx%d (%dx%d) fbo size (%d + %d > %d)\n",
DRM_ERROR("Overflow in %dx%d (%dx%d) fbo size (%d + %d > %zd)\n",
width, height,
aligned_width, aligned_height,
size, offset, fbo->base.size);
@@ -201,7 +200,6 @@ vc4_check_tex_size(struct vc4_exec_info *exec, struct drm_gem_cma_object *fbo,
return true;
}
static int
validate_flush(VALIDATE_ARGS)
{
@@ -270,7 +268,7 @@ validate_indexed_prim_list(VALIDATE_ARGS)
if (offset > ib->base.size ||
(ib->base.size - offset) / index_size < length) {
DRM_ERROR("IB access overflow (%d + %d*%d > %d)\n",
DRM_ERROR("IB access overflow (%d + %d*%d > %zd)\n",
offset, length, index_size, ib->base.size);
return -EINVAL;
}
@@ -361,9 +359,8 @@ validate_tile_binning_config(VALIDATE_ARGS)
}
if (flags & (VC4_BIN_CONFIG_DB_NON_MS |
VC4_BIN_CONFIG_TILE_BUFFER_64BIT |
VC4_BIN_CONFIG_MS_MODE_4X)) {
DRM_ERROR("unsupported bining config flags 0x%02x\n", flags);
VC4_BIN_CONFIG_TILE_BUFFER_64BIT)) {
DRM_ERROR("unsupported binning config flags 0x%02x\n", flags);
return -EINVAL;
}
@@ -424,8 +421,8 @@ validate_gem_handles(VALIDATE_ARGS)
return 0;
}
#define VC4_DEFINE_PACKET(packet, name, func) \
[packet] = { packet ## _SIZE, name, func }
#define VC4_DEFINE_PACKET(packet, func) \
[packet] = { packet ## _SIZE, #packet, func }
static const struct cmd_info {
uint16_t len;
@@ -433,42 +430,42 @@ static const struct cmd_info {
int (*func)(struct vc4_exec_info *exec, void *validated,
void *untrusted);
} cmd_info[] = {
VC4_DEFINE_PACKET(VC4_PACKET_HALT, "halt", NULL),
VC4_DEFINE_PACKET(VC4_PACKET_NOP, "nop", NULL),
VC4_DEFINE_PACKET(VC4_PACKET_FLUSH, "flush", validate_flush),
VC4_DEFINE_PACKET(VC4_PACKET_FLUSH_ALL, "flush all state", NULL),
VC4_DEFINE_PACKET(VC4_PACKET_START_TILE_BINNING, "start tile binning", validate_start_tile_binning),
VC4_DEFINE_PACKET(VC4_PACKET_INCREMENT_SEMAPHORE, "increment semaphore", validate_increment_semaphore),
VC4_DEFINE_PACKET(VC4_PACKET_HALT, NULL),
VC4_DEFINE_PACKET(VC4_PACKET_NOP, NULL),
VC4_DEFINE_PACKET(VC4_PACKET_FLUSH, validate_flush),
VC4_DEFINE_PACKET(VC4_PACKET_FLUSH_ALL, NULL),
VC4_DEFINE_PACKET(VC4_PACKET_START_TILE_BINNING,
validate_start_tile_binning),
VC4_DEFINE_PACKET(VC4_PACKET_INCREMENT_SEMAPHORE,
validate_increment_semaphore),
VC4_DEFINE_PACKET(VC4_PACKET_GL_INDEXED_PRIMITIVE, "Indexed Primitive List", validate_indexed_prim_list),
VC4_DEFINE_PACKET(VC4_PACKET_GL_INDEXED_PRIMITIVE,
validate_indexed_prim_list),
VC4_DEFINE_PACKET(VC4_PACKET_GL_ARRAY_PRIMITIVE,
validate_gl_array_primitive),
VC4_DEFINE_PACKET(VC4_PACKET_GL_ARRAY_PRIMITIVE, "Vertex Array Primitives", validate_gl_array_primitive),
VC4_DEFINE_PACKET(VC4_PACKET_PRIMITIVE_LIST_FORMAT, NULL),
/* This is only used by clipped primitives (packets 48 and 49), which
* we don't support parsing yet.
*/
VC4_DEFINE_PACKET(VC4_PACKET_PRIMITIVE_LIST_FORMAT, "primitive list format", NULL),
VC4_DEFINE_PACKET(VC4_PACKET_GL_SHADER_STATE, validate_gl_shader_state),
VC4_DEFINE_PACKET(VC4_PACKET_GL_SHADER_STATE, "GL Shader State", validate_gl_shader_state),
/* We don't support validating NV shader states. */
VC4_DEFINE_PACKET(VC4_PACKET_CONFIGURATION_BITS, "configuration bits", NULL),
VC4_DEFINE_PACKET(VC4_PACKET_FLAT_SHADE_FLAGS, "flat shade flags", NULL),
VC4_DEFINE_PACKET(VC4_PACKET_POINT_SIZE, "point size", NULL),
VC4_DEFINE_PACKET(VC4_PACKET_LINE_WIDTH, "line width", NULL),
VC4_DEFINE_PACKET(VC4_PACKET_RHT_X_BOUNDARY, "RHT X boundary", NULL),
VC4_DEFINE_PACKET(VC4_PACKET_DEPTH_OFFSET, "Depth Offset", NULL),
VC4_DEFINE_PACKET(VC4_PACKET_CLIP_WINDOW, "Clip Window", NULL),
VC4_DEFINE_PACKET(VC4_PACKET_VIEWPORT_OFFSET, "Viewport Offset", NULL),
VC4_DEFINE_PACKET(VC4_PACKET_CLIPPER_XY_SCALING, "Clipper XY Scaling", NULL),
VC4_DEFINE_PACKET(VC4_PACKET_CONFIGURATION_BITS, NULL),
VC4_DEFINE_PACKET(VC4_PACKET_FLAT_SHADE_FLAGS, NULL),
VC4_DEFINE_PACKET(VC4_PACKET_POINT_SIZE, NULL),
VC4_DEFINE_PACKET(VC4_PACKET_LINE_WIDTH, NULL),
VC4_DEFINE_PACKET(VC4_PACKET_RHT_X_BOUNDARY, NULL),
VC4_DEFINE_PACKET(VC4_PACKET_DEPTH_OFFSET, NULL),
VC4_DEFINE_PACKET(VC4_PACKET_CLIP_WINDOW, NULL),
VC4_DEFINE_PACKET(VC4_PACKET_VIEWPORT_OFFSET, NULL),
VC4_DEFINE_PACKET(VC4_PACKET_CLIPPER_XY_SCALING, NULL),
/* Note: The docs say this was also 105, but it was 106 in the
* initial userland code drop.
*/
VC4_DEFINE_PACKET(VC4_PACKET_CLIPPER_Z_SCALING, "Clipper Z Scale and Offset", NULL),
VC4_DEFINE_PACKET(VC4_PACKET_CLIPPER_Z_SCALING, NULL),
VC4_DEFINE_PACKET(VC4_PACKET_TILE_BINNING_MODE_CONFIG, "tile binning configuration", validate_tile_binning_config),
VC4_DEFINE_PACKET(VC4_PACKET_TILE_BINNING_MODE_CONFIG,
validate_tile_binning_config),
VC4_DEFINE_PACKET(VC4_PACKET_GEM_HANDLES, "GEM handles", validate_gem_handles),
VC4_DEFINE_PACKET(VC4_PACKET_GEM_HANDLES, validate_gem_handles),
};
int
@@ -500,11 +497,6 @@ vc4_validate_bin_cl(struct drm_device *dev,
return -EINVAL;
}
#if 0
DRM_INFO("0x%08x: packet %d (%s) size %d processing...\n",
src_offset, cmd, info->name, info->len);
#endif
if (src_offset + info->len > len) {
DRM_ERROR("0x%08x: packet %d (%s) length 0x%08x "
"exceeds bounds (0x%08x)\n",
@@ -519,8 +511,7 @@ vc4_validate_bin_cl(struct drm_device *dev,
if (info->func && info->func(exec,
dst_pkt + 1,
src_pkt + 1)) {
DRM_ERROR("0x%08x: packet %d (%s) failed to "
"validate\n",
DRM_ERROR("0x%08x: packet %d (%s) failed to validate\n",
src_offset, cmd, info->name);
return -EINVAL;
}
@@ -588,12 +579,14 @@ reloc_tex(struct vc4_exec_info *exec,
if (sample->is_direct) {
uint32_t remaining_size = tex->base.size - p0;
if (p0 > tex->base.size - 4) {
DRM_ERROR("UBO offset greater than UBO size\n");
goto fail;
}
if (p1 > remaining_size - 4) {
DRM_ERROR("UBO clamp would allow reads outside of UBO\n");
DRM_ERROR("UBO clamp would allow reads "
"outside of UBO\n");
goto fail;
}
*validated_p0 = tex->paddr + p0;
@@ -866,7 +859,7 @@ validate_gl_shader_rec(struct drm_device *dev,
if (vbo->base.size < offset ||
vbo->base.size - offset < attr_size) {
DRM_ERROR("BO offset overflow (%d + %d > %d)\n",
DRM_ERROR("BO offset overflow (%d + %d > %zd)\n",
offset, attr_size, vbo->base.size);
return -EINVAL;
}
@@ -875,7 +868,8 @@ validate_gl_shader_rec(struct drm_device *dev,
max_index = ((vbo->base.size - offset - attr_size) /
stride);
if (state->max_index > max_index) {
DRM_ERROR("primitives use index %d out of supplied %d\n",
DRM_ERROR("primitives use index %d out of "
"supplied %d\n",
state->max_index, max_index);
return -EINVAL;
}

View File

@@ -24,24 +24,16 @@
/**
* DOC: Shader validator for VC4.
*
* The VC4 has no IOMMU between it and system memory. So, a user with access
* to execute shaders could escalate privilege by overwriting system memory
* (using the VPM write address register in the general-purpose DMA mode) or
* reading system memory it shouldn't (reading it as a texture, or uniform
* data, or vertex data).
* The VC4 has no IOMMU between it and system memory, so a user with
* access to execute shaders could escalate privilege by overwriting
* system memory (using the VPM write address register in the
* general-purpose DMA mode) or reading system memory it shouldn't
* (reading it as a texture, or uniform data, or vertex data).
*
* This walks over a shader starting from some offset within a BO, ensuring
* that its accesses are appropriately bounded, and recording how many texture
* accesses are made and where so that we can do relocations for them in the
* This walks over a shader BO, ensuring that its accesses are
* appropriately bounded, and recording how many texture accesses are
* made and where so that we can do relocations for them in the
* uniform stream.
*
* The kernel API has shaders stored in user-mapped BOs. The BOs will be
* forcibly unmapped from the process before validation, and any cache of
* validated state will be flushed if the mapping is faulted back in.
*
* Storing the shaders in BOs means that the validation process will be slow
* due to uncached reads, but since shaders are long-lived and shader BOs are
* never actually modified, this shouldn't be a problem.
*/
#include "vc4_drv.h"
@@ -71,7 +63,6 @@ waddr_to_live_reg_index(uint32_t waddr, bool is_b)
else
return waddr;
} else if (waddr <= QPU_W_ACC3) {
return 64 + waddr - QPU_W_ACC0;
} else {
return ~0;
@@ -86,15 +77,14 @@ raddr_add_a_to_live_reg_index(uint64_t inst)
uint32_t raddr_a = QPU_GET_FIELD(inst, QPU_RADDR_A);
uint32_t raddr_b = QPU_GET_FIELD(inst, QPU_RADDR_B);
if (add_a == QPU_MUX_A) {
if (add_a == QPU_MUX_A)
return raddr_a;
} else if (add_a == QPU_MUX_B && sig != QPU_SIG_SMALL_IMM) {
else if (add_a == QPU_MUX_B && sig != QPU_SIG_SMALL_IMM)
return 32 + raddr_b;
} else if (add_a <= QPU_MUX_R3) {
else if (add_a <= QPU_MUX_R3)
return 64 + add_a;
} else {
else
return ~0;
}
}
static bool
@@ -112,9 +102,9 @@ is_tmu_write(uint32_t waddr)
}
static bool
record_validated_texture_sample(struct vc4_validated_shader_info *validated_shader,
struct vc4_shader_validation_state *validation_state,
int tmu)
record_texture_sample(struct vc4_validated_shader_info *validated_shader,
struct vc4_shader_validation_state *validation_state,
int tmu)
{
uint32_t s = validated_shader->num_texture_samples;
int i;
@@ -227,8 +217,8 @@ check_tmu_write(uint64_t inst,
validated_shader->uniforms_size += 4;
if (submit) {
if (!record_validated_texture_sample(validated_shader,
validation_state, tmu)) {
if (!record_texture_sample(validated_shader,
validation_state, tmu)) {
return false;
}
@@ -239,10 +229,10 @@ check_tmu_write(uint64_t inst,
}
static bool
check_register_write(uint64_t inst,
struct vc4_validated_shader_info *validated_shader,
struct vc4_shader_validation_state *validation_state,
bool is_mul)
check_reg_write(uint64_t inst,
struct vc4_validated_shader_info *validated_shader,
struct vc4_shader_validation_state *validation_state,
bool is_mul)
{
uint32_t waddr = (is_mul ?
QPU_GET_FIELD(inst, QPU_WADDR_MUL) :
@@ -298,7 +288,7 @@ check_register_write(uint64_t inst,
return true;
case QPU_W_TLB_STENCIL_SETUP:
return true;
return true;
}
return true;
@@ -361,7 +351,7 @@ track_live_clamps(uint64_t inst,
}
validation_state->live_max_clamp_regs[lri_add] = true;
} if (op_add == QPU_A_MIN) {
} else if (op_add == QPU_A_MIN) {
/* Track live clamps of a value clamped to a minimum of 0 and
* a maximum of some uniform's offset.
*/
@@ -393,8 +383,10 @@ check_instruction_writes(uint64_t inst,
return false;
}
ok = (check_register_write(inst, validated_shader, validation_state, false) &&
check_register_write(inst, validated_shader, validation_state, true));
ok = (check_reg_write(inst, validated_shader, validation_state,
false) &&
check_reg_write(inst, validated_shader, validation_state,
true));
track_live_clamps(inst, validated_shader, validation_state);
@@ -442,7 +434,7 @@ vc4_validate_shader(struct drm_gem_cma_object *shader_obj)
shader = shader_obj->vaddr;
max_ip = shader_obj->base.size / sizeof(uint64_t);
validated_shader = kcalloc(sizeof(*validated_shader), 1, GFP_KERNEL);
validated_shader = kcalloc(1, sizeof(*validated_shader), GFP_KERNEL);
if (!validated_shader)
return NULL;
@@ -498,7 +490,7 @@ vc4_validate_shader(struct drm_gem_cma_object *shader_obj)
if (ip == max_ip) {
DRM_ERROR("shader failed to terminate before "
"shader BO end at %d\n",
"shader BO end at %zd\n",
shader_obj->base.size);
goto fail;
}
@@ -514,6 +506,9 @@ vc4_validate_shader(struct drm_gem_cma_object *shader_obj)
return validated_shader;
fail:
kfree(validated_shader);
if (validated_shader) {
kfree(validated_shader->texture_samples);
kfree(validated_shader);
}
return NULL;
}

View File

@@ -41,24 +41,53 @@ vc4_get_blit_surface(struct pipe_context *pctx,
return pctx->create_surface(pctx, prsc, &tmpl);
}
static bool
is_tile_unaligned(unsigned size, unsigned tile_size)
{
return size & (tile_size - 1);
}
static bool
vc4_tile_blit(struct pipe_context *pctx, const struct pipe_blit_info *info)
{
struct vc4_context *vc4 = vc4_context(pctx);
bool old_msaa = vc4->msaa;
int old_tile_width = vc4->tile_width;
int old_tile_height = vc4->tile_height;
bool msaa = (info->src.resource->nr_samples ||
info->dst.resource->nr_samples);
int tile_width = msaa ? 32 : 64;
int tile_height = msaa ? 32 : 64;
if (util_format_is_depth_or_stencil(info->dst.resource->format))
return false;
if (info->scissor_enable)
return false;
if ((info->mask & PIPE_MASK_RGBA) == 0)
return false;
if (info->dst.box.x != 0 || info->dst.box.y != 0 ||
info->src.box.x != 0 || info->src.box.y != 0 ||
if (info->dst.box.x != info->src.box.x ||
info->dst.box.y != info->src.box.y ||
info->dst.box.width != info->src.box.width ||
info->dst.box.height != info->src.box.height) {
return false;
}
int dst_surface_width = u_minify(info->dst.resource->width0,
info->dst.level);
int dst_surface_height = u_minify(info->dst.resource->height0,
info->dst.level);
if (is_tile_unaligned(info->dst.box.x, tile_width) ||
is_tile_unaligned(info->dst.box.y, tile_height) ||
(is_tile_unaligned(info->dst.box.width, tile_width) &&
info->dst.box.x + info->dst.box.width != dst_surface_width) ||
(is_tile_unaligned(info->dst.box.height, tile_height) &&
info->dst.box.y + info->dst.box.height != dst_surface_height)) {
return false;
}
if (info->dst.resource->format != info->src.resource->format)
return false;
@@ -70,18 +99,32 @@ vc4_tile_blit(struct pipe_context *pctx, const struct pipe_blit_info *info)
vc4_get_blit_surface(pctx, info->src.resource, info->src.level);
pipe_surface_reference(&vc4->color_read, src_surf);
pipe_surface_reference(&vc4->color_write, dst_surf);
pipe_surface_reference(&vc4->color_write,
dst_surf->texture->nr_samples ? NULL : dst_surf);
pipe_surface_reference(&vc4->msaa_color_write,
dst_surf->texture->nr_samples ? dst_surf : NULL);
pipe_surface_reference(&vc4->zs_read, NULL);
pipe_surface_reference(&vc4->zs_write, NULL);
vc4->draw_min_x = 0;
vc4->draw_min_y = 0;
vc4->draw_max_x = dst_surf->width;
vc4->draw_max_y = dst_surf->height;
pipe_surface_reference(&vc4->msaa_zs_write, NULL);
vc4->draw_min_x = info->dst.box.x;
vc4->draw_min_y = info->dst.box.y;
vc4->draw_max_x = info->dst.box.x + info->dst.box.width;
vc4->draw_max_y = info->dst.box.y + info->dst.box.height;
vc4->draw_width = dst_surf->width;
vc4->draw_height = dst_surf->height;
vc4->tile_width = tile_width;
vc4->tile_height = tile_height;
vc4->msaa = msaa;
vc4->needs_flush = true;
vc4_job_submit(vc4);
vc4->msaa = old_msaa;
vc4->tile_width = old_tile_width;
vc4->tile_height = old_tile_height;
pipe_surface_reference(&dst_surf, NULL);
pipe_surface_reference(&src_surf, NULL);
@@ -131,14 +174,6 @@ vc4_blit(struct pipe_context *pctx, const struct pipe_blit_info *blit_info)
{
struct pipe_blit_info info = *blit_info;
if (info.src.resource->nr_samples > 1 &&
info.dst.resource->nr_samples <= 1 &&
!util_format_is_depth_or_stencil(info.src.resource->format) &&
!util_format_is_pure_integer(info.src.resource->format)) {
fprintf(stderr, "color resolve unimplemented\n");
return;
}
if (vc4_tile_blit(pctx, blit_info))
return;

View File

@@ -67,8 +67,16 @@ vc4_flush(struct pipe_context *pctx)
cl_u8(&bcl, VC4_PACKET_FLUSH);
cl_end(&vc4->bcl, bcl);
vc4->msaa = false;
if (cbuf && (vc4->resolve & PIPE_CLEAR_COLOR0)) {
pipe_surface_reference(&vc4->color_write, cbuf);
pipe_surface_reference(&vc4->color_write,
cbuf->texture->nr_samples ? NULL : cbuf);
pipe_surface_reference(&vc4->msaa_color_write,
cbuf->texture->nr_samples ? cbuf : NULL);
if (cbuf->texture->nr_samples)
vc4->msaa = true;
if (!(vc4->cleared & PIPE_CLEAR_COLOR0)) {
pipe_surface_reference(&vc4->color_read, cbuf);
} else {
@@ -78,11 +86,21 @@ vc4_flush(struct pipe_context *pctx)
} else {
pipe_surface_reference(&vc4->color_write, NULL);
pipe_surface_reference(&vc4->color_read, NULL);
pipe_surface_reference(&vc4->msaa_color_write, NULL);
}
if (vc4->framebuffer.zsbuf &&
(vc4->resolve & (PIPE_CLEAR_DEPTH | PIPE_CLEAR_STENCIL))) {
pipe_surface_reference(&vc4->zs_write, zsbuf);
pipe_surface_reference(&vc4->zs_write,
zsbuf->texture->nr_samples ?
NULL : zsbuf);
pipe_surface_reference(&vc4->msaa_zs_write,
zsbuf->texture->nr_samples ?
zsbuf : NULL);
if (zsbuf->texture->nr_samples)
vc4->msaa = true;
if (!(vc4->cleared & (PIPE_CLEAR_DEPTH | PIPE_CLEAR_STENCIL))) {
pipe_surface_reference(&vc4->zs_read, zsbuf);
} else {
@@ -91,6 +109,7 @@ vc4_flush(struct pipe_context *pctx)
} else {
pipe_surface_reference(&vc4->zs_write, NULL);
pipe_surface_reference(&vc4->zs_read, NULL);
pipe_surface_reference(&vc4->msaa_zs_write, NULL);
}
vc4_job_submit(vc4);
@@ -245,6 +264,8 @@ vc4_context_create(struct pipe_screen *pscreen, void *priv, unsigned flags)
vc4_debug |= saved_shaderdb_flag;
vc4->sample_mask = (1 << VC4_MAX_SAMPLES) - 1;
return &vc4->base;
fail:

View File

@@ -206,6 +206,8 @@ struct vc4_context {
struct pipe_surface *color_write;
struct pipe_surface *zs_read;
struct pipe_surface *zs_write;
struct pipe_surface *msaa_color_write;
struct pipe_surface *msaa_zs_write;
/** @} */
/** @{
* Bounding box of the scissor across all queued drawing.
@@ -224,6 +226,15 @@ struct vc4_context {
uint32_t draw_width;
uint32_t draw_height;
/** @} */
/** @{ Tile information, depending on MSAA and float color buffer. */
uint32_t draw_tiles_x; /** @< Number of tiles wide for framebuffer. */
uint32_t draw_tiles_y; /** @< Number of tiles high for framebuffer. */
uint32_t tile_width; /** @< Width of a tile. */
uint32_t tile_height; /** @< Height of a tile. */
/** Whether the current rendering is in a 4X MSAA tile buffer. */
bool msaa;
/** @} */
struct util_slab_mempool transfer_pool;
struct blitter_context *blitter;

View File

@@ -68,21 +68,17 @@ vc4_start_draw(struct vc4_context *vc4)
vc4_get_draw_cl_space(vc4);
uint32_t width = vc4->framebuffer.width;
uint32_t height = vc4->framebuffer.height;
uint32_t tilew = align(width, 64) / 64;
uint32_t tileh = align(height, 64) / 64;
struct vc4_cl_out *bcl = cl_start(&vc4->bcl);
// Tile state data is 48 bytes per tile, I think it can be thrown away
// as soon as binning is finished.
cl_u8(&bcl, VC4_PACKET_TILE_BINNING_MODE_CONFIG);
cl_u32(&bcl, 0); /* tile alloc addr, filled by kernel */
cl_u32(&bcl, 0); /* tile alloc size, filled by kernel */
cl_u32(&bcl, 0); /* tile state addr, filled by kernel */
cl_u8(&bcl, tilew);
cl_u8(&bcl, tileh);
cl_u8(&bcl, 0); /* flags, filled by kernel. */
cl_u8(&bcl, vc4->draw_tiles_x);
cl_u8(&bcl, vc4->draw_tiles_y);
/* Other flags are filled by kernel. */
cl_u8(&bcl, vc4->msaa ? VC4_BIN_CONFIG_MS_MODE_4X : 0);
/* START_TILE_BINNING resets the statechange counters in the hardware,
* which are what is used when a primitive is binned to a tile to
@@ -102,8 +98,8 @@ vc4_start_draw(struct vc4_context *vc4)
vc4->needs_flush = true;
vc4->draw_calls_queued++;
vc4->draw_width = width;
vc4->draw_height = height;
vc4->draw_width = vc4->framebuffer.width;
vc4->draw_height = vc4->framebuffer.height;
cl_end(&vc4->bcl, bcl);
}

View File

@@ -44,10 +44,13 @@ struct drm_vc4_submit_rcl_surface {
uint32_t hindex; /* Handle index, or ~0 if not present. */
uint32_t offset; /* Offset to start of buffer. */
/*
* Bits for either render config (color_ms_write) or load/store packet.
* Bits for either render config (color_write) or load/store packet.
* Bits should all be 0 for MSAA load/stores.
*/
uint16_t bits;
uint16_t pad;
#define VC4_SUBMIT_RCL_SURFACE_READ_IS_FULL_RES (1 << 0)
uint16_t flags;
};
/**
@@ -126,9 +129,11 @@ struct drm_vc4_submit_cl {
uint8_t max_x_tile;
uint8_t max_y_tile;
struct drm_vc4_submit_rcl_surface color_read;
struct drm_vc4_submit_rcl_surface color_ms_write;
struct drm_vc4_submit_rcl_surface color_write;
struct drm_vc4_submit_rcl_surface zs_read;
struct drm_vc4_submit_rcl_surface zs_write;
struct drm_vc4_submit_rcl_surface msaa_color_write;
struct drm_vc4_submit_rcl_surface msaa_zs_write;
uint32_t clear_color[2];
uint32_t clear_z;
uint8_t clear_s;

View File

@@ -29,17 +29,35 @@ vc4_emit_state(struct pipe_context *pctx)
struct vc4_context *vc4 = vc4_context(pctx);
struct vc4_cl_out *bcl = cl_start(&vc4->bcl);
if (vc4->dirty & (VC4_DIRTY_SCISSOR | VC4_DIRTY_VIEWPORT)) {
if (vc4->dirty & (VC4_DIRTY_SCISSOR | VC4_DIRTY_VIEWPORT |
VC4_DIRTY_RASTERIZER)) {
float *vpscale = vc4->viewport.scale;
float *vptranslate = vc4->viewport.translate;
float vp_minx = -fabsf(vpscale[0]) + vptranslate[0];
float vp_maxx = fabsf(vpscale[0]) + vptranslate[0];
float vp_miny = -fabsf(vpscale[1]) + vptranslate[1];
float vp_maxy = fabsf(vpscale[1]) + vptranslate[1];
uint32_t minx = MAX2(vc4->scissor.minx, vp_minx);
uint32_t miny = MAX2(vc4->scissor.miny, vp_miny);
uint32_t maxx = MIN2(vc4->scissor.maxx, vp_maxx);
uint32_t maxy = MIN2(vc4->scissor.maxy, vp_maxy);
/* Clip to the scissor if it's enabled, but still clip to the
* drawable regardless since that controls where the binner
* tries to put things.
*
* Additionally, always clip the rendering to the viewport,
* since the hardware does guardband clipping, meaning
* primitives would rasterize outside of the view volume.
*/
uint32_t minx, miny, maxx, maxy;
if (!vc4->rasterizer->base.scissor) {
minx = MAX2(vp_minx, 0);
miny = MAX2(vp_miny, 0);
maxx = MIN2(vp_maxx, vc4->draw_width);
maxy = MIN2(vp_maxy, vc4->draw_height);
} else {
minx = MAX2(vp_minx, vc4->scissor.minx);
miny = MAX2(vp_miny, vc4->scissor.miny);
maxx = MIN2(vp_maxx, vc4->scissor.maxx);
maxy = MIN2(vp_maxy, vc4->scissor.maxy);
}
cl_u8(&bcl, VC4_PACKET_CLIP_WINDOW);
cl_u16(&bcl, minx);
@@ -54,6 +72,20 @@ vc4_emit_state(struct pipe_context *pctx)
}
if (vc4->dirty & (VC4_DIRTY_RASTERIZER | VC4_DIRTY_ZSA)) {
uint8_t ez_enable_mask_out = ~0;
/* HW-2905: If the RCL ends up doing a full-res load when
* multisampling, then early Z tracking may end up with values
* from the previous tile due to a HW bug. Disable it to
* avoid that.
*
* We should be able to skip this when the Z is cleared, but I
* was seeing bad rendering on glxgears -samples 4 even in
* that case.
*/
if (vc4->msaa)
ez_enable_mask_out &= ~VC4_CONFIG_BITS_EARLY_Z;
cl_u8(&bcl, VC4_PACKET_CONFIGURATION_BITS);
cl_u8(&bcl,
vc4->rasterizer->config_bits[0] |
@@ -62,8 +94,8 @@ vc4_emit_state(struct pipe_context *pctx)
vc4->rasterizer->config_bits[1] |
vc4->zsa->config_bits[1]);
cl_u8(&bcl,
vc4->rasterizer->config_bits[2] |
vc4->zsa->config_bits[2]);
(vc4->rasterizer->config_bits[2] |
vc4->zsa->config_bits[2]) & ez_enable_mask_out);
}
if (vc4->dirty & VC4_DIRTY_RASTERIZER) {

View File

@@ -89,31 +89,37 @@ vc4_submit_setup_rcl_surface(struct vc4_context *vc4,
submit_surf->hindex = vc4_gem_hindex(vc4, rsc->bo);
submit_surf->offset = surf->offset;
if (is_depth) {
submit_surf->bits =
VC4_SET_FIELD(VC4_LOADSTORE_TILE_BUFFER_ZS,
VC4_LOADSTORE_TILE_BUFFER_BUFFER);
if (psurf->texture->nr_samples == 0) {
if (is_depth) {
submit_surf->bits =
VC4_SET_FIELD(VC4_LOADSTORE_TILE_BUFFER_ZS,
VC4_LOADSTORE_TILE_BUFFER_BUFFER);
} else {
submit_surf->bits =
VC4_SET_FIELD(VC4_LOADSTORE_TILE_BUFFER_COLOR,
VC4_LOADSTORE_TILE_BUFFER_BUFFER) |
VC4_SET_FIELD(vc4_rt_format_is_565(psurf->format) ?
VC4_LOADSTORE_TILE_BUFFER_BGR565 :
VC4_LOADSTORE_TILE_BUFFER_RGBA8888,
VC4_LOADSTORE_TILE_BUFFER_FORMAT);
}
submit_surf->bits |=
VC4_SET_FIELD(surf->tiling,
VC4_LOADSTORE_TILE_BUFFER_TILING);
} else {
submit_surf->bits =
VC4_SET_FIELD(VC4_LOADSTORE_TILE_BUFFER_COLOR,
VC4_LOADSTORE_TILE_BUFFER_BUFFER) |
VC4_SET_FIELD(vc4_rt_format_is_565(psurf->format) ?
VC4_LOADSTORE_TILE_BUFFER_BGR565 :
VC4_LOADSTORE_TILE_BUFFER_RGBA8888,
VC4_LOADSTORE_TILE_BUFFER_FORMAT);
assert(!is_write);
submit_surf->flags |= VC4_SUBMIT_RCL_SURFACE_READ_IS_FULL_RES;
}
submit_surf->bits |=
VC4_SET_FIELD(surf->tiling, VC4_LOADSTORE_TILE_BUFFER_TILING);
if (is_write)
rsc->writes++;
}
static void
vc4_submit_setup_ms_rcl_surface(struct vc4_context *vc4,
struct drm_vc4_submit_rcl_surface *submit_surf,
struct pipe_surface *psurf)
vc4_submit_setup_rcl_render_config_surface(struct vc4_context *vc4,
struct drm_vc4_submit_rcl_surface *submit_surf,
struct pipe_surface *psurf)
{
struct vc4_surface *surf = vc4_surface(psurf);
@@ -126,16 +132,38 @@ vc4_submit_setup_ms_rcl_surface(struct vc4_context *vc4,
submit_surf->hindex = vc4_gem_hindex(vc4, rsc->bo);
submit_surf->offset = surf->offset;
submit_surf->bits =
VC4_SET_FIELD(vc4_rt_format_is_565(surf->base.format) ?
VC4_RENDER_CONFIG_FORMAT_BGR565 :
VC4_RENDER_CONFIG_FORMAT_RGBA8888,
VC4_RENDER_CONFIG_FORMAT) |
VC4_SET_FIELD(surf->tiling, VC4_RENDER_CONFIG_MEMORY_FORMAT);
if (psurf->texture->nr_samples == 0) {
submit_surf->bits =
VC4_SET_FIELD(vc4_rt_format_is_565(surf->base.format) ?
VC4_RENDER_CONFIG_FORMAT_BGR565 :
VC4_RENDER_CONFIG_FORMAT_RGBA8888,
VC4_RENDER_CONFIG_FORMAT) |
VC4_SET_FIELD(surf->tiling,
VC4_RENDER_CONFIG_MEMORY_FORMAT);
}
rsc->writes++;
}
static void
vc4_submit_setup_rcl_msaa_surface(struct vc4_context *vc4,
struct drm_vc4_submit_rcl_surface *submit_surf,
struct pipe_surface *psurf)
{
struct vc4_surface *surf = vc4_surface(psurf);
if (!surf) {
submit_surf->hindex = ~0;
return;
}
struct vc4_resource *rsc = vc4_resource(psurf->texture);
submit_surf->hindex = vc4_gem_hindex(vc4, rsc->bo);
submit_surf->offset = surf->offset;
submit_surf->bits = 0;
rsc->writes++;
}
/**
* Submits the job to the kernel and then reinitializes it.
*/
@@ -150,18 +178,35 @@ vc4_job_submit(struct vc4_context *vc4)
struct drm_vc4_submit_cl submit;
memset(&submit, 0, sizeof(submit));
cl_ensure_space(&vc4->bo_handles, 4 * sizeof(uint32_t));
cl_ensure_space(&vc4->bo_pointers, 4 * sizeof(struct vc4_bo *));
cl_ensure_space(&vc4->bo_handles, 6 * sizeof(uint32_t));
cl_ensure_space(&vc4->bo_pointers, 6 * sizeof(struct vc4_bo *));
vc4_submit_setup_rcl_surface(vc4, &submit.color_read,
vc4->color_read, false, false);
vc4_submit_setup_ms_rcl_surface(vc4, &submit.color_ms_write,
vc4->color_write);
vc4_submit_setup_rcl_render_config_surface(vc4, &submit.color_write,
vc4->color_write);
vc4_submit_setup_rcl_surface(vc4, &submit.zs_read,
vc4->zs_read, true, false);
vc4_submit_setup_rcl_surface(vc4, &submit.zs_write,
vc4->zs_write, true, true);
vc4_submit_setup_rcl_msaa_surface(vc4, &submit.msaa_color_write,
vc4->msaa_color_write);
vc4_submit_setup_rcl_msaa_surface(vc4, &submit.msaa_zs_write,
vc4->msaa_zs_write);
if (vc4->msaa) {
/* This bit controls how many pixels the general
* (i.e. subsampled) loads/stores are iterating over
* (multisample loads replicate out to the other samples).
*/
submit.color_write.bits |= VC4_RENDER_CONFIG_MS_MODE_4X;
/* Controls whether color_write's
* VC4_PACKET_STORE_MS_TILE_BUFFER does 4x decimation
*/
submit.color_write.bits |= VC4_RENDER_CONFIG_DECIMATE_MODE_4X;
}
submit.bo_handles = (uintptr_t)vc4->bo_handles.base;
submit.bo_handle_count = cl_offset(&vc4->bo_handles) / 4;
submit.bin_cl = (uintptr_t)vc4->bcl.base;
@@ -173,10 +218,10 @@ vc4_job_submit(struct vc4_context *vc4)
submit.uniforms_size = cl_offset(&vc4->uniforms);
assert(vc4->draw_min_x != ~0 && vc4->draw_min_y != ~0);
submit.min_x_tile = vc4->draw_min_x / 64;
submit.min_y_tile = vc4->draw_min_y / 64;
submit.max_x_tile = (vc4->draw_max_x - 1) / 64;
submit.max_y_tile = (vc4->draw_max_y - 1) / 64;
submit.min_x_tile = vc4->draw_min_x / vc4->tile_width;
submit.min_y_tile = vc4->draw_min_y / vc4->tile_height;
submit.max_x_tile = (vc4->draw_max_x - 1) / vc4->tile_width;
submit.max_y_tile = (vc4->draw_max_y - 1) / vc4->tile_height;
submit.width = vc4->draw_width;
submit.height = vc4->draw_height;
if (vc4->cleared) {

View File

@@ -29,6 +29,10 @@
* from the tile buffer after having waited for the scoreboard (which is
* handled by vc4_qpu_emit.c), then do math using your output color and that
* destination value, and update the output color appropriately.
*
* Once this pass is done, the color write will either have one component (for
* single sample) with packed argb8888, or 4 components with the per-sample
* argb8888 result.
*/
/**
@@ -40,15 +44,23 @@
#include "glsl/nir/nir_builder.h"
#include "vc4_context.h"
static bool
blend_depends_on_dst_color(struct vc4_compile *c)
{
return (c->fs_key->blend.blend_enable ||
c->fs_key->blend.colormask != 0xf ||
c->fs_key->logicop_func != PIPE_LOGICOP_COPY);
}
/** Emits a load of the previous fragment color from the tile buffer. */
static nir_ssa_def *
vc4_nir_get_dst_color(nir_builder *b)
vc4_nir_get_dst_color(nir_builder *b, int sample)
{
nir_intrinsic_instr *load =
nir_intrinsic_instr_create(b->shader,
nir_intrinsic_load_input);
load->num_components = 1;
load->const_index[0] = VC4_NIR_TLB_COLOR_READ_INPUT;
load->const_index[0] = VC4_NIR_TLB_COLOR_READ_INPUT + sample;
nir_ssa_dest_init(&load->instr, &load->dest, 1, NULL);
nir_builder_instr_insert(b, &load->instr);
return &load->dest.ssa;
@@ -496,23 +508,26 @@ vc4_nir_swizzle_and_pack(struct vc4_compile *c, nir_builder *b,
}
static void
vc4_nir_lower_blend_instr(struct vc4_compile *c, nir_builder *b,
nir_intrinsic_instr *intr)
static nir_ssa_def *
vc4_nir_blend_pipeline(struct vc4_compile *c, nir_builder *b, nir_ssa_def *src,
int sample)
{
enum pipe_format color_format = c->fs_key->color_format;
const uint8_t *format_swiz = vc4_get_format_swizzle(color_format);
bool srgb = util_format_is_srgb(color_format);
/* Pull out the float src/dst color components. */
nir_ssa_def *packed_dst_color = vc4_nir_get_dst_color(b);
nir_ssa_def *packed_dst_color = vc4_nir_get_dst_color(b, sample);
nir_ssa_def *dst_vec4 = nir_unpack_unorm_4x8(b, packed_dst_color);
nir_ssa_def *src_color[4], *unpacked_dst_color[4];
for (unsigned i = 0; i < 4; i++) {
src_color[i] = nir_swizzle(b, intr->src[0].ssa, &i, 1, false);
unpacked_dst_color[i] = nir_swizzle(b, dst_vec4, &i, 1, false);
src_color[i] = nir_channel(b, src, i);
unpacked_dst_color[i] = nir_channel(b, dst_vec4, i);
}
if (c->fs_key->sample_alpha_to_one && c->fs_key->msaa)
src_color[3] = nir_imm_float(b, 1.0);
vc4_nir_emit_alpha_test_discard(c, b, src_color[3]);
nir_ssa_def *packed_color;
@@ -560,16 +575,100 @@ vc4_nir_lower_blend_instr(struct vc4_compile *c, nir_builder *b,
colormask &= ~(0xff << (i * 8));
}
}
packed_color = nir_ior(b,
nir_iand(b, packed_color,
nir_imm_int(b, colormask)),
nir_iand(b, packed_dst_color,
nir_imm_int(b, ~colormask)));
/* Turn the old vec4 output into a store of the packed color. */
nir_instr_rewrite_src(&intr->instr, &intr->src[0],
nir_src_for_ssa(packed_color));
return nir_ior(b,
nir_iand(b, packed_color,
nir_imm_int(b, colormask)),
nir_iand(b, packed_dst_color,
nir_imm_int(b, ~colormask)));
}
static int
vc4_nir_next_output_driver_location(nir_shader *s)
{
int maxloc = -1;
nir_foreach_variable(var, &s->outputs)
maxloc = MAX2(maxloc, (int)var->data.driver_location);
return maxloc + 1;
}
static void
vc4_nir_store_sample_mask(struct vc4_compile *c, nir_builder *b,
nir_ssa_def *val)
{
nir_variable *sample_mask = nir_variable_create(c->s, nir_var_shader_out,
glsl_uint_type(),
"sample_mask");
sample_mask->data.driver_location =
vc4_nir_next_output_driver_location(c->s);
sample_mask->data.location = FRAG_RESULT_SAMPLE_MASK;
nir_intrinsic_instr *intr =
nir_intrinsic_instr_create(c->s, nir_intrinsic_store_output);
intr->num_components = 1;
intr->const_index[0] = sample_mask->data.driver_location;
intr->src[0] = nir_src_for_ssa(val);
nir_builder_instr_insert(b, &intr->instr);
}
static void
vc4_nir_lower_blend_instr(struct vc4_compile *c, nir_builder *b,
nir_intrinsic_instr *intr)
{
nir_ssa_def *frag_color = intr->src[0].ssa;
if (c->fs_key->sample_coverage) {
nir_intrinsic_instr *load =
nir_intrinsic_instr_create(b->shader,
nir_intrinsic_load_sample_mask_in);
load->num_components = 1;
nir_ssa_dest_init(&load->instr, &load->dest, 1, NULL);
nir_builder_instr_insert(b, &load->instr);
nir_ssa_def *bitmask = &load->dest.ssa;
vc4_nir_store_sample_mask(c, b, bitmask);
} else if (c->fs_key->sample_alpha_to_coverage) {
nir_ssa_def *a = nir_channel(b, frag_color, 3);
/* XXX: We should do a nice dither based on the fragment
* coordinate, instead.
*/
nir_ssa_def *num_samples = nir_imm_float(b, VC4_MAX_SAMPLES);
nir_ssa_def *num_bits = nir_f2i(b, nir_fmul(b, a, num_samples));
nir_ssa_def *bitmask = nir_isub(b,
nir_ishl(b,
nir_imm_int(b, 1),
num_bits),
nir_imm_int(b, 1));
vc4_nir_store_sample_mask(c, b, bitmask);
}
/* The TLB color read returns each sample in turn, so if our blending
* depends on the destination color, we're going to have to run the
* blending function separately for each destination sample value, and
* then output the per-sample color using TLB_COLOR_MS.
*/
nir_ssa_def *blend_output;
if (c->fs_key->msaa && blend_depends_on_dst_color(c)) {
c->msaa_per_sample_output = true;
nir_ssa_def *samples[4];
for (int i = 0; i < VC4_MAX_SAMPLES; i++)
samples[i] = vc4_nir_blend_pipeline(c, b, frag_color, i);
blend_output = nir_vec4(b,
samples[0], samples[1],
samples[2], samples[3]);
} else {
blend_output = vc4_nir_blend_pipeline(c, b, frag_color, 0);
}
nir_instr_rewrite_src(&intr->instr, &intr->src[0],
nir_src_for_ssa(blend_output));
intr->num_components = blend_output->num_components;
}
static bool
@@ -577,7 +676,7 @@ vc4_nir_lower_blend_block(nir_block *block, void *state)
{
struct vc4_compile *c = state;
nir_foreach_instr(block, instr) {
nir_foreach_instr_safe(block, instr) {
if (instr->type != nir_instr_type_intrinsic)
continue;
nir_intrinsic_instr *intr = nir_instr_as_intrinsic(instr);

View File

@@ -84,7 +84,7 @@ vc4_nir_unpack_16u(nir_builder *b, nir_ssa_def *src, unsigned chan)
static nir_ssa_def *
vc4_nir_unpack_8f(nir_builder *b, nir_ssa_def *src, unsigned chan)
{
return nir_swizzle(b, nir_unpack_unorm_4x8(b, src), &chan, 1, false);
return nir_channel(b, nir_unpack_unorm_4x8(b, src), chan);
}
static nir_ssa_def *
@@ -226,7 +226,9 @@ vc4_nir_lower_fs_input(struct vc4_compile *c, nir_builder *b,
{
b->cursor = nir_before_instr(&intr->instr);
if (intr->const_index[0] == VC4_NIR_TLB_COLOR_READ_INPUT) {
if (intr->const_index[0] >= VC4_NIR_TLB_COLOR_READ_INPUT &&
intr->const_index[0] < (VC4_NIR_TLB_COLOR_READ_INPUT +
VC4_MAX_SAMPLES)) {
/* This doesn't need any lowering. */
return;
}
@@ -309,7 +311,8 @@ vc4_nir_lower_output(struct vc4_compile *c, nir_builder *b,
/* Color output is lowered by vc4_nir_lower_blend(). */
if (c->stage == QSTAGE_FRAG &&
(output_var->data.location == FRAG_RESULT_COLOR ||
output_var->data.location == FRAG_RESULT_DATA0)) {
output_var->data.location == FRAG_RESULT_DATA0 ||
output_var->data.location == FRAG_RESULT_SAMPLE_MASK)) {
intr->const_index[0] *= 4;
return;
}
@@ -326,9 +329,8 @@ vc4_nir_lower_output(struct vc4_compile *c, nir_builder *b,
intr_comp->const_index[0] = intr->const_index[0] * 4 + i;
assert(intr->src[0].is_ssa);
intr_comp->src[0] = nir_src_for_ssa(nir_swizzle(b,
intr->src[0].ssa,
&i, 1, false));
intr_comp->src[0] =
nir_src_for_ssa(nir_channel(b, intr->src[0].ssa, i));
nir_builder_instr_insert(b, &intr_comp->instr);
}

View File

@@ -0,0 +1,172 @@
/*
* Copyright © 2015 Broadcom
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the "Software"),
* to deal in the Software without restriction, including without limitation
* the rights to use, copy, modify, merge, publish, distribute, sublicense,
* and/or sell copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice (including the next
* paragraph) shall be included in all copies or substantial portions of the
* Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
* THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
* IN THE SOFTWARE.
*/
#include "vc4_qir.h"
#include "kernel/vc4_packet.h"
#include "tgsi/tgsi_info.h"
#include "glsl/nir/nir_builder.h"
/** @file vc4_nir_lower_txf_ms.c
* Walks the NIR generated by TGSI-to-NIR to lower its nir_texop_txf_ms
* coordinates to do the math necessary and use a plain nir_texop_txf instead.
*
* MSAA textures are laid out as 32x32-aligned blocks of RGBA8888 or Z24S8.
* We can't load them through the normal sampler path because of the lack of
* linear support in the hardware. So, we treat MSAA textures as a giant UBO
* and do the math in the shader.
*/
static void
vc4_nir_lower_txf_ms_instr(struct vc4_compile *c, nir_builder *b,
nir_tex_instr *txf_ms)
{
if (txf_ms->op != nir_texop_txf_ms)
return;
b->cursor = nir_before_instr(&txf_ms->instr);
nir_tex_instr *txf = nir_tex_instr_create(c->s, 1);
txf->op = nir_texop_txf;
txf->sampler = txf_ms->sampler;
txf->sampler_index = txf_ms->sampler_index;
txf->coord_components = txf_ms->coord_components;
txf->is_shadow = txf_ms->is_shadow;
txf->is_new_style_shadow = txf_ms->is_new_style_shadow;
nir_ssa_def *coord = NULL, *sample_index = NULL;
for (int i = 0; i < txf_ms->num_srcs; i++) {
assert(txf_ms->src[i].src.is_ssa);
switch (txf_ms->src[i].src_type) {
case nir_tex_src_coord:
coord = txf_ms->src[i].src.ssa;
break;
case nir_tex_src_ms_index:
sample_index = txf_ms->src[i].src.ssa;
break;
default:
unreachable("Unknown txf_ms src\n");
}
}
assert(coord);
assert(sample_index);
nir_ssa_def *x = nir_channel(b, coord, 0);
nir_ssa_def *y = nir_channel(b, coord, 1);
uint32_t tile_w = 32;
uint32_t tile_h = 32;
uint32_t tile_w_shift = 5;
uint32_t tile_h_shift = 5;
uint32_t tile_size = (tile_h * tile_w *
VC4_MAX_SAMPLES * sizeof(uint32_t));
unsigned unit = txf_ms->sampler_index;
uint32_t w = align(c->key->tex[unit].msaa_width, tile_w);
uint32_t w_tiles = w / tile_w;
nir_ssa_def *x_tile = nir_ushr(b, x, nir_imm_int(b, tile_w_shift));
nir_ssa_def *y_tile = nir_ushr(b, y, nir_imm_int(b, tile_h_shift));
nir_ssa_def *tile_addr = nir_iadd(b,
nir_imul(b, x_tile,
nir_imm_int(b, tile_size)),
nir_imul(b, y_tile,
nir_imm_int(b, (w_tiles *
tile_size))));
nir_ssa_def *x_subspan = nir_iand(b, x,
nir_imm_int(b, (tile_w - 1) & ~1));
nir_ssa_def *y_subspan = nir_iand(b, y,
nir_imm_int(b, (tile_h - 1) & ~1));
nir_ssa_def *subspan_addr = nir_iadd(b,
nir_imul(b, x_subspan,
nir_imm_int(b, 2 * VC4_MAX_SAMPLES * sizeof(uint32_t))),
nir_imul(b, y_subspan,
nir_imm_int(b,
tile_w *
VC4_MAX_SAMPLES *
sizeof(uint32_t))));
nir_ssa_def *pixel_addr = nir_ior(b,
nir_iand(b,
nir_ishl(b, x,
nir_imm_int(b, 2)),
nir_imm_int(b, (1 << 2))),
nir_iand(b,
nir_ishl(b, y,
nir_imm_int(b, 3)),
nir_imm_int(b, (1 << 3))));
nir_ssa_def *sample_addr = nir_ishl(b, sample_index, nir_imm_int(b, 4));
nir_ssa_def *addr = nir_iadd(b,
nir_ior(b, sample_addr, pixel_addr),
nir_iadd(b, subspan_addr, tile_addr));
txf->src[0].src_type = nir_tex_src_coord;
txf->src[0].src = nir_src_for_ssa(nir_vec2(b, addr, nir_imm_int(b, 0)));
nir_ssa_dest_init(&txf->instr, &txf->dest, 4, NULL);
nir_builder_instr_insert(b, &txf->instr);
nir_ssa_def_rewrite_uses(&txf_ms->dest.ssa,
nir_src_for_ssa(&txf->dest.ssa));
nir_instr_remove(&txf_ms->instr);
}
static bool
vc4_nir_lower_txf_ms_block(nir_block *block, void *arg)
{
struct vc4_compile *c = arg;
nir_function_impl *impl =
nir_cf_node_get_function(&block->cf_node);
nir_builder b;
nir_builder_init(&b, impl);
nir_foreach_instr_safe(block, instr) {
if (instr->type == nir_instr_type_tex) {
vc4_nir_lower_txf_ms_instr(c, &b,
nir_instr_as_tex(instr));
}
}
return true;
}
static bool
vc4_nir_lower_txf_ms_impl(struct vc4_compile *c, nir_function_impl *impl)
{
nir_foreach_block(impl, vc4_nir_lower_txf_ms_block, c);
nir_metadata_preserve(impl,
nir_metadata_block_index |
nir_metadata_dominance);
return true;
}
void
vc4_nir_lower_txf_ms(struct vc4_compile *c)
{
nir_foreach_overload(c->s, overload) {
if (overload->impl)
vc4_nir_lower_txf_ms_impl(c, overload->impl);
}
}

View File

@@ -94,7 +94,12 @@ static void
replace_with_mov(struct vc4_compile *c, struct qinst *inst, struct qreg arg)
{
dump_from(c, inst);
inst->op = QOP_MOV;
if (qir_is_mul(inst))
inst->op = QOP_MMOV;
else if (qir_is_float_input(inst))
inst->op = QOP_FMOV;
else
inst->op = QOP_MOV;
inst->src[0] = arg;
inst->src[1] = c->undef;
dump_to(c, inst);
@@ -181,6 +186,7 @@ qir_opt_algebraic(struct vc4_compile *c)
case QOP_SUB:
if (is_zero(c, inst->src[1])) {
replace_with_mov(c, inst, inst->src[0]);
progress = true;
}
break;

View File

@@ -294,6 +294,76 @@ ntq_umul(struct vc4_compile *c, struct qreg src0, struct qreg src1)
qir_uniform_ui(c, 24)));
}
static struct qreg
ntq_scale_depth_texture(struct vc4_compile *c, struct qreg src)
{
struct qreg depthf = qir_ITOF(c, qir_SHR(c, src,
qir_uniform_ui(c, 8)));
return qir_FMUL(c, depthf, qir_uniform_f(c, 1.0f/0xffffff));
}
/**
* Emits a lowered TXF_MS from an MSAA texture.
*
* The addressing math has been lowered in NIR, and now we just need to read
* it like a UBO.
*/
static void
ntq_emit_txf(struct vc4_compile *c, nir_tex_instr *instr)
{
uint32_t tile_width = 32;
uint32_t tile_height = 32;
uint32_t tile_size = (tile_height * tile_width *
VC4_MAX_SAMPLES * sizeof(uint32_t));
unsigned unit = instr->sampler_index;
uint32_t w = align(c->key->tex[unit].msaa_width, tile_width);
uint32_t w_tiles = w / tile_width;
uint32_t h = align(c->key->tex[unit].msaa_height, tile_height);
uint32_t h_tiles = h / tile_height;
uint32_t size = w_tiles * h_tiles * tile_size;
struct qreg addr;
assert(instr->num_srcs == 1);
assert(instr->src[0].src_type == nir_tex_src_coord);
addr = ntq_get_src(c, instr->src[0].src, 0);
/* Perform the clamping required by kernel validation. */
addr = qir_MAX(c, addr, qir_uniform_ui(c, 0));
addr = qir_MIN(c, addr, qir_uniform_ui(c, size - 4));
qir_TEX_DIRECT(c, addr, qir_uniform(c, QUNIFORM_TEXTURE_MSAA_ADDR, unit));
struct qreg tex = qir_TEX_RESULT(c);
c->num_texture_samples++;
struct qreg texture_output[4];
enum pipe_format format = c->key->tex[unit].format;
if (util_format_is_depth_or_stencil(format)) {
struct qreg scaled = ntq_scale_depth_texture(c, tex);
for (int i = 0; i < 4; i++)
texture_output[i] = scaled;
} else {
struct qreg tex_result_unpacked[4];
for (int i = 0; i < 4; i++)
tex_result_unpacked[i] = qir_UNPACK_8_F(c, tex, i);
const uint8_t *format_swiz =
vc4_get_format_swizzle(c->key->tex[unit].format);
for (int i = 0; i < 4; i++) {
texture_output[i] =
get_swizzled_channel(c, tex_result_unpacked,
format_swiz[i]);
}
}
struct qreg *dest = ntq_get_dest(c, &instr->dest);
for (int i = 0; i < 4; i++) {
dest[i] = get_swizzled_channel(c, texture_output,
c->key->tex[unit].swizzle[i]);
}
}
static void
ntq_emit_tex(struct vc4_compile *c, nir_tex_instr *instr)
{
@@ -301,6 +371,11 @@ ntq_emit_tex(struct vc4_compile *c, nir_tex_instr *instr)
bool is_txb = false, is_txl = false, has_proj = false;
unsigned unit = instr->sampler_index;
if (instr->op == nir_texop_txf) {
ntq_emit_txf(c, instr);
return;
}
for (unsigned i = 0; i < instr->num_srcs; i++) {
switch (instr->src[i].src_type) {
case nir_tex_src_coord:
@@ -396,11 +471,7 @@ ntq_emit_tex(struct vc4_compile *c, nir_tex_instr *instr)
struct qreg unpacked[4];
if (util_format_is_depth_or_stencil(format)) {
struct qreg depthf = qir_ITOF(c, qir_SHR(c, tex,
qir_uniform_ui(c, 8)));
struct qreg normalized = qir_FMUL(c, depthf,
qir_uniform_f(c, 1.0f/0xffffff));
struct qreg normalized = ntq_scale_depth_texture(c, tex);
struct qreg depth_output;
struct qreg one = qir_uniform_f(c, 1.0f);
@@ -1109,6 +1180,10 @@ emit_frag_end(struct vc4_compile *c)
}
}
if (c->output_sample_mask_index != -1) {
qir_MS_MASK(c, c->outputs[c->output_sample_mask_index]);
}
if (c->fs_key->depth_enabled) {
struct qreg z;
if (c->output_position_index != -1) {
@@ -1120,7 +1195,12 @@ emit_frag_end(struct vc4_compile *c)
qir_TLB_Z_WRITE(c, z);
}
qir_TLB_COLOR_WRITE(c, color);
if (!c->msaa_per_sample_output) {
qir_TLB_COLOR_WRITE(c, color);
} else {
for (int i = 0; i < VC4_MAX_SAMPLES; i++)
qir_TLB_COLOR_WRITE_MS(c, c->sample_colors[i]);
}
}
static void
@@ -1171,7 +1251,7 @@ emit_point_size_write(struct vc4_compile *c)
struct qreg point_size;
if (c->output_point_size_index != -1)
point_size = c->outputs[c->output_point_size_index + 3];
point_size = c->outputs[c->output_point_size_index];
else
point_size = qir_uniform_f(c, 1.0);
@@ -1359,6 +1439,9 @@ ntq_setup_outputs(struct vc4_compile *c)
case FRAG_RESULT_DEPTH:
c->output_position_index = loc;
break;
case FRAG_RESULT_SAMPLE_MASK:
c->output_sample_mask_index = loc;
break;
}
} else {
switch (var->data.location) {
@@ -1462,20 +1545,48 @@ ntq_emit_intrinsic(struct vc4_compile *c, nir_intrinsic_instr *instr)
instr->const_index[0]);
break;
case nir_intrinsic_load_sample_mask_in:
*dest = qir_uniform(c, QUNIFORM_SAMPLE_MASK, 0);
break;
case nir_intrinsic_load_input:
assert(instr->num_components == 1);
if (instr->const_index[0] == VC4_NIR_TLB_COLOR_READ_INPUT) {
*dest = qir_TLB_COLOR_READ(c);
if (instr->const_index[0] >= VC4_NIR_TLB_COLOR_READ_INPUT) {
/* Reads of the per-sample color need to be done in
* order.
*/
int sample_index = (instr->const_index[0] -
VC4_NIR_TLB_COLOR_READ_INPUT);
for (int i = 0; i <= sample_index; i++) {
if (c->color_reads[i].file == QFILE_NULL) {
c->color_reads[i] =
qir_TLB_COLOR_READ(c);
}
}
*dest = c->color_reads[sample_index];
} else {
*dest = c->inputs[instr->const_index[0]];
}
break;
case nir_intrinsic_store_output:
assert(instr->num_components == 1);
c->outputs[instr->const_index[0]] =
qir_MOV(c, ntq_get_src(c, instr->src[0], 0));
c->num_outputs = MAX2(c->num_outputs, instr->const_index[0] + 1);
/* MSAA color outputs are the only case where we have an
* output that's not lowered to being a store of a single 32
* bit value.
*/
if (c->stage == QSTAGE_FRAG && instr->num_components == 4) {
assert(instr->const_index[0] == c->output_color_index);
for (int i = 0; i < 4; i++) {
c->sample_colors[i] =
qir_MOV(c, ntq_get_src(c, instr->src[0],
i));
}
} else {
assert(instr->num_components == 1);
c->outputs[instr->const_index[0]] =
qir_MOV(c, ntq_get_src(c, instr->src[0], 0));
c->num_outputs = MAX2(c->num_outputs, instr->const_index[0] + 1);
}
break;
case nir_intrinsic_discard:
@@ -1672,6 +1783,7 @@ vc4_shader_ntq(struct vc4_context *vc4, enum qstage stage,
nir_lower_clip_vs(c->s, c->key->ucp_enables);
vc4_nir_lower_io(c);
vc4_nir_lower_txf_ms(c);
nir_lower_idiv(c->s);
nir_lower_load_const_to_scalar(c->s);
@@ -1907,12 +2019,19 @@ vc4_setup_shared_key(struct vc4_context *vc4, struct vc4_key *key,
struct pipe_sampler_state *sampler_state =
texstate->samplers[i];
if (sampler) {
key->tex[i].format = sampler->format;
key->tex[i].swizzle[0] = sampler->swizzle_r;
key->tex[i].swizzle[1] = sampler->swizzle_g;
key->tex[i].swizzle[2] = sampler->swizzle_b;
key->tex[i].swizzle[3] = sampler->swizzle_a;
if (!sampler)
continue;
key->tex[i].format = sampler->format;
key->tex[i].swizzle[0] = sampler->swizzle_r;
key->tex[i].swizzle[1] = sampler->swizzle_g;
key->tex[i].swizzle[2] = sampler->swizzle_b;
key->tex[i].swizzle[3] = sampler->swizzle_a;
if (sampler->texture->nr_samples) {
key->tex[i].msaa_width = sampler->texture->width0;
key->tex[i].msaa_height = sampler->texture->height0;
} else if (sampler){
key->tex[i].compare_mode = sampler_state->compare_mode;
key->tex[i].compare_func = sampler_state->compare_func;
key->tex[i].wrap_s = sampler_state->wrap_s;
@@ -1952,6 +2071,11 @@ vc4_update_compiled_fs(struct vc4_context *vc4, uint8_t prim_mode)
} else {
key->logicop_func = PIPE_LOGICOP_COPY;
}
key->msaa = vc4->rasterizer->base.multisample;
key->sample_coverage = (vc4->rasterizer->base.multisample &&
vc4->sample_mask != (1 << VC4_MAX_SAMPLES) - 1);
key->sample_alpha_to_coverage = vc4->blend->alpha_to_coverage;
key->sample_alpha_to_one = vc4->blend->alpha_to_one;
if (vc4->framebuffer.cbufs[0])
key->color_format = vc4->framebuffer.cbufs[0]->format;

View File

@@ -86,7 +86,9 @@ static const struct qir_op_info qir_op_info[] = {
[QOP_TLB_STENCIL_SETUP] = { "tlb_stencil_setup", 0, 1, true },
[QOP_TLB_Z_WRITE] = { "tlb_z", 0, 1, true },
[QOP_TLB_COLOR_WRITE] = { "tlb_color", 0, 1, true },
[QOP_TLB_COLOR_WRITE_MS] = { "tlb_color_ms", 0, 1, true },
[QOP_TLB_COLOR_READ] = { "tlb_color_read", 1, 0 },
[QOP_MS_MASK] = { "ms_mask", 0, 1, true },
[QOP_VARY_ADD_C] = { "vary_add_c", 1, 1 },
[QOP_FRAG_X] = { "frag_x", 1, 0 },
@@ -399,6 +401,7 @@ qir_compile_init(void)
c->output_position_index = -1;
c->output_color_index = -1;
c->output_point_size_index = -1;
c->output_sample_mask_index = -1;
c->def_ht = _mesa_hash_table_create(c, _mesa_hash_pointer,
_mesa_key_pointer_equal);
@@ -420,13 +423,19 @@ qir_remove_instruction(struct vc4_compile *c, struct qinst *qinst)
struct qreg
qir_follow_movs(struct vc4_compile *c, struct qreg reg)
{
int pack = reg.pack;
while (reg.file == QFILE_TEMP &&
c->defs[reg.index] &&
c->defs[reg.index]->op == QOP_MOV &&
!c->defs[reg.index]->dst.pack) {
(c->defs[reg.index]->op == QOP_MOV ||
c->defs[reg.index]->op == QOP_FMOV ||
c->defs[reg.index]->op == QOP_MMOV)&&
!c->defs[reg.index]->dst.pack &&
!c->defs[reg.index]->src[0].pack) {
reg = c->defs[reg.index]->src[0];
}
reg.pack = pack;
return reg;
}

View File

@@ -38,6 +38,7 @@
#include "vc4_screen.h"
#include "vc4_qpu_defines.h"
#include "kernel/vc4_packet.h"
#include "pipe/p_state.h"
struct nir_builder;
@@ -121,7 +122,9 @@ enum qop {
QOP_TLB_STENCIL_SETUP,
QOP_TLB_Z_WRITE,
QOP_TLB_COLOR_WRITE,
QOP_TLB_COLOR_WRITE_MS,
QOP_TLB_COLOR_READ,
QOP_MS_MASK,
QOP_VARY_ADD_C,
QOP_FRAG_X,
@@ -230,6 +233,8 @@ enum quniform_contents {
/** A reference to a texture config parameter 2 cubemap stride uniform */
QUNIFORM_TEXTURE_CONFIG_P2,
QUNIFORM_TEXTURE_MSAA_ADDR,
QUNIFORM_UBO_ADDR,
QUNIFORM_TEXRECT_SCALE_X,
@@ -247,6 +252,7 @@ enum quniform_contents {
QUNIFORM_STENCIL,
QUNIFORM_ALPHA_REF,
QUNIFORM_SAMPLE_MASK,
};
struct vc4_varying_slot {
@@ -283,11 +289,18 @@ struct vc4_key {
struct vc4_uncompiled_shader *shader_state;
struct {
enum pipe_format format;
unsigned compare_mode:1;
unsigned compare_func:3;
unsigned wrap_s:3;
unsigned wrap_t:3;
uint8_t swizzle[4];
union {
struct {
unsigned compare_mode:1;
unsigned compare_func:3;
unsigned wrap_s:3;
unsigned wrap_t:3;
};
struct {
uint16_t msaa_width, msaa_height;
};
};
} tex[VC4_MAX_TEXTURE_SAMPLERS];
uint8_t ucp_enables;
};
@@ -304,6 +317,10 @@ struct vc4_fs_key {
bool alpha_test;
bool point_coord_upper_left;
bool light_twoside;
bool msaa;
bool sample_coverage;
bool sample_alpha_to_coverage;
bool sample_alpha_to_one;
uint8_t alpha_test_func;
uint8_t logicop_func;
uint32_t point_sprite_mask;
@@ -348,6 +365,9 @@ struct vc4_compile {
*/
struct qreg *inputs;
struct qreg *outputs;
bool msaa_per_sample_output;
struct qreg color_reads[VC4_MAX_SAMPLES];
struct qreg sample_colors[VC4_MAX_SAMPLES];
uint32_t inputs_array_size;
uint32_t outputs_array_size;
uint32_t uniforms_array_size;
@@ -396,6 +416,7 @@ struct vc4_compile {
uint32_t output_position_index;
uint32_t output_color_index;
uint32_t output_point_size_index;
uint32_t output_sample_mask_index;
struct qreg undef;
enum qstage stage;
@@ -418,6 +439,8 @@ struct vc4_compile {
*/
#define VC4_NIR_TLB_COLOR_READ_INPUT 2000000000
#define VC4_NIR_MS_MASK_OUTPUT 2000000000
/* Special offset for nir_load_uniform values to get a QUNIFORM_*
* state-dependent value.
*/
@@ -476,6 +499,7 @@ nir_ssa_def *vc4_nir_get_state_uniform(struct nir_builder *b,
enum quniform_contents contents);
nir_ssa_def *vc4_nir_get_swizzled_channel(struct nir_builder *b,
nir_ssa_def **srcs, int swiz);
void vc4_nir_lower_txf_ms(struct vc4_compile *c);
void qir_lower_uniforms(struct vc4_compile *c);
void qpu_schedule_instructions(struct vc4_compile *c);
@@ -616,9 +640,11 @@ QIR_ALU0(FRAG_REV_FLAG)
QIR_ALU0(TEX_RESULT)
QIR_ALU0(TLB_COLOR_READ)
QIR_NODST_1(TLB_COLOR_WRITE)
QIR_NODST_1(TLB_COLOR_WRITE_MS)
QIR_NODST_1(TLB_Z_WRITE)
QIR_NODST_1(TLB_DISCARD_SETUP)
QIR_NODST_1(TLB_STENCIL_SETUP)
QIR_NODST_1(MS_MASK)
static inline struct qreg
qir_UNPACK_8_F(struct vc4_compile *c, struct qreg src, int i)

View File

@@ -116,6 +116,17 @@ qpu_tlbc()
return r;
}
static inline struct qpu_reg
qpu_tlbc_ms()
{
struct qpu_reg r = {
QPU_MUX_A,
QPU_W_TLB_COLOR_MS,
};
return r;
}
static inline struct qpu_reg qpu_r0(void) { return qpu_rn(0); }
static inline struct qpu_reg qpu_r1(void) { return qpu_rn(1); }
static inline struct qpu_reg qpu_r2(void) { return qpu_rn(2); }

View File

@@ -387,6 +387,14 @@ vc4_generate_code(struct vc4_context *vc4, struct vc4_compile *c)
qpu_rb(QPU_R_MS_REV_FLAGS)));
break;
case QOP_MS_MASK:
src[1] = qpu_ra(QPU_R_MS_REV_FLAGS);
fixup_raddr_conflict(c, dst, &src[0], &src[1],
qinst, &unpack);
queue(c, qpu_a_AND(qpu_ra(QPU_W_MS_FLAGS),
src[0], src[1]) | unpack);
break;
case QOP_FRAG_Z:
case QOP_FRAG_W:
/* QOP_FRAG_Z/W don't emit instructions, just allocate
@@ -430,6 +438,13 @@ vc4_generate_code(struct vc4_context *vc4, struct vc4_compile *c)
}
break;
case QOP_TLB_COLOR_WRITE_MS:
queue(c, qpu_a_MOV(qpu_tlbc_ms(), src[0]));
if (discard) {
set_last_cond_add(c, QPU_COND_ZS);
}
break;
case QOP_VARY_ADD_C:
queue(c, qpu_a_FADD(dst, src[0], qpu_r5()) | unpack);
break;

View File

@@ -295,6 +295,10 @@ process_waddr_deps(struct schedule_state *state, struct schedule_node *n,
add_write_dep(state, &state->last_tlb, n);
break;
case QPU_W_MS_FLAGS:
add_write_dep(state, &state->last_tlb, n);
break;
case QPU_W_NOP:
break;

View File

@@ -22,6 +22,7 @@
* IN THE SOFTWARE.
*/
#include "util/u_blit.h"
#include "util/u_memory.h"
#include "util/u_format.h"
#include "util/u_inlines.h"
@@ -72,11 +73,18 @@ vc4_resource_transfer_unmap(struct pipe_context *pctx,
{
struct vc4_context *vc4 = vc4_context(pctx);
struct vc4_transfer *trans = vc4_transfer(ptrans);
struct pipe_resource *prsc = ptrans->resource;
struct vc4_resource *rsc = vc4_resource(prsc);
struct vc4_resource_slice *slice = &rsc->slices[ptrans->level];
if (trans->map) {
struct vc4_resource *rsc;
struct vc4_resource_slice *slice;
if (trans->ss_resource) {
rsc = vc4_resource(trans->ss_resource);
slice = &rsc->slices[0];
} else {
rsc = vc4_resource(ptrans->resource);
slice = &rsc->slices[ptrans->level];
}
if (ptrans->usage & PIPE_TRANSFER_WRITE) {
vc4_store_tiled_image(rsc->bo->map + slice->offset +
ptrans->box.z * rsc->cube_map_stride,
@@ -88,10 +96,52 @@ vc4_resource_transfer_unmap(struct pipe_context *pctx,
free(trans->map);
}
if (trans->ss_resource && (ptrans->usage & PIPE_TRANSFER_WRITE)) {
struct pipe_blit_info blit;
memset(&blit, 0, sizeof(blit));
blit.src.resource = trans->ss_resource;
blit.src.format = trans->ss_resource->format;
blit.src.box.width = trans->ss_box.width;
blit.src.box.height = trans->ss_box.height;
blit.src.box.depth = 1;
blit.dst.resource = ptrans->resource;
blit.dst.format = ptrans->resource->format;
blit.dst.level = ptrans->level;
blit.dst.box = trans->ss_box;
blit.mask = util_format_get_mask(ptrans->resource->format);
blit.filter = PIPE_TEX_FILTER_NEAREST;
pctx->blit(pctx, &blit);
vc4_flush(pctx);
pipe_resource_reference(&trans->ss_resource, NULL);
}
pipe_resource_reference(&ptrans->resource, NULL);
util_slab_free(&vc4->transfer_pool, ptrans);
}
static struct pipe_resource *
vc4_get_temp_resource(struct pipe_context *pctx,
struct pipe_resource *prsc,
const struct pipe_box *box)
{
struct pipe_resource temp_setup;
memset(&temp_setup, 0, sizeof(temp_setup));
temp_setup.target = prsc->target;
temp_setup.format = prsc->format;
temp_setup.width0 = box->width;
temp_setup.height0 = box->height;
temp_setup.depth0 = 1;
temp_setup.array_size = 1;
return pctx->screen->resource_create(pctx->screen, &temp_setup);
}
static void *
vc4_resource_transfer_map(struct pipe_context *pctx,
struct pipe_resource *prsc,
@@ -101,7 +151,6 @@ vc4_resource_transfer_map(struct pipe_context *pctx,
{
struct vc4_context *vc4 = vc4_context(pctx);
struct vc4_resource *rsc = vc4_resource(prsc);
struct vc4_resource_slice *slice = &rsc->slices[level];
struct vc4_transfer *trans;
struct pipe_transfer *ptrans;
enum pipe_format format = prsc->format;
@@ -155,6 +204,50 @@ vc4_resource_transfer_map(struct pipe_context *pctx,
ptrans->usage = usage;
ptrans->box = *box;
/* If the resource is multisampled, we need to resolve to single
* sample. This seems like it should be handled at a higher layer.
*/
if (prsc->nr_samples) {
trans->ss_resource = vc4_get_temp_resource(pctx, prsc, box);
if (!trans->ss_resource)
goto fail;
assert(!trans->ss_resource->nr_samples);
/* The ptrans->box gets modified for tile alignment, so save
* the original box for unmap time.
*/
trans->ss_box = *box;
if (usage & PIPE_TRANSFER_READ) {
struct pipe_blit_info blit;
memset(&blit, 0, sizeof(blit));
blit.src.resource = ptrans->resource;
blit.src.format = ptrans->resource->format;
blit.src.level = ptrans->level;
blit.src.box = trans->ss_box;
blit.dst.resource = trans->ss_resource;
blit.dst.format = trans->ss_resource->format;
blit.dst.box.width = trans->ss_box.width;
blit.dst.box.height = trans->ss_box.height;
blit.dst.box.depth = 1;
blit.mask = util_format_get_mask(prsc->format);
blit.filter = PIPE_TEX_FILTER_NEAREST;
pctx->blit(pctx, &blit);
vc4_flush(pctx);
}
/* The rest of the mapping process should use our temporary. */
prsc = trans->ss_resource;
rsc = vc4_resource(prsc);
ptrans->box.x = 0;
ptrans->box.y = 0;
ptrans->box.z = 0;
}
/* Note that the current kernel implementation is synchronous, so no
* need to do syncing stuff here yet.
*/
@@ -170,6 +263,7 @@ vc4_resource_transfer_map(struct pipe_context *pctx,
*pptrans = ptrans;
struct vc4_resource_slice *slice = &rsc->slices[level];
if (rsc->tiled) {
uint32_t utile_w = vc4_utile_width(rsc->cpp);
uint32_t utile_h = vc4_utile_height(rsc->cpp);
@@ -203,7 +297,7 @@ vc4_resource_transfer_map(struct pipe_context *pctx,
ptrans->box.height != orig_height) {
vc4_load_tiled_image(trans->map, ptrans->stride,
buf + slice->offset +
box->z * rsc->cube_map_stride,
ptrans->box.z * rsc->cube_map_stride,
slice->stride,
slice->tiling, rsc->cpp,
&ptrans->box);
@@ -216,9 +310,9 @@ vc4_resource_transfer_map(struct pipe_context *pctx,
ptrans->layer_stride = ptrans->stride;
return buf + slice->offset +
box->y / util_format_get_blockheight(format) * ptrans->stride +
box->x / util_format_get_blockwidth(format) * rsc->cpp +
box->z * rsc->cube_map_stride;
ptrans->box.y / util_format_get_blockheight(format) * ptrans->stride +
ptrans->box.x / util_format_get_blockwidth(format) * rsc->cpp +
ptrans->box.z * rsc->cube_map_stride;
}
@@ -283,7 +377,13 @@ vc4_setup_slices(struct vc4_resource *rsc)
if (!rsc->tiled) {
slice->tiling = VC4_TILING_FORMAT_LINEAR;
level_width = align(level_width, utile_w);
if (prsc->nr_samples) {
/* MSAA (4x) surfaces are stored as raw tile buffer contents. */
level_width = align(level_width, 32);
level_height = align(level_height, 32);
} else {
level_width = align(level_width, utile_w);
}
} else {
if (vc4_size_is_lt(level_width, level_height,
rsc->cpp)) {
@@ -300,7 +400,8 @@ vc4_setup_slices(struct vc4_resource *rsc)
}
slice->offset = offset;
slice->stride = level_width * rsc->cpp;
slice->stride = (level_width * rsc->cpp *
MAX2(prsc->nr_samples, 1));
slice->size = level_height * slice->stride;
offset += slice->size;
@@ -357,7 +458,10 @@ vc4_resource_setup(struct pipe_screen *pscreen,
prsc->screen = pscreen;
rsc->base.vtbl = &vc4_resource_vtbl;
rsc->cpp = util_format_get_blocksize(tmpl->format);
if (prsc->nr_samples == 0)
rsc->cpp = util_format_get_blocksize(tmpl->format);
else
rsc->cpp = sizeof(uint32_t);
assert(rsc->cpp);
@@ -371,8 +475,12 @@ get_resource_texture_format(struct pipe_resource *prsc)
uint8_t format = vc4_get_tex_format(prsc->format);
if (!rsc->tiled) {
assert(format == VC4_TEXTURE_TYPE_RGBA8888);
return VC4_TEXTURE_TYPE_RGBA32R;
if (prsc->nr_samples) {
return ~0;
} else {
assert(format == VC4_TEXTURE_TYPE_RGBA8888);
return VC4_TEXTURE_TYPE_RGBA32R;
}
}
return format;
@@ -389,6 +497,7 @@ vc4_resource_create(struct pipe_screen *pscreen,
* communicate metadata about tiling currently.
*/
if (tmpl->target == PIPE_BUFFER ||
tmpl->nr_samples ||
(tmpl->bind & (PIPE_BIND_SCANOUT |
PIPE_BIND_LINEAR |
PIPE_BIND_SHARED |
@@ -492,13 +601,9 @@ vc4_surface_destroy(struct pipe_context *pctx, struct pipe_surface *psurf)
FREE(psurf);
}
/** Debug routine to dump the contents of an 8888 surface to the console */
void
vc4_dump_surface(struct pipe_surface *psurf)
static void
vc4_dump_surface_non_msaa(struct pipe_surface *psurf)
{
if (!psurf)
return;
struct pipe_resource *prsc = psurf->texture;
struct vc4_resource *rsc = vc4_resource(prsc);
uint32_t *map = vc4_bo_map(rsc->bo);
@@ -592,6 +697,147 @@ vc4_dump_surface(struct pipe_surface *psurf)
}
}
static uint32_t
vc4_surface_msaa_get_sample(struct pipe_surface *psurf,
uint32_t x, uint32_t y, uint32_t sample)
{
struct pipe_resource *prsc = psurf->texture;
struct vc4_resource *rsc = vc4_resource(prsc);
uint32_t tile_w = 32, tile_h = 32;
uint32_t tiles_w = DIV_ROUND_UP(psurf->width, 32);
uint32_t tile_x = x / tile_w;
uint32_t tile_y = y / tile_h;
uint32_t *tile = (vc4_bo_map(rsc->bo) +
VC4_TILE_BUFFER_SIZE * (tile_y * tiles_w + tile_x));
uint32_t subtile_x = x % tile_w;
uint32_t subtile_y = y % tile_h;
uint32_t quad_samples = VC4_MAX_SAMPLES * 4;
uint32_t tile_stride = quad_samples * tile_w / 2;
return *((uint32_t *)tile +
(subtile_y >> 1) * tile_stride +
(subtile_x >> 1) * quad_samples +
((subtile_y & 1) << 1) +
(subtile_x & 1) +
sample);
}
static void
vc4_dump_surface_msaa_char(struct pipe_surface *psurf,
uint32_t start_x, uint32_t start_y,
uint32_t w, uint32_t h)
{
bool all_same_color = true;
uint32_t all_pix = 0;
for (int y = start_y; y < start_y + h; y++) {
for (int x = start_x; x < start_x + w; x++) {
for (int s = 0; s < VC4_MAX_SAMPLES; s++) {
uint32_t pix = vc4_surface_msaa_get_sample(psurf,
x, y,
s);
if (x == start_x && y == start_y)
all_pix = pix;
else if (all_pix != pix)
all_same_color = false;
}
}
}
if (all_same_color) {
static const struct {
uint32_t val;
const char *c;
} named_colors[] = {
{ 0xff000000, "" },
{ 0x00000000, "" },
{ 0xffff0000, "r" },
{ 0xff00ff00, "g" },
{ 0xff0000ff, "b" },
{ 0xffffffff, "w" },
};
int i;
for (i = 0; i < ARRAY_SIZE(named_colors); i++) {
if (named_colors[i].val == all_pix) {
fprintf(stderr, "%s",
named_colors[i].c);
return;
}
}
fprintf(stderr, "x");
} else {
fprintf(stderr, ".");
}
}
static void
vc4_dump_surface_msaa(struct pipe_surface *psurf)
{
uint32_t tile_w = 32, tile_h = 32;
uint32_t tiles_w = DIV_ROUND_UP(psurf->width, tile_w);
uint32_t tiles_h = DIV_ROUND_UP(psurf->height, tile_h);
uint32_t char_w = 140, char_h = 60;
uint32_t char_w_per_tile = char_w / tiles_w - 1;
uint32_t char_h_per_tile = char_h / tiles_h - 1;
uint32_t found_colors[10];
uint32_t num_found_colors = 0;
fprintf(stderr, "Surface: %dx%d (%dx MSAA)\n",
psurf->width, psurf->height, psurf->texture->nr_samples);
for (int x = 0; x < (char_w_per_tile + 1) * tiles_w; x++)
fprintf(stderr, "-");
fprintf(stderr, "\n");
for (int ty = 0; ty < psurf->height; ty += tile_h) {
for (int y = 0; y < char_h_per_tile; y++) {
for (int tx = 0; tx < psurf->width; tx += tile_w) {
for (int x = 0; x < char_w_per_tile; x++) {
uint32_t bx1 = (x * tile_w /
char_w_per_tile);
uint32_t bx2 = ((x + 1) * tile_w /
char_w_per_tile);
uint32_t by1 = (y * tile_h /
char_h_per_tile);
uint32_t by2 = ((y + 1) * tile_h /
char_h_per_tile);
vc4_dump_surface_msaa_char(psurf,
tx + bx1,
ty + by1,
bx2 - bx1,
by2 - by1);
}
fprintf(stderr, "|");
}
fprintf(stderr, "\n");
}
for (int x = 0; x < (char_w_per_tile + 1) * tiles_w; x++)
fprintf(stderr, "-");
fprintf(stderr, "\n");
}
for (int i = 0; i < num_found_colors; i++) {
fprintf(stderr, "color %d: 0x%08x\n", i, found_colors[i]);
}
}
/** Debug routine to dump the contents of an 8888 surface to the console */
void
vc4_dump_surface(struct pipe_surface *psurf)
{
if (!psurf)
return;
if (psurf->texture->nr_samples)
vc4_dump_surface_msaa(psurf);
else
vc4_dump_surface_non_msaa(psurf);
}
static void
vc4_flush_resource(struct pipe_context *pctx, struct pipe_resource *resource)
{

View File

@@ -32,6 +32,9 @@
struct vc4_transfer {
struct pipe_transfer base;
void *map;
struct pipe_resource *ss_resource;
struct pipe_box ss_box;
};
struct vc4_resource_slice {

View File

@@ -95,6 +95,7 @@ vc4_screen_get_param(struct pipe_screen *pscreen, enum pipe_cap param)
case PIPE_CAP_BLEND_EQUATION_SEPARATE:
case PIPE_CAP_TWO_SIDED_STENCIL:
case PIPE_CAP_USER_INDEX_BUFFERS:
case PIPE_CAP_TEXTURE_MULTISAMPLE:
return 1;
/* lying for GL 2.0 */
@@ -140,7 +141,6 @@ vc4_screen_get_param(struct pipe_screen *pscreen, enum pipe_cap param)
case PIPE_CAP_PREFER_BLIT_BASED_TEXTURE_TRANSFER:
case PIPE_CAP_CONDITIONAL_RENDER:
case PIPE_CAP_PRIMITIVE_RESTART:
case PIPE_CAP_TEXTURE_MULTISAMPLE:
case PIPE_CAP_TEXTURE_BARRIER:
case PIPE_CAP_SM3:
case PIPE_CAP_INDEP_BLEND_ENABLE:
@@ -358,7 +358,6 @@ vc4_screen_is_format_supported(struct pipe_screen *pscreen,
unsigned retval = 0;
if ((target >= PIPE_MAX_TEXTURE_TYPES) ||
(sample_count > 1) ||
!util_format_is_supported(format, usage)) {
return FALSE;
}
@@ -417,11 +416,13 @@ vc4_screen_is_format_supported(struct pipe_screen *pscreen,
}
if ((usage & PIPE_BIND_RENDER_TARGET) &&
(sample_count == 0 || sample_count == VC4_MAX_SAMPLES) &&
vc4_rt_format_supported(format)) {
retval |= PIPE_BIND_RENDER_TARGET;
}
if ((usage & PIPE_BIND_SAMPLER_VIEW) &&
(sample_count == 0 || sample_count == VC4_MAX_SAMPLES) &&
(vc4_tex_format_supported(format))) {
retval |= PIPE_BIND_SAMPLER_VIEW;
}

View File

@@ -65,7 +65,7 @@ struct drm_device {
};
struct drm_gem_object {
uint32_t size;
size_t size;
struct drm_device *dev;
};

View File

@@ -79,7 +79,7 @@ static void
vc4_set_sample_mask(struct pipe_context *pctx, unsigned sample_mask)
{
struct vc4_context *vc4 = vc4_context(pctx);
vc4->sample_mask = (uint16_t)sample_mask;
vc4->sample_mask = sample_mask & ((1 << VC4_MAX_SAMPLES) - 1);
vc4->dirty |= VC4_DIRTY_SAMPLE_MASK;
}
@@ -121,6 +121,9 @@ vc4_create_rasterizer_state(struct pipe_context *pctx,
so->offset_factor = float_to_187_half(cso->offset_scale);
}
if (cso->multisample)
so->config_bits[0] |= VC4_CONFIG_BITS_RASTERIZER_OVERSAMPLE_4X;
return so;
}
@@ -457,6 +460,22 @@ vc4_set_framebuffer_state(struct pipe_context *pctx,
rsc->cpp);
}
vc4->msaa = false;
if (cso->cbufs[0])
vc4->msaa = cso->cbufs[0]->texture->nr_samples != 0;
else if (cso->zsbuf)
vc4->msaa = cso->zsbuf->texture->nr_samples != 0;
if (vc4->msaa) {
vc4->tile_width = 32;
vc4->tile_height = 32;
} else {
vc4->tile_width = 64;
vc4->tile_height = 64;
}
vc4->draw_tiles_x = DIV_ROUND_UP(cso->width, vc4->tile_width);
vc4->draw_tiles_y = DIV_ROUND_UP(cso->height, vc4->tile_height);
vc4->dirty |= VC4_DIRTY_FRAMEBUFFER;
}

View File

@@ -71,6 +71,18 @@ write_texture_p2(struct vc4_context *vc4,
VC4_SET_FIELD((data >> 16) & 1, VC4_TEX_P2_BSLOD));
}
static void
write_texture_msaa_addr(struct vc4_context *vc4,
struct vc4_cl_out **uniforms,
struct vc4_texture_stateobj *texstate,
uint32_t unit)
{
struct pipe_sampler_view *texture = texstate->textures[unit];
struct vc4_resource *rsc = vc4_resource(texture->texture);
cl_aligned_reloc(vc4, &vc4->uniforms, uniforms, rsc->bo, 0);
}
#define SWIZ(x,y,z,w) { \
UTIL_FORMAT_SWIZZLE_##x, \
@@ -244,6 +256,11 @@ vc4_write_uniforms(struct vc4_context *vc4, struct vc4_compiled_shader *shader,
cl_aligned_reloc(vc4, &vc4->uniforms, &uniforms, ubo, 0);
break;
case QUNIFORM_TEXTURE_MSAA_ADDR:
write_texture_msaa_addr(vc4, &uniforms,
texstate, uinfo->data[i]);
break;
case QUNIFORM_TEXTURE_BORDER_COLOR:
write_texture_border_color(vc4, &uniforms,
texstate, uinfo->data[i]);
@@ -303,6 +320,10 @@ vc4_write_uniforms(struct vc4_context *vc4, struct vc4_compiled_shader *shader,
cl_aligned_f(&uniforms,
vc4->zsa->base.alpha.ref_value);
break;
case QUNIFORM_SAMPLE_MASK:
cl_aligned_u32(&uniforms, vc4->sample_mask);
break;
}
#if 0
uint32_t written_val = *((uint32_t *)uniforms - 1);
@@ -345,6 +366,7 @@ vc4_set_shader_uniform_dirty_flags(struct vc4_compiled_shader *shader)
case QUNIFORM_TEXTURE_CONFIG_P1:
case QUNIFORM_TEXTURE_CONFIG_P2:
case QUNIFORM_TEXTURE_BORDER_COLOR:
case QUNIFORM_TEXTURE_MSAA_ADDR:
case QUNIFORM_TEXRECT_SCALE_X:
case QUNIFORM_TEXRECT_SCALE_Y:
dirty |= VC4_DIRTY_TEXSTATE;
@@ -363,6 +385,10 @@ vc4_set_shader_uniform_dirty_flags(struct vc4_compiled_shader *shader)
case QUNIFORM_ALPHA_REF:
dirty |= VC4_DIRTY_ZSA;
break;
case QUNIFORM_SAMPLE_MASK:
dirty |= VC4_DIRTY_SAMPLE_MASK;
break;
}
}

View File

@@ -28,10 +28,14 @@
#include "pipe/p_screen.h"
#include "util/u_video.h"
#include "vl/vl_winsys.h"
#include "va_private.h"
DEBUG_GET_ONCE_BOOL_OPTION(mpeg4, "VAAPI_MPEG4_ENABLED", false)
VAStatus
vlVaQueryConfigProfiles(VADriverContextP ctx, VAProfile *profile_list, int *num_profiles)
{
@@ -45,12 +49,16 @@ vlVaQueryConfigProfiles(VADriverContextP ctx, VAProfile *profile_list, int *num_
*num_profiles = 0;
pscreen = VL_VA_PSCREEN(ctx);
for (p = PIPE_VIDEO_PROFILE_MPEG2_SIMPLE; p <= PIPE_VIDEO_PROFILE_HEVC_MAIN_444; ++p)
for (p = PIPE_VIDEO_PROFILE_MPEG2_SIMPLE; p <= PIPE_VIDEO_PROFILE_HEVC_MAIN_444; ++p) {
if (u_reduce_video_profile(p) == PIPE_VIDEO_FORMAT_MPEG4 && !debug_get_option_mpeg4())
continue;
if (pscreen->get_video_param(pscreen, p, PIPE_VIDEO_ENTRYPOINT_BITSTREAM, PIPE_VIDEO_CAP_SUPPORTED)) {
vap = PipeToProfile(p);
if (vap != VAProfileNone)
profile_list[(*num_profiles)++] = vap;
}
}
/* Support postprocessing through vl_compositor */
profile_list[(*num_profiles)++] = VAProfileNone;

View File

@@ -1737,7 +1737,7 @@ ast_function_expression::handle_method(exec_list *instructions,
result = new(ctx) ir_constant(op->type->array_size());
}
} else if (op->type->is_vector()) {
if (state->ARB_shading_language_420pack_enable) {
if (state->has_420pack()) {
/* .length() returns int. */
result = new(ctx) ir_constant((int) op->type->vector_elements);
} else {
@@ -1746,7 +1746,7 @@ ast_function_expression::handle_method(exec_list *instructions,
goto fail;
}
} else if (op->type->is_matrix()) {
if (state->ARB_shading_language_420pack_enable) {
if (state->has_420pack()) {
/* .length() returns int. */
result = new(ctx) ir_constant((int) op->type->matrix_columns);
} else {
@@ -2075,7 +2075,7 @@ ast_aggregate_initializer::hir(exec_list *instructions,
}
const glsl_type *const constructor_type = this->constructor_type;
if (!state->ARB_shading_language_420pack_enable) {
if (!state->has_420pack()) {
_mesa_glsl_error(&loc, state, "C-style initialization requires the "
"GL_ARB_shading_language_420pack extension");
return ir_rvalue::error_value(ctx);

View File

@@ -2649,7 +2649,9 @@ apply_explicit_binding(struct _mesa_glsl_parse_state *state,
return;
}
} else if (state->is_version(420, 310) && base_type->is_image()) {
} else if ((state->is_version(420, 310) ||
state->ARB_shading_language_420pack_enable) &&
base_type->is_image()) {
assert(ctx->Const.MaxImageUnits <= MAX_IMAGE_UNITS);
if (max_index >= ctx->Const.MaxImageUnits) {
_mesa_glsl_error(loc, state, "Image binding %d exceeds the "
@@ -3736,7 +3738,7 @@ process_initializer(ir_variable *var, ast_declaration *decl,
* expressions. Const-qualified global variables must still be
* initialized with constant expressions.
*/
if (!state->ARB_shading_language_420pack_enable
if (!state->has_420pack()
|| state->current_function == NULL) {
_mesa_glsl_error(& initializer_loc, state,
"initializer of %s variable `%s' must be a "
@@ -5365,7 +5367,7 @@ ast_jump_statement::hir(exec_list *instructions,
if (state->current_function->return_type != ret_type) {
YYLTYPE loc = this->get_location();
if (state->ARB_shading_language_420pack_enable) {
if (state->has_420pack()) {
if (!apply_implicit_conversion(state->current_function->return_type,
ret, state)) {
_mesa_glsl_error(& loc, state,

View File

@@ -948,7 +948,7 @@ parameter_qualifier:
if (($1.flags.q.in || $1.flags.q.out) && ($2.flags.q.in || $2.flags.q.out))
_mesa_glsl_error(&@1, state, "duplicate in/out/inout qualifier");
if (!state->has_420pack() && $2.flags.q.constant)
if (!state->has_420pack_or_es31() && $2.flags.q.constant)
_mesa_glsl_error(&@1, state, "in/out/inout must come after const "
"or precise");
@@ -960,7 +960,7 @@ parameter_qualifier:
if ($2.precision != ast_precision_none)
_mesa_glsl_error(&@1, state, "duplicate precision qualifier");
if (!(state->has_420pack() || state->is_version(420, 310)) &&
if (!state->has_420pack_or_es31() &&
$2.flags.i != 0)
_mesa_glsl_error(&@1, state, "precision qualifiers must come last");
@@ -1482,7 +1482,7 @@ layout_qualifier_id:
$$.index = $3;
}
if ((state->has_420pack() ||
if ((state->has_420pack_or_es31() ||
state->has_atomic_counters() ||
state->has_shader_storage_buffer_objects()) &&
match_layout_qualifier("binding", $1, state) == 0) {
@@ -1714,7 +1714,7 @@ type_qualifier:
if ($2.flags.q.invariant)
_mesa_glsl_error(&@1, state, "duplicate \"invariant\" qualifier");
if (!state->has_420pack() && $2.flags.q.precise)
if (!state->has_420pack_or_es31() && $2.flags.q.precise)
_mesa_glsl_error(&@1, state,
"\"invariant\" must come after \"precise\"");
@@ -1747,7 +1747,7 @@ type_qualifier:
if ($2.has_interpolation())
_mesa_glsl_error(&@1, state, "duplicate interpolation qualifier");
if (!state->has_420pack() &&
if (!state->has_420pack_or_es31() &&
($2.flags.q.precise || $2.flags.q.invariant)) {
_mesa_glsl_error(&@1, state, "interpolation qualifiers must come "
"after \"precise\" or \"invariant\"");
@@ -1767,7 +1767,7 @@ type_qualifier:
* precise qualifiers since these are useful in ARB_separate_shader_objects.
* There is no clear spec guidance on this either.
*/
if (!state->has_420pack() && $2.has_layout())
if (!state->has_420pack_or_es31() && $2.has_layout())
_mesa_glsl_error(&@1, state, "duplicate layout(...) qualifiers");
$$ = $1;
@@ -1785,7 +1785,7 @@ type_qualifier:
"duplicate auxiliary storage qualifier (centroid or sample)");
}
if (!state->has_420pack() &&
if (!state->has_420pack_or_es31() &&
($2.flags.q.precise || $2.flags.q.invariant ||
$2.has_interpolation() || $2.has_layout())) {
_mesa_glsl_error(&@1, state, "auxiliary storage qualifiers must come "
@@ -1803,7 +1803,7 @@ type_qualifier:
if ($2.has_storage())
_mesa_glsl_error(&@1, state, "duplicate storage qualifier");
if (!state->has_420pack() &&
if (!state->has_420pack_or_es31() &&
($2.flags.q.precise || $2.flags.q.invariant || $2.has_interpolation() ||
$2.has_layout() || $2.has_auxiliary_storage())) {
_mesa_glsl_error(&@1, state, "storage qualifiers must come after "
@@ -1819,7 +1819,7 @@ type_qualifier:
if ($2.precision != ast_precision_none)
_mesa_glsl_error(&@1, state, "duplicate precision qualifier");
if (!(state->has_420pack() || state->is_version(420, 310)) &&
if (!(state->has_420pack_or_es31()) &&
$2.flags.i != 0)
_mesa_glsl_error(&@1, state, "precision qualifiers must come last");
@@ -2575,7 +2575,7 @@ interface_block:
{
ast_interface_block *block = (ast_interface_block *) $2;
if (!state->has_420pack() && block->layout.has_layout() &&
if (!state->has_420pack_or_es31() && block->layout.has_layout() &&
!block->layout.is_default_qualifier) {
_mesa_glsl_error(&@1, state, "duplicate layout(...) qualifiers");
YYERROR;

View File

@@ -255,6 +255,11 @@ struct _mesa_glsl_parse_state {
return ARB_shading_language_420pack_enable || is_version(420, 0);
}
bool has_420pack_or_es31() const
{
return ARB_shading_language_420pack_enable || is_version(420, 310);
}
bool has_compute_shader() const
{
return ARB_compute_shader_enable || is_version(430, 310);

View File

@@ -57,8 +57,7 @@ _mesa_ast_field_selection_to_hir(const ast_expression *expr,
expr->primary_expression.identifier);
}
} else if (op->type->is_vector() ||
(state->ARB_shading_language_420pack_enable &&
op->type->is_scalar())) {
(state->has_420pack() && op->type->is_scalar())) {
ir_swizzle *swiz = ir_swizzle::create(op,
expr->primary_expression.identifier,
op->type->vector_elements);

View File

@@ -1669,6 +1669,7 @@ ir_variable::ir_variable(const struct glsl_type *type, const char *name,
this->data.pixel_center_integer = false;
this->data.depth_layout = ir_depth_layout_none;
this->data.used = false;
this->data.always_active_io = false;
this->data.read_only = false;
this->data.centroid = false;
this->data.sample = false;

View File

@@ -658,6 +658,13 @@ public:
*/
unsigned assigned:1;
/**
* When separate shader programs are enabled, only input/outputs between
* the stages of a multi-stage separate program can be safely removed
* from the shader interface. Other input/outputs must remains active.
*/
unsigned always_active_io:1;
/**
* Enum indicating how the variable was declared. See
* ir_var_declaration_type.

View File

@@ -766,7 +766,7 @@ public:
gl_shader_stage consumer_stage);
~varying_matches();
void record(ir_variable *producer_var, ir_variable *consumer_var);
unsigned assign_locations(uint64_t reserved_slots);
unsigned assign_locations(uint64_t reserved_slots, bool separate_shader);
void store_locations() const;
private:
@@ -986,11 +986,36 @@ varying_matches::record(ir_variable *producer_var, ir_variable *consumer_var)
* passed to varying_matches::record().
*/
unsigned
varying_matches::assign_locations(uint64_t reserved_slots)
varying_matches::assign_locations(uint64_t reserved_slots, bool separate_shader)
{
/* Sort varying matches into an order that makes them easy to pack. */
qsort(this->matches, this->num_matches, sizeof(*this->matches),
&varying_matches::match_comparator);
/* We disable varying sorting for separate shader programs for the
* following reasons:
*
* 1/ All programs must sort the code in the same order to guarantee the
* interface matching. However varying_matches::record() will change the
* interpolation qualifier of some stages.
*
* 2/ GLSL version 4.50 removes the matching constrain on the interpolation
* qualifier.
*
* From Section 4.5 (Interpolation Qualifiers) of the GLSL 4.40 spec:
*
* "The type and presence of interpolation qualifiers of variables with
* the same name declared in all linked shaders for the same cross-stage
* interface must match, otherwise the link command will fail.
*
* When comparing an output from one stage to an input of a subsequent
* stage, the input and output don't match if their interpolation
* qualifiers (or lack thereof) are not the same."
*
* "It is a link-time error if, within the same stage, the interpolation
* qualifiers of variables of the same name do not match."
*/
if (!separate_shader) {
/* Sort varying matches into an order that makes them easy to pack. */
qsort(this->matches, this->num_matches, sizeof(*this->matches),
&varying_matches::match_comparator);
}
unsigned generic_location = 0;
unsigned generic_patch_location = MAX_VARYING*4;
@@ -1590,7 +1615,8 @@ assign_varying_locations(struct gl_context *ctx,
reserved_varying_slot(producer, ir_var_shader_out) |
reserved_varying_slot(consumer, ir_var_shader_in);
const unsigned slots_used = matches.assign_locations(reserved_slots);
const unsigned slots_used = matches.assign_locations(reserved_slots,
prog->SeparateShader);
matches.store_locations();
for (unsigned i = 0; i < num_tfeedback_decls; ++i) {

View File

@@ -3940,6 +3940,77 @@ split_ubos_and_ssbos(void *mem_ctx,
assert(*num_ubos + *num_ssbos == num_blocks);
}
static void
set_always_active_io(exec_list *ir, ir_variable_mode io_mode)
{
assert(io_mode == ir_var_shader_in || io_mode == ir_var_shader_out);
foreach_in_list(ir_instruction, node, ir) {
ir_variable *const var = node->as_variable();
if (var == NULL || var->data.mode != io_mode)
continue;
/* Don't set always active on builtins that haven't been redeclared */
if (var->data.how_declared == ir_var_declared_implicitly)
continue;
var->data.always_active_io = true;
}
}
/**
* When separate shader programs are enabled, only input/outputs between
* the stages of a multi-stage separate program can be safely removed
* from the shader interface. Other inputs/outputs must remain active.
*/
static void
disable_varying_optimizations_for_sso(struct gl_shader_program *prog)
{
unsigned first, last;
assert(prog->SeparateShader);
first = MESA_SHADER_STAGES;
last = 0;
/* Determine first and last stage. Excluding the compute stage */
for (unsigned i = 0; i < MESA_SHADER_COMPUTE; i++) {
if (!prog->_LinkedShaders[i])
continue;
if (first == MESA_SHADER_STAGES)
first = i;
last = i;
}
if (first == MESA_SHADER_STAGES)
return;
for (unsigned stage = 0; stage < MESA_SHADER_STAGES; stage++) {
gl_shader *sh = prog->_LinkedShaders[stage];
if (!sh)
continue;
if (first == last) {
/* For a single shader program only allow inputs to the vertex shader
* and outputs from the fragment shader to be removed.
*/
if (stage != MESA_SHADER_VERTEX)
set_always_active_io(sh->ir, ir_var_shader_in);
if (stage != MESA_SHADER_FRAGMENT)
set_always_active_io(sh->ir, ir_var_shader_out);
} else {
/* For multi-stage separate shader programs only allow inputs and
* outputs between the shader stages to be removed as well as inputs
* to the vertex shader and outputs from the fragment shader.
*/
if (stage == first && stage != MESA_SHADER_VERTEX)
set_always_active_io(sh->ir, ir_var_shader_in);
else if (stage == last && stage != MESA_SHADER_FRAGMENT)
set_always_active_io(sh->ir, ir_var_shader_out);
}
}
}
void
link_shaders(struct gl_context *ctx, struct gl_shader_program *prog)
{
@@ -4199,6 +4270,9 @@ link_shaders(struct gl_context *ctx, struct gl_shader_program *prog)
}
}
if (prog->SeparateShader)
disable_varying_optimizations_for_sso(prog);
if (!interstage_cross_validate_uniform_blocks(prog))
goto done;

View File

@@ -187,6 +187,7 @@ flatten_named_interface_blocks_declarations::run(exec_list *instructions)
new_var->data.sample = iface_t->fields.structure[i].sample;
new_var->data.patch = iface_t->fields.structure[i].patch;
new_var->data.stream = var->data.stream;
new_var->data.how_declared = var->data.how_declared;
new_var->init_interface_type(iface_t);
hash_table_insert(interface_namespace, new_var,

View File

@@ -75,6 +75,20 @@ do_dead_code(exec_list *instructions, bool uniform_locations_assigned)
|| !entry->declaration)
continue;
/* Section 7.4.1 (Shader Interface Matching) of the OpenGL 4.5
* (Core Profile) spec says:
*
* "With separable program objects, interfaces between shader
* stages may involve the outputs from one program object and the
* inputs from a second program object. For such interfaces, it is
* not possible to detect mismatches at link time, because the
* programs are linked separately. When each such program is
* linked, all inputs or outputs interfacing with another program
* stage are treated as active."
*/
if (entry->var->data.always_active_io)
continue;
if (!entry->assign_list.is_empty()) {
/* Remove all the dead assignments to the variable we found.
* Don't do so if it's a shader or function output, though.

View File

@@ -196,6 +196,24 @@ intel_update_state(struct gl_context * ctx, GLuint new_state)
brw_render_cache_set_check_flush(brw, tex_obj->mt->bo);
}
/* Resolve color for each active shader image. */
for (unsigned i = 0; i < MESA_SHADER_STAGES; i++) {
const struct gl_shader *shader = ctx->_Shader->CurrentProgram[i] ?
ctx->_Shader->CurrentProgram[i]->_LinkedShaders[i] : NULL;
if (unlikely(shader && shader->NumImages)) {
for (unsigned j = 0; j < shader->NumImages; j++) {
struct gl_image_unit *u = &ctx->ImageUnits[shader->ImageUnits[j]];
tex_obj = intel_texture_object(u->TexObj);
if (tex_obj && tex_obj->mt) {
intel_miptree_resolve_color(brw, tex_obj->mt);
brw_render_cache_set_check_flush(brw, tex_obj->mt->bo);
}
}
}
}
_mesa_lock_context_textures(ctx);
}

View File

@@ -1434,14 +1434,12 @@ void brw_create_constant_surface(struct brw_context *brw,
drm_intel_bo *bo,
uint32_t offset,
uint32_t size,
uint32_t *out_offset,
bool dword_pitch);
uint32_t *out_offset);
void brw_create_buffer_surface(struct brw_context *brw,
drm_intel_bo *bo,
uint32_t offset,
uint32_t size,
uint32_t *out_offset,
bool dword_pitch);
uint32_t *out_offset);
void brw_update_buffer_texture_surface(struct gl_context *ctx,
unsigned unit,
uint32_t *surf_offset);
@@ -1453,8 +1451,7 @@ brw_update_sol_surface(struct brw_context *brw,
void brw_upload_ubo_surfaces(struct brw_context *brw,
struct gl_shader *shader,
struct brw_stage_state *stage_state,
struct brw_stage_prog_data *prog_data,
bool dword_pitch);
struct brw_stage_prog_data *prog_data);
void brw_upload_abo_surfaces(struct brw_context *brw,
struct gl_shader *shader,
struct brw_stage_state *stage_state,

View File

@@ -186,7 +186,7 @@ fs_visitor::VARYING_PULL_CONSTANT_LOAD(const fs_builder &bld,
* the redundant ones.
*/
fs_reg vec4_offset = vgrf(glsl_type::int_type);
bld.ADD(vec4_offset, varying_offset, brw_imm_ud(const_offset & ~3));
bld.ADD(vec4_offset, varying_offset, brw_imm_ud(const_offset & ~0xf));
int scale = 1;
if (devinfo->gen == 4 && bld.dispatch_width() == 8) {
@@ -218,7 +218,7 @@ fs_visitor::VARYING_PULL_CONSTANT_LOAD(const fs_builder &bld,
inst->mlen = 1 + bld.dispatch_width() / 8;
}
bld.MOV(dst, offset(vec4_result, bld, (const_offset & 3) * scale));
bld.MOV(dst, offset(vec4_result, bld, ((const_offset & 0xf) / 4) * scale));
}
/**
@@ -1999,10 +1999,12 @@ fs_visitor::demote_pull_constants()
/* Generate a pull load into dst. */
if (inst->src[i].reladdr) {
fs_reg indirect = ibld.vgrf(BRW_REGISTER_TYPE_D);
ibld.MUL(indirect, *inst->src[i].reladdr, brw_imm_d(4));
VARYING_PULL_CONSTANT_LOAD(ibld, dst,
brw_imm_ud(index),
*inst->src[i].reladdr,
pull_index);
indirect,
pull_index * 4);
inst->src[i].reladdr = NULL;
inst->src[i].stride = 1;
} else {
@@ -3038,13 +3040,11 @@ fs_visitor::lower_uniform_pull_constant_loads()
continue;
if (devinfo->gen >= 7) {
/* The offset arg before was a vec4-aligned byte offset. We need to
* turn it into a dword offset.
*/
/* The offset arg is a vec4-aligned immediate byte offset. */
fs_reg const_offset_reg = inst->src[1];
assert(const_offset_reg.file == IMM &&
const_offset_reg.type == BRW_REGISTER_TYPE_UD);
const_offset_reg.ud /= 4;
assert(const_offset_reg.ud % 16 == 0);
fs_reg payload, offset;
if (devinfo->gen >= 9) {

View File

@@ -1101,28 +1101,6 @@ fs_visitor::nir_emit_undef(const fs_builder &bld, nir_ssa_undef_instr *instr)
instr->def.num_components);
}
static fs_reg
fs_reg_for_nir_reg(fs_visitor *v, nir_register *nir_reg,
unsigned base_offset, nir_src *indirect)
{
fs_reg reg;
assert(!nir_reg->is_global);
reg = v->nir_locals[nir_reg->index];
reg = offset(reg, v->bld, base_offset * nir_reg->num_components);
if (indirect) {
int multiplier = nir_reg->num_components * (v->dispatch_width / 8);
reg.reladdr = new(v->mem_ctx) fs_reg(v->vgrf(glsl_type::int_type));
v->bld.MUL(*reg.reladdr, v->get_nir_src(*indirect),
brw_imm_d(multiplier));
}
return reg;
}
fs_reg
fs_visitor::get_nir_src(nir_src src)
{
@@ -1130,8 +1108,10 @@ fs_visitor::get_nir_src(nir_src src)
if (src.is_ssa) {
reg = nir_ssa_values[src.ssa->index];
} else {
reg = fs_reg_for_nir_reg(this, src.reg.reg, src.reg.base_offset,
src.reg.indirect);
/* We don't handle indirects on locals */
assert(src.reg.indirect == NULL);
reg = offset(nir_locals[src.reg.reg->index], bld,
src.reg.base_offset * src.reg.reg->num_components);
}
/* to avoid floating-point denorm flushing problems, set the type by
@@ -1148,10 +1128,12 @@ fs_visitor::get_nir_dest(nir_dest dest)
nir_ssa_values[dest.ssa.index] = bld.vgrf(BRW_REGISTER_TYPE_F,
dest.ssa.num_components);
return nir_ssa_values[dest.ssa.index];
} else {
/* We don't handle indirects on locals */
assert(dest.reg.indirect == NULL);
return offset(nir_locals[dest.reg.reg->index], bld,
dest.reg.base_offset * dest.reg.reg->num_components);
}
return fs_reg_for_nir_reg(this, dest.reg.reg, dest.reg.base_offset,
dest.reg.indirect);
}
fs_reg
@@ -2368,16 +2350,13 @@ fs_visitor::nir_emit_intrinsic(const fs_builder &bld, nir_intrinsic_instr *instr
}
if (has_indirect) {
/* Turn the byte offset into a dword offset. */
fs_reg base_offset = vgrf(glsl_type::int_type);
bld.SHR(base_offset, retype(get_nir_src(instr->src[1]),
BRW_REGISTER_TYPE_D),
brw_imm_d(2));
fs_reg base_offset = retype(get_nir_src(instr->src[1]),
BRW_REGISTER_TYPE_D);
unsigned vec4_offset = instr->const_index[0] / 4;
unsigned vec4_offset = instr->const_index[0];
for (int i = 0; i < instr->num_components; i++)
VARYING_PULL_CONSTANT_LOAD(bld, offset(dest, bld, i), surf_index,
base_offset, vec4_offset + i);
base_offset, vec4_offset + i * 4);
} else {
fs_reg packed_consts = vgrf(glsl_type::float_type);
packed_consts.type = dest.type;
@@ -2450,7 +2429,7 @@ fs_visitor::nir_emit_intrinsic(const fs_builder &bld, nir_intrinsic_instr *instr
}
case nir_intrinsic_load_input_indirect:
has_indirect = true;
unreachable("Not allowed");
/* fallthrough */
case nir_intrinsic_load_input: {
unsigned index = 0;
@@ -2462,8 +2441,6 @@ fs_visitor::nir_emit_intrinsic(const fs_builder &bld, nir_intrinsic_instr *instr
src = offset(retype(nir_inputs, dest.type), bld,
instr->const_index[0] + index);
}
if (has_indirect)
src.reladdr = new(mem_ctx) fs_reg(get_nir_src(instr->src[0]));
index++;
bld.MOV(dest, src);
@@ -2536,7 +2513,7 @@ fs_visitor::nir_emit_intrinsic(const fs_builder &bld, nir_intrinsic_instr *instr
}
case nir_intrinsic_store_output_indirect:
has_indirect = true;
unreachable("Not allowed");
/* fallthrough */
case nir_intrinsic_store_output: {
fs_reg src = get_nir_src(instr->src[0]);
@@ -2544,8 +2521,6 @@ fs_visitor::nir_emit_intrinsic(const fs_builder &bld, nir_intrinsic_instr *instr
for (unsigned j = 0; j < instr->num_components; j++) {
fs_reg new_dest = offset(retype(nir_outputs, src.type), bld,
instr->const_index[0] + index);
if (has_indirect)
src.reladdr = new(mem_ctx) fs_reg(get_nir_src(instr->src[1]));
index++;
bld.MOV(new_dest, src);
src = offset(src, bld, 1);

View File

@@ -48,11 +48,10 @@ brw_upload_gs_pull_constants(struct brw_context *brw)
/* BRW_NEW_GS_PROG_DATA */
const struct brw_vue_prog_data *prog_data = &brw->gs.prog_data->base;
const bool dword_pitch = prog_data->dispatch_mode == DISPATCH_MODE_SIMD8;
/* _NEW_PROGRAM_CONSTANTS */
brw_upload_pull_constants(brw, BRW_NEW_GS_CONSTBUF, &gp->program.Base,
stage_state, &prog_data->base, dword_pitch);
stage_state, &prog_data->base);
}
const struct brw_tracked_state brw_gs_pull_constants = {
@@ -79,10 +78,9 @@ brw_upload_gs_ubo_surfaces(struct brw_context *brw)
/* BRW_NEW_GS_PROG_DATA */
struct brw_vue_prog_data *prog_data = &brw->gs.prog_data->base;
bool dword_pitch = prog_data->dispatch_mode == DISPATCH_MODE_SIMD8;
brw_upload_ubo_surfaces(brw, prog->_LinkedShaders[MESA_SHADER_GEOMETRY],
&brw->gs.base, &prog_data->base, dword_pitch);
&brw->gs.base, &prog_data->base);
}
const struct brw_tracked_state brw_gs_ubo_surfaces = {

View File

@@ -494,7 +494,6 @@ fast_clear_attachments(struct brw_context *brw,
struct rect fast_clear_rect)
{
assert(brw->gen >= 9);
struct gl_context *ctx = &brw->ctx;
brw_bind_rep_write_shader(brw, (float *) fast_clear_color);
@@ -511,7 +510,7 @@ fast_clear_attachments(struct brw_context *brw,
_mesa_meta_drawbuffers_from_bitfield(1 << index);
brw_draw_rectlist(ctx, &fast_clear_rect, MAX2(1, fb->MaxNumLayers));
brw_draw_rectlist(brw, &fast_clear_rect, MAX2(1, fb->MaxNumLayers));
/* Now set the mcs we cleared to INTEL_FAST_CLEAR_STATE_CLEAR so we'll
* resolve them eventually.

View File

@@ -357,8 +357,7 @@ brw_upload_pull_constants(struct brw_context *brw,
GLbitfield64 brw_new_constbuf,
const struct gl_program *prog,
struct brw_stage_state *stage_state,
const struct brw_stage_prog_data *prog_data,
bool dword_pitch);
const struct brw_stage_prog_data *prog_data);
/* gen7_vs_state.c */
void

View File

@@ -901,8 +901,21 @@ generate_pull_constant_load(struct brw_codegen *p,
gen6_resolve_implied_move(p, &header, inst->base_mrf);
brw_MOV(p, retype(brw_message_reg(inst->base_mrf + 1), BRW_REGISTER_TYPE_D),
offset);
if (devinfo->gen >= 6) {
if (offset.file == BRW_IMMEDIATE_VALUE) {
brw_MOV(p, retype(brw_message_reg(inst->base_mrf + 1),
BRW_REGISTER_TYPE_D),
brw_imm_d(offset.ud >> 4));
} else {
brw_SHR(p, retype(brw_message_reg(inst->base_mrf + 1),
BRW_REGISTER_TYPE_D),
offset, brw_imm_d(4));
}
} else {
brw_MOV(p, retype(brw_message_reg(inst->base_mrf + 1),
BRW_REGISTER_TYPE_D),
offset);
}
uint32_t msg_type;

View File

@@ -787,11 +787,9 @@ vec4_visitor::nir_emit_intrinsic(nir_intrinsic_instr *instr)
src_reg offset;
if (!has_indirect) {
offset = brw_imm_ud(const_offset / 16);
offset = brw_imm_ud(const_offset & ~15);
} else {
offset = src_reg(this, glsl_type::uint_type);
emit(SHR(dst_reg(offset), get_nir_src(instr->src[1], nir_type_int, 1),
brw_imm_ud(4u)));
offset = get_nir_src(instr->src[1], nir_type_int, 1);
}
src_reg packed_consts = src_reg(this, glsl_type::vec4_type);

View File

@@ -1550,23 +1550,16 @@ vec4_visitor::get_pull_constant_offset(bblock_t * block, vec4_instruction *inst,
emit_before(block, inst, ADD(dst_reg(index), *reladdr,
brw_imm_d(reg_offset)));
/* Pre-gen6, the message header uses byte offsets instead of vec4
* (16-byte) offset units.
*/
if (devinfo->gen < 6) {
emit_before(block, inst, MUL(dst_reg(index), index, brw_imm_d(16)));
}
emit_before(block, inst, MUL(dst_reg(index), index, brw_imm_d(16)));
return index;
} else if (devinfo->gen >= 8) {
/* Store the offset in a GRF so we can send-from-GRF. */
src_reg offset = src_reg(this, glsl_type::int_type);
emit_before(block, inst, MOV(dst_reg(offset), brw_imm_d(reg_offset)));
emit_before(block, inst, MOV(dst_reg(offset), brw_imm_d(reg_offset * 16)));
return offset;
} else {
int message_header_scale = devinfo->gen < 6 ? 16 : 1;
return brw_imm_d(reg_offset * message_header_scale);
return brw_imm_d(reg_offset * 16);
}
}

View File

@@ -53,8 +53,7 @@ brw_upload_pull_constants(struct brw_context *brw,
GLbitfield64 brw_new_constbuf,
const struct gl_program *prog,
struct brw_stage_state *stage_state,
const struct brw_stage_prog_data *prog_data,
bool dword_pitch)
const struct brw_stage_prog_data *prog_data)
{
unsigned i;
uint32_t surf_index = prog_data->binding_table.pull_constants_start;
@@ -94,8 +93,7 @@ brw_upload_pull_constants(struct brw_context *brw,
}
brw_create_constant_surface(brw, const_bo, const_offset, size,
&stage_state->surf_offset[surf_index],
dword_pitch);
&stage_state->surf_offset[surf_index]);
drm_intel_bo_unreference(const_bo);
brw->ctx.NewDriverState |= brw_new_constbuf;
@@ -112,7 +110,6 @@ static void
brw_upload_vs_pull_constants(struct brw_context *brw)
{
struct brw_stage_state *stage_state = &brw->vs.base;
bool dword_pitch;
/* BRW_NEW_VERTEX_PROGRAM */
struct brw_vertex_program *vp =
@@ -121,11 +118,9 @@ brw_upload_vs_pull_constants(struct brw_context *brw)
/* BRW_NEW_VS_PROG_DATA */
const struct brw_stage_prog_data *prog_data = &brw->vs.prog_data->base.base;
dword_pitch = brw->vs.prog_data->base.dispatch_mode == DISPATCH_MODE_SIMD8;
/* _NEW_PROGRAM_CONSTANTS */
brw_upload_pull_constants(brw, BRW_NEW_VS_CONSTBUF, &vp->program.Base,
stage_state, prog_data, dword_pitch);
stage_state, prog_data);
}
const struct brw_tracked_state brw_vs_pull_constants = {
@@ -145,16 +140,13 @@ brw_upload_vs_ubo_surfaces(struct brw_context *brw)
/* _NEW_PROGRAM */
struct gl_shader_program *prog =
ctx->_Shader->CurrentProgram[MESA_SHADER_VERTEX];
bool dword_pitch;
if (!prog)
return;
/* BRW_NEW_VS_PROG_DATA */
dword_pitch = brw->vs.prog_data->base.dispatch_mode == DISPATCH_MODE_SIMD8;
brw_upload_ubo_surfaces(brw, prog->_LinkedShaders[MESA_SHADER_VERTEX],
&brw->vs.base, &brw->vs.prog_data->base.base,
dword_pitch);
&brw->vs.base, &brw->vs.prog_data->base.base);
}
const struct brw_tracked_state brw_vs_ubo_surfaces = {

View File

@@ -400,15 +400,11 @@ brw_create_constant_surface(struct brw_context *brw,
drm_intel_bo *bo,
uint32_t offset,
uint32_t size,
uint32_t *out_offset,
bool dword_pitch)
uint32_t *out_offset)
{
uint32_t stride = dword_pitch ? 4 : 16;
uint32_t elements = ALIGN(size, stride) / stride;
brw->vtbl.emit_buffer_surface_state(brw, out_offset, bo, offset,
BRW_SURFACEFORMAT_R32G32B32A32_FLOAT,
elements, stride, false);
size, 1, false);
}
/**
@@ -421,8 +417,7 @@ brw_create_buffer_surface(struct brw_context *brw,
drm_intel_bo *bo,
uint32_t offset,
uint32_t size,
uint32_t *out_offset,
bool dword_pitch)
uint32_t *out_offset)
{
/* Use a raw surface so we can reuse existing untyped read/write/atomic
* messages. We need these specifically for the fragment shader since they
@@ -537,7 +532,7 @@ brw_upload_wm_pull_constants(struct brw_context *brw)
/* _NEW_PROGRAM_CONSTANTS */
brw_upload_pull_constants(brw, BRW_NEW_SURFACES, &fp->program.Base,
stage_state, prog_data, true);
stage_state, prog_data);
}
const struct brw_tracked_state brw_wm_pull_constants = {
@@ -918,8 +913,7 @@ void
brw_upload_ubo_surfaces(struct brw_context *brw,
struct gl_shader *shader,
struct brw_stage_state *stage_state,
struct brw_stage_prog_data *prog_data,
bool dword_pitch)
struct brw_stage_prog_data *prog_data)
{
struct gl_context *ctx = &brw->ctx;
@@ -944,8 +938,7 @@ brw_upload_ubo_surfaces(struct brw_context *brw,
binding->BufferObject->Size - binding->Offset);
brw_create_constant_surface(brw, bo, binding->Offset,
binding->BufferObject->Size - binding->Offset,
&ubo_surf_offsets[i],
dword_pitch);
&ubo_surf_offsets[i]);
}
}
@@ -967,8 +960,7 @@ brw_upload_ubo_surfaces(struct brw_context *brw,
binding->BufferObject->Size - binding->Offset);
brw_create_buffer_surface(brw, bo, binding->Offset,
binding->BufferObject->Size - binding->Offset,
&ssbo_surf_offsets[i],
dword_pitch);
&ssbo_surf_offsets[i]);
}
}
@@ -988,7 +980,7 @@ brw_upload_wm_ubo_surfaces(struct brw_context *brw)
/* BRW_NEW_FS_PROG_DATA */
brw_upload_ubo_surfaces(brw, prog->_LinkedShaders[MESA_SHADER_FRAGMENT],
&brw->wm.base, &brw->wm.prog_data->base, true);
&brw->wm.base, &brw->wm.prog_data->base);
}
const struct brw_tracked_state brw_wm_ubo_surfaces = {
@@ -1014,7 +1006,7 @@ brw_upload_cs_ubo_surfaces(struct brw_context *brw)
/* BRW_NEW_CS_PROG_DATA */
brw_upload_ubo_surfaces(brw, prog->_LinkedShaders[MESA_SHADER_COMPUTE],
&brw->cs.base, &brw->cs.prog_data->base, true);
&brw->cs.base, &brw->cs.prog_data->base);
}
const struct brw_tracked_state brw_cs_ubo_surfaces = {

View File

@@ -304,7 +304,7 @@ brw_upload_cs_pull_constants(struct brw_context *brw)
/* _NEW_PROGRAM_CONSTANTS */
brw_upload_pull_constants(brw, BRW_NEW_SURFACES, &cp->program.Base,
stage_state, prog_data, true);
stage_state, prog_data);
}
const struct brw_tracked_state brw_cs_pull_constants = {

View File

@@ -898,6 +898,21 @@ _mesa_validate_program_pipeline(struct gl_context* ctx,
if (!_mesa_sampler_uniforms_pipeline_are_valid(pipe))
goto err;
/* Validate inputs against outputs, this cannot be done during linking
* since programs have been linked separately from each other.
*
* From OpenGL 4.5 Core spec:
* "Separable program objects may have validation failures that cannot be
* detected without the complete program pipeline. Mismatched interfaces,
* improper usage of program objects together, and the same
* state-dependent failures can result in validation errors for such
* program objects."
*
* OpenGL ES 3.1 specification has the same text.
*/
if (!_mesa_validate_pipeline_io(pipe))
goto err;
pipe->Validated = GL_TRUE;
return GL_TRUE;
@@ -928,23 +943,11 @@ _mesa_ValidateProgramPipeline(GLuint pipeline)
return;
}
_mesa_validate_program_pipeline(ctx, pipe,
(ctx->_Shader->Name == pipe->Name));
/* Validate inputs against outputs, this cannot be done during linking
* since programs have been linked separately from each other.
*
* From OpenGL 4.5 Core spec:
* "Separable program objects may have validation failures that cannot be
* detected without the complete program pipeline. Mismatched interfaces,
* improper usage of program objects together, and the same
* state-dependent failures can result in validation errors for such
* program objects."
*
* OpenGL ES 3.1 specification has the same text.
/* ValidateProgramPipeline should not throw errors when pipeline validation
* fails and should instead only update the validation status. We pass
* false for IsBound to avoid an error being thrown.
*/
if (!_mesa_validate_pipeline_io(pipe))
pipe->Validated = GL_FALSE;
_mesa_validate_program_pipeline(ctx, pipe, false);
}
void GLAPIENTRY

View File

@@ -758,6 +758,10 @@ _mesa_uniform(struct gl_context *ctx, struct gl_shader_program *shProg,
return;
}
}
/* We need to reset the validate flag on changes to samplers in case
* two different sampler types are set to the same texture unit.
*/
ctx->_Shader->Validated = GL_FALSE;
}
if (uni->type->is_image()) {