Compare commits

...

155 Commits

Author SHA1 Message Date
Emil Velikov
5a616125ac docs: Update 11.1.0 release notes
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2015-12-15 14:49:25 +00:00
Emil Velikov
a8b2698494 Update version to 11.1.0(final)
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2015-12-14 12:20:18 +00:00
Francisco Jerez
7753691f1a i965: Resolve color and flush for all active shader images in intel_update_state().
Fixes arb_shader_image_load_store/execution/load-from-cleared-image.shader_test.

Couldn't reproduce any significant FPS regression in CPU-bound
benchmarks from the Finnish benchmarking system on neither VLV nor BSW
after 30 runs with 95% confidence level.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92849
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jason Ekstrand <jason.ekstrand@intel.com>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Tested-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
(cherry picked from commit 595c818071)
2015-12-12 19:39:03 +00:00
Dave Airlie
ce914d941d radeonsi: handle loading doubles as geometry shader inputs.
This adds the double code to the geometry shader input handling.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit e307cfa7d9)
2015-12-12 19:39:03 +00:00
Dave Airlie
300f807649 radeonsi: handle doubles in lds load path.
This handles loading doubles from LDS properly.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Cc: "11.0 11.1" <mesa-stable@lists.fedoraproject.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 8c9e40ac22)
2015-12-12 19:39:03 +00:00
Dave Airlie
61a275b789 r600: handle geometry dynamic input array index
This fixes:
glsl-1.50/execution/geometry/dynamic_input_array_index.shader_test
my profanity.

We need to load the AR register with the value from the index reg

Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit cce3864046)
2015-12-12 19:39:03 +00:00
Dave Airlie
0f3892ed9d r600g: fix geom shader input indirect indexing.
This fixes:
gs-input-array-vec4-index-rd

The others run out of gprs unfortunately.

Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 38542921c7)
2015-12-12 19:39:03 +00:00
Dave Airlie
3d942ee4e5 r600/shader: add utility functions to do single slot arithmatic
These utilities are to be used to do things like integer adds and
multiplies to be used in calculating the LDS offsets etc.

It handles CAYMAN MULLO differences as well.

Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 0696ebc899)
2015-12-12 19:39:03 +00:00
Dave Airlie
efdf841238 r600/shader: split address get out to a function.
This will be used in the tess shaders.

Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 4d64459a92)
2015-12-12 19:39:02 +00:00
Dave Airlie
5913a8c9ec r600g: fix outputing to non-0 buffers for stream 0.
This fixes:
arb_transform_feedback3-ext_interleaved_two_bufs_gs
arb_transform_feedback3-ext_interleaved_two_bufs_gs_max
transform-feedback-builtins

If we are only emitting one ring, then emit all output
buffers on it.

Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit e97ac006d7)
[Emil Velikov: squash trivial conflicts]
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>

Conflicts:
	src/gallium/drivers/r600/r600_shader.c
2015-12-12 19:39:02 +00:00
Ilia Mirkin
3c9e76fc24 nv50/ir: fix cutoff for using r63 vs r127 when replacing zero
The only effect here is a space savings - 822 programs in shader-db
affected with the following overall change:

total bytes used in shared programs   : 44154976 -> 44139880 (-0.03%)

Fixes: 641eda0c (nv50/ir: r63 is only 0 if we are using less than 63 registers)
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit f920f8eb02)
2015-12-12 19:39:02 +00:00
Matt Turner
67b1e7b947 glsl: Relax qualifier ordering restriction in ES 3.1.
... and allow the "binding" qualifier in ES 3.1 as well.

GLSL ES 3.1 incorporates only a few features from the extension
ARB_shading_language_420pack: the relaxed qualifier ordering
requirements and the binding qualifier.

Cc: "11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit eca846e7ae)
2015-12-12 19:39:02 +00:00
Matt Turner
0586c5844f glsl: Use has_420pack().
These features would not have been enabled with #version 420 otherwise.

Cc: "11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
(cherry picked from commit 79da7220db)
2015-12-12 19:39:02 +00:00
Matt Turner
7d226ee279 glsl: Allow binding of image variables with 420pack.
This interaction was missed in the addition of ARB_image_load_store.

Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93266
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
(cherry picked from commit c200e606f7)
2015-12-12 19:39:02 +00:00
Jason Ekstrand
36ff210d0e i965/nir: Remove unused indirect handling
The one and only place where the FS backend allows reladdr is on uniforms.
For locals, inputs, and outputs, we lower it away before the backend ever
sees it.  This commit gets rid of the dead indirect handling code.

Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 22c273de2b)
2015-12-12 19:39:02 +00:00
Jason Ekstrand
017f4755fd i965/state: Get rid of dword_pitch arguments to buffer functions
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit abb569ca18)
2015-12-12 19:39:02 +00:00
Jason Ekstrand
61cb4db868 i965/vec4: Use a stride of 1 and byte offsets for UBOs
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92909
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 05bdc21f84)
2015-12-12 19:39:02 +00:00
Jason Ekstrand
34785fb7b9 i965/fs: Use a stride of 1 and byte offsets for UBOs
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 13ad8d03f2)
2015-12-12 19:39:02 +00:00
Jason Ekstrand
22d6bf5078 i965/vec4: Use byte offsets for UBO pulls on Sandy Bridge
Previously, the VS_OPCODE_PULL_CONSTANT_LOAD opcode operated on
vec4-aligned byte offsets on Iron Lake and below and worked in terms of
vec4 offsets on Sandy Bridge.  On Ivy Bridge, we add a new *LOAD_GEN7
variant which works in terms of vec4s.  We're about to change the GEN7
version to work in terms of bytes, so this is a nice unification.

Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit e3e70698c3)
2015-12-12 19:39:02 +00:00
Nicolai Hähnle
9908d19699 radeonsi: last_gfx_fence is a winsys fence
Cc: "11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit d5a5dbd71f)
2015-12-12 19:39:02 +00:00
Ilia Mirkin
a500109aad gk110/ir: fix imad sat/hi flag emission for immediate args
According to nvdisasm both the immediate and non-imm cases use the same
bits. Both of these flags are quite rarely set though.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 1d708aacb7)
2015-12-12 19:39:01 +00:00
Ilia Mirkin
0e78a67709 gk104/ir: sampler doesn't matter for txf
We actually leave the sampler unset for OP_TXF, which caused the GK104+
logic to treat some texel fetches as indirect. While this works, it's
incredibly wasteful. This only happened when the texture was > 0 (since
sampler remained == 0).

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 63b850403c)
2015-12-12 19:39:01 +00:00
Marek Olšák
4bb16d712a radeonsi: disable DCC on Stoney
Cc: 11.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 32f05fadbb)
2015-12-12 19:39:01 +00:00
Christian König
950e9886d0 st/va: disable MPEG4 by default v2
The workarounds are too hacky to enable them by default
and otherwise MPEG4 doesn't work reliably.

v2: add docs/envvars.html, CC stable and fix typos

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com> (v1)
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> (v1)
Cc: "11.1.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit a2c5200a4b)
2015-12-12 19:39:01 +00:00
Ilia Mirkin
dff89432d8 gk110/ir: fix imul hi emission with limm arg
The elemental demo hits this case.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit db072d2086)
2015-12-12 19:39:01 +00:00
Timothy Arceri
499d409a20 mesa: move pipeline input/output validation inside _mesa_validate_program_pipeline()
This allows validation to be done on rendering calls also.

Fixes 3 dEQP-GLES31.functional.separate tests.

Cc: "11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 4dd096d741)
2015-12-12 19:39:01 +00:00
Timothy Arceri
a16f5195ef glsl: don't generate extra errors in ValidateProgramPipeline
From Section 11.1.3.11 (Validation) of the GLES 3.1 spec:

   "An INVALID_OPERATION error is generated by any command that trans-
   fers vertices to the GL or launches compute work if the current set
   of active program objects cannot be executed, for reasons including:"

It then goes on to list the rules we validate in the
_mesa_validate_program_pipeline() function.

For ValidateProgramPipeline the only mention of generating an error is:

   "An INVALID_OPERATION error is generated if pipeline is not a name re-
   turned from a previous call to GenProgramPipelines or if such a name has
   since been deleted by DeleteProgramPipelines,"

Which we handle separately.

This fixes:
ES31-CTS.sepshaderobjs.PipelineApi

No regressions on the eEQP 3.1 tests.

Cc: Gregory Hainaut <gregory.hainaut@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
(cherry picked from commit c3ec12ec3c)
Nominated-by: Emil Velikov <emil.velikov@collabora.com>
2015-12-12 19:39:01 +00:00
Timothy Arceri
f65b790089 glsl: re-validate program pipeline after sampler change
Cc: "11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
https://bugs.freedesktop.org/show_bug.cgi?id=93180
(cherry picked from commit da1a01361b)
2015-12-12 19:39:01 +00:00
Gregory Hainaut
aa19234943 glsl: don't sort varying in separate shader mode
This fixes an issue where the addition of the FLAT qualifier in
varying_matches::record() can break the expected varying order.

It also avoids a future issue with the relaxing of interpolation
qualifier matching constraints in GLSL 4.50.

V2: (by Timothy Arceri)
* reworked comment slightly

Signed-off-by: Gregory Hainaut <gregory.hainaut@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
(cherry picked from commit 2ab9cd0c4d)
Nominated-by: Timothy Arceri <timothy.arceri@collabora.com>
2015-12-12 19:39:01 +00:00
Gregory Hainaut
66f216d8ce glsl: don't dead code remove SSO varyings marked as active
GL_ARB_separate_shader_objects allow matching by name variable or block
interface. Input varyings can't be removed because it is will impact the
location assignment.

This fixes the bug 79783 and likely any application that uses
GL_ARB_separate_shader_objects extension.

V2 (by Timothy Arceri):
* simplify now that builtins are not set as always active

Signed-off-by: Gregory Hainaut <gregory.hainaut@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
https://bugs.freedesktop.org/show_bug.cgi?id=79783
(cherry picked from commit 8117f46f49)
Nominated-by: Timothy Arceri <timothy.arceri@collabora.com>
2015-12-12 19:39:01 +00:00
Gregory Hainaut
4d34038ae5 glsl: add always_active_io attribute to ir_variable
The value will be set in separate-shader program when an input/output
must remains active. e.g. when deadcode removal isn't allowed because
it will create interface location/name-matching mismatch.

v3:
* Rename the attribute
* Use ir_variable directly instead of ir_variable_refcount_visitor
* Move the foreach IR code in the linker file

v4:
* Fix variable name in assert

v5 (by Timothy Arceri):
* Rename functions and reword comments
* Don't set always active on builtins

Signed-off-by: Gregory Hainaut <gregory.hainaut@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
(cherry picked from commit 618612f867)
Nominated-by: Timothy Arceri <timothy.arceri@collabora.com>
2015-12-12 19:39:01 +00:00
Timothy Arceri
781a68555d glsl: copy how_declared when lowering interface blocks
Cc: Gregory Hainaut <gregory.hainaut@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
(cherry picked from commit 76c09c1792)
2015-12-12 19:39:01 +00:00
Marek Olšák
e0b11bcc87 radeonsi: fix occlusion queries on Fiji
Tested.

(cherry picked from commit bfc14796b0)
2015-12-12 19:39:01 +00:00
Matt Turner
359679cb33 i965: Pass brw_context pointer, not gl_context pointer.
Fixes a warning introduced by commit dcadd855.

(cherry picked from commit f1b7fefd4e)
2015-12-12 19:39:00 +00:00
Marta Lofstedt
fcf6091521 gles2: Update gl2ext.h to revision: 32120
This is needed to be able to implement the accepted OES
extensions.

Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Marta Lofstedt <marta.lofstedt@linux.intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
(cherry picked from commit 1d5b88e33b)
2015-12-12 19:38:39 +00:00
Emil Velikov
aa5082b135 Revert "cherry-ignore: ignore unneeded header update"
This reverts commit 79f3aaca4f.

The commit (header update) was not needed for the 11.0 branch as opposed
to this one (11.1)
2015-12-12 19:38:39 +00:00
Eric Anholt
1df00e17d3 vc4: When doing algebraic optimization into a MOV, use the right MOV.
If there were src unpacks, changing to the integer MOV instead of float
(for example) would change the unpack operation.

(cherry picked from commit e3efc4b023)
2015-12-11 17:04:11 -08:00
Eric Anholt
ad3df9d168 vc4: Fix handling of src packs on in qir_follow_movs().
The caller isn't going to expect it from a return, so it would probably
get misinterpreted.  If the caller had an unpack in its reg, that's fine,
but don't lose track of it.

(cherry picked from commit 2591beef89)
2015-12-11 17:04:08 -08:00
Eric Anholt
e4cf550501 vc4: Add missing progress note in opt_algebraic.
(cherry picked from commit b70a2f4d81)
2015-12-11 17:04:00 -08:00
Eric Anholt
ecf2885d7f vc4: Fix handling of sample_mask output.
I apparently broke this in a late refactor, in such a way that I decided
its tests were some of those interminable ones that I should just
blacklist from my testing.  As a result, the refactors related to it were
totally wrong.

(cherry picked from commit 53b2523c6e)
2015-12-11 17:03:51 -08:00
Eric Anholt
fc59ca4064 vc4: Enable MSAA.
We still have several failures in the newly enabled tests in simulation:
sRGB downsampling is done as if it was just linear, stencil blits are not
supported on MSAA either, and derivatives are still not supported
(breaking some MSAA simulation shaders).  So, other than sRGB downsampling
quality, things seem to be in good shape.

(cherry picked from commit f61ceeb3fd)
2015-12-11 17:03:44 -08:00
Eric Anholt
396fbdc721 vc4: Add support for mapping of MSAA resources.
The pipe_transfer_map API requires that we do an implicit
downsample/upsample and return a mapping of that.

(cherry picked from commit fc4a1bfb88)
2015-12-11 17:03:40 -08:00
Eric Anholt
50ac2100df vc4: Add support for texel fetches from MSAA resources.
This is the core of ARB_texture_multisample.  Most of the piglit tests for
GL_ARB_texture_multisample require GL 3.0, but exposing support for this
lets us use the gallium blitter for multisample resolves.  We can
sometimes multisample resolve using just the RCL, but that requires that
the blit is 1:1, unflipped, and aligned to tile boundaries.

(cherry picked from commit 6b4dfd53ae)
2015-12-11 17:03:36 -08:00
Eric Anholt
08cf0f8529 vc4: Add support for multisample framebuffer operations.
This includes GL_SAMPLE_COVERAGE, GL_SAMPLE_ALPHA_TO_ONE, and
GL_SAMPLE_ALPHA_TO_COVAGE.

I haven't implemented a dithering function yet, and gallium doesn't give
me a good chance to do so for GL_SAMPLE_COVERAGE.

(cherry picked from commit a97b40dca4)
2015-12-11 17:03:31 -08:00
Eric Anholt
ba51596b1d vc4: Add a workaround for HW-2905, and additional failure I saw with MSAA.
I only stumbled on this while experimenting due to reading about HW-2905.
I don't know if the EZ disable in the Z-clear is actually necessary, but
go with it for now.

(cherry picked from commit edc3305de7)
2015-12-11 17:03:03 -08:00
Eric Anholt
3d13bb8851 vc4: Add support for drawing in MSAA.
(cherry picked from commit edfd4d853a)
2015-12-11 17:03:03 -08:00
Eric Anholt
3bf2c6b96a vc4: Add kernel RCL support for MSAA rendering.
(cherry picked from commit e7c8ad0a6c)
2015-12-11 17:03:03 -08:00
Eric Anholt
5ab1bb4bec vc4: Rename color_ms_write to color_write.
I was thinking this was the only MSAA resolve thing, so it should be noted
separately, but actually load/store general also do MSAA resolve.

(cherry picked from commit 568d3a8e32)
2015-12-11 17:03:03 -08:00
Eric Anholt
c5ca18ec2f vc4: Allow RCL blits to the edge of the surface.
The recent unaligned fix successfully prevented RCL blits that weren't
aligned inside of the surface, but we also want to be able to do RCL blits
for the whole surface when the width or height of the surface aren't
aligned (we don't care what renders inside of the padding).

(cherry picked from commit bf92017ace)
2015-12-11 17:03:03 -08:00
Eric Anholt
f6cca7a0c9 vc4: Fix check for tile RCL blits with mismatched y.
This was a typo in 3a508a0d94 that didn't
show up in testcases at that moment.

(cherry picked from commit 2792d118f1)
2015-12-11 17:03:03 -08:00
Eric Anholt
ae649bf1ad vc4: Fix compiler warning from size_t change.
I missed this when bringing over the kernel changes.

(cherry picked from commit 1529f138ff)
2015-12-11 17:03:03 -08:00
Eric Anholt
132303cfe4 vc4: Fix accidental scissoring when scissor is disabled.
Even if the rasterizer has scissor disabled, we'll have whatever
vc4->scissor bounds were last set when someone set up a scissor, so we
shouldn't clip to them in that case.

Fixes piglit fbo-blit-rect, and a lot of MSAA tests once they're enabled.

(cherry picked from commit a4eff86f4a)
2015-12-11 17:03:03 -08:00
Eric Anholt
9df2431194 vc4: Disable RCL blitting when scissors are enabled.
We could potentially handle scissored blits when they're tile aligned, but
it doesn't seem worth it.  If you're doing a scissored blit, you're
probably a testcase.

Fixes piglit's fbo-scissor-blit fbo

(cherry picked from commit d16d666776)
2015-12-11 17:03:03 -08:00
Eric Anholt
dd409e2a41 vc4: Bring over cleanups from submitting to the kernel.
(cherry picked from commit 0afe83078d)
2015-12-11 17:03:03 -08:00
Eric Anholt
38c770ec29 vc4: Add debug dumping of MSAA surfaces.
(cherry picked from commit a69ac4e89c)
2015-12-11 17:03:03 -08:00
Eric Anholt
d8450616d9 vc4: Add support for laying out MSAA resources.
For MSAA, we store full resolution tile buffer contents, which have their
own tiling format.  Since they're full resolution buffers, we have to
align their size to full tiles.

(cherry picked from commit 3c3b1184eb)
2015-12-11 17:03:02 -08:00
Eric Anholt
c9fe9e4b42 vc4: Add support for storing sample mask.
From the API perspective, writing 1 bits can't turn on pixels that were
off, so we AND it with the sample mask from the payload.

(cherry picked from commit 74c4b3b80c)
2015-12-11 17:03:02 -08:00
Eric Anholt
693e938321 vc4: Fix up tile alignment checks for blitting using just an RCL.
We were checking that the blit started at 0 and was 1:1, but not that it
went to the full width of the surface, or that the width was aligned to a
tile.  We then told it to blit to the full width/height of the surface,
causing contents to be stomped in a bunch of MSAA tests that happen to
include half-screen-width blits to 0,0.

(cherry picked from commit 3a508a0d94)
2015-12-11 17:03:02 -08:00
Eric Anholt
7a0661839b vc4: Add support for loading sample mask.
(cherry picked from commit a664233042)
2015-12-11 17:03:02 -08:00
Eric Anholt
4c234d183b vc4: Use nir_channel() to simplify all of our nir_swizzle() cases.
(cherry picked from commit 4cff16bc3a)
2015-12-11 17:03:02 -08:00
Eric Anholt
b37189523e vc4: Fix point size lookup.
I think I may have regressed this in the NIR conversion.  TGSI-to-NIR is
putting the PSIZ in the .x channel, not .w, so we were grabbing some
garbage for point size, which ended up meaning just not drawing points.

Fixes glean pointAtten and pointsprite.

(cherry picked from commit 81544f231a)
2015-12-11 16:57:39 -08:00
Emil Velikov
20db46c227 Update version to 11.1.0-rc3
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2015-12-07 13:50:15 +00:00
Michel Dänzer
b2a5efb56f radeon/llvm: Use llvm.AMDIL.exp intrinsic again for now
llvm.exp2.f32 doesn't work in some cases yet.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92709

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
(cherry picked from commit d094631936)
2015-12-04 16:37:19 +00:00
Connor Abbott
38c645b60a i965: fix 64-bit immediates in brw_inst(_set)_bits
If we tried to get/set something that was exactly 64 bits, we would
try to do (1 << 64) - 1 to calculate the mask which doesn't give us all
1's like we want.

v2 (Iago)
 - Replace ~0 by ~0ull
 - Removed unnecessary parenthesis

v3 (Kristian)
 - Avoid the conditional

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
(cherry picked from commit b1a83b5d1b)

Squashed with commit

i965: Use ull immediates in brw_inst_bits

This fixes a regression introduced in b1a83b5d1 that caused basically all
shaders to fail to compile on 32-bit platforms.

Reported-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit 9d703de85a)
Nominated-by: Ian Romanick <ian.d.romanick@intel.com>
2015-12-04 16:37:07 +00:00
Emil Velikov
2dff4c6fa7 mesa: rework the meaning of gl_debug_message::length
Currently it stores strlen(buf) whenever the user originally provided a
negative value for length.

Although I've not seen any explicit text in the spec, CTS requires that
the very same length (be that negative value or not) is returned back on
Pop.

So let's push down the length < 0 checks, tweak the meaning of
gl_debug_message::length and fix GetDebugMessageLog to add and count the
null terminators, as required by the spec.

v2: return correct total length in GetDebugMessageLog
v3: rebase (drop _mesa_shader_debug hunk).

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
(cherry picked from commit 5a23f6bd8d)
2015-12-04 16:37:07 +00:00
Emil Velikov
d81ddb3ed8 mesa: errors: validate the length of null terminated string
We're about to rework the meaning of gl_debug_message::length to only
store the user provided data. Thus we should add an explicit validation
for null terminated strings.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
(cherry picked from commit 622186fbdf)
2015-12-04 16:37:07 +00:00
Emil Velikov
c25c1dbf51 mesa: accept TYPE_PUSH/POP_GROUP with glDebugMessageInsert
These new (relative to ARB_debug_output) tokens, have been explicitly
separated from the existing ones in the spec text. With the reference
to glDebugMessageInsert was dropped.

At the same time, further down the spec says:
   "The value of <type> must be one of the values from Table 5.4"

... and these two are listed in Table 5.4.

The GL 4.3 and GLES 3.2 do not give any hints on the former
'definition', plus CTS requires that the tokens are valid values for
glDebugMessageInsert.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
(cherry picked from commit 66fea8bd96)
2015-12-04 16:37:07 +00:00
Emil Velikov
bed982c4b7 mesa: add SEVERITY_NOTIFICATION to default state
As per the spec quote:

    "All messages are initially enabled unless their assigned severity
    is DEBUG_SEVERITY_LOW"

We already had MEDIUM and HIGH set, let's toggle NOTIFICATION as well.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
(cherry picked from commit 53be28107b)
2015-12-04 16:37:06 +00:00
Emil Velikov
dcaf3989d1 mesa: return the correct value for GroupStackDepth
We already have one group (the default) as specified in the spec. So
lets return its size, rather than the index of the current group.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
(cherry picked from commit 078dd6a0b4)
2015-12-04 16:37:06 +00:00
Emil Velikov
996a4958da mesa: rename GroupStackDepth to CurrentGroup
The variable is used as the actual index, rather than the size of the
group stack - rename it to reflect that.

Suggested-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
(cherry picked from commit f39954bf7c)
2015-12-04 16:37:06 +00:00
Emil Velikov
0cf5a8159f mesa: do not enable KHR_debug for ES 1.0
The extension requires (cough implements) GetPointervKHR (alias of
GetPointerv) which in itself is available for ES 1.1 enabled mesa.

Anyone willing to fish around and implement it for ES 1.0 is more than
welcome to revert this commit. Until then lets restrict things.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93048
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
(cherry picked from commit 1ca735701b)
2015-12-04 16:37:06 +00:00
Emil Velikov
6cc9a53d84 glapi: add GetPointervKHR to the ES dispatch
The KHR_debug extension implements this.

Strictly speaking it could be used with ES 1.0, although as the original
function is available on ES 1.1, I'm inclined to lift the KHR_debug
requirement to ES 1.1.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93048
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
(cherry picked from commit f53f9eb8d4)

Squashed with commit

mesa/tests: add KHR_debug GLES glGetPointervKHR entry points

Should have been part of commit f53f9eb8d4 "glapi: add GetPointervKHR
to the ES dispatch".

v2: comment out the ES1.1 symbol and use the same description (pattern)
as elsewhere (Matt)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93235
Fixes: f53f9eb8d4 "glapi: add GetPointervKHR to the ES dispatch".
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Tested-by: Vinson Lee <vlee@freedesktop.org> (v1)
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
(cherry picked from commit 1074e38fbb)
2015-12-04 16:36:45 +00:00
Emil Velikov
0a51e77fa1 mesa: remove len argument from _mesa_shader_debug()
There was only a single user which was using strlen(buf).
As this function is not user facing (i.e. we don't need to feed back
original length via a callback), we can simplify things.

Suggested-by: Timothy Arceri <timothy.arceri@collabora.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
(cherry picked from commit d37ebed470)
2015-12-04 16:36:45 +00:00
Ilia Mirkin
ca6d0a3dbe nv50/ir: avoid looking at uninitialized srcMods entries
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 2b98914fe0)
2015-12-04 16:36:45 +00:00
Ilia Mirkin
4ae9142f8b nv50/ir: fix DCE to not generate 96-bit loads
A situation where there's a 128-bit load where the last component gets
DCE'd causes a 96-bit load to be generated, which no GPU can actually
emit. Avoid generating such instructions by scaling back to 64-bit on
the first load when splitting.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 49692f86a1)
2015-12-04 16:36:45 +00:00
Marek Olšák
aff9f8a6f7 radeonsi: fix Fiji for LLVM <= 3.7
Cc: 11.0 11.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit dd27825c8c)
2015-12-04 16:36:45 +00:00
Nanley Chery
b0b163c82a mesa/version: Update gl_extensions::Version during version override
Commit a16ffb743c, which introduced
gl_extensions::Version, updates the field when the context version
is computed and when entering/exiting meta. Update this field when
the version is overridden as well.

Cc: "11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
(cherry picked from commit 808e752796)
2015-12-04 16:36:45 +00:00
Tapani Pälli
f70574c835 i965: use _Shader to get fragment program when updating surface state
Atomic counters and Images were using ctx::Shader that does not take in
to account program pipeline changes, ctx::_Shader must be used for SSO to
work. Commit c0347705 already changed ubo's to use this.

Fixes failures seen with following Piglit test:
	arb_separate_shader_object-atomic-counter

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 231db5869c)
2015-12-04 16:36:45 +00:00
Ilia Mirkin
26dff8a7bb nv50/ir: don't forget to mark flagsDef on cvt in txb lowering
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 101e315cc1)
2015-12-04 16:36:45 +00:00
Ilia Mirkin
ea21336d15 nv50/ir: fix instruction permutation logic
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 06055121e6)
2015-12-04 16:36:44 +00:00
Ilia Mirkin
7f6e9c5f59 nv50/ir: the mad source might not have a defining instruction
For example if it's $r63 (aka 0), there won't be a definition.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 11fcf46590)
2015-12-04 16:36:44 +00:00
Ilia Mirkin
0828391a34 nv50/ir: deal with loops with no breaks
For example if there are only returns, the break bb will not end up part
of the CFG. However there will have been a prebreak already emitted for
it, and when hitting the RET that comes after, we will try to insert the
current (i.e. break) BB into the graph even though it will be
unreachable. This makes the SSA code sad.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit adcc547bfb)
2015-12-04 16:36:44 +00:00
Ilia Mirkin
75b6f14ab8 nvc0/ir: fold postfactor into immediate
SM20-SM50 can't emit a post-factor in the presence of a long immediate.
Make sure to fold it in.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit ff61ac4838)
2015-12-04 16:36:44 +00:00
Roland Scheidegger
69df6ac272 mesa: fix VIEWPORT_INDEX_PROVOKING_VERTEX and LAYER_PROVOKING_VERTEX queries
These are implementation-dependent queries, but so far we just returned the
value of whatever the current provoking vertex convention was set to, which
was clearly wrong.
Just make this a variable in the context constants like for other things
which are implementation dependent (I assume all drivers will want to set
this to the same value for both queries), and set it to GL_UNDEFINED_VERTEX
which is correct for everybody (and drivers can override it).

Reviewed-by: Brian Paul <brianp@vmware.com>
CC: <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 09f74e6ef4)
2015-12-04 16:36:44 +00:00
Dave Airlie
0f53b2010c r600: SMX returns CONTEXT_DONE early workaround
streamout, gs rings bug on certain r600s, requires a wait idle
before each surface sync.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "10.6 11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit af4013d26b)
2015-12-04 16:36:44 +00:00
Dave Airlie
67be605b96 r600: do SQ flush ES ring rolling workaround
Need to insert a SQ_NON_EVENT when ever geometry
shaders are enabled.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "10.6 11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit b63944e8b9)
2015-12-04 16:36:44 +00:00
Tom Stellard
be20f1d7c1 clover: Handle NULL devices returned by pipe_loader_probe() v2
When probing for devices, clover will call pipe_loader_probe() twice.
The first time to retrieve the number of devices, and then second time
to retrieve the device structures.

We currently assume that the return value of both calls will be the
same, but this will not be the case if a device happens to disappear
between the two calls.

When a device disappears, the pipe_loader_probe() will add a NULL
device to the device list, so we need to handle this.

v2:
  - Keep range for loop

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Acked-by: Emil Velikov <emil.l.velikov@gmail.com>

CC: <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 9adbb9e713)
2015-12-04 16:36:44 +00:00
Jonathan Gray
15344c978b automake: fix some occurrences of hardcoded -ldl and -lpthread
Correct some occurrences of -ldl and -lpthread to use
$(DLOPEN_LIBS) and $(PTHREAD_LIBS) respectively.

Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit 99cd600835)
2015-12-04 16:36:44 +00:00
Dave Airlie
f1bb27acc5 r600: workaround empty geom shader.
We need to emit at least one cut/emit in every
geometry shader, the easiest workaround it to
stick a single CUT at the top of each geom shader.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "10.6 11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 4f34722575)
2015-12-04 16:36:44 +00:00
Dave Airlie
dd37db0c80 r600: rv670 use at least 16es/gs threads
This is specified in the docs for rv670 to work properly.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "10.6 11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 04efcc6c7a)
2015-12-04 16:36:43 +00:00
Dave Airlie
8e3fbb90a9 r600: geometry shader gsvs itemsize workaround
On some chips the GSVS itemsize needs to be aligned to a cacheline size.

This only applies to some of the r600 family chips.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "10.6 11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 8168dfdd4e)
2015-12-04 16:36:43 +00:00
Emil Velikov
79f3aaca4f cherry-ignore: ignore unneeded header update
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2015-12-04 16:36:35 +00:00
Julien Isorce
f9a2bd212a vl/buffers: fixes vl_video_buffer_formats for RGBX
Fixes: 42a5e143a8 "vl/buffers: add RGBX and BGRX to the supported formats"
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit 10c14919c8)
2015-12-04 16:33:10 +00:00
Emil Velikov
aefd6769e8 Update version to 11.1.0-rc2
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2015-11-30 00:13:23 +00:00
Neil Roberts
82a363b851 i965: Handle lum, intensity and missing components in the fast clear
It looks like the sampler hardware doesn't take into account the
surface format when sampling a cleared color after a fast clear has
been done. So for example if you clear a GL_RED surface to 1,1,1,1
then the sampling instructions will return 1,1,1,1 instead of 1,0,0,1.
This patch makes it override the color that is programmed in the
surface state in order to swizzle for luminance and intensity as well
as overriding the missing components.

Fixes the ext_framebuffer_multisample-fast-clear Piglit test.

v2: Handle luminance and intensity formats
Reviewed-by: Ben Widawsky <benjamin.widawsky@intel.com>
(cherry picked from commit 2010de4015)
2015-11-30 00:13:23 +00:00
Nanley Chery
b3183c81c4 mesa/teximage: Fix S3TC regression due to ASTC interaction
A prior, literal reading of the ASTC spec led to the prohibition
of some compressed formats being used against the targets:
TEXTURE_CUBE_MAP_ARRAY and TEXTURE_3D. Since the spec does not specify
interactions with other extensions for specific compressed textures,
remove such interactions.

Fixes the following Piglit tests on Gen9:
piglit.spec.arb_direct_state_access.getcompressedtextureimage
piglit.spec.arb_get_texture_sub_image.arb_get_texture_sub_image-getcompressed
piglit.spec.arb_texture_cube_map_array.fbo-generatemipmap-cubemap array s3tc_dxt1
piglit.spec.ext_texture_compression_s3tc.getteximage-targets cube_array s3tc

v2. Don't interact with other specific compressed formats (Ian).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91927
Suggested-by: Neil Roberts <neil@linux.intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
(cherry picked from commit d1212abf50)
2015-11-30 00:13:23 +00:00
Nanley Chery
f5e508649d mesa/extensions: Enable overriding permanently enabled extensions
Provide the ability to prevent any permanently enabled extension
from appearing in the string returned by glGetString[i]().

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
(cherry picked from commit 21d43fe51a)
2015-11-30 00:13:23 +00:00
Leo Liu
31546c0e8f radeon/vce: disable Stoney VCE for 11.0
Signed-off-by: Leo Liu <leo.liu@amd.com>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2015-11-30 00:13:23 +00:00
Emil Velikov
6b149bedc3 auxiliary/vl/dri: fd management cleanups
Analogous to previous commit, minus the extra dup. We are the one
opening the device thus we can directly use the fd.

Spotted by Coverity (CID 1339867, 1339877)

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit 5d294d9fa3)
2015-11-30 00:13:23 +00:00
Emil Velikov
7a4ba7bfad auxiliary/vl/drm: fd management cleanups
Analogous to previous commit.

Spotted by Coverity (CID 1339868)

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit 151290c154)
2015-11-30 00:13:23 +00:00
Emil Velikov
ef6769f18f st/xa: fd management cleanups
Analogous to previous commit.

Spotted by Coverity (CID 1339866)

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit fe71059388)
2015-11-30 00:13:23 +00:00
Emil Velikov
a71db1c46e st/dri: fd management cleanups
Add some checks if the original/dup'd fd is valid and ensure that we
don't leak it on error. The former is implicitly handled within the
pipe_loader, although let's make things explicit and check beforehand.

Spotted by Coverity (CID 1339865)

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit d90ba57c08)
2015-11-30 00:13:23 +00:00
Emil Velikov
88cd21fefb pipe-loader: check if winsys.name is non-null prior to strcmp
In theory this wouldn't be an issue, as we'll find the correct name and
break out of the loop before we hit the sentinel.

Let's fix this and avoid issues in the future.

Spotted by Coverity (CID 1339869, 1339870, 1339871)

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit 5f92906b87)
2015-11-30 00:13:23 +00:00
Ilia Mirkin
97d4954f3f mesa: support GL_RED/GL_RG in ES2 contexts when driver support exists
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93126
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 0396eaaf80)
2015-11-30 00:13:23 +00:00
Nicolai Hähnle
3d525c8650 radeon: only suspend queries on flush if they haven't been suspended yet
Non-timer queries are suspended during blits. When the blits end, the queries
are resumed, but this resume operation itself might run out of CS space and
trigger a flush. When this happens, we must prevent a duplicate suspend during
preflush suspend, and we must also prevent a duplicate resume when the CS flush
returns back to the original resume operation.

This fixes a regression that was introduced by:

commit 8a125afa6e
Author: Nicolai Hähnle <nhaehnle@gmail.com>
Date:   Wed Nov 18 18:40:22 2015 +0100

    radeon: ensure that timing/profiling queries are suspended on flush

    The queries_suspended_for_flush flag is redundant because suspended queries
    are not removed from their respective linked list.

    Reviewed-by: Marek Olšák <marek.olsak@amd.com>

Reported-by: Axel Davy <axel.davy@ens.fr>
Cc: "11.1" <mesa-stable@lists.freedesktop.org>
Tested-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 9e5e702cfb)
2015-11-30 00:13:22 +00:00
Emil Velikov
9b9fff6830 targets: use the non-inline sw helpers
Previously (with the inline ones) things were embedded into the
pipe-loader, which means that we cannot control/select what we want in
each target.

That also meant that at runtime we ended up with the empty
sw_screen_create() as the GALLIUM_SOFTPIPE/LLVMPIPE were not set.

v2: Cover all the targets, not just dri.

Cc: "11.1" <mesa-stable@lists.freedesktop.org>
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: Edward O'Callaghan <edward.ocallaghan@koparo.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Tested-by: Oded Gabbay <oded.gabbay@gmail.com>
Tested-by: Nick Sarnie <commendsarnex@gmail.com>
(cherry picked from commit 59cfb21d46)

Squashed with commit

targets/xvmc: use the non-inline sw helpers

This was missed in commit 59cfb21d ("targets: use the non-inline sw
helpers").

Fixes build failure:

  CXXLD    libXvMCgallium.la
../../../../src/gallium/auxiliary/pipe-loader/.libs/libpipe_loader_static.a(libpipe_loader_static_la-pipe_loader_sw.o):(.data.rel.ro+0x0): undefined reference to `sw_screen_create'
collect2: error: ld returned 1 exit status
Makefile:756: recipe for target 'libXvMCgallium.la' failed
make[3]: *** [libXvMCgallium.la] Error 1

Trivial.

(cherry picked from commit 22d2dda03b)
2015-11-30 00:12:58 +00:00
Emil Velikov
3d09bede30 target-hepers: add non inline sw helpers
Feeling rather dirty copying the inline ones, yet we need the inline
ones for swrast only targets like libgl-xlib, osmesa.

Cc: "11.1" <mesa-stable@lists.freedesktop.org>
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: Edward O'Callaghan <edward.ocallaghan@koparo.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Tested-by: Oded Gabbay <oded.gabbay@gmail.com>
Tested-by: Nick Sarnie <commendsarnex@gmail.com>
(cherry picked from commit fbc6447c3d)
2015-11-29 19:36:56 +00:00
Emil Velikov
aad5c7d1ca pipe-loader: fix off-by one error
With earlier commit we've dropped the manual iteration over the fixed
size array and prepemtively set the variable storing the size, that is
to be returned. Yet we forgot to adjust the comparison, as before we
were comparing the index, now we're comparing the size.

Fixes: ff9cd8a67c "pipe-loader: directly use
pipe_loader_sw_probe_null() at probe time"
Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93091
Reported-by: Tom Stellard <thomas.stellard@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Tested-by: Tom Stellard <thomas.stellard@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>

(cherry picked from commit f623517188)
2015-11-29 19:36:55 +00:00
Kenneth Graunke
323161333c i965: Fix scalar vertex shader struct outputs.
While we correctly set output[] for composite varyings, we set completely
bogus values for output_components[], making emit_urb_writes() output
zeros instead of the actual values.

Unfortunately, our simple approach goes out the window, and we need to
recurse into structs to get the proper value of vector_elements for each
field.

Together with the previous patch, this fixes rendering in an upcoming
game from Feral Interactive.

v2: Use pointers instead of pass-by-mutable-reference (Jason, Matt).

Cc: "11.1 11.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit 3810c15614)
2015-11-29 19:36:55 +00:00
Kenneth Graunke
80febef0ad i965: Fix fragment shader struct inputs.
Apparently we have literally no support for FS varying struct inputs.
This is somewhat surprising, given that we've had tests for that very
feature that have been passing for a long time.

Normally, varying packing splits up structures for us, so we don't see
them in the backend.  However, with SSO, varying packing isn't around
to save us, and we get actual structs that we have to handle.

This patch changes fs_visitor::emit_general_interpolation() to work
recursively, properly handling nested structs/arrays/and so on.
(It's easier to read with diff -b, as indentation changes.)

When using the vec4 VS backend, this fixes rendering in an upcoming
game from Feral Interactive.  (The scalar VS backend requires additional
bug fixes in the next patch.)

v2: Use pointers instead of pass-by-mutable-reference (Jason, Matt).

Cc: "11.1 11.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit 3e9003e9cf)
2015-11-29 19:36:55 +00:00
Tom Stellard
cf70584907 radeonsi/compute: Use the compiler's COMPUTE_PGM_RSRC* register values
The compiler has more information and is able to optimize the bits
it sets in these registers.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>

CC: <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 89851a2965)
2015-11-29 19:36:55 +00:00
Tom Stellard
96e1bf8791 radeonsi: Rename si_shader::ls_rsrc{1,2} to si_shader::rsrc{1,2}
In the future, these will be used by other shaders types.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 95e0510916)
2015-11-29 19:36:55 +00:00
Ian Romanick
34521c2840 docs: add missed i965 feature to relnotes
Trivial.  GL_ARB_fragment_layer_viewport support was added in 8c902a58
by Ken.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "11.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 9b41489cb5)
2015-11-29 17:59:46 +00:00
Timothy Arceri
6a71090002 glsl: implement recent spec update to SSO validation
Enables 200+ dEQP SSO tests to proceed past validation,
and fixes a ES31-CTS.sepshaderobjs.PipelineApi subtest.

V2: split out change that reverts a previous patch into its own commit,
move variable declaration to top of function, and fix some formatting
all suggested by Ian.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Cc: "11.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 2571a768d6)
2015-11-29 17:59:22 +00:00
Timothy Arceri
88fd679706 Revert "mesa: return initial value for VALIDATE_STATUS if pipe not bound"
This reverts commit ba02f7a3b6.

The commit checked whether the pipeline was currently bound instead
of checking whether it had ever been bound.  The previous setting
of Validated during object creation makes this unnecessary.  The
real problem was that Validated was not properly set to false
elsewhere in the code.  This is fixed by a later patch.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Cc: "11.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 3c4aa7aff2)
2015-11-29 17:58:57 +00:00
Boyuan Zhang
bb7a1ee11f radeon/uvd: uv pitch separation for stoney
v2: set the behaviour default for future ASICs.

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit f55f134a03)
2015-11-29 17:58:34 +00:00
Dave Airlie
7a41162b45 texgetimage: consolidate 1D array handling code.
This should fix the getteximage-depth test that currently asserts.

I was hitting problem with virgl as well in this area.

This moves the 1D array handling code to a single place.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ben Skeggs <bskeggs@redhat.com>
Cc: "10.6 11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 237bcdbab5)
2015-11-29 17:58:09 +00:00
Ilia Mirkin
5e853a4f01 docs: add missed freedreno features to relnotes
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit e4c1221d36)
2015-11-29 17:57:46 +00:00
Ilia Mirkin
2e073938d0 freedreno/a4xx: use a factor of 32767 for snorm8 blending
It appears that the hardware wants the integer to be scaled the same way
that the hardware representation is. snorm16 uses one of the float
factors, so this is only relevant for snorm8.

This fixes a number of subcases of
  bin/fbo-blending-formats GL_EXT_texture_snorm

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 81b16350fa)
2015-11-29 17:57:23 +00:00
Emil Velikov
30e1c390b3 configure.ac: default to disabled dri3 when --disable-dri is set
Not too long ago, the dri3 code was living in src/glx, which in itself
was guarded by HAVE_DRI_GLX. As the name suggests we didn't dive into
the folder when dri was disabled, thus we missed that dri3 does not
consider/honour --enable-dri.

Cc: mesa-stable@lists.freedesktop.org
Fixes: 6bd9ba7d07 "loader: Add dri3 helper"
Cc: Pali Rohár <pali.rohar@gmail.com>
Reported-by: Pali Rohár <pali.rohar@gmail.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit b89d1b2ccf)
2015-11-29 17:57:00 +00:00
Emil Velikov
72e51e5dfa loader: unconditionally add AM_CPPFLAGS to libloader_la_CPPFLAGS
It seems that due to the conditional autotools is getting confused and
forgetting to add AM_CPPFLAGS when building libloader (when
HAVE_DRICOMMON is not set).

Cc: mesa-stable@lists.freedesktop.org
Fixes: 5a79e0a8e3 "automake: loader: rework the CPPFLAGS"
Reported-by: Pali Rohár <pali.rohar@gmail.com>
Tested-by: Pali Rohár <pali.rohar@gmail.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit b9b0a1f58e)
2015-11-29 17:56:36 +00:00
Emil Velikov
902378d6c8 pipe-loader: link against libloader regardless of libdrm presence
Whether or not the loader has libdrm support is up-to it. Anyone using
the loader should just include it whenever they depend on it.

Cc: mesa-stable@lists.freedesktop.org
Fixes: 0f39f9cb7a "pipe-loader: add a dummy 'static' pipe-loader"
Reported-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Tested-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit 8a6d476588)
2015-11-29 17:56:13 +00:00
Ilia Mirkin
f6f127b597 nv50/ir: fix (un)spilling of 3-wide results
There is no 96-bit load/store operations, so we have to split it up
into a 32-bit parts, with a split/merge around it.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90348
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 4deb118d06)
2015-11-29 17:55:51 +00:00
Ilia Mirkin
a2f2329cdd nv50,nvc0: properly handle buffer storage invalidation on dsa buffer
In case that the buffer has no bind at all, assume it can be a regular
buffer. This can happen on buffers created through the ARB_dsa
interfaces.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit ad5f6b03e7)
2015-11-29 17:55:27 +00:00
Ilia Mirkin
642b66291c nouveau: use the buffer usage to determine placement when no binding
With ARB_direct_state_access, buffers can be created without any binding
hints at all. We still need to allocate these buffers to VRAM or GART,
as we don't have logic down the line to place them into GPU-mappable
space. Ideally we'd be able to shift these things around based on usage.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92438
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 079f713754)
2015-11-29 17:55:04 +00:00
Eric Anholt
06c3ed8d21 vc4: Take precedence over ilo when in simulator mode.
They're exclusive at build time, but the ilo entry is always present, so
we'd try to use it and fail out.

v2: Add comment in the code, from Emil.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit 1b62a4e885)
2015-11-29 17:54:41 +00:00
Eric Anholt
cfbb08168a vc4: Just put USE_VC4_SIMULATOR in DEFINES.
In the pipe-loader reworks, it was missed in one of the new directories it
was used.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit a39eac80fd)
2015-11-29 17:54:18 +00:00
Igor Gnatenko
43b0b8a9a3 virgl: pipe_virgl_create_screen is not static
Cc: mesa-stable@lists.freedesktop.org
Fixes: 17d3a5f857 "target-helpers: add a non-inline drm_helper.h"
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93063
Signed-off-by: Igor Gnatenko <i.gnatenko.brain@gmail.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit 05eed0eca7)
2015-11-29 17:53:55 +00:00
Ilia Mirkin
85b6f905e1 freedreno/a4xx: disable blending and alphatest for integer rt0
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 22aeb0c568)
2015-11-29 17:53:31 +00:00
Ilia Mirkin
6a6326dcd4 freedreno/a4xx: fix independent blend
This fixes the ext_draw_buffers2 and arb_draw_buffers_blend tests.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 4c170d9e1d)
2015-11-29 17:53:08 +00:00
Ilia Mirkin
17a64701cb freedreno/a4xx: fix 3d texture setup
Same fix as on a3xx - set the second (tiny) layer size bitfield to the
smallest level's size so that the hw knows not to minify beyond that.

This fixes texelFetch sampler3D piglits.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 740eb63aa7)
2015-11-29 17:52:43 +00:00
Ilia Mirkin
cb4f6e2a30 freedreno/a4xx: only align slices in non-layer_first textures
When layer is the container, slices are tightly packed inside of each
layer. We don't need any additional alignment. On a3xx, each slice
contains all the layers, so having alignment makes sense.

This fixes a whole slew of array-related piglits, including texelFetch
and tex-miplevel-selection varieties.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit ecb0dcd34c)
2015-11-29 17:48:28 +00:00
Ian Romanick
8c564f0376 meta: Don't save or restore the active client texture
This setting is only used by glTexCoordPointer and related glEnable
calls.  Since the preceeding commits removed all of those, it is not
necessary to save, reset to default, or restore this state.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
(cherry picked from commit 47b3a0d235)
2015-11-24 11:36:06 -08:00
Ian Romanick
3d2bf5a5f5 meta: Don't save or restore the VBO binding
Nothing left in meta does anything with the VBO binding, so we don't
need to save or restore it.  The VAO binding is still modified.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
(cherry picked from commit c63f9c735d)
2015-11-24 11:36:06 -08:00
Ian Romanick
d1b7a1f5af meta/TexSubImage: Don't pollute the buffer object namespace
tl;dr: For many types of GL object, we can *NEVER* use the Gen function.

In OpenGL ES (all versions!) and OpenGL compatibility profile,
applications don't have to call Gen functions.  The GL spec is very
clear about how you can mix-and-match generated names and non-generated
names: you can use any name you want for a particular object type until
you call the Gen function for that object type.

Here's the problem scenario:

 - Application calls a meta function that generates a name.  The first
   Gen will probably return 1.

 - Application decides to use the same name for an object of the same
   type without calling Gen.  Many demo programs use names 1, 2, 3,
   etc. without calling Gen.

 - Application calls the meta function again, and the meta function
   replaces the data.  The application's data is lost, and the app
   fails.  Have fun debugging that.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92363
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
(cherry picked from commit 58aa56d40b)
2015-11-24 11:36:06 -08:00
Ian Romanick
9c2a7cfbbf meta: Don't pollute the buffer object namespace in _mesa_meta_DrawTex
tl;dr: For many types of GL object, we can *NEVER* use the Gen function.

In OpenGL ES (all versions!) and OpenGL compatibility profile,
applications don't have to call Gen functions.  The GL spec is very
clear about how you can mix-and-match generated names and non-generated
names: you can use any name you want for a particular object type until
you call the Gen function for that object type.

Here's the problem scenario:

 - Application calls a meta function that generates a name.  The first
   Gen will probably return 1.

 - Application decides to use the same name for an object of the same
   type without calling Gen.  Many demo programs use names 1, 2, 3,
   etc. without calling Gen.

 - Application calls the meta function again, and the meta function
   replaces the data.  The application's data is lost, and the app
   fails.  Have fun debugging that.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92363
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
(cherry picked from commit 76cfe2bc44)
2015-11-24 11:36:06 -08:00
Ian Romanick
089fa07dee meta: Use internal functions for buffer object and VAO access in _mesa_meta_DrawTex
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
(cherry picked from commit a222d4cbc3)
2015-11-24 11:36:06 -08:00
Ian Romanick
7ebc8c36a0 meta: Track VBO using gl_buffer_object instead of GL API object handle in _mesa_meta_DrawTex
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
(cherry picked from commit b8a7369fb7)
2015-11-24 11:36:06 -08:00
Ian Romanick
79468fac69 meta: Partially convert _mesa_meta_DrawTex to DSA
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
(cherry picked from commit d5225ee5d9)
2015-11-24 11:36:06 -08:00
Ian Romanick
756e323f2c meta: Don't pollute the buffer object namespace in _mesa_meta_setup_vertex_objects
tl;dr: For many types of GL object, we can *NEVER* use the Gen function.

In OpenGL ES (all versions!) and OpenGL compatibility profile,
applications don't have to call Gen functions.  The GL spec is very
clear about how you can mix-and-match generated names and non-generated
names: you can use any name you want for a particular object type until
you call the Gen function for that object type.

Here's the problem scenario:

 - Application calls a meta function that generates a name.  The first
   Gen will probably return 1.

 - Application decides to use the same name for an object of the same
   type without calling Gen.  Many demo programs use names 1, 2, 3,
   etc. without calling Gen.

 - Application calls the meta function again, and the meta function
   replaces the data.  The application's data is lost, and the app
   fails.  Have fun debugging that.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92363
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
(cherry picked from commit 37d11b13ce)
2015-11-24 11:36:06 -08:00
Ian Romanick
507732ea3d meta: Use internal functions for buffer object and VAO access
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
(cherry picked from commit b1b73a42c8)
2015-11-24 11:36:06 -08:00
Ian Romanick
01909c1f29 meta: Use DSA functions for VBOs in _mesa_meta_setup_vertex_objects
The fixed-function attribute paths don't get the DSA treatment because
there are no DSA entry-points for fixed-function attributes.  These
could have been added, but this is a temporary patch intended to make
later patches easier to review.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
(cherry picked from commit 52921f8e08)
2015-11-24 11:36:06 -08:00
Ian Romanick
76b155c9cd meta: Track VBO using gl_buffer_object instead of GL API object handle
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
(cherry picked from commit 1035e00a81)
2015-11-24 11:36:06 -08:00
Ian Romanick
4a5c29d877 meta: Don't leave the VBO bound after _mesa_meta_setup_vertex_objects
Meta currently does this, but future changes will make this impossible.
Explicitly do it as a step in the patch series now to catch any possible
kinks.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
(cherry picked from commit 3b5a7d450d)
2015-11-24 11:36:06 -08:00
Ian Romanick
bf3f0b9e9b i965: Use _mesa_NamedBufferSubData for users of _mesa_meta_setup_vertex_objects
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
(cherry picked from commit ed0bd6573b)
2015-11-24 11:36:06 -08:00
Ian Romanick
e097324fee meta: Use _mesa_NamedBufferData and _mesa_NamedBufferSubData for users of _mesa_meta_setup_vertex_objects
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
(cherry picked from commit 7f2f300071)
2015-11-24 11:36:05 -08:00
Ian Romanick
aa607c69af meta: Use DSA functions for PBO in create_texture_for_pbo
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
(cherry picked from commit 89a61afdd7)
2015-11-24 11:36:05 -08:00
Ian Romanick
72470a9c37 i965: Don't pollute the buffer object namespace in brw_meta_fast_clear
tl;dr: For many types of GL object, we can *NEVER* use the Gen function.

In OpenGL ES (all versions!) and OpenGL compatibility profile,
applications don't have to call Gen functions.  The GL spec is very
clear about how you can mix-and-match generated names and non-generated
names: you can use any name you want for a particular object type until
you call the Gen function for that object type.

Here's the problem scenario:

 - Application calls a meta function that generates a name.  The first
   Gen will probably return 1.

 - Application decides to use the same name for an object of the same
   type without calling Gen.  Many demo programs use names 1, 2, 3,
   etc. without calling Gen.

 - Application calls the meta function again, and the meta function
   replaces the data.  The application's data is lost, and the app
   fails.  Have fun debugging that.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92363
Reviewed-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
(cherry picked from commit 4e6b9c11fc)
2015-11-24 11:36:05 -08:00
Ian Romanick
de299e1e2e i965: Use internal functions for buffer object access
Instead of going through the GL API implementation functions, use the
lower-level functions.  This means that we have to keep track of a
pointer to the gl_buffer_object and the gl_vertex_array_object.

This has two advantages.  First, it avoids a bunch of CPU overhead in
looking up objects and validing API parameters.  Second, and much more
importantly, it will allow us to stop calling _mesa_GenBuffers /
_mesa_CreateBuffers and pollute the buffer namespace (next patch).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
(cherry picked from commit e62799bd4e)
2015-11-24 11:36:05 -08:00
Ian Romanick
ded66b1451 i965: Use DSA functions for VBOs in brw_meta_fast_clear
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
(cherry picked from commit 1c5423d3a0)
2015-11-24 11:36:05 -08:00
Ian Romanick
b7b4104a7f i965: Pass brw_context instead of gl_context to brw_draw_rectlist
Future patches will use the brw_context instead.  Keeping this
non-functional change separate should make the function changes easier
to review.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
(cherry picked from commit dcadd855f1)
2015-11-24 11:36:05 -08:00
Ian Romanick
236fb067a5 mesa: Refactor enable_vertex_array_attrib to make _mesa_enable_vertex_array_attrib
Pulls the parts of enable_vertex_array_attrib that aren't just parameter
validation out into a function that can be called from other parts of
Mesa (e.g., meta).

_mesa_enable_vertex_array_attrib can also be used to enable
fixed-function arrays.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
(cherry picked from commit 4a644f1caa)
2015-11-24 11:36:05 -08:00
Ian Romanick
2d9093fdf0 mesa: Refactor update_array_format to make _mesa_update_array_format_public
Pulls the parts of update_array_format that aren't just parameter
validation out into a function that can be called from other parts of
Mesa (e.g., meta).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
(cherry picked from commit a336fcd36a)
2015-11-24 11:36:05 -08:00
Ian Romanick
d757c04215 mesa: Make bind_vertex_buffer avilable outside varray.c
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
(cherry picked from commit 8fae494df2)
2015-11-24 11:36:05 -08:00
Emil Velikov
f9339359d5 Update version to 11.1.0-rc1
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2015-11-21 13:00:25 +00:00
148 changed files with 4032 additions and 1027 deletions

View File

@@ -1 +1 @@
11.1.0-devel 11.1.0

View File

@@ -767,6 +767,11 @@ linux*)
dri3_default=no dri3_default=no
;; ;;
esac esac
if test "x$enable_dri" = xno; then
dri3_default=no
fi
AC_ARG_ENABLE([dri3], AC_ARG_ENABLE([dri3],
[AS_HELP_STRING([--enable-dri3], [AS_HELP_STRING([--enable-dri3],
[enable DRI3 @<:@default=auto@:>@])], [enable DRI3 @<:@default=auto@:>@])],
@@ -2173,7 +2178,9 @@ if test -n "$with_gallium_drivers"; then
gallium_require_drm_loader gallium_require_drm_loader
PKG_CHECK_MODULES([SIMPENROSE], [simpenrose], PKG_CHECK_MODULES([SIMPENROSE], [simpenrose],
[USE_VC4_SIMULATOR=yes], [USE_VC4_SIMULATOR=no]) [USE_VC4_SIMULATOR=yes;
DEFINES="$DEFINES -DUSE_VC4_SIMULATOR"],
[USE_VC4_SIMULATOR=no])
;; ;;
xvirgl) xvirgl)
HAVE_GALLIUM_VIRGL=yes HAVE_GALLIUM_VIRGL=yes

View File

@@ -238,6 +238,12 @@ for details.
</ul> </ul>
<h3>VA-API state tracker environment variables</h3>
<ul>
<li>VAAPI_MPEG4_ENABLED - enable MPEG4 for VA-API, disabled by default.
</ul>
<p> <p>
Other Gallium drivers have their own environment variables. These may change Other Gallium drivers have their own environment variables. These may change
frequently so the source code should be consulted for details. frequently so the source code should be consulted for details.

View File

@@ -14,7 +14,7 @@
<iframe src="../contents.html"></iframe> <iframe src="../contents.html"></iframe>
<div class="content"> <div class="content">
<h1>Mesa 11.1.0 Release Notes / TBD</h1> <h1>Mesa 11.1.0 Release Notes / 15 December 2015</h1>
<p> <p>
Mesa 11.1.0 is a new development release. Mesa 11.1.0 is a new development release.
@@ -51,14 +51,20 @@ Note: some of the new features are only available with certain drivers.
<li>GL_ARB_arrays_of_arrays on i965</li> <li>GL_ARB_arrays_of_arrays on i965</li>
<li>GL_ARB_blend_func_extended on freedreno (a3xx)</li> <li>GL_ARB_blend_func_extended on freedreno (a3xx)</li>
<li>GL_ARB_clear_texture on nv50, nvc0</li> <li>GL_ARB_clear_texture on nv50, nvc0</li>
<li>GL_ARB_clip_control on freedreno/a4xx</li>
<li>GL_ARB_copy_image on nv50, nvc0, radeonsi</li> <li>GL_ARB_copy_image on nv50, nvc0, radeonsi</li>
<li>GL_ARB_depth_clamp on freedreno/a4xx</li>
<li>GL_ARB_fragment_layer_viewport on i965 (gen6+)</li>
<li>GL_ARB_gpu_shader_fp64 on r600 for Cypress/Cayman/Aruba chips</li> <li>GL_ARB_gpu_shader_fp64 on r600 for Cypress/Cayman/Aruba chips</li>
<li>GL_ARB_gpu_shader5 on r600 for Evergreen and later chips</li> <li>GL_ARB_gpu_shader5 on r600 for Evergreen and later chips</li>
<li>GL_ARB_seamless_cubemap_per_texture on freedreno/a4xx</li>
<li>GL_ARB_shader_clock on i965 (gen7+)</li> <li>GL_ARB_shader_clock on i965 (gen7+)</li>
<li>GL_ARB_shader_stencil_export on i965 (gen9+)</li> <li>GL_ARB_shader_stencil_export on i965 (gen9+)</li>
<li>GL_ARB_shader_storage_buffer_object on i965</li> <li>GL_ARB_shader_storage_buffer_object on i965</li>
<li>GL_ARB_shader_texture_image_samples on i965, nv50, nvc0, r600, radeonsi</li> <li>GL_ARB_shader_texture_image_samples on i965, nv50, nvc0, r600, radeonsi</li>
<li>GL_ARB_texture_barrier / GL_NV_texture_barrier on i965</li> <li>GL_ARB_texture_barrier / GL_NV_texture_barrier on i965</li>
<li>GL_ARB_texture_buffer_range on freedreno/a3xx</li>
<li>GL_ARB_texture_compression_bptc on freedreno/a4xx</li>
<li>GL_ARB_texture_query_lod on softpipe</li> <li>GL_ARB_texture_query_lod on softpipe</li>
<li>GL_ARB_texture_view on radeonsi and r600 (for evergeen and newer)</li> <li>GL_ARB_texture_view on radeonsi and r600 (for evergeen and newer)</li>
<li>GL_ARB_vertex_type_2_10_10_10_rev on freedreno (a3xx, a4xx)</li> <li>GL_ARB_vertex_type_2_10_10_10_rev on freedreno (a3xx, a4xx)</li>
@@ -78,11 +84,196 @@ Note: some of the new features are only available with certain drivers.
<h2>Bug fixes</h2> <h2>Bug fixes</h2>
TBD. <p>This list is likely incomplete.</p>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=28130">Bug 28130</a> - vbo: premature flushing breaks GL_LINE_LOOP</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=38109">Bug 38109</a> - i915 driver crashes if too few vertices are submitted (Mesa 7.10.2)</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=49779">Bug 49779</a> - Extra line segments in GL_LINE_LOOP</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=55552">Bug 55552</a> - Compile errors with --enable-mangling</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=71789">Bug 71789</a> - [r300g] Visuals not found in (default) depth = 24</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=79783">Bug 79783</a> - Distorted output in obs-studio where other vendors &quot;work&quot;</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=80821">Bug 80821</a> - When LIBGL_ALWAYS_SOFTWARE is set, KHR_create_context is not supported</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=81174">Bug 81174</a> - Gallium: GL_LINE_LOOP broken with more than 512 points</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=83508">Bug 83508</a> - [UBO] Assertion for array of blocks</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=84677">Bug 84677</a> - Triangle disappears with glPolygonMode GL_LINE</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=86281">Bug 86281</a> - brw_meta_fast_clear (brw=brw&#64;entry=0x7fffd4097a08, fb=fb&#64;entry=0x7fffd40fa900, buffers=buffers&#64;entry=2, partial_clear=partial_clear&#64;entry=false)</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=86469">Bug 86469</a> - Unreal Engine demo doesn't run</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=86720">Bug 86720</a> - [radeon] Europa Universalis 4 freezing during game start (10.3.3+, still broken on 11.0.2)</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=89014">Bug 89014</a> - PIPE_QUERY_GPU_FINISHED is not acting as expected on SI</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90175">Bug 90175</a> - [hsw bisected][PATCH] atomic counters doesn't work for a binding point different to zero</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90348">Bug 90348</a> - Spilling failure of b96 merged value</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90631">Bug 90631</a> - Compilation failure for fragment shader with many branches on Sandy Bridge</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90734">Bug 90734</a> - glBufferSubData is corrupting data when buffer is &gt; 32k</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90887">Bug 90887</a> - PhiMovesPass in register allocator broken</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91044">Bug 91044</a> - piglit spec/egl_khr_create_context/valid debug flag gles* fail</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91114">Bug 91114</a> - ES3-CTS.gtf.GL3Tests.shadow.shadow_execution_vert fails</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91254">Bug 91254</a> - (regresion) video using VA-API on Intel slow and freeze system with mesa 10.6 or 10.6.1</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91292">Bug 91292</a> - [BDW+] glVertexAttribDivisor not working in combination with glPolygonMode</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91342">Bug 91342</a> - Very dark textures on some objects in indoors environments in Postal 2</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91526">Bug 91526</a> - World of Warcraft (on Wine) has UI corruption with nouveau</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91551">Bug 91551</a> - DXTn compressed normal maps produce severe artifacts on all NV5x and NVDx chipsets</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91596">Bug 91596</a> - EGL_KHR_gl_colorspace (v2) causes problem with Android-x86 GUI</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91716">Bug 91716</a> - [bisected] piglit.shaders.glsl-vs-int-attrib regresses on 32 bit BYT, HSW, IVB, SNB</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91718">Bug 91718</a> - piglit.spec.arb_shader_image_load_store.invalid causes intermittent GPU HANG</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91719">Bug 91719</a> - [SNB,HSW,BYT] dEQP regressions associated with using NIR for vertex shaders</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91726">Bug 91726</a> - R600 asserts in tgsi_cmp/make_src_for_op3</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91780">Bug 91780</a> - Rendering issues with geometry shader</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91785">Bug 91785</a> - make check DispatchSanity_test.GLES31 regression</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91788">Bug 91788</a> - [HSW Regression] Synmark2_v6 Multithread performance case FPS reduced by 36%</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91847">Bug 91847</a> - glGenerateTextureMipmap not working (no errors) unless glActiveTexture(GL_TEXTURE1) is called before</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91857">Bug 91857</a> - Mesa 10.6.3 linker is slow</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91881">Bug 91881</a> - regression: GPU lockups since mesa-11.0.0_rc1 on RV620 (r600) driver</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91890">Bug 91890</a> - [nve7] witcher2: blurry image &amp; DATA_ERRORs (class 0xa097 mthd 0x2380/0x238c)</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91898">Bug 91898</a> - src/util/mesa-sha1.c:250:25: fatal error: openssl/sha.h: No such file or directory</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91927">Bug 91927</a> - [SKL] [regression] piglit compressed textures tests fail with kernel upgrade</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91930">Bug 91930</a> - Program with GtkGLArea widget does not redraw</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91970">Bug 91970</a> - [BSW regression] dEQP-GLES3.functional.shaders.precision.int.highp_mul_vertex</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91985">Bug 91985</a> - [regression, bisected] FTBFS with commit f9caabe8f1: R600_UCP_CONST_BUFFER is undefined</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91993">Bug 91993</a> - Graphical glitch in Astromenace (open-source game).</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92009">Bug 92009</a> - ES3-CTS.gtf.GL3Tests.packed_pixels.packed_pixels fails</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92033">Bug 92033</a> - [SNB,regression,dEQP,bisected] functional.shaders.random tests regressed</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92052">Bug 92052</a> - nir/nir_builder.h:79: error: expected primary-expression before . token</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92054">Bug 92054</a> - make check gbm-symbols-check regression</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92066">Bug 92066</a> - [ILK,G45,regression] New assertion on BRW_MAX_MRF breaks ilk and g45</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92072">Bug 92072</a> - Wine breakage since d082c5324 (st/mesa: don't call st_validate_state in BlitFramebuffer)</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92095">Bug 92095</a> - [Regression, bisected] arb_shader_atomic_counters.compiler.builtins.frag</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92122">Bug 92122</a> - [bisected, cts] Regression with Assault Android Cactus</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92124">Bug 92124</a> - shader_query.cpp:841:34: error: strndup was not declared in this scope</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92183">Bug 92183</a> - linker.cpp:3187:46: error: strtok_r was not declared in this scope</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92193">Bug 92193</a> - [SKL] ES2-CTS.gtf.GL2ExtensionTests.compressed_astc_texture.compressed_astc_texture fails</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92214">Bug 92214</a> - Flightgear crashes during splashboot with R600 driver, LLVM 3.7.0 and mesa 11.0.2</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92221">Bug 92221</a> - Unintended code changes in _mesa_base_tex_format commit</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92265">Bug 92265</a> - Black windows in weston after update mesa to 11.0.2-1</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92304">Bug 92304</a> - [cts] cts.shaders.negative conformance tests fail</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92363">Bug 92363</a> - [BSW/BDW] ogles1conform Gets test fails</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92437">Bug 92437</a> - osmesa: Expose GL entry points for Windows build, via .def file</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92438">Bug 92438</a> - Segfault in pushbuf_kref when running the android emulator (qemu) on nv50</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92476">Bug 92476</a> - [cts] ES2-CTS.gtf.GL2ExtensionTests.egl_image.egl_image fails</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92588">Bug 92588</a> - [HSW,BDW,BSW,SKL-Y][GLES 3.1 CTS] ES31-CTS.arrays_of_arrays.InteractionFunctionCalls2 - assert</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92621">Bug 92621</a> - [G965 ILK G45] Regression: 24 piglit regressions in glsl-1.10</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92623">Bug 92623</a> - Differences in prog_data ignored when caching fragment programs (causes hangs)</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92634">Bug 92634</a> - gallium's vl_mpeg12_decoder does not work with st/va</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92639">Bug 92639</a> - [Regression bisected] Ogles1conform mustpass.c fail</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92641">Bug 92641</a> - [SKL BSW] [Regression] Ogles1conform userclip.c fail</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92645">Bug 92645</a> - kodi vdpau interop fails since mesa,meta: move gl_texture_object::TargetIndex initializations</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92705">Bug 92705</a> - [clover] fail to build with llvm-svn/clang-svn 3.8</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92709">Bug 92709</a> - &quot;LLVM triggered Diagnostic Handler: unsupported call to function ldexpf in main&quot; when starting race in stuntrally</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92738">Bug 92738</a> - Randon R7 240 doesn't work on 16KiB page size platform</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92744">Bug 92744</a> - [g965 Regression bisected] Performance regression and piglit assertions due to liveness analysis</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92770">Bug 92770</a> - [SNB, regression, dEQP] deqp-gles3.functional.shaders.discard.dynamic_loop_texture</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92824">Bug 92824</a> - [regression, bisected] `make check` dispatch-sanity broken by GL_EXT_buffer_storage</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92849">Bug 92849</a> - [IVB HSW BDW] piglit image load/store load-from-cleared-image.shader_test fails</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92859">Bug 92859</a> - [regression, bisected] validate_intrinsic_instr: Assertion triggered</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92860">Bug 92860</a> - [radeonsi][bisected] st/mesa: implement ARB_copy_image - Corruption in ARK Survival Evolved</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92900">Bug 92900</a> - [regression bisected] About 700 piglit regressions is what could go wrong</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92909">Bug 92909</a> - Offset/alignment issue with layout std140 and vec3</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92985">Bug 92985</a> - Mac OS X build error &quot;ar: no archive members specified&quot;</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93015">Bug 93015</a> - Tonga Elemental segfault + VM faults since radeon: implement r600_query_hw_get_result via function pointers</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93048">Bug 93048</a> - [CTS regression] mesa af2723 breaks GL Conformance for debug extension</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93063">Bug 93063</a> - drm_helper.h:227:1: error: static declaration of pipe_virgl_create_screen follows non-static declaration</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93091">Bug 93091</a> - [opencl] segfault when running any opencl programs (like clinfo)</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93126">Bug 93126</a> - wrongly claim supporting GL_EXT_texture_rg</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93180">Bug 93180</a> - [regression] arb_separate_shader_objects.active sampler conflict fails</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93235">Bug 93235</a> - [regression] dispatch sanity broken by GetPointerv</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93266">Bug 93266</a> - gl_arb_shading_language_420pack does not allow binding of image variables</li>
</ul>
<h2>Changes</h2> <h2>Changes</h2>
TBD. <li>MPEG4 decoding has been disabled by default in the VAAPI driver</li>
</div> </div>
</body> </body>

File diff suppressed because it is too large Load Diff

View File

@@ -37,12 +37,12 @@ libpipe_loader_static_la_SOURCES += \
libpipe_loader_dynamic_la_SOURCES += \ libpipe_loader_dynamic_la_SOURCES += \
$(DRM_SOURCES) $(DRM_SOURCES)
endif
libpipe_loader_static_la_LIBADD = \ libpipe_loader_static_la_LIBADD = \
$(top_builddir)/src/loader/libloader.la $(top_builddir)/src/loader/libloader.la
libpipe_loader_dynamic_la_LIBADD = \ libpipe_loader_dynamic_la_LIBADD = \
$(top_builddir)/src/loader/libloader.la $(top_builddir)/src/loader/libloader.la
endif
EXTRA_DIST = SConscript EXTRA_DIST = SConscript

View File

@@ -94,6 +94,18 @@ static const struct drm_driver_descriptor driver_descriptors[] = {
.create_screen = pipe_i915_create_screen, .create_screen = pipe_i915_create_screen,
.configuration = configuration_query, .configuration = configuration_query,
}, },
#ifdef USE_VC4_SIMULATOR
/* VC4 simulator and ILO (i965) are mutually exclusive (error at
* configure). As the latter is unconditionally added, keep this one above
* it.
*/
{
.name = "i965",
.driver_name = "vc4",
.create_screen = pipe_vc4_create_screen,
.configuration = configuration_query,
},
#endif
{ {
.name = "i965", .name = "i965",
.driver_name = "i915", .driver_name = "i915",
@@ -154,14 +166,6 @@ static const struct drm_driver_descriptor driver_descriptors[] = {
.create_screen = pipe_vc4_create_screen, .create_screen = pipe_vc4_create_screen,
.configuration = configuration_query, .configuration = configuration_query,
}, },
#ifdef USE_VC4_SIMULATOR
{
.name = "i965",
.driver_name = "vc4",
.create_screen = pipe_vc4_create_screen,
.configuration = configuration_query,
},
#endif
}; };
#endif #endif

View File

@@ -33,9 +33,10 @@
#include "sw/kms-dri/kms_dri_sw_winsys.h" #include "sw/kms-dri/kms_dri_sw_winsys.h"
#include "sw/null/null_sw_winsys.h" #include "sw/null/null_sw_winsys.h"
#include "sw/wrapper/wrapper_sw_winsys.h" #include "sw/wrapper/wrapper_sw_winsys.h"
#include "target-helpers/inline_sw_helper.h" #include "target-helpers/sw_helper_public.h"
#include "state_tracker/drisw_api.h" #include "state_tracker/drisw_api.h"
#include "state_tracker/sw_driver.h" #include "state_tracker/sw_driver.h"
#include "state_tracker/sw_winsys.h"
struct pipe_loader_sw_device { struct pipe_loader_sw_device {
struct pipe_loader_device base; struct pipe_loader_device base;
@@ -136,7 +137,7 @@ pipe_loader_sw_probe_dri(struct pipe_loader_device **devs, struct drisw_loader_f
if (!pipe_loader_sw_probe_init_common(sdev)) if (!pipe_loader_sw_probe_init_common(sdev))
goto fail; goto fail;
for (i = 0; sdev->dd->winsys; i++) { for (i = 0; sdev->dd->winsys[i].name; i++) {
if (strcmp(sdev->dd->winsys[i].name, "dri") == 0) { if (strcmp(sdev->dd->winsys[i].name, "dri") == 0) {
sdev->ws = sdev->dd->winsys[i].create_winsys(drisw_lf); sdev->ws = sdev->dd->winsys[i].create_winsys(drisw_lf);
break; break;
@@ -168,7 +169,7 @@ pipe_loader_sw_probe_kms(struct pipe_loader_device **devs, int fd)
if (!pipe_loader_sw_probe_init_common(sdev)) if (!pipe_loader_sw_probe_init_common(sdev))
goto fail; goto fail;
for (i = 0; sdev->dd->winsys; i++) { for (i = 0; sdev->dd->winsys[i].name; i++) {
if (strcmp(sdev->dd->winsys[i].name, "kms_dri") == 0) { if (strcmp(sdev->dd->winsys[i].name, "kms_dri") == 0) {
sdev->ws = sdev->dd->winsys[i].create_winsys(fd); sdev->ws = sdev->dd->winsys[i].create_winsys(fd);
break; break;
@@ -199,7 +200,7 @@ pipe_loader_sw_probe_null(struct pipe_loader_device **devs)
if (!pipe_loader_sw_probe_init_common(sdev)) if (!pipe_loader_sw_probe_init_common(sdev))
goto fail; goto fail;
for (i = 0; sdev->dd->winsys; i++) { for (i = 0; sdev->dd->winsys[i].name; i++) {
if (strcmp(sdev->dd->winsys[i].name, "null") == 0) { if (strcmp(sdev->dd->winsys[i].name, "null") == 0) {
sdev->ws = sdev->dd->winsys[i].create_winsys(); sdev->ws = sdev->dd->winsys[i].create_winsys();
break; break;
@@ -222,7 +223,7 @@ pipe_loader_sw_probe(struct pipe_loader_device **devs, int ndev)
{ {
int i = 1; int i = 1;
if (i < ndev) { if (i <= ndev) {
if (!pipe_loader_sw_probe_null(devs)) { if (!pipe_loader_sw_probe_null(devs)) {
i--; i--;
} }
@@ -244,7 +245,7 @@ pipe_loader_sw_probe_wrapped(struct pipe_loader_device **dev,
if (!pipe_loader_sw_probe_init_common(sdev)) if (!pipe_loader_sw_probe_init_common(sdev))
goto fail; goto fail;
for (i = 0; sdev->dd->winsys; i++) { for (i = 0; sdev->dd->winsys[i].name; i++) {
if (strcmp(sdev->dd->winsys[i].name, "wrapped") == 0) { if (strcmp(sdev->dd->winsys[i].name, "wrapped") == 0) {
sdev->ws = sdev->dd->winsys[i].create_winsys(screen); sdev->ws = sdev->dd->winsys[i].create_winsys(screen);
break; break;

View File

@@ -223,7 +223,7 @@ pipe_freedreno_create_screen(int fd)
#include "virgl/drm/virgl_drm_public.h" #include "virgl/drm/virgl_drm_public.h"
#include "virgl/virgl_public.h" #include "virgl/virgl_public.h"
static struct pipe_screen * struct pipe_screen *
pipe_virgl_create_screen(int fd) pipe_virgl_create_screen(int fd)
{ {
struct virgl_winsys *vws; struct virgl_winsys *vws;

View File

@@ -0,0 +1,73 @@
#ifndef SW_HELPER_H
#define SW_HELPER_H
#include "pipe/p_compiler.h"
#include "util/u_debug.h"
#include "target-helpers/sw_helper_public.h"
#include "state_tracker/sw_winsys.h"
/* Helper function to choose and instantiate one of the software rasterizers:
* llvmpipe, softpipe.
*/
#ifdef GALLIUM_SOFTPIPE
#include "softpipe/sp_public.h"
#endif
#ifdef GALLIUM_LLVMPIPE
#include "llvmpipe/lp_public.h"
#endif
#ifdef GALLIUM_VIRGL
#include "virgl/virgl_public.h"
#include "virgl/vtest/virgl_vtest_public.h"
#endif
static inline struct pipe_screen *
sw_screen_create_named(struct sw_winsys *winsys, const char *driver)
{
struct pipe_screen *screen = NULL;
#if defined(GALLIUM_LLVMPIPE)
if (screen == NULL && strcmp(driver, "llvmpipe") == 0)
screen = llvmpipe_create_screen(winsys);
#endif
#if defined(GALLIUM_VIRGL)
if (screen == NULL && strcmp(driver, "virpipe") == 0) {
struct virgl_winsys *vws;
vws = virgl_vtest_winsys_wrap(winsys);
screen = virgl_create_screen(vws);
}
#endif
#if defined(GALLIUM_SOFTPIPE)
if (screen == NULL)
screen = softpipe_create_screen(winsys);
#endif
return screen;
}
struct pipe_screen *
sw_screen_create(struct sw_winsys *winsys)
{
const char *default_driver;
const char *driver;
#if defined(GALLIUM_LLVMPIPE)
default_driver = "llvmpipe";
#elif defined(GALLIUM_SOFTPIPE)
default_driver = "softpipe";
#else
default_driver = "";
#endif
driver = debug_get_option("GALLIUM_DRIVER", default_driver);
return sw_screen_create_named(winsys, driver);
}
#endif

View File

@@ -0,0 +1,10 @@
#ifndef _SW_HELPER_PUBLIC_H
#define _SW_HELPER_PUBLIC_H
struct pipe_screen;
struct sw_winsys;
struct pipe_screen *
sw_screen_create(struct sw_winsys *winsys);
#endif /* _SW_HELPER_PUBLIC_H */

View File

@@ -115,7 +115,7 @@ vl_video_buffer_formats(struct pipe_screen *screen, enum pipe_format format)
return const_resource_formats_VUYA; return const_resource_formats_VUYA;
case PIPE_FORMAT_R8G8B8X8_UNORM: case PIPE_FORMAT_R8G8B8X8_UNORM:
return const_resource_formats_VUYX; return const_resource_formats_YUVX;
case PIPE_FORMAT_B8G8R8X8_UNORM: case PIPE_FORMAT_B8G8R8X8_UNORM:
return const_resource_formats_VUYX; return const_resource_formats_VUYX;

View File

@@ -392,7 +392,7 @@ vl_dri2_screen_create(Display *display, int screen)
goto free_connect; goto free_connect;
if (drmGetMagic(fd, &magic)) if (drmGetMagic(fd, &magic))
goto free_connect; goto close_fd;
authenticate_cookie = xcb_dri2_authenticate_unchecked(scrn->conn, authenticate_cookie = xcb_dri2_authenticate_unchecked(scrn->conn,
get_xcb_screen(s, screen)->root, get_xcb_screen(s, screen)->root,
@@ -402,7 +402,7 @@ vl_dri2_screen_create(Display *display, int screen)
if (authenticate == NULL || !authenticate->authenticated) if (authenticate == NULL || !authenticate->authenticated)
goto free_authenticate; goto free_authenticate;
if (pipe_loader_drm_probe_fd(&scrn->base.dev, dup(fd))) if (pipe_loader_drm_probe_fd(&scrn->base.dev, fd))
scrn->base.pscreen = pipe_loader_create_screen(scrn->base.dev); scrn->base.pscreen = pipe_loader_create_screen(scrn->base.dev);
if (!scrn->base.pscreen) if (!scrn->base.pscreen)
@@ -428,8 +428,11 @@ vl_dri2_screen_create(Display *display, int screen)
release_pipe: release_pipe:
if (scrn->base.dev) if (scrn->base.dev)
pipe_loader_release(&scrn->base.dev, 1); pipe_loader_release(&scrn->base.dev, 1);
fd = -1;
free_authenticate: free_authenticate:
free(authenticate); free(authenticate);
close_fd:
close(fd);
free_connect: free_connect:
free(connect); free(connect);
free_query: free_query:

View File

@@ -41,12 +41,16 @@ struct vl_screen *
vl_drm_screen_create(int fd) vl_drm_screen_create(int fd)
{ {
struct vl_screen *vscreen; struct vl_screen *vscreen;
int new_fd = -1;
vscreen = CALLOC_STRUCT(vl_screen); vscreen = CALLOC_STRUCT(vl_screen);
if (!vscreen) if (!vscreen)
return NULL; return NULL;
if (pipe_loader_drm_probe_fd(&vscreen->dev, dup(fd))) if (fd < 0 || (new_fd = dup(fd)) < 0)
goto error;
if (pipe_loader_drm_probe_fd(&vscreen->dev, new_fd))
vscreen->pscreen = pipe_loader_create_screen(vscreen->dev); vscreen->pscreen = pipe_loader_create_screen(vscreen->dev);
if (!vscreen->pscreen) if (!vscreen->pscreen)
@@ -63,6 +67,8 @@ vl_drm_screen_create(int fd)
error: error:
if (vscreen->dev) if (vscreen->dev)
pipe_loader_release(&vscreen->dev, 1); pipe_loader_release(&vscreen->dev, 1);
else
close(new_fd);
FREE(vscreen); FREE(vscreen);
return NULL; return NULL;

View File

@@ -627,7 +627,7 @@ static inline uint32_t A4XX_RB_FS_OUTPUT_ENABLE_BLEND(uint32_t val)
{ {
return ((val) << A4XX_RB_FS_OUTPUT_ENABLE_BLEND__SHIFT) & A4XX_RB_FS_OUTPUT_ENABLE_BLEND__MASK; return ((val) << A4XX_RB_FS_OUTPUT_ENABLE_BLEND__SHIFT) & A4XX_RB_FS_OUTPUT_ENABLE_BLEND__MASK;
} }
#define A4XX_RB_FS_OUTPUT_FAST_CLEAR 0x00000100 #define A4XX_RB_FS_OUTPUT_INDEPENDENT_BLEND 0x00000100
#define A4XX_RB_FS_OUTPUT_SAMPLE_MASK__MASK 0xffff0000 #define A4XX_RB_FS_OUTPUT_SAMPLE_MASK__MASK 0xffff0000
#define A4XX_RB_FS_OUTPUT_SAMPLE_MASK__SHIFT 16 #define A4XX_RB_FS_OUTPUT_SAMPLE_MASK__SHIFT 16
static inline uint32_t A4XX_RB_FS_OUTPUT_SAMPLE_MASK(uint32_t val) static inline uint32_t A4XX_RB_FS_OUTPUT_SAMPLE_MASK(uint32_t val)

View File

@@ -137,7 +137,8 @@ fd4_blend_state_create(struct pipe_context *pctx,
so->rb_mrt[i].buf_info |= A4XX_RB_MRT_BUF_INFO_DITHER_MODE(DITHER_ALWAYS); so->rb_mrt[i].buf_info |= A4XX_RB_MRT_BUF_INFO_DITHER_MODE(DITHER_ALWAYS);
} }
so->rb_fs_output = A4XX_RB_FS_OUTPUT_ENABLE_BLEND(mrt_blend); so->rb_fs_output = A4XX_RB_FS_OUTPUT_ENABLE_BLEND(mrt_blend) |
COND(cso->independent_blend_enable, A4XX_RB_FS_OUTPUT_INDEPENDENT_BLEND);
return so; return so;
} }

View File

@@ -194,7 +194,7 @@ emit_textures(struct fd_context *ctx, struct fd_ringbuffer *ring,
if (view->base.texture) { if (view->base.texture) {
struct fd_resource *rsc = fd_resource(view->base.texture); struct fd_resource *rsc = fd_resource(view->base.texture);
uint32_t offset = fd_resource_offset(rsc, start, 0); uint32_t offset = fd_resource_offset(rsc, start, 0);
OUT_RELOC(ring, rsc->bo, offset, view->textconst4, 0); OUT_RELOC(ring, rsc->bo, offset, view->texconst4, 0);
} else { } else {
OUT_RING(ring, 0x00000000); OUT_RING(ring, 0x00000000);
} }
@@ -497,11 +497,16 @@ fd4_emit_state(struct fd_context *ctx, struct fd_ringbuffer *ring,
OUT_RINGP(ring, val, &fd4_context(ctx)->rbrc_patches); OUT_RINGP(ring, val, &fd4_context(ctx)->rbrc_patches);
} }
if (dirty & FD_DIRTY_ZSA) { if (dirty & (FD_DIRTY_ZSA | FD_DIRTY_FRAMEBUFFER)) {
struct fd4_zsa_stateobj *zsa = fd4_zsa_stateobj(ctx->zsa); struct fd4_zsa_stateobj *zsa = fd4_zsa_stateobj(ctx->zsa);
struct pipe_framebuffer_state *pfb = &ctx->framebuffer;
uint32_t rb_alpha_control = zsa->rb_alpha_control;
if (util_format_is_pure_integer(pipe_surface_format(pfb->cbufs[0])))
rb_alpha_control &= ~A4XX_RB_ALPHA_CONTROL_ALPHA_TEST;
OUT_PKT0(ring, REG_A4XX_RB_ALPHA_CONTROL, 1); OUT_PKT0(ring, REG_A4XX_RB_ALPHA_CONTROL, 1);
OUT_RING(ring, zsa->rb_alpha_control); OUT_RING(ring, rb_alpha_control);
OUT_PKT0(ring, REG_A4XX_RB_STENCIL_CONTROL, 2); OUT_PKT0(ring, REG_A4XX_RB_STENCIL_CONTROL, 2);
OUT_RING(ring, zsa->rb_stencil_control); OUT_RING(ring, zsa->rb_stencil_control);
@@ -628,10 +633,16 @@ fd4_emit_state(struct fd_context *ctx, struct fd_ringbuffer *ring,
for (i = 0; i < A4XX_MAX_RENDER_TARGETS; i++) { for (i = 0; i < A4XX_MAX_RENDER_TARGETS; i++) {
enum pipe_format format = pipe_surface_format( enum pipe_format format = pipe_surface_format(
ctx->framebuffer.cbufs[i]); ctx->framebuffer.cbufs[i]);
bool is_int = util_format_is_pure_integer(format);
bool has_alpha = util_format_has_alpha(format); bool has_alpha = util_format_has_alpha(format);
uint32_t control = blend->rb_mrt[i].control; uint32_t control = blend->rb_mrt[i].control;
uint32_t blend_control = blend->rb_mrt[i].blend_control_alpha; uint32_t blend_control = blend->rb_mrt[i].blend_control_alpha;
if (is_int) {
control &= A4XX_RB_MRT_CONTROL_COMPONENT_ENABLE__MASK;
control |= A4XX_RB_MRT_CONTROL_ROP_CODE(ROP_COPY);
}
if (has_alpha) { if (has_alpha) {
blend_control |= blend->rb_mrt[i].blend_control_rgb; blend_control |= blend->rb_mrt[i].blend_control_rgb;
} else { } else {
@@ -651,19 +662,48 @@ fd4_emit_state(struct fd_context *ctx, struct fd_ringbuffer *ring,
A4XX_RB_FS_OUTPUT_SAMPLE_MASK(0xffff)); A4XX_RB_FS_OUTPUT_SAMPLE_MASK(0xffff));
} }
if (dirty & FD_DIRTY_BLEND_COLOR) { if (dirty & (FD_DIRTY_BLEND_COLOR | FD_DIRTY_FRAMEBUFFER)) {
struct pipe_blend_color *bcolor = &ctx->blend_color; struct pipe_blend_color *bcolor = &ctx->blend_color;
struct pipe_framebuffer_state *pfb = &ctx->framebuffer;
float factor = 65535.0;
int i;
for (i = 0; i < pfb->nr_cbufs; i++) {
enum pipe_format format = pipe_surface_format(pfb->cbufs[i]);
const struct util_format_description *desc =
util_format_description(format);
int j;
if (desc->is_mixed)
continue;
j = util_format_get_first_non_void_channel(format);
if (j == -1)
continue;
if (desc->channel[j].size > 8 || !desc->channel[j].normalized ||
desc->channel[j].pure_integer)
continue;
/* Just use the first unorm8/snorm8 render buffer. Can't keep
* everyone happy.
*/
if (desc->channel[j].type == UTIL_FORMAT_TYPE_SIGNED)
factor = 32767.0;
break;
}
OUT_PKT0(ring, REG_A4XX_RB_BLEND_RED, 8); OUT_PKT0(ring, REG_A4XX_RB_BLEND_RED, 8);
OUT_RING(ring, A4XX_RB_BLEND_RED_UINT(bcolor->color[0] * 65535.0) | OUT_RING(ring, A4XX_RB_BLEND_RED_UINT(bcolor->color[0] * factor) |
A4XX_RB_BLEND_RED_FLOAT(bcolor->color[0])); A4XX_RB_BLEND_RED_FLOAT(bcolor->color[0]));
OUT_RING(ring, A4XX_RB_BLEND_RED_F32(bcolor->color[0])); OUT_RING(ring, A4XX_RB_BLEND_RED_F32(bcolor->color[0]));
OUT_RING(ring, A4XX_RB_BLEND_GREEN_UINT(bcolor->color[1] * 65535.0) | OUT_RING(ring, A4XX_RB_BLEND_GREEN_UINT(bcolor->color[1] * factor) |
A4XX_RB_BLEND_GREEN_FLOAT(bcolor->color[1])); A4XX_RB_BLEND_GREEN_FLOAT(bcolor->color[1]));
OUT_RING(ring, A4XX_RB_BLEND_GREEN_F32(bcolor->color[1])); OUT_RING(ring, A4XX_RB_BLEND_GREEN_F32(bcolor->color[1]));
OUT_RING(ring, A4XX_RB_BLEND_BLUE_UINT(bcolor->color[2] * 65535.0) | OUT_RING(ring, A4XX_RB_BLEND_BLUE_UINT(bcolor->color[2] * factor) |
A4XX_RB_BLEND_BLUE_FLOAT(bcolor->color[2])); A4XX_RB_BLEND_BLUE_FLOAT(bcolor->color[2]));
OUT_RING(ring, A4XX_RB_BLEND_BLUE_F32(bcolor->color[2])); OUT_RING(ring, A4XX_RB_BLEND_BLUE_F32(bcolor->color[2]));
OUT_RING(ring, A4XX_RB_BLEND_ALPHA_UINT(bcolor->color[3] * 65535.0) | OUT_RING(ring, A4XX_RB_BLEND_ALPHA_UINT(bcolor->color[3] * factor) |
A4XX_RB_BLEND_ALPHA_FLOAT(bcolor->color[3])); A4XX_RB_BLEND_ALPHA_FLOAT(bcolor->color[3]));
OUT_RING(ring, A4XX_RB_BLEND_ALPHA_F32(bcolor->color[3])); OUT_RING(ring, A4XX_RB_BLEND_ALPHA_F32(bcolor->color[3]));
} }

View File

@@ -214,6 +214,7 @@ fd4_sampler_view_create(struct pipe_context *pctx, struct pipe_resource *prsc,
struct fd_resource *rsc = fd_resource(prsc); struct fd_resource *rsc = fd_resource(prsc);
unsigned lvl = fd_sampler_first_level(cso); unsigned lvl = fd_sampler_first_level(cso);
unsigned miplevels = fd_sampler_last_level(cso) - lvl; unsigned miplevels = fd_sampler_last_level(cso) - lvl;
uint32_t sz2 = 0;
if (!so) if (!so)
return NULL; return NULL;
@@ -259,7 +260,10 @@ fd4_sampler_view_create(struct pipe_context *pctx, struct pipe_resource *prsc,
case PIPE_TEXTURE_3D: case PIPE_TEXTURE_3D:
so->texconst3 = so->texconst3 =
A4XX_TEX_CONST_3_DEPTH(u_minify(prsc->depth0, lvl)) | A4XX_TEX_CONST_3_DEPTH(u_minify(prsc->depth0, lvl)) |
A4XX_TEX_CONST_3_LAYERSZ(rsc->slices[0].size0); A4XX_TEX_CONST_3_LAYERSZ(rsc->slices[lvl].size0);
while (lvl < cso->u.tex.last_level && sz2 != rsc->slices[lvl+1].size0)
sz2 = rsc->slices[++lvl].size0;
so->texconst4 = A4XX_TEX_CONST_4_LAYERSZ(sz2);
break; break;
default: default:
so->texconst3 = 0x00000000; so->texconst3 = 0x00000000;

View File

@@ -51,7 +51,7 @@ fd4_sampler_stateobj(struct pipe_sampler_state *samp)
struct fd4_pipe_sampler_view { struct fd4_pipe_sampler_view {
struct pipe_sampler_view base; struct pipe_sampler_view base;
uint32_t texconst0, texconst1, texconst2, texconst3, textconst4; uint32_t texconst0, texconst1, texconst2, texconst3, texconst4;
}; };
static inline struct fd4_pipe_sampler_view * static inline struct fd4_pipe_sampler_view *

View File

@@ -551,7 +551,7 @@ fd_resource_create(struct pipe_screen *pscreen,
struct fd_resource *rsc = CALLOC_STRUCT(fd_resource); struct fd_resource *rsc = CALLOC_STRUCT(fd_resource);
struct pipe_resource *prsc = &rsc->base.b; struct pipe_resource *prsc = &rsc->base.b;
enum pipe_format format = tmpl->format; enum pipe_format format = tmpl->format;
uint32_t size; uint32_t size, alignment;
DBG("target=%d, format=%s, %ux%ux%u, array_size=%u, last_level=%u, " DBG("target=%d, format=%s, %ux%ux%u, array_size=%u, last_level=%u, "
"nr_samples=%u, usage=%u, bind=%x, flags=%x", "nr_samples=%u, usage=%u, bind=%x, flags=%x",
@@ -583,6 +583,7 @@ fd_resource_create(struct pipe_screen *pscreen,
assert(rsc->cpp); assert(rsc->cpp);
alignment = slice_alignment(pscreen, tmpl);
if (is_a4xx(fd_screen(pscreen))) { if (is_a4xx(fd_screen(pscreen))) {
switch (tmpl->target) { switch (tmpl->target) {
case PIPE_TEXTURE_3D: case PIPE_TEXTURE_3D:
@@ -590,11 +591,12 @@ fd_resource_create(struct pipe_screen *pscreen,
break; break;
default: default:
rsc->layer_first = true; rsc->layer_first = true;
alignment = 1;
break; break;
} }
} }
size = setup_slices(rsc, slice_alignment(pscreen, tmpl), format); size = setup_slices(rsc, alignment, format);
if (rsc->layer_first) { if (rsc->layer_first) {
rsc->layer_size = align(size, 4096); rsc->layer_size = align(size, 4096);

View File

@@ -291,7 +291,7 @@ void BasicBlock::permuteAdjacent(Instruction *a, Instruction *b)
if (b->prev) if (b->prev)
b->prev->next = b; b->prev->next = b;
if (a->prev) if (a->next)
a->next->prev = a; a->next->prev = a;
} }

View File

@@ -575,8 +575,8 @@ CodeEmitterGK110::emitIMUL(const Instruction *i)
if (isLIMM(i->src(1), TYPE_S32)) { if (isLIMM(i->src(1), TYPE_S32)) {
emitForm_L(i, 0x280, 2, Modifier(0)); emitForm_L(i, 0x280, 2, Modifier(0));
assert(i->subOp != NV50_IR_SUBOP_MUL_HIGH); if (i->subOp == NV50_IR_SUBOP_MUL_HIGH)
code[1] |= 1 << 24;
if (i->sType == TYPE_S32) if (i->sType == TYPE_S32)
code[1] |= 3 << 25; code[1] |= 3 << 25;
} else { } else {
@@ -695,14 +695,9 @@ CodeEmitterGK110::emitIMAD(const Instruction *i)
if (i->sType == TYPE_S32) if (i->sType == TYPE_S32)
code[1] |= (1 << 19) | (1 << 24); code[1] |= (1 << 19) | (1 << 24);
if (code[0] & 0x1) { if (i->subOp == NV50_IR_SUBOP_MUL_HIGH)
assert(!i->subOp); code[1] |= 1 << 25;
SAT_(39); SAT_(35);
} else {
if (i->subOp == NV50_IR_SUBOP_MUL_HIGH)
code[1] |= 1 << 25;
SAT_(35);
}
} }
void void

View File

@@ -2893,6 +2893,12 @@ Converter::handleInstruction(const struct tgsi_full_instruction *insn)
bb->cfg.attach(&loopBB->cfg, Graph::Edge::BACK); bb->cfg.attach(&loopBB->cfg, Graph::Edge::BACK);
} }
setPosition(reinterpret_cast<BasicBlock *>(breakBBs.pop().u.p), true); setPosition(reinterpret_cast<BasicBlock *>(breakBBs.pop().u.p), true);
// If the loop never breaks (e.g. only has RET's inside), then there
// will be no way to get to the break bb. However BGNLOOP will have
// already made a PREBREAK to it, so it must be in the CFG.
if (getBB()->cfg.incidentCount() == 0)
loopBB->cfg.attach(&getBB()->cfg, Graph::Edge::TREE);
} }
break; break;
case TGSI_OPCODE_BRK: case TGSI_OPCODE_BRK:

View File

@@ -202,7 +202,8 @@ NV50LegalizePostRA::visit(Function *fn)
Program *prog = fn->getProgram(); Program *prog = fn->getProgram();
r63 = new_LValue(fn, FILE_GPR); r63 = new_LValue(fn, FILE_GPR);
if (prog->maxGPR < 63) // GPR units on nv50 are in half-regs
if (prog->maxGPR < 126)
r63->reg.data.id = 63; r63->reg.data.id = 63;
else else
r63->reg.data.id = 127; r63->reg.data.id = 127;
@@ -832,7 +833,7 @@ NV50LoweringPreSSA::handleTXB(TexInstruction *i)
} }
Value *flags = bld.getScratch(1, FILE_FLAGS); Value *flags = bld.getScratch(1, FILE_FLAGS);
bld.setPosition(cond, true); bld.setPosition(cond, true);
bld.mkCvt(OP_CVT, TYPE_U8, flags, TYPE_U32, cond->getDef(0)); bld.mkCvt(OP_CVT, TYPE_U8, flags, TYPE_U32, cond->getDef(0))->flagsDef = 0;
Instruction *tex[4]; Instruction *tex[4];
for (l = 0; l < 4; ++l) { for (l = 0; l < 4; ++l) {

View File

@@ -686,7 +686,7 @@ NVC0LoweringPass::handleTEX(TexInstruction *i)
i->tex.s = 0x1f; i->tex.s = 0x1f;
i->setIndirectR(hnd); i->setIndirectR(hnd);
i->setIndirectS(NULL); i->setIndirectS(NULL);
} else if (i->tex.r == i->tex.s) { } else if (i->tex.r == i->tex.s || i->op == OP_TXF) {
i->tex.r += prog->driver->io.texBindBase / 4; i->tex.r += prog->driver->io.texBindBase / 4;
i->tex.s = 0; // only a single cX[] value possible here i->tex.s = 0; // only a single cX[] value possible here
} else { } else {

View File

@@ -858,6 +858,12 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue &imm0, int s)
i->src(0).mod = i->src(t).mod; i->src(0).mod = i->src(t).mod;
i->setSrc(1, new_ImmediateValue(prog, imm0.reg.data.u32)); i->setSrc(1, new_ImmediateValue(prog, imm0.reg.data.u32));
i->src(1).mod = 0; i->src(1).mod = 0;
} else
if (i->postFactor && i->sType == TYPE_F32) {
/* Can't emit a postfactor with an immediate, have to fold it in */
i->setSrc(s, new_ImmediateValue(
prog, imm0.reg.data.f32 * exp2f(i->postFactor)));
i->postFactor = 0;
} }
break; break;
case OP_MAD: case OP_MAD:
@@ -2654,7 +2660,7 @@ NV50PostRaConstantFolding::visit(BasicBlock *bb)
break; break;
def = i->getSrc(1)->getInsn(); def = i->getSrc(1)->getInsn();
if (def->op == OP_MOV && def->src(0).getFile() == FILE_IMMEDIATE) { if (def && def->op == OP_MOV && def->src(0).getFile() == FILE_IMMEDIATE) {
vtmp = i->getSrc(1); vtmp = i->getSrc(1);
i->setSrc(1, def->getSrc(0)); i->setSrc(1, def->getSrc(0));
@@ -2956,6 +2962,16 @@ DeadCodeElim::visit(BasicBlock *bb)
return true; return true;
} }
// Each load can go into up to 4 destinations, any of which might potentially
// be dead (i.e. a hole). These can always be split into 2 loads, independent
// of where the holes are. We find the first contiguous region, put it into
// the first load, and then put the second contiguous region into the second
// load. There can be at most 2 contiguous regions.
//
// Note that there are some restrictions, for example it's not possible to do
// a 64-bit load that's not 64-bit aligned, so such a load has to be split
// up. Also hardware doesn't support 96-bit loads, so those also have to be
// split into a 64-bit and 32-bit load.
void void
DeadCodeElim::checkSplitLoad(Instruction *ld1) DeadCodeElim::checkSplitLoad(Instruction *ld1)
{ {
@@ -2976,6 +2992,8 @@ DeadCodeElim::checkSplitLoad(Instruction *ld1)
addr1 = ld1->getSrc(0)->reg.data.offset; addr1 = ld1->getSrc(0)->reg.data.offset;
n1 = n2 = 0; n1 = n2 = 0;
size1 = size2 = 0; size1 = size2 = 0;
// Compute address/width for first load
for (d = 0; ld1->defExists(d); ++d) { for (d = 0; ld1->defExists(d); ++d) {
if (mask & (1 << d)) { if (mask & (1 << d)) {
if (size1 && (addr1 & 0x7)) if (size1 && (addr1 & 0x7))
@@ -2989,16 +3007,34 @@ DeadCodeElim::checkSplitLoad(Instruction *ld1)
break; break;
} }
} }
// Scale back the size of the first load until it can be loaded. This
// typically happens for TYPE_B96 loads.
while (n1 &&
!prog->getTarget()->isAccessSupported(ld1->getSrc(0)->reg.file,
typeOfSize(size1))) {
size1 -= def1[--n1]->reg.size;
d--;
}
// Compute address/width for second load
for (addr2 = addr1 + size1; ld1->defExists(d); ++d) { for (addr2 = addr1 + size1; ld1->defExists(d); ++d) {
if (mask & (1 << d)) { if (mask & (1 << d)) {
assert(!size2 || !(addr2 & 0x7));
def2[n2] = ld1->getDef(d); def2[n2] = ld1->getDef(d);
size2 += def2[n2++]->reg.size; size2 += def2[n2++]->reg.size;
} else { } else if (!n2) {
assert(!n2); assert(!n2);
addr2 += ld1->getDef(d)->reg.size; addr2 += ld1->getDef(d)->reg.size;
} else {
break;
} }
} }
// Make sure that we've processed all the values
for (; ld1->defExists(d); ++d)
assert(!(mask & (1 << d)));
updateLdStOffset(ld1, addr1, func); updateLdStOffset(ld1, addr1, func);
ld1->setType(typeOfSize(size1)); ld1->setType(typeOfSize(size1));
for (d = 0; d < 4; ++d) for (d = 0; d < 4; ++d)

View File

@@ -1573,10 +1573,28 @@ SpillCodeInserter::spill(Instruction *defi, Value *slot, LValue *lval)
Instruction *st; Instruction *st;
if (slot->reg.file == FILE_MEMORY_LOCAL) { if (slot->reg.file == FILE_MEMORY_LOCAL) {
st = new_Instruction(func, OP_STORE, ty);
st->setSrc(0, slot);
st->setSrc(1, lval);
lval->noSpill = 1; lval->noSpill = 1;
if (ty != TYPE_B96) {
st = new_Instruction(func, OP_STORE, ty);
st->setSrc(0, slot);
st->setSrc(1, lval);
} else {
st = new_Instruction(func, OP_SPLIT, ty);
st->setSrc(0, lval);
for (int d = 0; d < lval->reg.size / 4; ++d)
st->setDef(d, new_LValue(func, FILE_GPR));
for (int d = lval->reg.size / 4 - 1; d >= 0; --d) {
Value *tmp = cloneShallow(func, slot);
tmp->reg.size = 4;
tmp->reg.data.offset += 4 * d;
Instruction *s = new_Instruction(func, OP_STORE, TYPE_U32);
s->setSrc(0, tmp);
s->setSrc(1, st->getDef(d));
defi->bb->insertAfter(defi, s);
}
}
} else { } else {
st = new_Instruction(func, OP_CVT, ty); st = new_Instruction(func, OP_CVT, ty);
st->setDef(0, slot); st->setDef(0, slot);
@@ -1596,7 +1614,27 @@ SpillCodeInserter::unspill(Instruction *usei, LValue *lval, Value *slot)
Instruction *ld; Instruction *ld;
if (slot->reg.file == FILE_MEMORY_LOCAL) { if (slot->reg.file == FILE_MEMORY_LOCAL) {
lval->noSpill = 1; lval->noSpill = 1;
ld = new_Instruction(func, OP_LOAD, ty); if (ty != TYPE_B96) {
ld = new_Instruction(func, OP_LOAD, ty);
} else {
ld = new_Instruction(func, OP_MERGE, ty);
for (int d = 0; d < lval->reg.size / 4; ++d) {
Value *tmp = cloneShallow(func, slot);
LValue *val;
tmp->reg.size = 4;
tmp->reg.data.offset += 4 * d;
Instruction *l = new_Instruction(func, OP_LOAD, TYPE_U32);
l->setDef(0, (val = new_LValue(func, FILE_GPR)));
l->setSrc(0, tmp);
usei->bb->insertBefore(usei, l);
ld->setSrc(d, val);
val->noSpill = 1;
}
ld->setDef(0, lval);
usei->bb->insertBefore(usei, ld);
return lval;
}
} else { } else {
ld = new_Instruction(func, OP_CVT, ty); ld = new_Instruction(func, OP_CVT, ty);
} }

View File

@@ -454,7 +454,7 @@ TargetNV50::isModSupported(const Instruction *insn, int s, Modifier mod) const
return false; return false;
} }
} }
if (s >= 3) if (s >= opInfo[insn->op].srcNr || s >= 3)
return false; return false;
return (mod & Modifier(opInfo[insn->op].srcMods[s])) == mod; return (mod & Modifier(opInfo[insn->op].srcMods[s])) == mod;
} }

View File

@@ -439,7 +439,7 @@ TargetNVC0::isModSupported(const Instruction *insn, int s, Modifier mod) const
return false; return false;
} }
} }
if (s >= 3) if (s >= opInfo[insn->op].srcNr || s >= 3)
return false; return false;
return (mod & Modifier(opInfo[insn->op].srcMods[s])) == mod; return (mod & Modifier(opInfo[insn->op].srcMods[s])) == mod;
} }

View File

@@ -657,8 +657,8 @@ nouveau_buffer_create(struct pipe_screen *pscreen,
if (buffer->base.flags & (PIPE_RESOURCE_FLAG_MAP_PERSISTENT | if (buffer->base.flags & (PIPE_RESOURCE_FLAG_MAP_PERSISTENT |
PIPE_RESOURCE_FLAG_MAP_COHERENT)) { PIPE_RESOURCE_FLAG_MAP_COHERENT)) {
buffer->domain = NOUVEAU_BO_GART; buffer->domain = NOUVEAU_BO_GART;
} else if (buffer->base.bind & } else if (buffer->base.bind == 0 || (buffer->base.bind &
(screen->vidmem_bindings & screen->sysmem_bindings)) { (screen->vidmem_bindings & screen->sysmem_bindings))) {
switch (buffer->base.usage) { switch (buffer->base.usage) {
case PIPE_USAGE_DEFAULT: case PIPE_USAGE_DEFAULT:
case PIPE_USAGE_IMMUTABLE: case PIPE_USAGE_IMMUTABLE:
@@ -685,6 +685,10 @@ nouveau_buffer_create(struct pipe_screen *pscreen,
if (buffer->base.bind & screen->sysmem_bindings) if (buffer->base.bind & screen->sysmem_bindings)
buffer->domain = NOUVEAU_BO_GART; buffer->domain = NOUVEAU_BO_GART;
} }
/* There can be very special situations where we want non-gpu-mapped
* buffers, but never through this interface.
*/
assert(buffer->domain);
ret = nouveau_buffer_allocate(screen, buffer, buffer->domain); ret = nouveau_buffer_allocate(screen, buffer, buffer->domain);
if (ret == false) if (ret == false)

View File

@@ -168,9 +168,10 @@ nv50_invalidate_resource_storage(struct nouveau_context *ctx,
int ref) int ref)
{ {
struct nv50_context *nv50 = nv50_context(&ctx->pipe); struct nv50_context *nv50 = nv50_context(&ctx->pipe);
unsigned bind = res->bind ? res->bind : PIPE_BIND_VERTEX_BUFFER;
unsigned s, i; unsigned s, i;
if (res->bind & PIPE_BIND_RENDER_TARGET) { if (bind & PIPE_BIND_RENDER_TARGET) {
assert(nv50->framebuffer.nr_cbufs <= PIPE_MAX_COLOR_BUFS); assert(nv50->framebuffer.nr_cbufs <= PIPE_MAX_COLOR_BUFS);
for (i = 0; i < nv50->framebuffer.nr_cbufs; ++i) { for (i = 0; i < nv50->framebuffer.nr_cbufs; ++i) {
if (nv50->framebuffer.cbufs[i] && if (nv50->framebuffer.cbufs[i] &&
@@ -182,7 +183,7 @@ nv50_invalidate_resource_storage(struct nouveau_context *ctx,
} }
} }
} }
if (res->bind & PIPE_BIND_DEPTH_STENCIL) { if (bind & PIPE_BIND_DEPTH_STENCIL) {
if (nv50->framebuffer.zsbuf && if (nv50->framebuffer.zsbuf &&
nv50->framebuffer.zsbuf->texture == res) { nv50->framebuffer.zsbuf->texture == res) {
nv50->dirty |= NV50_NEW_FRAMEBUFFER; nv50->dirty |= NV50_NEW_FRAMEBUFFER;
@@ -192,11 +193,11 @@ nv50_invalidate_resource_storage(struct nouveau_context *ctx,
} }
} }
if (res->bind & (PIPE_BIND_VERTEX_BUFFER | if (bind & (PIPE_BIND_VERTEX_BUFFER |
PIPE_BIND_INDEX_BUFFER | PIPE_BIND_INDEX_BUFFER |
PIPE_BIND_CONSTANT_BUFFER | PIPE_BIND_CONSTANT_BUFFER |
PIPE_BIND_STREAM_OUTPUT | PIPE_BIND_STREAM_OUTPUT |
PIPE_BIND_SAMPLER_VIEW)) { PIPE_BIND_SAMPLER_VIEW)) {
assert(nv50->num_vtxbufs <= PIPE_MAX_ATTRIBS); assert(nv50->num_vtxbufs <= PIPE_MAX_ATTRIBS);
for (i = 0; i < nv50->num_vtxbufs; ++i) { for (i = 0; i < nv50->num_vtxbufs; ++i) {

View File

@@ -180,9 +180,10 @@ nvc0_invalidate_resource_storage(struct nouveau_context *ctx,
int ref) int ref)
{ {
struct nvc0_context *nvc0 = nvc0_context(&ctx->pipe); struct nvc0_context *nvc0 = nvc0_context(&ctx->pipe);
unsigned bind = res->bind ? res->bind : PIPE_BIND_VERTEX_BUFFER;
unsigned s, i; unsigned s, i;
if (res->bind & PIPE_BIND_RENDER_TARGET) { if (bind & PIPE_BIND_RENDER_TARGET) {
for (i = 0; i < nvc0->framebuffer.nr_cbufs; ++i) { for (i = 0; i < nvc0->framebuffer.nr_cbufs; ++i) {
if (nvc0->framebuffer.cbufs[i] && if (nvc0->framebuffer.cbufs[i] &&
nvc0->framebuffer.cbufs[i]->texture == res) { nvc0->framebuffer.cbufs[i]->texture == res) {
@@ -193,7 +194,7 @@ nvc0_invalidate_resource_storage(struct nouveau_context *ctx,
} }
} }
} }
if (res->bind & PIPE_BIND_DEPTH_STENCIL) { if (bind & PIPE_BIND_DEPTH_STENCIL) {
if (nvc0->framebuffer.zsbuf && if (nvc0->framebuffer.zsbuf &&
nvc0->framebuffer.zsbuf->texture == res) { nvc0->framebuffer.zsbuf->texture == res) {
nvc0->dirty |= NVC0_NEW_FRAMEBUFFER; nvc0->dirty |= NVC0_NEW_FRAMEBUFFER;
@@ -203,12 +204,12 @@ nvc0_invalidate_resource_storage(struct nouveau_context *ctx,
} }
} }
if (res->bind & (PIPE_BIND_VERTEX_BUFFER | if (bind & (PIPE_BIND_VERTEX_BUFFER |
PIPE_BIND_INDEX_BUFFER | PIPE_BIND_INDEX_BUFFER |
PIPE_BIND_CONSTANT_BUFFER | PIPE_BIND_CONSTANT_BUFFER |
PIPE_BIND_STREAM_OUTPUT | PIPE_BIND_STREAM_OUTPUT |
PIPE_BIND_COMMAND_ARGS_BUFFER | PIPE_BIND_COMMAND_ARGS_BUFFER |
PIPE_BIND_SAMPLER_VIEW)) { PIPE_BIND_SAMPLER_VIEW)) {
for (i = 0; i < nvc0->num_vtxbufs; ++i) { for (i = 0; i < nvc0->num_vtxbufs; ++i) {
if (nvc0->vtxbuf[i].buffer == res) { if (nvc0->vtxbuf[i].buffer == res) {
nvc0->dirty |= NVC0_NEW_ARRAYS; nvc0->dirty |= NVC0_NEW_ARRAYS;

View File

@@ -59,7 +59,7 @@
/* the number of CS dwords for flushing and drawing */ /* the number of CS dwords for flushing and drawing */
#define R600_MAX_FLUSH_CS_DWORDS 16 #define R600_MAX_FLUSH_CS_DWORDS 16
#define R600_MAX_DRAW_CS_DWORDS 47 #define R600_MAX_DRAW_CS_DWORDS 52
#define R600_TRACE_CS_DWORDS 7 #define R600_TRACE_CS_DWORDS 7
#define R600_MAX_USER_CONST_BUFFERS 13 #define R600_MAX_USER_CONST_BUFFERS 13

View File

@@ -598,6 +598,106 @@ static int select_twoside_color(struct r600_shader_ctx *ctx, int front, int back
return 0; return 0;
} }
/* execute a single slot ALU calculation */
static int single_alu_op2(struct r600_shader_ctx *ctx, int op,
int dst_sel, int dst_chan,
int src0_sel, unsigned src0_chan_val,
int src1_sel, unsigned src1_chan_val)
{
struct r600_bytecode_alu alu;
int r, i;
if (ctx->bc->chip_class == CAYMAN && op == ALU_OP2_MULLO_INT) {
for (i = 0; i < 4; i++) {
memset(&alu, 0, sizeof(struct r600_bytecode_alu));
alu.op = op;
alu.src[0].sel = src0_sel;
if (src0_sel == V_SQ_ALU_SRC_LITERAL)
alu.src[0].value = src0_chan_val;
else
alu.src[0].chan = src0_chan_val;
alu.src[1].sel = src1_sel;
if (src1_sel == V_SQ_ALU_SRC_LITERAL)
alu.src[1].value = src1_chan_val;
else
alu.src[1].chan = src1_chan_val;
alu.dst.sel = dst_sel;
alu.dst.chan = i;
alu.dst.write = i == dst_chan;
alu.last = (i == 3);
r = r600_bytecode_add_alu(ctx->bc, &alu);
if (r)
return r;
}
return 0;
}
memset(&alu, 0, sizeof(struct r600_bytecode_alu));
alu.op = op;
alu.src[0].sel = src0_sel;
if (src0_sel == V_SQ_ALU_SRC_LITERAL)
alu.src[0].value = src0_chan_val;
else
alu.src[0].chan = src0_chan_val;
alu.src[1].sel = src1_sel;
if (src1_sel == V_SQ_ALU_SRC_LITERAL)
alu.src[1].value = src1_chan_val;
else
alu.src[1].chan = src1_chan_val;
alu.dst.sel = dst_sel;
alu.dst.chan = dst_chan;
alu.dst.write = 1;
alu.last = 1;
r = r600_bytecode_add_alu(ctx->bc, &alu);
if (r)
return r;
return 0;
}
/* execute a single slot ALU calculation */
static int single_alu_op3(struct r600_shader_ctx *ctx, int op,
int dst_sel, int dst_chan,
int src0_sel, unsigned src0_chan_val,
int src1_sel, unsigned src1_chan_val,
int src2_sel, unsigned src2_chan_val)
{
struct r600_bytecode_alu alu;
int r;
/* validate this for other ops */
assert(op == ALU_OP3_MULADD_UINT24);
memset(&alu, 0, sizeof(struct r600_bytecode_alu));
alu.op = op;
alu.src[0].sel = src0_sel;
if (src0_sel == V_SQ_ALU_SRC_LITERAL)
alu.src[0].value = src0_chan_val;
else
alu.src[0].chan = src0_chan_val;
alu.src[1].sel = src1_sel;
if (src1_sel == V_SQ_ALU_SRC_LITERAL)
alu.src[1].value = src1_chan_val;
else
alu.src[1].chan = src1_chan_val;
alu.src[2].sel = src2_sel;
if (src2_sel == V_SQ_ALU_SRC_LITERAL)
alu.src[2].value = src2_chan_val;
else
alu.src[2].chan = src2_chan_val;
alu.dst.sel = dst_sel;
alu.dst.chan = dst_chan;
alu.is_op3 = 1;
alu.last = 1;
r = r600_bytecode_add_alu(ctx->bc, &alu);
if (r)
return r;
return 0;
}
static inline int get_address_file_reg(struct r600_shader_ctx *ctx, int index)
{
return index > 0 ? ctx->bc->index_reg[index - 1] : ctx->bc->ar_reg;
}
static int vs_add_primid_output(struct r600_shader_ctx *ctx, int prim_id_sid) static int vs_add_primid_output(struct r600_shader_ctx *ctx, int prim_id_sid)
{ {
int i; int i;
@@ -1129,6 +1229,7 @@ static int fetch_gs_input(struct r600_shader_ctx *ctx, struct tgsi_full_src_regi
unsigned vtx_id = src->Dimension.Index; unsigned vtx_id = src->Dimension.Index;
int offset_reg = vtx_id / 3; int offset_reg = vtx_id / 3;
int offset_chan = vtx_id % 3; int offset_chan = vtx_id % 3;
int t2 = 0;
/* offsets of per-vertex data in ESGS ring are passed to GS in R0.x, R0.y, /* offsets of per-vertex data in ESGS ring are passed to GS in R0.x, R0.y,
* R0.w, R1.x, R1.y, R1.z (it seems R0.z is used for PrimitiveID) */ * R0.w, R1.x, R1.y, R1.z (it seems R0.z is used for PrimitiveID) */
@@ -1136,13 +1237,24 @@ static int fetch_gs_input(struct r600_shader_ctx *ctx, struct tgsi_full_src_regi
if (offset_reg == 0 && offset_chan == 2) if (offset_reg == 0 && offset_chan == 2)
offset_chan = 3; offset_chan = 3;
if (src->Dimension.Indirect || src->Register.Indirect)
t2 = r600_get_temp(ctx);
if (src->Dimension.Indirect) { if (src->Dimension.Indirect) {
int treg[3]; int treg[3];
int t2;
struct r600_bytecode_alu alu; struct r600_bytecode_alu alu;
int r, i; int r, i;
unsigned addr_reg;
/* you have got to be shitting me - addr_reg = get_address_file_reg(ctx, src->DimIndirect.Index);
if (src->DimIndirect.Index > 0) {
r = single_alu_op2(ctx, ALU_OP1_MOV,
ctx->bc->ar_reg, 0,
addr_reg, 0,
0, 0);
if (r)
return r;
}
/*
we have to put the R0.x/y/w into Rt.x Rt+1.x Rt+2.x then index reg from Rt. we have to put the R0.x/y/w into Rt.x Rt+1.x Rt+2.x then index reg from Rt.
at least this is what fglrx seems to do. */ at least this is what fglrx seems to do. */
for (i = 0; i < 3; i++) { for (i = 0; i < 3; i++) {
@@ -1150,7 +1262,6 @@ static int fetch_gs_input(struct r600_shader_ctx *ctx, struct tgsi_full_src_regi
} }
r600_add_gpr_array(ctx->shader, treg[0], 3, 0x0F); r600_add_gpr_array(ctx->shader, treg[0], 3, 0x0F);
t2 = r600_get_temp(ctx);
for (i = 0; i < 3; i++) { for (i = 0; i < 3; i++) {
memset(&alu, 0, sizeof(struct r600_bytecode_alu)); memset(&alu, 0, sizeof(struct r600_bytecode_alu));
alu.op = ALU_OP1_MOV; alu.op = ALU_OP1_MOV;
@@ -1175,8 +1286,33 @@ static int fetch_gs_input(struct r600_shader_ctx *ctx, struct tgsi_full_src_regi
if (r) if (r)
return r; return r;
offset_reg = t2; offset_reg = t2;
offset_chan = 0;
} }
if (src->Register.Indirect) {
int addr_reg;
unsigned first = ctx->info.input_array_first[src->Indirect.ArrayID];
addr_reg = get_address_file_reg(ctx, src->Indirect.Index);
/* pull the value from index_reg */
r = single_alu_op2(ctx, ALU_OP2_ADD_INT,
t2, 1,
addr_reg, 0,
V_SQ_ALU_SRC_LITERAL, first);
if (r)
return r;
r = single_alu_op3(ctx, ALU_OP3_MULADD_UINT24,
t2, 0,
t2, 1,
V_SQ_ALU_SRC_LITERAL, 4,
offset_reg, offset_chan);
if (r)
return r;
offset_reg = t2;
offset_chan = 0;
index = src->Register.Index - first;
}
memset(&vtx, 0, sizeof(vtx)); memset(&vtx, 0, sizeof(vtx));
vtx.buffer_id = R600_GS_RING_CONST_BUFFER; vtx.buffer_id = R600_GS_RING_CONST_BUFFER;
@@ -1222,6 +1358,7 @@ static int tgsi_split_gs_inputs(struct r600_shader_ctx *ctx)
fetch_gs_input(ctx, src, treg); fetch_gs_input(ctx, src, treg);
ctx->src[i].sel = treg; ctx->src[i].sel = treg;
ctx->src[i].rel = 0;
} }
} }
return 0; return 0;
@@ -1498,7 +1635,7 @@ static int generate_gs_copy_shader(struct r600_context *rctx,
*last_exp_pos = NULL, *last_exp_param = NULL; *last_exp_pos = NULL, *last_exp_param = NULL;
int i, j, next_clip_pos = 61, next_param = 0; int i, j, next_clip_pos = 61, next_param = 0;
int ring; int ring;
bool only_ring_0 = true;
cshader = calloc(1, sizeof(struct r600_pipe_shader)); cshader = calloc(1, sizeof(struct r600_pipe_shader));
if (!cshader) if (!cshader)
return 0; return 0;
@@ -1570,6 +1707,8 @@ static int generate_gs_copy_shader(struct r600_context *rctx,
for (i = 0; i < so->num_outputs; i++) { for (i = 0; i < so->num_outputs; i++) {
if (so->output[i].stream == ring) { if (so->output[i].stream == ring) {
enabled = true; enabled = true;
if (ring > 0)
only_ring_0 = false;
break; break;
} }
} }
@@ -1604,7 +1743,7 @@ static int generate_gs_copy_shader(struct r600_context *rctx,
cf_jump = ctx.bc->cf_last; cf_jump = ctx.bc->cf_last;
if (enabled) if (enabled)
emit_streamout(&ctx, so, ring, &cshader->shader.ring_item_sizes[ring]); emit_streamout(&ctx, so, only_ring_0 ? -1 : ring, &cshader->shader.ring_item_sizes[ring]);
cshader->shader.ring_item_sizes[ring] = ocnt * 16; cshader->shader.ring_item_sizes[ring] = ocnt * 16;
} }
@@ -2206,6 +2345,11 @@ static int r600_shader_from_tgsi(struct r600_context *rctx,
if (ctx.type == TGSI_PROCESSOR_GEOMETRY) { if (ctx.type == TGSI_PROCESSOR_GEOMETRY) {
struct r600_bytecode_alu alu; struct r600_bytecode_alu alu;
int r; int r;
/* GS thread with no output workaround - emit a cut at start of GS */
if (ctx.bc->chip_class == R600)
r600_bytecode_add_cfinst(ctx.bc, CF_OP_CUT_VERTEX);
for (j = 0; j < 4; j++) { for (j = 0; j < 4; j++) {
memset(&alu, 0, sizeof(struct r600_bytecode_alu)); memset(&alu, 0, sizeof(struct r600_bytecode_alu));
alu.op = ALU_OP1_MOV; alu.op = ALU_OP1_MOV;
@@ -7180,7 +7324,7 @@ static int tgsi_eg_arl(struct r600_shader_ctx *ctx)
struct r600_bytecode_alu alu; struct r600_bytecode_alu alu;
int r; int r;
int i, lasti = tgsi_last_instruction(inst->Dst[0].Register.WriteMask); int i, lasti = tgsi_last_instruction(inst->Dst[0].Register.WriteMask);
unsigned reg = inst->Dst[0].Register.Index > 0 ? ctx->bc->index_reg[inst->Dst[0].Register.Index - 1] : ctx->bc->ar_reg; unsigned reg = get_address_file_reg(ctx, inst->Dst[0].Register.Index);
assert(inst->Dst[0].Register.Index < 3); assert(inst->Dst[0].Register.Index < 3);
memset(&alu, 0, sizeof(struct r600_bytecode_alu)); memset(&alu, 0, sizeof(struct r600_bytecode_alu));

View File

@@ -2213,10 +2213,11 @@ void r600_init_atom_start_cs(struct r600_context *rctx)
num_temp_gprs = 4; num_temp_gprs = 4;
num_gs_gprs = 0; num_gs_gprs = 0;
num_es_gprs = 0; num_es_gprs = 0;
num_ps_threads = 136; /* use limits 40 VS and at least 16 ES/GS */
num_vs_threads = 48; num_ps_threads = 120;
num_gs_threads = 4; num_vs_threads = 40;
num_es_threads = 4; num_gs_threads = 16;
num_es_threads = 16;
num_ps_stack_entries = 40; num_ps_stack_entries = 40;
num_vs_stack_entries = 40; num_vs_stack_entries = 40;
num_gs_stack_entries = 32; num_gs_stack_entries = 32;
@@ -2675,6 +2676,9 @@ void r600_update_vs_state(struct pipe_context *ctx, struct r600_pipe_shader *sha
S_02881C_USE_VTX_VIEWPORT_INDX(rshader->vs_out_viewport); S_02881C_USE_VTX_VIEWPORT_INDX(rshader->vs_out_viewport);
} }
#define RV610_GSVS_ALIGN 32
#define R600_GSVS_ALIGN 16
void r600_update_gs_state(struct pipe_context *ctx, struct r600_pipe_shader *shader) void r600_update_gs_state(struct pipe_context *ctx, struct r600_pipe_shader *shader)
{ {
struct r600_context *rctx = (struct r600_context *)ctx; struct r600_context *rctx = (struct r600_context *)ctx;
@@ -2684,6 +2688,23 @@ void r600_update_gs_state(struct pipe_context *ctx, struct r600_pipe_shader *sha
unsigned gsvs_itemsize = unsigned gsvs_itemsize =
(cp_shader->ring_item_sizes[0] * shader->selector->gs_max_out_vertices) >> 2; (cp_shader->ring_item_sizes[0] * shader->selector->gs_max_out_vertices) >> 2;
/* some r600s needs gsvs itemsize aligned to cacheline size
this was fixed in rs780 and above. */
switch (rctx->b.family) {
case CHIP_RV610:
gsvs_itemsize = align(gsvs_itemsize, RV610_GSVS_ALIGN);
break;
case CHIP_R600:
case CHIP_RV630:
case CHIP_RV670:
case CHIP_RV620:
case CHIP_RV635:
gsvs_itemsize = align(gsvs_itemsize, R600_GSVS_ALIGN);
break;
default:
break;
}
r600_init_command_buffer(cb, 64); r600_init_command_buffer(cb, 64);
/* VGT_GS_MODE is written by r600_emit_shader_stages */ /* VGT_GS_MODE is written by r600_emit_shader_stages */

View File

@@ -1770,6 +1770,24 @@ static void r600_draw_vbo(struct pipe_context *ctx, const struct pipe_draw_info
(info.count_from_stream_output ? S_0287F0_USE_OPAQUE(1) : 0); (info.count_from_stream_output ? S_0287F0_USE_OPAQUE(1) : 0);
} }
/* SMX returns CONTEXT_DONE too early workaround */
if (rctx->b.family == CHIP_R600 ||
rctx->b.family == CHIP_RV610 ||
rctx->b.family == CHIP_RV630 ||
rctx->b.family == CHIP_RV635) {
/* if we have gs shader or streamout
we need to do a wait idle after every draw */
if (rctx->gs_shader || rctx->b.streamout.streamout_enabled) {
radeon_set_config_reg(cs, R_008040_WAIT_UNTIL, S_008040_WAIT_3D_IDLE(1));
}
}
/* ES ring rolling over at EOP - workaround */
if (rctx->b.chip_class == R600) {
cs->buf[cs->cdw++] = PKT3(PKT3_EVENT_WRITE, 0, 0);
cs->buf[cs->cdw++] = EVENT_TYPE(EVENT_TYPE_SQ_NON_EVENT);
}
if (rctx->screen->b.trace_bo) { if (rctx->screen->b.trace_bo) {
r600_trace_emit(rctx); r600_trace_emit(rctx);
} }

View File

@@ -130,6 +130,7 @@
#define EVENT_TYPE_SAMPLE_STREAMOUTSTATS 0x20 #define EVENT_TYPE_SAMPLE_STREAMOUTSTATS 0x20
#define EVENT_TYPE_FLUSH_AND_INV_DB_META 0x2c /* supported on r700+ */ #define EVENT_TYPE_FLUSH_AND_INV_DB_META 0x2c /* supported on r700+ */
#define EVENT_TYPE_VGT_FLUSH 0x24 #define EVENT_TYPE_VGT_FLUSH 0x24
#define EVENT_TYPE_SQ_NON_EVENT 0x26
#define EVENT_TYPE_FLUSH_AND_INV_CB_META 46 /* supported on r700+ */ #define EVENT_TYPE_FLUSH_AND_INV_CB_META 46 /* supported on r700+ */
#define EVENT_TYPE(x) ((x) << 0) #define EVENT_TYPE(x) ((x) << 0)
#define EVENT_INDEX(x) ((x) << 8) #define EVENT_INDEX(x) ((x) << 8)

View File

@@ -136,8 +136,12 @@ static void r600_memory_barrier(struct pipe_context *ctx, unsigned flags)
void r600_preflush_suspend_features(struct r600_common_context *ctx) void r600_preflush_suspend_features(struct r600_common_context *ctx)
{ {
/* suspend queries */ /* suspend queries */
if (!LIST_IS_EMPTY(&ctx->active_nontimer_queries)) if (ctx->num_cs_dw_nontimer_queries_suspend) {
/* Since non-timer queries are suspended during blits,
* we have to guard against double-suspends. */
r600_suspend_nontimer_queries(ctx); r600_suspend_nontimer_queries(ctx);
ctx->nontimer_queries_suspended_by_flush = true;
}
if (!LIST_IS_EMPTY(&ctx->active_timer_queries)) if (!LIST_IS_EMPTY(&ctx->active_timer_queries))
r600_suspend_timer_queries(ctx); r600_suspend_timer_queries(ctx);
@@ -158,8 +162,10 @@ void r600_postflush_resume_features(struct r600_common_context *ctx)
/* resume queries */ /* resume queries */
if (!LIST_IS_EMPTY(&ctx->active_timer_queries)) if (!LIST_IS_EMPTY(&ctx->active_timer_queries))
r600_resume_timer_queries(ctx); r600_resume_timer_queries(ctx);
if (!LIST_IS_EMPTY(&ctx->active_nontimer_queries)) if (ctx->nontimer_queries_suspended_by_flush) {
ctx->nontimer_queries_suspended_by_flush = false;
r600_resume_nontimer_queries(ctx); r600_resume_nontimer_queries(ctx);
}
} }
static void r600_flush_from_st(struct pipe_context *ctx, static void r600_flush_from_st(struct pipe_context *ctx,
@@ -233,8 +239,8 @@ bool r600_common_context_init(struct r600_common_context *rctx,
rctx->family = rscreen->family; rctx->family = rscreen->family;
rctx->chip_class = rscreen->chip_class; rctx->chip_class = rscreen->chip_class;
if (rscreen->family == CHIP_HAWAII) if (rscreen->chip_class >= CIK)
rctx->max_db = 16; rctx->max_db = MAX2(8, rscreen->info.r600_num_backends);
else if (rscreen->chip_class >= EVERGREEN) else if (rscreen->chip_class >= EVERGREEN)
rctx->max_db = 8; rctx->max_db = 8;
else else
@@ -550,10 +556,11 @@ const char *r600_get_llvm_processor_name(enum radeon_family family)
case CHIP_TONGA: return "tonga"; case CHIP_TONGA: return "tonga";
case CHIP_ICELAND: return "iceland"; case CHIP_ICELAND: return "iceland";
case CHIP_CARRIZO: return "carrizo"; case CHIP_CARRIZO: return "carrizo";
case CHIP_FIJI: return "fiji";
#if HAVE_LLVM <= 0x0307 #if HAVE_LLVM <= 0x0307
case CHIP_FIJI: return "tonga";
case CHIP_STONEY: return "carrizo"; case CHIP_STONEY: return "carrizo";
#else #else
case CHIP_FIJI: return "fiji";
case CHIP_STONEY: return "stoney"; case CHIP_STONEY: return "stoney";
#endif #endif
default: return ""; default: return "";

View File

@@ -392,6 +392,7 @@ struct r600_common_context {
struct list_head active_nontimer_queries; struct list_head active_nontimer_queries;
struct list_head active_timer_queries; struct list_head active_timer_queries;
unsigned num_cs_dw_nontimer_queries_suspend; unsigned num_cs_dw_nontimer_queries_suspend;
bool nontimer_queries_suspended_by_flush;
unsigned num_cs_dw_timer_queries_suspend; unsigned num_cs_dw_timer_queries_suspend;
/* Additional hardware info. */ /* Additional hardware info. */
unsigned backend_mask; unsigned backend_mask;

View File

@@ -489,6 +489,10 @@ static void vi_texture_alloc_dcc_separate(struct r600_common_screen *rscreen,
if (rscreen->debug_flags & DBG_NO_DCC) if (rscreen->debug_flags & DBG_NO_DCC)
return; return;
/* TODO: DCC is broken on Stoney */
if (rscreen->family == CHIP_STONEY)
return;
rtex->dcc_buffer = (struct r600_resource *) rtex->dcc_buffer = (struct r600_resource *)
r600_aligned_buffer_create(&rscreen->b, PIPE_BIND_CUSTOM, r600_aligned_buffer_create(&rscreen->b, PIPE_BIND_CUSTOM,
PIPE_USAGE_DEFAULT, rtex->surface.dcc_size, rtex->surface.dcc_alignment); PIPE_USAGE_DEFAULT, rtex->surface.dcc_size, rtex->surface.dcc_alignment);

View File

@@ -1539,7 +1539,7 @@ void radeon_llvm_context_init(struct radeon_llvm_context * ctx)
bld_base->op_actions[TGSI_OPCODE_ENDIF].emit = endif_emit; bld_base->op_actions[TGSI_OPCODE_ENDIF].emit = endif_emit;
bld_base->op_actions[TGSI_OPCODE_ENDLOOP].emit = endloop_emit; bld_base->op_actions[TGSI_OPCODE_ENDLOOP].emit = endloop_emit;
bld_base->op_actions[TGSI_OPCODE_EX2].emit = build_tgsi_intrinsic_nomem; bld_base->op_actions[TGSI_OPCODE_EX2].emit = build_tgsi_intrinsic_nomem;
bld_base->op_actions[TGSI_OPCODE_EX2].intr_name = "llvm.exp2.f32"; bld_base->op_actions[TGSI_OPCODE_EX2].intr_name = "llvm.AMDIL.exp.";
bld_base->op_actions[TGSI_OPCODE_FLR].emit = build_tgsi_intrinsic_nomem; bld_base->op_actions[TGSI_OPCODE_FLR].emit = build_tgsi_intrinsic_nomem;
bld_base->op_actions[TGSI_OPCODE_FLR].intr_name = "llvm.floor.f32"; bld_base->op_actions[TGSI_OPCODE_FLR].intr_name = "llvm.floor.f32";
bld_base->op_actions[TGSI_OPCODE_FMA].emit = build_tgsi_intrinsic_nomem; bld_base->op_actions[TGSI_OPCODE_FMA].emit = build_tgsi_intrinsic_nomem;

View File

@@ -958,6 +958,8 @@ static void ruvd_end_frame(struct pipe_video_codec *decoder,
dec->msg->body.decode.db_pitch = dec->base.width; dec->msg->body.decode.db_pitch = dec->base.width;
dt = dec->set_dtb(dec->msg, (struct vl_video_buffer *)target); dt = dec->set_dtb(dec->msg, (struct vl_video_buffer *)target);
if (((struct r600_common_screen*)dec->screen)->family >= CHIP_STONEY)
dec->msg->body.decode.dt_wa_chroma_top_offset = dec->msg->body.decode.dt_pitch / 2;
switch (u_reduce_video_profile(picture->profile)) { switch (u_reduce_video_profile(picture->profile)) {
case PIPE_VIDEO_FORMAT_MPEG4_AVC: case PIPE_VIDEO_FORMAT_MPEG4_AVC:

View File

@@ -394,7 +394,10 @@ struct ruvd_msg {
uint32_t dt_chroma_top_offset; uint32_t dt_chroma_top_offset;
uint32_t dt_chroma_bottom_offset; uint32_t dt_chroma_bottom_offset;
uint32_t dt_surf_tile_config; uint32_t dt_surf_tile_config;
uint32_t dt_reserved[3]; uint32_t dt_uv_surf_tile_config;
// re-use dt_wa_chroma_top_offset as dt_ext_info for UV pitch in stoney
uint32_t dt_wa_chroma_top_offset;
uint32_t dt_wa_chroma_bottom_offset;
uint32_t reserved[16]; uint32_t reserved[16];

View File

@@ -389,6 +389,11 @@ struct pipe_video_codec *rvce_create_encoder(struct pipe_context *context,
struct radeon_surf *tmp_surf; struct radeon_surf *tmp_surf;
unsigned cpb_size; unsigned cpb_size;
if (rscreen->info.family == CHIP_STONEY) {
RVID_ERR("Stoney VCE is not supported!\n");
return NULL;
}
if (!rscreen->info.vce_fw_version) { if (!rscreen->info.vce_fw_version) {
RVID_ERR("Kernel doesn't supports VCE!\n"); RVID_ERR("Kernel doesn't supports VCE!\n");
return NULL; return NULL;

View File

@@ -34,11 +34,6 @@
#define MAX_GLOBAL_BUFFERS 20 #define MAX_GLOBAL_BUFFERS 20
/* XXX: Even though we don't pass the scratch buffer via user sgprs any more
* LLVM still expects that we specify 4 USER_SGPRS so it can remain compatible
* with older mesa. */
#define NUM_USER_SGPRS 4
struct si_compute { struct si_compute {
struct si_context *ctx; struct si_context *ctx;
@@ -238,7 +233,6 @@ static void si_launch_grid(
uint64_t kernel_args_va; uint64_t kernel_args_va;
uint64_t scratch_buffer_va = 0; uint64_t scratch_buffer_va = 0;
uint64_t shader_va; uint64_t shader_va;
unsigned arg_user_sgpr_count = NUM_USER_SGPRS;
unsigned i; unsigned i;
struct si_shader *shader = &program->shader; struct si_shader *shader = &program->shader;
unsigned lds_blocks; unsigned lds_blocks;
@@ -366,20 +360,7 @@ static void si_launch_grid(
si_pm4_set_reg(pm4, R_00B830_COMPUTE_PGM_LO, shader_va >> 8); si_pm4_set_reg(pm4, R_00B830_COMPUTE_PGM_LO, shader_va >> 8);
si_pm4_set_reg(pm4, R_00B834_COMPUTE_PGM_HI, shader_va >> 40); si_pm4_set_reg(pm4, R_00B834_COMPUTE_PGM_HI, shader_va >> 40);
si_pm4_set_reg(pm4, R_00B848_COMPUTE_PGM_RSRC1, si_pm4_set_reg(pm4, R_00B848_COMPUTE_PGM_RSRC1, shader->rsrc1);
/* We always use at least 3 VGPRS, these come from
* TIDIG_COMP_CNT.
* XXX: The compiler should account for this.
*/
S_00B848_VGPRS((MAX2(3, shader->num_vgprs) - 1) / 4)
/* We always use at least 4 + arg_user_sgpr_count. The 4 extra
* sgprs are from TGID_X_EN, TGID_Y_EN, TGID_Z_EN, TG_SIZE_EN
* XXX: The compiler should account for this.
*/
| S_00B848_SGPRS(((MAX2(4 + arg_user_sgpr_count,
shader->num_sgprs)) - 1) / 8)
| S_00B028_FLOAT_MODE(shader->float_mode))
;
lds_blocks = shader->lds_size; lds_blocks = shader->lds_size;
/* XXX: We are over allocating LDS. For SI, the shader reports LDS in /* XXX: We are over allocating LDS. For SI, the shader reports LDS in
@@ -395,17 +376,10 @@ static void si_launch_grid(
assert(lds_blocks <= 0xFF); assert(lds_blocks <= 0xFF);
si_pm4_set_reg(pm4, R_00B84C_COMPUTE_PGM_RSRC2, shader->rsrc2 &= C_00B84C_LDS_SIZE;
S_00B84C_SCRATCH_EN(shader->scratch_bytes_per_wave > 0) shader->rsrc2 |= S_00B84C_LDS_SIZE(lds_blocks);
| S_00B84C_USER_SGPR(arg_user_sgpr_count)
| S_00B84C_TGID_X_EN(1) si_pm4_set_reg(pm4, R_00B84C_COMPUTE_PGM_RSRC2, shader->rsrc2);
| S_00B84C_TGID_Y_EN(1)
| S_00B84C_TGID_Z_EN(1)
| S_00B84C_TG_SIZE_EN(1)
| S_00B84C_TIDIG_COMP_CNT(2)
| S_00B84C_LDS_SIZE(lds_blocks)
| S_00B84C_EXCP_EN(0))
;
si_pm4_set_reg(pm4, R_00B854_COMPUTE_RESOURCE_LIMITS, 0); si_pm4_set_reg(pm4, R_00B854_COMPUTE_RESOURCE_LIMITS, 0);
si_pm4_set_reg(pm4, R_00B858_COMPUTE_STATIC_THREAD_MGMT_SE0, si_pm4_set_reg(pm4, R_00B858_COMPUTE_STATIC_THREAD_MGMT_SE0,

View File

@@ -632,7 +632,7 @@ void si_check_vm_faults(struct si_context *sctx)
/* Use conservative timeout 800ms, after which we won't wait any /* Use conservative timeout 800ms, after which we won't wait any
* longer and assume the GPU is hung. * longer and assume the GPU is hung.
*/ */
screen->fence_finish(screen, sctx->last_gfx_fence, 800*1000*1000); sctx->b.ws->fence_wait(sctx->b.ws, sctx->last_gfx_fence, 800*1000*1000);
if (!si_vm_fault_occured(sctx, &addr)) if (!si_vm_fault_occured(sctx, &addr))
return; return;

View File

@@ -594,6 +594,14 @@ static LLVMValueRef lds_load(struct lp_build_tgsi_context *bld_base,
lp_build_const_int32(gallivm, swizzle)); lp_build_const_int32(gallivm, swizzle));
value = build_indexed_load(si_shader_ctx, si_shader_ctx->lds, dw_addr); value = build_indexed_load(si_shader_ctx, si_shader_ctx->lds, dw_addr);
if (type == TGSI_TYPE_DOUBLE) {
LLVMValueRef value2;
dw_addr = lp_build_add(&bld_base->uint_bld, dw_addr,
lp_build_const_int32(gallivm, swizzle + 1));
value2 = build_indexed_load(si_shader_ctx, si_shader_ctx->lds, dw_addr);
return radeon_llvm_emit_fetch_double(bld_base, value, value2);
}
return LLVMBuildBitCast(gallivm->builder, value, return LLVMBuildBitCast(gallivm->builder, value,
tgsi2llvmtype(bld_base, type), ""); tgsi2llvmtype(bld_base, type), "");
} }
@@ -733,6 +741,7 @@ static LLVMValueRef fetch_input_gs(
unsigned semantic_name = info->input_semantic_name[reg->Register.Index]; unsigned semantic_name = info->input_semantic_name[reg->Register.Index];
unsigned semantic_index = info->input_semantic_index[reg->Register.Index]; unsigned semantic_index = info->input_semantic_index[reg->Register.Index];
unsigned param; unsigned param;
LLVMValueRef value;
if (swizzle != ~0 && semantic_name == TGSI_SEMANTIC_PRIMID) if (swizzle != ~0 && semantic_name == TGSI_SEMANTIC_PRIMID)
return get_primitive_id(bld_base, swizzle); return get_primitive_id(bld_base, swizzle);
@@ -774,11 +783,22 @@ static LLVMValueRef fetch_input_gs(
args[7] = uint->zero; /* SLC */ args[7] = uint->zero; /* SLC */
args[8] = uint->zero; /* TFE */ args[8] = uint->zero; /* TFE */
value = lp_build_intrinsic(gallivm->builder,
"llvm.SI.buffer.load.dword.i32.i32",
i32, args, 9,
LLVMReadOnlyAttribute | LLVMNoUnwindAttribute);
if (type == TGSI_TYPE_DOUBLE) {
LLVMValueRef value2;
args[2] = lp_build_const_int32(gallivm, (param * 4 + swizzle + 1) * 256);
value2 = lp_build_intrinsic(gallivm->builder,
"llvm.SI.buffer.load.dword.i32.i32",
i32, args, 9,
LLVMReadOnlyAttribute | LLVMNoUnwindAttribute);
return radeon_llvm_emit_fetch_double(bld_base,
value, value2);
}
return LLVMBuildBitCast(gallivm->builder, return LLVMBuildBitCast(gallivm->builder,
lp_build_intrinsic(gallivm->builder, value,
"llvm.SI.buffer.load.dword.i32.i32",
i32, args, 9,
LLVMReadOnlyAttribute | LLVMNoUnwindAttribute),
tgsi2llvmtype(bld_base, type), ""); tgsi2llvmtype(bld_base, type), "");
} }
@@ -3745,12 +3765,14 @@ void si_shader_binary_read_config(const struct si_screen *sscreen,
shader->num_sgprs = MAX2(shader->num_sgprs, (G_00B028_SGPRS(value) + 1) * 8); shader->num_sgprs = MAX2(shader->num_sgprs, (G_00B028_SGPRS(value) + 1) * 8);
shader->num_vgprs = MAX2(shader->num_vgprs, (G_00B028_VGPRS(value) + 1) * 4); shader->num_vgprs = MAX2(shader->num_vgprs, (G_00B028_VGPRS(value) + 1) * 4);
shader->float_mode = G_00B028_FLOAT_MODE(value); shader->float_mode = G_00B028_FLOAT_MODE(value);
shader->rsrc1 = value;
break; break;
case R_00B02C_SPI_SHADER_PGM_RSRC2_PS: case R_00B02C_SPI_SHADER_PGM_RSRC2_PS:
shader->lds_size = MAX2(shader->lds_size, G_00B02C_EXTRA_LDS_SIZE(value)); shader->lds_size = MAX2(shader->lds_size, G_00B02C_EXTRA_LDS_SIZE(value));
break; break;
case R_00B84C_COMPUTE_PGM_RSRC2: case R_00B84C_COMPUTE_PGM_RSRC2:
shader->lds_size = MAX2(shader->lds_size, G_00B84C_LDS_SIZE(value)); shader->lds_size = MAX2(shader->lds_size, G_00B84C_LDS_SIZE(value));
shader->rsrc2 = value;
break; break;
case R_0286CC_SPI_PS_INPUT_ENA: case R_0286CC_SPI_PS_INPUT_ENA:
shader->spi_ps_input_ena = value; shader->spi_ps_input_ena = value;

View File

@@ -290,8 +290,8 @@ struct si_shader {
bool is_gs_copy_shader; bool is_gs_copy_shader;
bool dx10_clamp_mode; /* convert NaNs to 0 */ bool dx10_clamp_mode; /* convert NaNs to 0 */
unsigned ls_rsrc1; unsigned rsrc1;
unsigned ls_rsrc2; unsigned rsrc2;
}; };
static inline struct tgsi_shader_info *si_get_vs_info(struct si_context *sctx) static inline struct tgsi_shader_info *si_get_vs_info(struct si_context *sctx)

View File

@@ -163,7 +163,7 @@ static void si_emit_derived_tess_state(struct si_context *sctx,
perpatch_output_offset = output_patch0_offset + pervertex_output_patch_size; perpatch_output_offset = output_patch0_offset + pervertex_output_patch_size;
lds_size = output_patch0_offset + output_patch_size * *num_patches; lds_size = output_patch0_offset + output_patch_size * *num_patches;
ls_rsrc2 = ls->current->ls_rsrc2; ls_rsrc2 = ls->current->rsrc2;
if (sctx->b.chip_class >= CIK) { if (sctx->b.chip_class >= CIK) {
assert(lds_size <= 65536); assert(lds_size <= 65536);
@@ -178,7 +178,7 @@ static void si_emit_derived_tess_state(struct si_context *sctx,
if (sctx->b.chip_class == CIK && sctx->b.family != CHIP_HAWAII) if (sctx->b.chip_class == CIK && sctx->b.family != CHIP_HAWAII)
radeon_set_sh_reg(cs, R_00B52C_SPI_SHADER_PGM_RSRC2_LS, ls_rsrc2); radeon_set_sh_reg(cs, R_00B52C_SPI_SHADER_PGM_RSRC2_LS, ls_rsrc2);
radeon_set_sh_reg_seq(cs, R_00B528_SPI_SHADER_PGM_RSRC1_LS, 2); radeon_set_sh_reg_seq(cs, R_00B528_SPI_SHADER_PGM_RSRC1_LS, 2);
radeon_emit(cs, ls->current->ls_rsrc1); radeon_emit(cs, ls->current->rsrc1);
radeon_emit(cs, ls_rsrc2); radeon_emit(cs, ls_rsrc2);
/* Compute userdata SGPRs. */ /* Compute userdata SGPRs. */

View File

@@ -121,11 +121,11 @@ static void si_shader_ls(struct si_shader *shader)
si_pm4_set_reg(pm4, R_00B520_SPI_SHADER_PGM_LO_LS, va >> 8); si_pm4_set_reg(pm4, R_00B520_SPI_SHADER_PGM_LO_LS, va >> 8);
si_pm4_set_reg(pm4, R_00B524_SPI_SHADER_PGM_HI_LS, va >> 40); si_pm4_set_reg(pm4, R_00B524_SPI_SHADER_PGM_HI_LS, va >> 40);
shader->ls_rsrc1 = S_00B528_VGPRS((shader->num_vgprs - 1) / 4) | shader->rsrc1 = S_00B528_VGPRS((shader->num_vgprs - 1) / 4) |
S_00B528_SGPRS((num_sgprs - 1) / 8) | S_00B528_SGPRS((num_sgprs - 1) / 8) |
S_00B528_VGPR_COMP_CNT(vgpr_comp_cnt) | S_00B528_VGPR_COMP_CNT(vgpr_comp_cnt) |
S_00B528_DX10_CLAMP(shader->dx10_clamp_mode); S_00B528_DX10_CLAMP(shader->dx10_clamp_mode);
shader->ls_rsrc2 = S_00B52C_USER_SGPR(num_user_sgprs) | shader->rsrc2 = S_00B52C_USER_SGPR(num_user_sgprs) |
S_00B52C_SCRATCH_EN(shader->scratch_bytes_per_wave > 0); S_00B52C_SCRATCH_EN(shader->scratch_bytes_per_wave > 0);
} }

View File

@@ -6,8 +6,4 @@ TARGET_LIB_DEPS += \
$(top_builddir)/src/gallium/winsys/vc4/drm/libvc4drm.la \ $(top_builddir)/src/gallium/winsys/vc4/drm/libvc4drm.la \
$(top_builddir)/src/gallium/drivers/vc4/libvc4.la $(top_builddir)/src/gallium/drivers/vc4/libvc4.la
if USE_VC4_SIMULATOR
TARGET_CPPFLAGS += -DUSE_VC4_SIMULATOR
endif
endif endif

View File

@@ -23,7 +23,6 @@ include Makefile.sources
include $(top_srcdir)/src/gallium/Automake.inc include $(top_srcdir)/src/gallium/Automake.inc
if USE_VC4_SIMULATOR if USE_VC4_SIMULATOR
SIM_CFLAGS = -DUSE_VC4_SIMULATOR=1
SIM_LDFLAGS = -lsimpenrose SIM_LDFLAGS = -lsimpenrose
endif endif

View File

@@ -21,6 +21,7 @@ C_SOURCES := \
vc4_job.c \ vc4_job.c \
vc4_nir_lower_blend.c \ vc4_nir_lower_blend.c \
vc4_nir_lower_io.c \ vc4_nir_lower_io.c \
vc4_nir_lower_txf_ms.c \
vc4_opt_algebraic.c \ vc4_opt_algebraic.c \
vc4_opt_constant_folding.c \ vc4_opt_constant_folding.c \
vc4_opt_copy_propagation.c \ vc4_opt_copy_propagation.c \

View File

@@ -121,6 +121,11 @@ enum vc4_packet {
#define VC4_PACKET_TILE_COORDINATES_SIZE 3 #define VC4_PACKET_TILE_COORDINATES_SIZE 3
#define VC4_PACKET_GEM_HANDLES_SIZE 9 #define VC4_PACKET_GEM_HANDLES_SIZE 9
/* Number of multisamples supported. */
#define VC4_MAX_SAMPLES 4
/* Size of a full resolution color or Z tile buffer load/store. */
#define VC4_TILE_BUFFER_SIZE (64 * 64 * 4)
#define VC4_MASK(high, low) (((1 << ((high) - (low) + 1)) - 1) << (low)) #define VC4_MASK(high, low) (((1 << ((high) - (low) + 1)) - 1) << (low))
/* Using the GNU statement expression extension */ /* Using the GNU statement expression extension */
#define VC4_SET_FIELD(value, field) \ #define VC4_SET_FIELD(value, field) \
@@ -151,6 +156,16 @@ enum vc4_packet {
#define VC4_LOADSTORE_FULL_RES_DISABLE_ZS (1 << 1) #define VC4_LOADSTORE_FULL_RES_DISABLE_ZS (1 << 1)
#define VC4_LOADSTORE_FULL_RES_DISABLE_COLOR (1 << 0) #define VC4_LOADSTORE_FULL_RES_DISABLE_COLOR (1 << 0)
/** @{
*
* low bits of VC4_PACKET_STORE_FULL_RES_TILE_BUFFER and
* VC4_PACKET_LOAD_FULL_RES_TILE_BUFFER.
*/
#define VC4_LOADSTORE_FULL_RES_EOF (1 << 3)
#define VC4_LOADSTORE_FULL_RES_DISABLE_CLEAR_ALL (1 << 2)
#define VC4_LOADSTORE_FULL_RES_DISABLE_ZS (1 << 1)
#define VC4_LOADSTORE_FULL_RES_DISABLE_COLOR (1 << 0)
/** @{ /** @{
* *
* byte 2 of VC4_PACKET_STORE_TILE_BUFFER_GENERAL and * byte 2 of VC4_PACKET_STORE_TILE_BUFFER_GENERAL and

View File

@@ -36,9 +36,11 @@
struct vc4_rcl_setup { struct vc4_rcl_setup {
struct drm_gem_cma_object *color_read; struct drm_gem_cma_object *color_read;
struct drm_gem_cma_object *color_ms_write; struct drm_gem_cma_object *color_write;
struct drm_gem_cma_object *zs_read; struct drm_gem_cma_object *zs_read;
struct drm_gem_cma_object *zs_write; struct drm_gem_cma_object *zs_write;
struct drm_gem_cma_object *msaa_color_write;
struct drm_gem_cma_object *msaa_zs_write;
struct drm_gem_cma_object *rcl; struct drm_gem_cma_object *rcl;
u32 next_offset; u32 next_offset;
@@ -62,7 +64,6 @@ static inline void rcl_u32(struct vc4_rcl_setup *setup, u32 val)
setup->next_offset += 4; setup->next_offset += 4;
} }
/* /*
* Emits a no-op STORE_TILE_BUFFER_GENERAL. * Emits a no-op STORE_TILE_BUFFER_GENERAL.
* *
@@ -81,6 +82,22 @@ static void vc4_store_before_load(struct vc4_rcl_setup *setup)
rcl_u32(setup, 0); /* no address, since we're in None mode */ rcl_u32(setup, 0); /* no address, since we're in None mode */
} }
/*
* Calculates the physical address of the start of a tile in a RCL surface.
*
* Unlike the other load/store packets,
* VC4_PACKET_LOAD/STORE_FULL_RES_TILE_BUFFER don't look at the tile
* coordinates packet, and instead just store to the address given.
*/
static uint32_t vc4_full_res_offset(struct vc4_exec_info *exec,
struct drm_gem_cma_object *bo,
struct drm_vc4_submit_rcl_surface *surf,
uint8_t x, uint8_t y)
{
return bo->paddr + surf->offset + VC4_TILE_BUFFER_SIZE *
(DIV_ROUND_UP(exec->args->width, 32) * y + x);
}
/* /*
* Emits a PACKET_TILE_COORDINATES if one isn't already pending. * Emits a PACKET_TILE_COORDINATES if one isn't already pending.
* *
@@ -108,22 +125,41 @@ static void emit_tile(struct vc4_exec_info *exec,
* may be outstanding at a time. * may be outstanding at a time.
*/ */
if (setup->color_read) { if (setup->color_read) {
rcl_u8(setup, VC4_PACKET_LOAD_TILE_BUFFER_GENERAL); if (args->color_read.flags &
rcl_u16(setup, args->color_read.bits); VC4_SUBMIT_RCL_SURFACE_READ_IS_FULL_RES) {
rcl_u32(setup, rcl_u8(setup, VC4_PACKET_LOAD_FULL_RES_TILE_BUFFER);
setup->color_read->paddr + args->color_read.offset); rcl_u32(setup,
vc4_full_res_offset(exec, setup->color_read,
&args->color_read, x, y) |
VC4_LOADSTORE_FULL_RES_DISABLE_ZS);
} else {
rcl_u8(setup, VC4_PACKET_LOAD_TILE_BUFFER_GENERAL);
rcl_u16(setup, args->color_read.bits);
rcl_u32(setup, setup->color_read->paddr +
args->color_read.offset);
}
} }
if (setup->zs_read) { if (setup->zs_read) {
if (setup->color_read) { if (args->zs_read.flags &
/* Exec previous load. */ VC4_SUBMIT_RCL_SURFACE_READ_IS_FULL_RES) {
vc4_tile_coordinates(setup, x, y); rcl_u8(setup, VC4_PACKET_LOAD_FULL_RES_TILE_BUFFER);
vc4_store_before_load(setup); rcl_u32(setup,
} vc4_full_res_offset(exec, setup->zs_read,
&args->zs_read, x, y) |
VC4_LOADSTORE_FULL_RES_DISABLE_COLOR);
} else {
if (setup->color_read) {
/* Exec previous load. */
vc4_tile_coordinates(setup, x, y);
vc4_store_before_load(setup);
}
rcl_u8(setup, VC4_PACKET_LOAD_TILE_BUFFER_GENERAL); rcl_u8(setup, VC4_PACKET_LOAD_TILE_BUFFER_GENERAL);
rcl_u16(setup, args->zs_read.bits); rcl_u16(setup, args->zs_read.bits);
rcl_u32(setup, setup->zs_read->paddr + args->zs_read.offset); rcl_u32(setup, setup->zs_read->paddr +
args->zs_read.offset);
}
} }
/* Clipping depends on tile coordinates having been /* Clipping depends on tile coordinates having been
@@ -144,20 +180,60 @@ static void emit_tile(struct vc4_exec_info *exec,
(y * exec->bin_tiles_x + x) * 32)); (y * exec->bin_tiles_x + x) * 32));
} }
if (setup->msaa_color_write) {
bool last_tile_write = (!setup->msaa_zs_write &&
!setup->zs_write &&
!setup->color_write);
uint32_t bits = VC4_LOADSTORE_FULL_RES_DISABLE_ZS;
if (!last_tile_write)
bits |= VC4_LOADSTORE_FULL_RES_DISABLE_CLEAR_ALL;
else if (last)
bits |= VC4_LOADSTORE_FULL_RES_EOF;
rcl_u8(setup, VC4_PACKET_STORE_FULL_RES_TILE_BUFFER);
rcl_u32(setup,
vc4_full_res_offset(exec, setup->msaa_color_write,
&args->msaa_color_write, x, y) |
bits);
}
if (setup->msaa_zs_write) {
bool last_tile_write = (!setup->zs_write &&
!setup->color_write);
uint32_t bits = VC4_LOADSTORE_FULL_RES_DISABLE_COLOR;
if (setup->msaa_color_write)
vc4_tile_coordinates(setup, x, y);
if (!last_tile_write)
bits |= VC4_LOADSTORE_FULL_RES_DISABLE_CLEAR_ALL;
else if (last)
bits |= VC4_LOADSTORE_FULL_RES_EOF;
rcl_u8(setup, VC4_PACKET_STORE_FULL_RES_TILE_BUFFER);
rcl_u32(setup,
vc4_full_res_offset(exec, setup->msaa_zs_write,
&args->msaa_zs_write, x, y) |
bits);
}
if (setup->zs_write) { if (setup->zs_write) {
bool last_tile_write = !setup->color_write;
if (setup->msaa_color_write || setup->msaa_zs_write)
vc4_tile_coordinates(setup, x, y);
rcl_u8(setup, VC4_PACKET_STORE_TILE_BUFFER_GENERAL); rcl_u8(setup, VC4_PACKET_STORE_TILE_BUFFER_GENERAL);
rcl_u16(setup, args->zs_write.bits | rcl_u16(setup, args->zs_write.bits |
(setup->color_ms_write ? (last_tile_write ?
VC4_STORE_TILE_BUFFER_DISABLE_COLOR_CLEAR : 0)); 0 : VC4_STORE_TILE_BUFFER_DISABLE_COLOR_CLEAR));
rcl_u32(setup, rcl_u32(setup,
(setup->zs_write->paddr + args->zs_write.offset) | (setup->zs_write->paddr + args->zs_write.offset) |
((last && !setup->color_ms_write) ? ((last && last_tile_write) ?
VC4_LOADSTORE_TILE_BUFFER_EOF : 0)); VC4_LOADSTORE_TILE_BUFFER_EOF : 0));
} }
if (setup->color_ms_write) { if (setup->color_write) {
if (setup->zs_write) { if (setup->msaa_color_write || setup->msaa_zs_write ||
/* Reset after previous store */ setup->zs_write) {
vc4_tile_coordinates(setup, x, y); vc4_tile_coordinates(setup, x, y);
} }
@@ -192,14 +268,26 @@ static int vc4_create_rcl_bo(struct drm_device *dev, struct vc4_exec_info *exec,
} }
if (setup->color_read) { if (setup->color_read) {
loop_body_size += (VC4_PACKET_LOAD_TILE_BUFFER_GENERAL_SIZE); if (args->color_read.flags &
VC4_SUBMIT_RCL_SURFACE_READ_IS_FULL_RES) {
loop_body_size += VC4_PACKET_LOAD_FULL_RES_TILE_BUFFER_SIZE;
} else {
loop_body_size += VC4_PACKET_LOAD_TILE_BUFFER_GENERAL_SIZE;
}
} }
if (setup->zs_read) { if (setup->zs_read) {
if (setup->color_read) { if (args->zs_read.flags &
loop_body_size += VC4_PACKET_TILE_COORDINATES_SIZE; VC4_SUBMIT_RCL_SURFACE_READ_IS_FULL_RES) {
loop_body_size += VC4_PACKET_STORE_TILE_BUFFER_GENERAL_SIZE; loop_body_size += VC4_PACKET_LOAD_FULL_RES_TILE_BUFFER_SIZE;
} else {
if (setup->color_read &&
!(args->color_read.flags &
VC4_SUBMIT_RCL_SURFACE_READ_IS_FULL_RES)) {
loop_body_size += VC4_PACKET_TILE_COORDINATES_SIZE;
loop_body_size += VC4_PACKET_STORE_TILE_BUFFER_GENERAL_SIZE;
}
loop_body_size += VC4_PACKET_LOAD_TILE_BUFFER_GENERAL_SIZE;
} }
loop_body_size += VC4_PACKET_LOAD_TILE_BUFFER_GENERAL_SIZE;
} }
if (has_bin) { if (has_bin) {
@@ -207,13 +295,23 @@ static int vc4_create_rcl_bo(struct drm_device *dev, struct vc4_exec_info *exec,
loop_body_size += VC4_PACKET_BRANCH_TO_SUB_LIST_SIZE; loop_body_size += VC4_PACKET_BRANCH_TO_SUB_LIST_SIZE;
} }
if (setup->msaa_color_write)
loop_body_size += VC4_PACKET_STORE_FULL_RES_TILE_BUFFER_SIZE;
if (setup->msaa_zs_write)
loop_body_size += VC4_PACKET_STORE_FULL_RES_TILE_BUFFER_SIZE;
if (setup->zs_write) if (setup->zs_write)
loop_body_size += VC4_PACKET_STORE_TILE_BUFFER_GENERAL_SIZE; loop_body_size += VC4_PACKET_STORE_TILE_BUFFER_GENERAL_SIZE;
if (setup->color_ms_write) { if (setup->color_write)
if (setup->zs_write)
loop_body_size += VC4_PACKET_TILE_COORDINATES_SIZE;
loop_body_size += VC4_PACKET_STORE_MS_TILE_BUFFER_SIZE; loop_body_size += VC4_PACKET_STORE_MS_TILE_BUFFER_SIZE;
}
/* We need a VC4_PACKET_TILE_COORDINATES in between each store. */
loop_body_size += VC4_PACKET_TILE_COORDINATES_SIZE *
((setup->msaa_color_write != NULL) +
(setup->msaa_zs_write != NULL) +
(setup->color_write != NULL) +
(setup->zs_write != NULL) - 1);
size += xtiles * ytiles * loop_body_size; size += xtiles * ytiles * loop_body_size;
setup->rcl = drm_gem_cma_create(dev, size); setup->rcl = drm_gem_cma_create(dev, size);
@@ -224,13 +322,12 @@ static int vc4_create_rcl_bo(struct drm_device *dev, struct vc4_exec_info *exec,
rcl_u8(setup, VC4_PACKET_TILE_RENDERING_MODE_CONFIG); rcl_u8(setup, VC4_PACKET_TILE_RENDERING_MODE_CONFIG);
rcl_u32(setup, rcl_u32(setup,
(setup->color_ms_write ? (setup->color_write ? (setup->color_write->paddr +
(setup->color_ms_write->paddr + args->color_write.offset) :
args->color_ms_write.offset) :
0)); 0));
rcl_u16(setup, args->width); rcl_u16(setup, args->width);
rcl_u16(setup, args->height); rcl_u16(setup, args->height);
rcl_u16(setup, args->color_ms_write.bits); rcl_u16(setup, args->color_write.bits);
/* The tile buffer gets cleared when the previous tile is stored. If /* The tile buffer gets cleared when the previous tile is stored. If
* the clear values changed between frames, then the tile buffer has * the clear values changed between frames, then the tile buffer has
@@ -255,6 +352,7 @@ static int vc4_create_rcl_bo(struct drm_device *dev, struct vc4_exec_info *exec,
for (x = min_x_tile; x <= max_x_tile; x++) { for (x = min_x_tile; x <= max_x_tile; x++) {
bool first = (x == min_x_tile && y == min_y_tile); bool first = (x == min_x_tile && y == min_y_tile);
bool last = (x == max_x_tile && y == max_y_tile); bool last = (x == max_x_tile && y == max_y_tile);
emit_tile(exec, setup, x, y, first, last); emit_tile(exec, setup, x, y, first, last);
} }
} }
@@ -266,6 +364,56 @@ static int vc4_create_rcl_bo(struct drm_device *dev, struct vc4_exec_info *exec,
return 0; return 0;
} }
static int vc4_full_res_bounds_check(struct vc4_exec_info *exec,
struct drm_gem_cma_object *obj,
struct drm_vc4_submit_rcl_surface *surf)
{
struct drm_vc4_submit_cl *args = exec->args;
u32 render_tiles_stride = DIV_ROUND_UP(exec->args->width, 32);
if (surf->offset > obj->base.size) {
DRM_ERROR("surface offset %d > BO size %zd\n",
surf->offset, obj->base.size);
return -EINVAL;
}
if ((obj->base.size - surf->offset) / VC4_TILE_BUFFER_SIZE <
render_tiles_stride * args->max_y_tile + args->max_x_tile) {
DRM_ERROR("MSAA tile %d, %d out of bounds "
"(bo size %zd, offset %d).\n",
args->max_x_tile, args->max_y_tile,
obj->base.size,
surf->offset);
return -EINVAL;
}
return 0;
}
static int vc4_rcl_msaa_surface_setup(struct vc4_exec_info *exec,
struct drm_gem_cma_object **obj,
struct drm_vc4_submit_rcl_surface *surf)
{
if (surf->flags != 0 || surf->bits != 0) {
DRM_ERROR("MSAA surface had nonzero flags/bits\n");
return -EINVAL;
}
if (surf->hindex == ~0)
return 0;
*obj = vc4_use_bo(exec, surf->hindex);
if (!*obj)
return -EINVAL;
if (surf->offset & 0xf) {
DRM_ERROR("MSAA write must be 16b aligned.\n");
return -EINVAL;
}
return vc4_full_res_bounds_check(exec, *obj, surf);
}
static int vc4_rcl_surface_setup(struct vc4_exec_info *exec, static int vc4_rcl_surface_setup(struct vc4_exec_info *exec,
struct drm_gem_cma_object **obj, struct drm_gem_cma_object **obj,
struct drm_vc4_submit_rcl_surface *surf) struct drm_vc4_submit_rcl_surface *surf)
@@ -277,9 +425,10 @@ static int vc4_rcl_surface_setup(struct vc4_exec_info *exec,
uint8_t format = VC4_GET_FIELD(surf->bits, uint8_t format = VC4_GET_FIELD(surf->bits,
VC4_LOADSTORE_TILE_BUFFER_FORMAT); VC4_LOADSTORE_TILE_BUFFER_FORMAT);
int cpp; int cpp;
int ret;
if (surf->pad != 0) { if (surf->flags & ~VC4_SUBMIT_RCL_SURFACE_READ_IS_FULL_RES) {
DRM_ERROR("Padding unset\n"); DRM_ERROR("Extra flags set\n");
return -EINVAL; return -EINVAL;
} }
@@ -290,6 +439,25 @@ static int vc4_rcl_surface_setup(struct vc4_exec_info *exec,
if (!*obj) if (!*obj)
return -EINVAL; return -EINVAL;
if (surf->flags & VC4_SUBMIT_RCL_SURFACE_READ_IS_FULL_RES) {
if (surf == &exec->args->zs_write) {
DRM_ERROR("general zs write may not be a full-res.\n");
return -EINVAL;
}
if (surf->bits != 0) {
DRM_ERROR("load/store general bits set with "
"full res load/store.\n");
return -EINVAL;
}
ret = vc4_full_res_bounds_check(exec, *obj, surf);
if (!ret)
return ret;
return 0;
}
if (surf->bits & ~(VC4_LOADSTORE_TILE_BUFFER_TILING_MASK | if (surf->bits & ~(VC4_LOADSTORE_TILE_BUFFER_TILING_MASK |
VC4_LOADSTORE_TILE_BUFFER_BUFFER_MASK | VC4_LOADSTORE_TILE_BUFFER_BUFFER_MASK |
VC4_LOADSTORE_TILE_BUFFER_FORMAT_MASK)) { VC4_LOADSTORE_TILE_BUFFER_FORMAT_MASK)) {
@@ -341,9 +509,10 @@ static int vc4_rcl_surface_setup(struct vc4_exec_info *exec,
} }
static int static int
vc4_rcl_ms_surface_setup(struct vc4_exec_info *exec, vc4_rcl_render_config_surface_setup(struct vc4_exec_info *exec,
struct drm_gem_cma_object **obj, struct vc4_rcl_setup *setup,
struct drm_vc4_submit_rcl_surface *surf) struct drm_gem_cma_object **obj,
struct drm_vc4_submit_rcl_surface *surf)
{ {
uint8_t tiling = VC4_GET_FIELD(surf->bits, uint8_t tiling = VC4_GET_FIELD(surf->bits,
VC4_RENDER_CONFIG_MEMORY_FORMAT); VC4_RENDER_CONFIG_MEMORY_FORMAT);
@@ -351,13 +520,15 @@ vc4_rcl_ms_surface_setup(struct vc4_exec_info *exec,
VC4_RENDER_CONFIG_FORMAT); VC4_RENDER_CONFIG_FORMAT);
int cpp; int cpp;
if (surf->pad != 0) { if (surf->flags != 0) {
DRM_ERROR("Padding unset\n"); DRM_ERROR("No flags supported on render config.\n");
return -EINVAL; return -EINVAL;
} }
if (surf->bits & ~(VC4_RENDER_CONFIG_MEMORY_FORMAT_MASK | if (surf->bits & ~(VC4_RENDER_CONFIG_MEMORY_FORMAT_MASK |
VC4_RENDER_CONFIG_FORMAT_MASK)) { VC4_RENDER_CONFIG_FORMAT_MASK |
VC4_RENDER_CONFIG_MS_MODE_4X |
VC4_RENDER_CONFIG_DECIMATE_MODE_4X)) {
DRM_ERROR("Unknown bits in render config: 0x%04x\n", DRM_ERROR("Unknown bits in render config: 0x%04x\n",
surf->bits); surf->bits);
return -EINVAL; return -EINVAL;
@@ -414,18 +585,20 @@ int vc4_get_rcl(struct drm_device *dev, struct vc4_exec_info *exec)
if (has_bin && if (has_bin &&
(args->max_x_tile > exec->bin_tiles_x || (args->max_x_tile > exec->bin_tiles_x ||
args->max_y_tile > exec->bin_tiles_y)) { args->max_y_tile > exec->bin_tiles_y)) {
DRM_ERROR("Render tiles (%d,%d) outside of bin config (%d,%d)\n", DRM_ERROR("Render tiles (%d,%d) outside of bin config "
"(%d,%d)\n",
args->max_x_tile, args->max_y_tile, args->max_x_tile, args->max_y_tile,
exec->bin_tiles_x, exec->bin_tiles_y); exec->bin_tiles_x, exec->bin_tiles_y);
return -EINVAL; return -EINVAL;
} }
ret = vc4_rcl_surface_setup(exec, &setup.color_read, &args->color_read); ret = vc4_rcl_render_config_surface_setup(exec, &setup,
&setup.color_write,
&args->color_write);
if (ret) if (ret)
return ret; return ret;
ret = vc4_rcl_ms_surface_setup(exec, &setup.color_ms_write, ret = vc4_rcl_surface_setup(exec, &setup.color_read, &args->color_read);
&args->color_ms_write);
if (ret) if (ret)
return ret; return ret;
@@ -437,10 +610,21 @@ int vc4_get_rcl(struct drm_device *dev, struct vc4_exec_info *exec)
if (ret) if (ret)
return ret; return ret;
ret = vc4_rcl_msaa_surface_setup(exec, &setup.msaa_color_write,
&args->msaa_color_write);
if (ret)
return ret;
ret = vc4_rcl_msaa_surface_setup(exec, &setup.msaa_zs_write,
&args->msaa_zs_write);
if (ret)
return ret;
/* We shouldn't even have the job submitted to us if there's no /* We shouldn't even have the job submitted to us if there's no
* surface to write out. * surface to write out.
*/ */
if (!setup.color_ms_write && !setup.zs_write) { if (!setup.color_write && !setup.zs_write &&
!setup.msaa_color_write && !setup.msaa_zs_write) {
DRM_ERROR("RCL requires color or Z/S write\n"); DRM_ERROR("RCL requires color or Z/S write\n");
return -EINVAL; return -EINVAL;
} }

View File

@@ -47,7 +47,6 @@
void *validated, \ void *validated, \
void *untrusted void *untrusted
/** Return the width in pixels of a 64-byte microtile. */ /** Return the width in pixels of a 64-byte microtile. */
static uint32_t static uint32_t
utile_width(int cpp) utile_width(int cpp)
@@ -191,7 +190,7 @@ vc4_check_tex_size(struct vc4_exec_info *exec, struct drm_gem_cma_object *fbo,
if (size + offset < size || if (size + offset < size ||
size + offset > fbo->base.size) { size + offset > fbo->base.size) {
DRM_ERROR("Overflow in %dx%d (%dx%d) fbo size (%d + %d > %d)\n", DRM_ERROR("Overflow in %dx%d (%dx%d) fbo size (%d + %d > %zd)\n",
width, height, width, height,
aligned_width, aligned_height, aligned_width, aligned_height,
size, offset, fbo->base.size); size, offset, fbo->base.size);
@@ -201,7 +200,6 @@ vc4_check_tex_size(struct vc4_exec_info *exec, struct drm_gem_cma_object *fbo,
return true; return true;
} }
static int static int
validate_flush(VALIDATE_ARGS) validate_flush(VALIDATE_ARGS)
{ {
@@ -270,7 +268,7 @@ validate_indexed_prim_list(VALIDATE_ARGS)
if (offset > ib->base.size || if (offset > ib->base.size ||
(ib->base.size - offset) / index_size < length) { (ib->base.size - offset) / index_size < length) {
DRM_ERROR("IB access overflow (%d + %d*%d > %d)\n", DRM_ERROR("IB access overflow (%d + %d*%d > %zd)\n",
offset, length, index_size, ib->base.size); offset, length, index_size, ib->base.size);
return -EINVAL; return -EINVAL;
} }
@@ -361,9 +359,8 @@ validate_tile_binning_config(VALIDATE_ARGS)
} }
if (flags & (VC4_BIN_CONFIG_DB_NON_MS | if (flags & (VC4_BIN_CONFIG_DB_NON_MS |
VC4_BIN_CONFIG_TILE_BUFFER_64BIT | VC4_BIN_CONFIG_TILE_BUFFER_64BIT)) {
VC4_BIN_CONFIG_MS_MODE_4X)) { DRM_ERROR("unsupported binning config flags 0x%02x\n", flags);
DRM_ERROR("unsupported bining config flags 0x%02x\n", flags);
return -EINVAL; return -EINVAL;
} }
@@ -424,8 +421,8 @@ validate_gem_handles(VALIDATE_ARGS)
return 0; return 0;
} }
#define VC4_DEFINE_PACKET(packet, name, func) \ #define VC4_DEFINE_PACKET(packet, func) \
[packet] = { packet ## _SIZE, name, func } [packet] = { packet ## _SIZE, #packet, func }
static const struct cmd_info { static const struct cmd_info {
uint16_t len; uint16_t len;
@@ -433,42 +430,42 @@ static const struct cmd_info {
int (*func)(struct vc4_exec_info *exec, void *validated, int (*func)(struct vc4_exec_info *exec, void *validated,
void *untrusted); void *untrusted);
} cmd_info[] = { } cmd_info[] = {
VC4_DEFINE_PACKET(VC4_PACKET_HALT, "halt", NULL), VC4_DEFINE_PACKET(VC4_PACKET_HALT, NULL),
VC4_DEFINE_PACKET(VC4_PACKET_NOP, "nop", NULL), VC4_DEFINE_PACKET(VC4_PACKET_NOP, NULL),
VC4_DEFINE_PACKET(VC4_PACKET_FLUSH, "flush", validate_flush), VC4_DEFINE_PACKET(VC4_PACKET_FLUSH, validate_flush),
VC4_DEFINE_PACKET(VC4_PACKET_FLUSH_ALL, "flush all state", NULL), VC4_DEFINE_PACKET(VC4_PACKET_FLUSH_ALL, NULL),
VC4_DEFINE_PACKET(VC4_PACKET_START_TILE_BINNING, "start tile binning", validate_start_tile_binning), VC4_DEFINE_PACKET(VC4_PACKET_START_TILE_BINNING,
VC4_DEFINE_PACKET(VC4_PACKET_INCREMENT_SEMAPHORE, "increment semaphore", validate_increment_semaphore), validate_start_tile_binning),
VC4_DEFINE_PACKET(VC4_PACKET_INCREMENT_SEMAPHORE,
validate_increment_semaphore),
VC4_DEFINE_PACKET(VC4_PACKET_GL_INDEXED_PRIMITIVE, "Indexed Primitive List", validate_indexed_prim_list), VC4_DEFINE_PACKET(VC4_PACKET_GL_INDEXED_PRIMITIVE,
validate_indexed_prim_list),
VC4_DEFINE_PACKET(VC4_PACKET_GL_ARRAY_PRIMITIVE,
validate_gl_array_primitive),
VC4_DEFINE_PACKET(VC4_PACKET_GL_ARRAY_PRIMITIVE, "Vertex Array Primitives", validate_gl_array_primitive), VC4_DEFINE_PACKET(VC4_PACKET_PRIMITIVE_LIST_FORMAT, NULL),
/* This is only used by clipped primitives (packets 48 and 49), which VC4_DEFINE_PACKET(VC4_PACKET_GL_SHADER_STATE, validate_gl_shader_state),
* we don't support parsing yet.
*/
VC4_DEFINE_PACKET(VC4_PACKET_PRIMITIVE_LIST_FORMAT, "primitive list format", NULL),
VC4_DEFINE_PACKET(VC4_PACKET_GL_SHADER_STATE, "GL Shader State", validate_gl_shader_state), VC4_DEFINE_PACKET(VC4_PACKET_CONFIGURATION_BITS, NULL),
/* We don't support validating NV shader states. */ VC4_DEFINE_PACKET(VC4_PACKET_FLAT_SHADE_FLAGS, NULL),
VC4_DEFINE_PACKET(VC4_PACKET_POINT_SIZE, NULL),
VC4_DEFINE_PACKET(VC4_PACKET_CONFIGURATION_BITS, "configuration bits", NULL), VC4_DEFINE_PACKET(VC4_PACKET_LINE_WIDTH, NULL),
VC4_DEFINE_PACKET(VC4_PACKET_FLAT_SHADE_FLAGS, "flat shade flags", NULL), VC4_DEFINE_PACKET(VC4_PACKET_RHT_X_BOUNDARY, NULL),
VC4_DEFINE_PACKET(VC4_PACKET_POINT_SIZE, "point size", NULL), VC4_DEFINE_PACKET(VC4_PACKET_DEPTH_OFFSET, NULL),
VC4_DEFINE_PACKET(VC4_PACKET_LINE_WIDTH, "line width", NULL), VC4_DEFINE_PACKET(VC4_PACKET_CLIP_WINDOW, NULL),
VC4_DEFINE_PACKET(VC4_PACKET_RHT_X_BOUNDARY, "RHT X boundary", NULL), VC4_DEFINE_PACKET(VC4_PACKET_VIEWPORT_OFFSET, NULL),
VC4_DEFINE_PACKET(VC4_PACKET_DEPTH_OFFSET, "Depth Offset", NULL), VC4_DEFINE_PACKET(VC4_PACKET_CLIPPER_XY_SCALING, NULL),
VC4_DEFINE_PACKET(VC4_PACKET_CLIP_WINDOW, "Clip Window", NULL),
VC4_DEFINE_PACKET(VC4_PACKET_VIEWPORT_OFFSET, "Viewport Offset", NULL),
VC4_DEFINE_PACKET(VC4_PACKET_CLIPPER_XY_SCALING, "Clipper XY Scaling", NULL),
/* Note: The docs say this was also 105, but it was 106 in the /* Note: The docs say this was also 105, but it was 106 in the
* initial userland code drop. * initial userland code drop.
*/ */
VC4_DEFINE_PACKET(VC4_PACKET_CLIPPER_Z_SCALING, "Clipper Z Scale and Offset", NULL), VC4_DEFINE_PACKET(VC4_PACKET_CLIPPER_Z_SCALING, NULL),
VC4_DEFINE_PACKET(VC4_PACKET_TILE_BINNING_MODE_CONFIG, "tile binning configuration", validate_tile_binning_config), VC4_DEFINE_PACKET(VC4_PACKET_TILE_BINNING_MODE_CONFIG,
validate_tile_binning_config),
VC4_DEFINE_PACKET(VC4_PACKET_GEM_HANDLES, "GEM handles", validate_gem_handles), VC4_DEFINE_PACKET(VC4_PACKET_GEM_HANDLES, validate_gem_handles),
}; };
int int
@@ -500,11 +497,6 @@ vc4_validate_bin_cl(struct drm_device *dev,
return -EINVAL; return -EINVAL;
} }
#if 0
DRM_INFO("0x%08x: packet %d (%s) size %d processing...\n",
src_offset, cmd, info->name, info->len);
#endif
if (src_offset + info->len > len) { if (src_offset + info->len > len) {
DRM_ERROR("0x%08x: packet %d (%s) length 0x%08x " DRM_ERROR("0x%08x: packet %d (%s) length 0x%08x "
"exceeds bounds (0x%08x)\n", "exceeds bounds (0x%08x)\n",
@@ -519,8 +511,7 @@ vc4_validate_bin_cl(struct drm_device *dev,
if (info->func && info->func(exec, if (info->func && info->func(exec,
dst_pkt + 1, dst_pkt + 1,
src_pkt + 1)) { src_pkt + 1)) {
DRM_ERROR("0x%08x: packet %d (%s) failed to " DRM_ERROR("0x%08x: packet %d (%s) failed to validate\n",
"validate\n",
src_offset, cmd, info->name); src_offset, cmd, info->name);
return -EINVAL; return -EINVAL;
} }
@@ -588,12 +579,14 @@ reloc_tex(struct vc4_exec_info *exec,
if (sample->is_direct) { if (sample->is_direct) {
uint32_t remaining_size = tex->base.size - p0; uint32_t remaining_size = tex->base.size - p0;
if (p0 > tex->base.size - 4) { if (p0 > tex->base.size - 4) {
DRM_ERROR("UBO offset greater than UBO size\n"); DRM_ERROR("UBO offset greater than UBO size\n");
goto fail; goto fail;
} }
if (p1 > remaining_size - 4) { if (p1 > remaining_size - 4) {
DRM_ERROR("UBO clamp would allow reads outside of UBO\n"); DRM_ERROR("UBO clamp would allow reads "
"outside of UBO\n");
goto fail; goto fail;
} }
*validated_p0 = tex->paddr + p0; *validated_p0 = tex->paddr + p0;
@@ -866,7 +859,7 @@ validate_gl_shader_rec(struct drm_device *dev,
if (vbo->base.size < offset || if (vbo->base.size < offset ||
vbo->base.size - offset < attr_size) { vbo->base.size - offset < attr_size) {
DRM_ERROR("BO offset overflow (%d + %d > %d)\n", DRM_ERROR("BO offset overflow (%d + %d > %zd)\n",
offset, attr_size, vbo->base.size); offset, attr_size, vbo->base.size);
return -EINVAL; return -EINVAL;
} }
@@ -875,7 +868,8 @@ validate_gl_shader_rec(struct drm_device *dev,
max_index = ((vbo->base.size - offset - attr_size) / max_index = ((vbo->base.size - offset - attr_size) /
stride); stride);
if (state->max_index > max_index) { if (state->max_index > max_index) {
DRM_ERROR("primitives use index %d out of supplied %d\n", DRM_ERROR("primitives use index %d out of "
"supplied %d\n",
state->max_index, max_index); state->max_index, max_index);
return -EINVAL; return -EINVAL;
} }

View File

@@ -24,24 +24,16 @@
/** /**
* DOC: Shader validator for VC4. * DOC: Shader validator for VC4.
* *
* The VC4 has no IOMMU between it and system memory. So, a user with access * The VC4 has no IOMMU between it and system memory, so a user with
* to execute shaders could escalate privilege by overwriting system memory * access to execute shaders could escalate privilege by overwriting
* (using the VPM write address register in the general-purpose DMA mode) or * system memory (using the VPM write address register in the
* reading system memory it shouldn't (reading it as a texture, or uniform * general-purpose DMA mode) or reading system memory it shouldn't
* data, or vertex data). * (reading it as a texture, or uniform data, or vertex data).
* *
* This walks over a shader starting from some offset within a BO, ensuring * This walks over a shader BO, ensuring that its accesses are
* that its accesses are appropriately bounded, and recording how many texture * appropriately bounded, and recording how many texture accesses are
* accesses are made and where so that we can do relocations for them in the * made and where so that we can do relocations for them in the
* uniform stream. * uniform stream.
*
* The kernel API has shaders stored in user-mapped BOs. The BOs will be
* forcibly unmapped from the process before validation, and any cache of
* validated state will be flushed if the mapping is faulted back in.
*
* Storing the shaders in BOs means that the validation process will be slow
* due to uncached reads, but since shaders are long-lived and shader BOs are
* never actually modified, this shouldn't be a problem.
*/ */
#include "vc4_drv.h" #include "vc4_drv.h"
@@ -71,7 +63,6 @@ waddr_to_live_reg_index(uint32_t waddr, bool is_b)
else else
return waddr; return waddr;
} else if (waddr <= QPU_W_ACC3) { } else if (waddr <= QPU_W_ACC3) {
return 64 + waddr - QPU_W_ACC0; return 64 + waddr - QPU_W_ACC0;
} else { } else {
return ~0; return ~0;
@@ -86,15 +77,14 @@ raddr_add_a_to_live_reg_index(uint64_t inst)
uint32_t raddr_a = QPU_GET_FIELD(inst, QPU_RADDR_A); uint32_t raddr_a = QPU_GET_FIELD(inst, QPU_RADDR_A);
uint32_t raddr_b = QPU_GET_FIELD(inst, QPU_RADDR_B); uint32_t raddr_b = QPU_GET_FIELD(inst, QPU_RADDR_B);
if (add_a == QPU_MUX_A) { if (add_a == QPU_MUX_A)
return raddr_a; return raddr_a;
} else if (add_a == QPU_MUX_B && sig != QPU_SIG_SMALL_IMM) { else if (add_a == QPU_MUX_B && sig != QPU_SIG_SMALL_IMM)
return 32 + raddr_b; return 32 + raddr_b;
} else if (add_a <= QPU_MUX_R3) { else if (add_a <= QPU_MUX_R3)
return 64 + add_a; return 64 + add_a;
} else { else
return ~0; return ~0;
}
} }
static bool static bool
@@ -112,9 +102,9 @@ is_tmu_write(uint32_t waddr)
} }
static bool static bool
record_validated_texture_sample(struct vc4_validated_shader_info *validated_shader, record_texture_sample(struct vc4_validated_shader_info *validated_shader,
struct vc4_shader_validation_state *validation_state, struct vc4_shader_validation_state *validation_state,
int tmu) int tmu)
{ {
uint32_t s = validated_shader->num_texture_samples; uint32_t s = validated_shader->num_texture_samples;
int i; int i;
@@ -227,8 +217,8 @@ check_tmu_write(uint64_t inst,
validated_shader->uniforms_size += 4; validated_shader->uniforms_size += 4;
if (submit) { if (submit) {
if (!record_validated_texture_sample(validated_shader, if (!record_texture_sample(validated_shader,
validation_state, tmu)) { validation_state, tmu)) {
return false; return false;
} }
@@ -239,10 +229,10 @@ check_tmu_write(uint64_t inst,
} }
static bool static bool
check_register_write(uint64_t inst, check_reg_write(uint64_t inst,
struct vc4_validated_shader_info *validated_shader, struct vc4_validated_shader_info *validated_shader,
struct vc4_shader_validation_state *validation_state, struct vc4_shader_validation_state *validation_state,
bool is_mul) bool is_mul)
{ {
uint32_t waddr = (is_mul ? uint32_t waddr = (is_mul ?
QPU_GET_FIELD(inst, QPU_WADDR_MUL) : QPU_GET_FIELD(inst, QPU_WADDR_MUL) :
@@ -298,7 +288,7 @@ check_register_write(uint64_t inst,
return true; return true;
case QPU_W_TLB_STENCIL_SETUP: case QPU_W_TLB_STENCIL_SETUP:
return true; return true;
} }
return true; return true;
@@ -361,7 +351,7 @@ track_live_clamps(uint64_t inst,
} }
validation_state->live_max_clamp_regs[lri_add] = true; validation_state->live_max_clamp_regs[lri_add] = true;
} if (op_add == QPU_A_MIN) { } else if (op_add == QPU_A_MIN) {
/* Track live clamps of a value clamped to a minimum of 0 and /* Track live clamps of a value clamped to a minimum of 0 and
* a maximum of some uniform's offset. * a maximum of some uniform's offset.
*/ */
@@ -393,8 +383,10 @@ check_instruction_writes(uint64_t inst,
return false; return false;
} }
ok = (check_register_write(inst, validated_shader, validation_state, false) && ok = (check_reg_write(inst, validated_shader, validation_state,
check_register_write(inst, validated_shader, validation_state, true)); false) &&
check_reg_write(inst, validated_shader, validation_state,
true));
track_live_clamps(inst, validated_shader, validation_state); track_live_clamps(inst, validated_shader, validation_state);
@@ -442,7 +434,7 @@ vc4_validate_shader(struct drm_gem_cma_object *shader_obj)
shader = shader_obj->vaddr; shader = shader_obj->vaddr;
max_ip = shader_obj->base.size / sizeof(uint64_t); max_ip = shader_obj->base.size / sizeof(uint64_t);
validated_shader = kcalloc(sizeof(*validated_shader), 1, GFP_KERNEL); validated_shader = kcalloc(1, sizeof(*validated_shader), GFP_KERNEL);
if (!validated_shader) if (!validated_shader)
return NULL; return NULL;
@@ -498,7 +490,7 @@ vc4_validate_shader(struct drm_gem_cma_object *shader_obj)
if (ip == max_ip) { if (ip == max_ip) {
DRM_ERROR("shader failed to terminate before " DRM_ERROR("shader failed to terminate before "
"shader BO end at %d\n", "shader BO end at %zd\n",
shader_obj->base.size); shader_obj->base.size);
goto fail; goto fail;
} }
@@ -514,6 +506,9 @@ vc4_validate_shader(struct drm_gem_cma_object *shader_obj)
return validated_shader; return validated_shader;
fail: fail:
kfree(validated_shader); if (validated_shader) {
kfree(validated_shader->texture_samples);
kfree(validated_shader);
}
return NULL; return NULL;
} }

View File

@@ -41,24 +41,53 @@ vc4_get_blit_surface(struct pipe_context *pctx,
return pctx->create_surface(pctx, prsc, &tmpl); return pctx->create_surface(pctx, prsc, &tmpl);
} }
static bool
is_tile_unaligned(unsigned size, unsigned tile_size)
{
return size & (tile_size - 1);
}
static bool static bool
vc4_tile_blit(struct pipe_context *pctx, const struct pipe_blit_info *info) vc4_tile_blit(struct pipe_context *pctx, const struct pipe_blit_info *info)
{ {
struct vc4_context *vc4 = vc4_context(pctx); struct vc4_context *vc4 = vc4_context(pctx);
bool old_msaa = vc4->msaa;
int old_tile_width = vc4->tile_width;
int old_tile_height = vc4->tile_height;
bool msaa = (info->src.resource->nr_samples ||
info->dst.resource->nr_samples);
int tile_width = msaa ? 32 : 64;
int tile_height = msaa ? 32 : 64;
if (util_format_is_depth_or_stencil(info->dst.resource->format)) if (util_format_is_depth_or_stencil(info->dst.resource->format))
return false; return false;
if (info->scissor_enable)
return false;
if ((info->mask & PIPE_MASK_RGBA) == 0) if ((info->mask & PIPE_MASK_RGBA) == 0)
return false; return false;
if (info->dst.box.x != 0 || info->dst.box.y != 0 || if (info->dst.box.x != info->src.box.x ||
info->src.box.x != 0 || info->src.box.y != 0 || info->dst.box.y != info->src.box.y ||
info->dst.box.width != info->src.box.width || info->dst.box.width != info->src.box.width ||
info->dst.box.height != info->src.box.height) { info->dst.box.height != info->src.box.height) {
return false; return false;
} }
int dst_surface_width = u_minify(info->dst.resource->width0,
info->dst.level);
int dst_surface_height = u_minify(info->dst.resource->height0,
info->dst.level);
if (is_tile_unaligned(info->dst.box.x, tile_width) ||
is_tile_unaligned(info->dst.box.y, tile_height) ||
(is_tile_unaligned(info->dst.box.width, tile_width) &&
info->dst.box.x + info->dst.box.width != dst_surface_width) ||
(is_tile_unaligned(info->dst.box.height, tile_height) &&
info->dst.box.y + info->dst.box.height != dst_surface_height)) {
return false;
}
if (info->dst.resource->format != info->src.resource->format) if (info->dst.resource->format != info->src.resource->format)
return false; return false;
@@ -70,18 +99,32 @@ vc4_tile_blit(struct pipe_context *pctx, const struct pipe_blit_info *info)
vc4_get_blit_surface(pctx, info->src.resource, info->src.level); vc4_get_blit_surface(pctx, info->src.resource, info->src.level);
pipe_surface_reference(&vc4->color_read, src_surf); pipe_surface_reference(&vc4->color_read, src_surf);
pipe_surface_reference(&vc4->color_write, dst_surf); pipe_surface_reference(&vc4->color_write,
dst_surf->texture->nr_samples ? NULL : dst_surf);
pipe_surface_reference(&vc4->msaa_color_write,
dst_surf->texture->nr_samples ? dst_surf : NULL);
pipe_surface_reference(&vc4->zs_read, NULL); pipe_surface_reference(&vc4->zs_read, NULL);
pipe_surface_reference(&vc4->zs_write, NULL); pipe_surface_reference(&vc4->zs_write, NULL);
vc4->draw_min_x = 0; pipe_surface_reference(&vc4->msaa_zs_write, NULL);
vc4->draw_min_y = 0;
vc4->draw_max_x = dst_surf->width; vc4->draw_min_x = info->dst.box.x;
vc4->draw_max_y = dst_surf->height; vc4->draw_min_y = info->dst.box.y;
vc4->draw_max_x = info->dst.box.x + info->dst.box.width;
vc4->draw_max_y = info->dst.box.y + info->dst.box.height;
vc4->draw_width = dst_surf->width; vc4->draw_width = dst_surf->width;
vc4->draw_height = dst_surf->height; vc4->draw_height = dst_surf->height;
vc4->tile_width = tile_width;
vc4->tile_height = tile_height;
vc4->msaa = msaa;
vc4->needs_flush = true; vc4->needs_flush = true;
vc4_job_submit(vc4); vc4_job_submit(vc4);
vc4->msaa = old_msaa;
vc4->tile_width = old_tile_width;
vc4->tile_height = old_tile_height;
pipe_surface_reference(&dst_surf, NULL); pipe_surface_reference(&dst_surf, NULL);
pipe_surface_reference(&src_surf, NULL); pipe_surface_reference(&src_surf, NULL);
@@ -131,14 +174,6 @@ vc4_blit(struct pipe_context *pctx, const struct pipe_blit_info *blit_info)
{ {
struct pipe_blit_info info = *blit_info; struct pipe_blit_info info = *blit_info;
if (info.src.resource->nr_samples > 1 &&
info.dst.resource->nr_samples <= 1 &&
!util_format_is_depth_or_stencil(info.src.resource->format) &&
!util_format_is_pure_integer(info.src.resource->format)) {
fprintf(stderr, "color resolve unimplemented\n");
return;
}
if (vc4_tile_blit(pctx, blit_info)) if (vc4_tile_blit(pctx, blit_info))
return; return;

View File

@@ -67,8 +67,16 @@ vc4_flush(struct pipe_context *pctx)
cl_u8(&bcl, VC4_PACKET_FLUSH); cl_u8(&bcl, VC4_PACKET_FLUSH);
cl_end(&vc4->bcl, bcl); cl_end(&vc4->bcl, bcl);
vc4->msaa = false;
if (cbuf && (vc4->resolve & PIPE_CLEAR_COLOR0)) { if (cbuf && (vc4->resolve & PIPE_CLEAR_COLOR0)) {
pipe_surface_reference(&vc4->color_write, cbuf); pipe_surface_reference(&vc4->color_write,
cbuf->texture->nr_samples ? NULL : cbuf);
pipe_surface_reference(&vc4->msaa_color_write,
cbuf->texture->nr_samples ? cbuf : NULL);
if (cbuf->texture->nr_samples)
vc4->msaa = true;
if (!(vc4->cleared & PIPE_CLEAR_COLOR0)) { if (!(vc4->cleared & PIPE_CLEAR_COLOR0)) {
pipe_surface_reference(&vc4->color_read, cbuf); pipe_surface_reference(&vc4->color_read, cbuf);
} else { } else {
@@ -78,11 +86,21 @@ vc4_flush(struct pipe_context *pctx)
} else { } else {
pipe_surface_reference(&vc4->color_write, NULL); pipe_surface_reference(&vc4->color_write, NULL);
pipe_surface_reference(&vc4->color_read, NULL); pipe_surface_reference(&vc4->color_read, NULL);
pipe_surface_reference(&vc4->msaa_color_write, NULL);
} }
if (vc4->framebuffer.zsbuf && if (vc4->framebuffer.zsbuf &&
(vc4->resolve & (PIPE_CLEAR_DEPTH | PIPE_CLEAR_STENCIL))) { (vc4->resolve & (PIPE_CLEAR_DEPTH | PIPE_CLEAR_STENCIL))) {
pipe_surface_reference(&vc4->zs_write, zsbuf); pipe_surface_reference(&vc4->zs_write,
zsbuf->texture->nr_samples ?
NULL : zsbuf);
pipe_surface_reference(&vc4->msaa_zs_write,
zsbuf->texture->nr_samples ?
zsbuf : NULL);
if (zsbuf->texture->nr_samples)
vc4->msaa = true;
if (!(vc4->cleared & (PIPE_CLEAR_DEPTH | PIPE_CLEAR_STENCIL))) { if (!(vc4->cleared & (PIPE_CLEAR_DEPTH | PIPE_CLEAR_STENCIL))) {
pipe_surface_reference(&vc4->zs_read, zsbuf); pipe_surface_reference(&vc4->zs_read, zsbuf);
} else { } else {
@@ -91,6 +109,7 @@ vc4_flush(struct pipe_context *pctx)
} else { } else {
pipe_surface_reference(&vc4->zs_write, NULL); pipe_surface_reference(&vc4->zs_write, NULL);
pipe_surface_reference(&vc4->zs_read, NULL); pipe_surface_reference(&vc4->zs_read, NULL);
pipe_surface_reference(&vc4->msaa_zs_write, NULL);
} }
vc4_job_submit(vc4); vc4_job_submit(vc4);
@@ -245,6 +264,8 @@ vc4_context_create(struct pipe_screen *pscreen, void *priv, unsigned flags)
vc4_debug |= saved_shaderdb_flag; vc4_debug |= saved_shaderdb_flag;
vc4->sample_mask = (1 << VC4_MAX_SAMPLES) - 1;
return &vc4->base; return &vc4->base;
fail: fail:

View File

@@ -206,6 +206,8 @@ struct vc4_context {
struct pipe_surface *color_write; struct pipe_surface *color_write;
struct pipe_surface *zs_read; struct pipe_surface *zs_read;
struct pipe_surface *zs_write; struct pipe_surface *zs_write;
struct pipe_surface *msaa_color_write;
struct pipe_surface *msaa_zs_write;
/** @} */ /** @} */
/** @{ /** @{
* Bounding box of the scissor across all queued drawing. * Bounding box of the scissor across all queued drawing.
@@ -224,6 +226,15 @@ struct vc4_context {
uint32_t draw_width; uint32_t draw_width;
uint32_t draw_height; uint32_t draw_height;
/** @} */ /** @} */
/** @{ Tile information, depending on MSAA and float color buffer. */
uint32_t draw_tiles_x; /** @< Number of tiles wide for framebuffer. */
uint32_t draw_tiles_y; /** @< Number of tiles high for framebuffer. */
uint32_t tile_width; /** @< Width of a tile. */
uint32_t tile_height; /** @< Height of a tile. */
/** Whether the current rendering is in a 4X MSAA tile buffer. */
bool msaa;
/** @} */
struct util_slab_mempool transfer_pool; struct util_slab_mempool transfer_pool;
struct blitter_context *blitter; struct blitter_context *blitter;

View File

@@ -68,21 +68,17 @@ vc4_start_draw(struct vc4_context *vc4)
vc4_get_draw_cl_space(vc4); vc4_get_draw_cl_space(vc4);
uint32_t width = vc4->framebuffer.width;
uint32_t height = vc4->framebuffer.height;
uint32_t tilew = align(width, 64) / 64;
uint32_t tileh = align(height, 64) / 64;
struct vc4_cl_out *bcl = cl_start(&vc4->bcl); struct vc4_cl_out *bcl = cl_start(&vc4->bcl);
// Tile state data is 48 bytes per tile, I think it can be thrown away // Tile state data is 48 bytes per tile, I think it can be thrown away
// as soon as binning is finished. // as soon as binning is finished.
cl_u8(&bcl, VC4_PACKET_TILE_BINNING_MODE_CONFIG); cl_u8(&bcl, VC4_PACKET_TILE_BINNING_MODE_CONFIG);
cl_u32(&bcl, 0); /* tile alloc addr, filled by kernel */ cl_u32(&bcl, 0); /* tile alloc addr, filled by kernel */
cl_u32(&bcl, 0); /* tile alloc size, filled by kernel */ cl_u32(&bcl, 0); /* tile alloc size, filled by kernel */
cl_u32(&bcl, 0); /* tile state addr, filled by kernel */ cl_u32(&bcl, 0); /* tile state addr, filled by kernel */
cl_u8(&bcl, tilew); cl_u8(&bcl, vc4->draw_tiles_x);
cl_u8(&bcl, tileh); cl_u8(&bcl, vc4->draw_tiles_y);
cl_u8(&bcl, 0); /* flags, filled by kernel. */ /* Other flags are filled by kernel. */
cl_u8(&bcl, vc4->msaa ? VC4_BIN_CONFIG_MS_MODE_4X : 0);
/* START_TILE_BINNING resets the statechange counters in the hardware, /* START_TILE_BINNING resets the statechange counters in the hardware,
* which are what is used when a primitive is binned to a tile to * which are what is used when a primitive is binned to a tile to
@@ -102,8 +98,8 @@ vc4_start_draw(struct vc4_context *vc4)
vc4->needs_flush = true; vc4->needs_flush = true;
vc4->draw_calls_queued++; vc4->draw_calls_queued++;
vc4->draw_width = width; vc4->draw_width = vc4->framebuffer.width;
vc4->draw_height = height; vc4->draw_height = vc4->framebuffer.height;
cl_end(&vc4->bcl, bcl); cl_end(&vc4->bcl, bcl);
} }

View File

@@ -44,10 +44,13 @@ struct drm_vc4_submit_rcl_surface {
uint32_t hindex; /* Handle index, or ~0 if not present. */ uint32_t hindex; /* Handle index, or ~0 if not present. */
uint32_t offset; /* Offset to start of buffer. */ uint32_t offset; /* Offset to start of buffer. */
/* /*
* Bits for either render config (color_ms_write) or load/store packet. * Bits for either render config (color_write) or load/store packet.
* Bits should all be 0 for MSAA load/stores.
*/ */
uint16_t bits; uint16_t bits;
uint16_t pad;
#define VC4_SUBMIT_RCL_SURFACE_READ_IS_FULL_RES (1 << 0)
uint16_t flags;
}; };
/** /**
@@ -126,9 +129,11 @@ struct drm_vc4_submit_cl {
uint8_t max_x_tile; uint8_t max_x_tile;
uint8_t max_y_tile; uint8_t max_y_tile;
struct drm_vc4_submit_rcl_surface color_read; struct drm_vc4_submit_rcl_surface color_read;
struct drm_vc4_submit_rcl_surface color_ms_write; struct drm_vc4_submit_rcl_surface color_write;
struct drm_vc4_submit_rcl_surface zs_read; struct drm_vc4_submit_rcl_surface zs_read;
struct drm_vc4_submit_rcl_surface zs_write; struct drm_vc4_submit_rcl_surface zs_write;
struct drm_vc4_submit_rcl_surface msaa_color_write;
struct drm_vc4_submit_rcl_surface msaa_zs_write;
uint32_t clear_color[2]; uint32_t clear_color[2];
uint32_t clear_z; uint32_t clear_z;
uint8_t clear_s; uint8_t clear_s;

View File

@@ -29,17 +29,35 @@ vc4_emit_state(struct pipe_context *pctx)
struct vc4_context *vc4 = vc4_context(pctx); struct vc4_context *vc4 = vc4_context(pctx);
struct vc4_cl_out *bcl = cl_start(&vc4->bcl); struct vc4_cl_out *bcl = cl_start(&vc4->bcl);
if (vc4->dirty & (VC4_DIRTY_SCISSOR | VC4_DIRTY_VIEWPORT)) { if (vc4->dirty & (VC4_DIRTY_SCISSOR | VC4_DIRTY_VIEWPORT |
VC4_DIRTY_RASTERIZER)) {
float *vpscale = vc4->viewport.scale; float *vpscale = vc4->viewport.scale;
float *vptranslate = vc4->viewport.translate; float *vptranslate = vc4->viewport.translate;
float vp_minx = -fabsf(vpscale[0]) + vptranslate[0]; float vp_minx = -fabsf(vpscale[0]) + vptranslate[0];
float vp_maxx = fabsf(vpscale[0]) + vptranslate[0]; float vp_maxx = fabsf(vpscale[0]) + vptranslate[0];
float vp_miny = -fabsf(vpscale[1]) + vptranslate[1]; float vp_miny = -fabsf(vpscale[1]) + vptranslate[1];
float vp_maxy = fabsf(vpscale[1]) + vptranslate[1]; float vp_maxy = fabsf(vpscale[1]) + vptranslate[1];
uint32_t minx = MAX2(vc4->scissor.minx, vp_minx);
uint32_t miny = MAX2(vc4->scissor.miny, vp_miny); /* Clip to the scissor if it's enabled, but still clip to the
uint32_t maxx = MIN2(vc4->scissor.maxx, vp_maxx); * drawable regardless since that controls where the binner
uint32_t maxy = MIN2(vc4->scissor.maxy, vp_maxy); * tries to put things.
*
* Additionally, always clip the rendering to the viewport,
* since the hardware does guardband clipping, meaning
* primitives would rasterize outside of the view volume.
*/
uint32_t minx, miny, maxx, maxy;
if (!vc4->rasterizer->base.scissor) {
minx = MAX2(vp_minx, 0);
miny = MAX2(vp_miny, 0);
maxx = MIN2(vp_maxx, vc4->draw_width);
maxy = MIN2(vp_maxy, vc4->draw_height);
} else {
minx = MAX2(vp_minx, vc4->scissor.minx);
miny = MAX2(vp_miny, vc4->scissor.miny);
maxx = MIN2(vp_maxx, vc4->scissor.maxx);
maxy = MIN2(vp_maxy, vc4->scissor.maxy);
}
cl_u8(&bcl, VC4_PACKET_CLIP_WINDOW); cl_u8(&bcl, VC4_PACKET_CLIP_WINDOW);
cl_u16(&bcl, minx); cl_u16(&bcl, minx);
@@ -54,6 +72,20 @@ vc4_emit_state(struct pipe_context *pctx)
} }
if (vc4->dirty & (VC4_DIRTY_RASTERIZER | VC4_DIRTY_ZSA)) { if (vc4->dirty & (VC4_DIRTY_RASTERIZER | VC4_DIRTY_ZSA)) {
uint8_t ez_enable_mask_out = ~0;
/* HW-2905: If the RCL ends up doing a full-res load when
* multisampling, then early Z tracking may end up with values
* from the previous tile due to a HW bug. Disable it to
* avoid that.
*
* We should be able to skip this when the Z is cleared, but I
* was seeing bad rendering on glxgears -samples 4 even in
* that case.
*/
if (vc4->msaa)
ez_enable_mask_out &= ~VC4_CONFIG_BITS_EARLY_Z;
cl_u8(&bcl, VC4_PACKET_CONFIGURATION_BITS); cl_u8(&bcl, VC4_PACKET_CONFIGURATION_BITS);
cl_u8(&bcl, cl_u8(&bcl,
vc4->rasterizer->config_bits[0] | vc4->rasterizer->config_bits[0] |
@@ -62,8 +94,8 @@ vc4_emit_state(struct pipe_context *pctx)
vc4->rasterizer->config_bits[1] | vc4->rasterizer->config_bits[1] |
vc4->zsa->config_bits[1]); vc4->zsa->config_bits[1]);
cl_u8(&bcl, cl_u8(&bcl,
vc4->rasterizer->config_bits[2] | (vc4->rasterizer->config_bits[2] |
vc4->zsa->config_bits[2]); vc4->zsa->config_bits[2]) & ez_enable_mask_out);
} }
if (vc4->dirty & VC4_DIRTY_RASTERIZER) { if (vc4->dirty & VC4_DIRTY_RASTERIZER) {

View File

@@ -89,31 +89,37 @@ vc4_submit_setup_rcl_surface(struct vc4_context *vc4,
submit_surf->hindex = vc4_gem_hindex(vc4, rsc->bo); submit_surf->hindex = vc4_gem_hindex(vc4, rsc->bo);
submit_surf->offset = surf->offset; submit_surf->offset = surf->offset;
if (is_depth) { if (psurf->texture->nr_samples == 0) {
submit_surf->bits = if (is_depth) {
VC4_SET_FIELD(VC4_LOADSTORE_TILE_BUFFER_ZS, submit_surf->bits =
VC4_LOADSTORE_TILE_BUFFER_BUFFER); VC4_SET_FIELD(VC4_LOADSTORE_TILE_BUFFER_ZS,
VC4_LOADSTORE_TILE_BUFFER_BUFFER);
} else {
submit_surf->bits =
VC4_SET_FIELD(VC4_LOADSTORE_TILE_BUFFER_COLOR,
VC4_LOADSTORE_TILE_BUFFER_BUFFER) |
VC4_SET_FIELD(vc4_rt_format_is_565(psurf->format) ?
VC4_LOADSTORE_TILE_BUFFER_BGR565 :
VC4_LOADSTORE_TILE_BUFFER_RGBA8888,
VC4_LOADSTORE_TILE_BUFFER_FORMAT);
}
submit_surf->bits |=
VC4_SET_FIELD(surf->tiling,
VC4_LOADSTORE_TILE_BUFFER_TILING);
} else { } else {
submit_surf->bits = assert(!is_write);
VC4_SET_FIELD(VC4_LOADSTORE_TILE_BUFFER_COLOR, submit_surf->flags |= VC4_SUBMIT_RCL_SURFACE_READ_IS_FULL_RES;
VC4_LOADSTORE_TILE_BUFFER_BUFFER) |
VC4_SET_FIELD(vc4_rt_format_is_565(psurf->format) ?
VC4_LOADSTORE_TILE_BUFFER_BGR565 :
VC4_LOADSTORE_TILE_BUFFER_RGBA8888,
VC4_LOADSTORE_TILE_BUFFER_FORMAT);
} }
submit_surf->bits |=
VC4_SET_FIELD(surf->tiling, VC4_LOADSTORE_TILE_BUFFER_TILING);
if (is_write) if (is_write)
rsc->writes++; rsc->writes++;
} }
static void static void
vc4_submit_setup_ms_rcl_surface(struct vc4_context *vc4, vc4_submit_setup_rcl_render_config_surface(struct vc4_context *vc4,
struct drm_vc4_submit_rcl_surface *submit_surf, struct drm_vc4_submit_rcl_surface *submit_surf,
struct pipe_surface *psurf) struct pipe_surface *psurf)
{ {
struct vc4_surface *surf = vc4_surface(psurf); struct vc4_surface *surf = vc4_surface(psurf);
@@ -126,16 +132,38 @@ vc4_submit_setup_ms_rcl_surface(struct vc4_context *vc4,
submit_surf->hindex = vc4_gem_hindex(vc4, rsc->bo); submit_surf->hindex = vc4_gem_hindex(vc4, rsc->bo);
submit_surf->offset = surf->offset; submit_surf->offset = surf->offset;
submit_surf->bits = if (psurf->texture->nr_samples == 0) {
VC4_SET_FIELD(vc4_rt_format_is_565(surf->base.format) ? submit_surf->bits =
VC4_RENDER_CONFIG_FORMAT_BGR565 : VC4_SET_FIELD(vc4_rt_format_is_565(surf->base.format) ?
VC4_RENDER_CONFIG_FORMAT_RGBA8888, VC4_RENDER_CONFIG_FORMAT_BGR565 :
VC4_RENDER_CONFIG_FORMAT) | VC4_RENDER_CONFIG_FORMAT_RGBA8888,
VC4_SET_FIELD(surf->tiling, VC4_RENDER_CONFIG_MEMORY_FORMAT); VC4_RENDER_CONFIG_FORMAT) |
VC4_SET_FIELD(surf->tiling,
VC4_RENDER_CONFIG_MEMORY_FORMAT);
}
rsc->writes++; rsc->writes++;
} }
static void
vc4_submit_setup_rcl_msaa_surface(struct vc4_context *vc4,
struct drm_vc4_submit_rcl_surface *submit_surf,
struct pipe_surface *psurf)
{
struct vc4_surface *surf = vc4_surface(psurf);
if (!surf) {
submit_surf->hindex = ~0;
return;
}
struct vc4_resource *rsc = vc4_resource(psurf->texture);
submit_surf->hindex = vc4_gem_hindex(vc4, rsc->bo);
submit_surf->offset = surf->offset;
submit_surf->bits = 0;
rsc->writes++;
}
/** /**
* Submits the job to the kernel and then reinitializes it. * Submits the job to the kernel and then reinitializes it.
*/ */
@@ -150,18 +178,35 @@ vc4_job_submit(struct vc4_context *vc4)
struct drm_vc4_submit_cl submit; struct drm_vc4_submit_cl submit;
memset(&submit, 0, sizeof(submit)); memset(&submit, 0, sizeof(submit));
cl_ensure_space(&vc4->bo_handles, 4 * sizeof(uint32_t)); cl_ensure_space(&vc4->bo_handles, 6 * sizeof(uint32_t));
cl_ensure_space(&vc4->bo_pointers, 4 * sizeof(struct vc4_bo *)); cl_ensure_space(&vc4->bo_pointers, 6 * sizeof(struct vc4_bo *));
vc4_submit_setup_rcl_surface(vc4, &submit.color_read, vc4_submit_setup_rcl_surface(vc4, &submit.color_read,
vc4->color_read, false, false); vc4->color_read, false, false);
vc4_submit_setup_ms_rcl_surface(vc4, &submit.color_ms_write, vc4_submit_setup_rcl_render_config_surface(vc4, &submit.color_write,
vc4->color_write); vc4->color_write);
vc4_submit_setup_rcl_surface(vc4, &submit.zs_read, vc4_submit_setup_rcl_surface(vc4, &submit.zs_read,
vc4->zs_read, true, false); vc4->zs_read, true, false);
vc4_submit_setup_rcl_surface(vc4, &submit.zs_write, vc4_submit_setup_rcl_surface(vc4, &submit.zs_write,
vc4->zs_write, true, true); vc4->zs_write, true, true);
vc4_submit_setup_rcl_msaa_surface(vc4, &submit.msaa_color_write,
vc4->msaa_color_write);
vc4_submit_setup_rcl_msaa_surface(vc4, &submit.msaa_zs_write,
vc4->msaa_zs_write);
if (vc4->msaa) {
/* This bit controls how many pixels the general
* (i.e. subsampled) loads/stores are iterating over
* (multisample loads replicate out to the other samples).
*/
submit.color_write.bits |= VC4_RENDER_CONFIG_MS_MODE_4X;
/* Controls whether color_write's
* VC4_PACKET_STORE_MS_TILE_BUFFER does 4x decimation
*/
submit.color_write.bits |= VC4_RENDER_CONFIG_DECIMATE_MODE_4X;
}
submit.bo_handles = (uintptr_t)vc4->bo_handles.base; submit.bo_handles = (uintptr_t)vc4->bo_handles.base;
submit.bo_handle_count = cl_offset(&vc4->bo_handles) / 4; submit.bo_handle_count = cl_offset(&vc4->bo_handles) / 4;
submit.bin_cl = (uintptr_t)vc4->bcl.base; submit.bin_cl = (uintptr_t)vc4->bcl.base;
@@ -173,10 +218,10 @@ vc4_job_submit(struct vc4_context *vc4)
submit.uniforms_size = cl_offset(&vc4->uniforms); submit.uniforms_size = cl_offset(&vc4->uniforms);
assert(vc4->draw_min_x != ~0 && vc4->draw_min_y != ~0); assert(vc4->draw_min_x != ~0 && vc4->draw_min_y != ~0);
submit.min_x_tile = vc4->draw_min_x / 64; submit.min_x_tile = vc4->draw_min_x / vc4->tile_width;
submit.min_y_tile = vc4->draw_min_y / 64; submit.min_y_tile = vc4->draw_min_y / vc4->tile_height;
submit.max_x_tile = (vc4->draw_max_x - 1) / 64; submit.max_x_tile = (vc4->draw_max_x - 1) / vc4->tile_width;
submit.max_y_tile = (vc4->draw_max_y - 1) / 64; submit.max_y_tile = (vc4->draw_max_y - 1) / vc4->tile_height;
submit.width = vc4->draw_width; submit.width = vc4->draw_width;
submit.height = vc4->draw_height; submit.height = vc4->draw_height;
if (vc4->cleared) { if (vc4->cleared) {

View File

@@ -29,6 +29,10 @@
* from the tile buffer after having waited for the scoreboard (which is * from the tile buffer after having waited for the scoreboard (which is
* handled by vc4_qpu_emit.c), then do math using your output color and that * handled by vc4_qpu_emit.c), then do math using your output color and that
* destination value, and update the output color appropriately. * destination value, and update the output color appropriately.
*
* Once this pass is done, the color write will either have one component (for
* single sample) with packed argb8888, or 4 components with the per-sample
* argb8888 result.
*/ */
/** /**
@@ -40,15 +44,23 @@
#include "glsl/nir/nir_builder.h" #include "glsl/nir/nir_builder.h"
#include "vc4_context.h" #include "vc4_context.h"
static bool
blend_depends_on_dst_color(struct vc4_compile *c)
{
return (c->fs_key->blend.blend_enable ||
c->fs_key->blend.colormask != 0xf ||
c->fs_key->logicop_func != PIPE_LOGICOP_COPY);
}
/** Emits a load of the previous fragment color from the tile buffer. */ /** Emits a load of the previous fragment color from the tile buffer. */
static nir_ssa_def * static nir_ssa_def *
vc4_nir_get_dst_color(nir_builder *b) vc4_nir_get_dst_color(nir_builder *b, int sample)
{ {
nir_intrinsic_instr *load = nir_intrinsic_instr *load =
nir_intrinsic_instr_create(b->shader, nir_intrinsic_instr_create(b->shader,
nir_intrinsic_load_input); nir_intrinsic_load_input);
load->num_components = 1; load->num_components = 1;
load->const_index[0] = VC4_NIR_TLB_COLOR_READ_INPUT; load->const_index[0] = VC4_NIR_TLB_COLOR_READ_INPUT + sample;
nir_ssa_dest_init(&load->instr, &load->dest, 1, NULL); nir_ssa_dest_init(&load->instr, &load->dest, 1, NULL);
nir_builder_instr_insert(b, &load->instr); nir_builder_instr_insert(b, &load->instr);
return &load->dest.ssa; return &load->dest.ssa;
@@ -496,23 +508,26 @@ vc4_nir_swizzle_and_pack(struct vc4_compile *c, nir_builder *b,
} }
static void static nir_ssa_def *
vc4_nir_lower_blend_instr(struct vc4_compile *c, nir_builder *b, vc4_nir_blend_pipeline(struct vc4_compile *c, nir_builder *b, nir_ssa_def *src,
nir_intrinsic_instr *intr) int sample)
{ {
enum pipe_format color_format = c->fs_key->color_format; enum pipe_format color_format = c->fs_key->color_format;
const uint8_t *format_swiz = vc4_get_format_swizzle(color_format); const uint8_t *format_swiz = vc4_get_format_swizzle(color_format);
bool srgb = util_format_is_srgb(color_format); bool srgb = util_format_is_srgb(color_format);
/* Pull out the float src/dst color components. */ /* Pull out the float src/dst color components. */
nir_ssa_def *packed_dst_color = vc4_nir_get_dst_color(b); nir_ssa_def *packed_dst_color = vc4_nir_get_dst_color(b, sample);
nir_ssa_def *dst_vec4 = nir_unpack_unorm_4x8(b, packed_dst_color); nir_ssa_def *dst_vec4 = nir_unpack_unorm_4x8(b, packed_dst_color);
nir_ssa_def *src_color[4], *unpacked_dst_color[4]; nir_ssa_def *src_color[4], *unpacked_dst_color[4];
for (unsigned i = 0; i < 4; i++) { for (unsigned i = 0; i < 4; i++) {
src_color[i] = nir_swizzle(b, intr->src[0].ssa, &i, 1, false); src_color[i] = nir_channel(b, src, i);
unpacked_dst_color[i] = nir_swizzle(b, dst_vec4, &i, 1, false); unpacked_dst_color[i] = nir_channel(b, dst_vec4, i);
} }
if (c->fs_key->sample_alpha_to_one && c->fs_key->msaa)
src_color[3] = nir_imm_float(b, 1.0);
vc4_nir_emit_alpha_test_discard(c, b, src_color[3]); vc4_nir_emit_alpha_test_discard(c, b, src_color[3]);
nir_ssa_def *packed_color; nir_ssa_def *packed_color;
@@ -560,16 +575,100 @@ vc4_nir_lower_blend_instr(struct vc4_compile *c, nir_builder *b,
colormask &= ~(0xff << (i * 8)); colormask &= ~(0xff << (i * 8));
} }
} }
packed_color = nir_ior(b,
nir_iand(b, packed_color,
nir_imm_int(b, colormask)),
nir_iand(b, packed_dst_color,
nir_imm_int(b, ~colormask)));
/* Turn the old vec4 output into a store of the packed color. */ return nir_ior(b,
nir_instr_rewrite_src(&intr->instr, &intr->src[0], nir_iand(b, packed_color,
nir_src_for_ssa(packed_color)); nir_imm_int(b, colormask)),
nir_iand(b, packed_dst_color,
nir_imm_int(b, ~colormask)));
}
static int
vc4_nir_next_output_driver_location(nir_shader *s)
{
int maxloc = -1;
nir_foreach_variable(var, &s->outputs)
maxloc = MAX2(maxloc, (int)var->data.driver_location);
return maxloc + 1;
}
static void
vc4_nir_store_sample_mask(struct vc4_compile *c, nir_builder *b,
nir_ssa_def *val)
{
nir_variable *sample_mask = nir_variable_create(c->s, nir_var_shader_out,
glsl_uint_type(),
"sample_mask");
sample_mask->data.driver_location =
vc4_nir_next_output_driver_location(c->s);
sample_mask->data.location = FRAG_RESULT_SAMPLE_MASK;
nir_intrinsic_instr *intr =
nir_intrinsic_instr_create(c->s, nir_intrinsic_store_output);
intr->num_components = 1; intr->num_components = 1;
intr->const_index[0] = sample_mask->data.driver_location;
intr->src[0] = nir_src_for_ssa(val);
nir_builder_instr_insert(b, &intr->instr);
}
static void
vc4_nir_lower_blend_instr(struct vc4_compile *c, nir_builder *b,
nir_intrinsic_instr *intr)
{
nir_ssa_def *frag_color = intr->src[0].ssa;
if (c->fs_key->sample_coverage) {
nir_intrinsic_instr *load =
nir_intrinsic_instr_create(b->shader,
nir_intrinsic_load_sample_mask_in);
load->num_components = 1;
nir_ssa_dest_init(&load->instr, &load->dest, 1, NULL);
nir_builder_instr_insert(b, &load->instr);
nir_ssa_def *bitmask = &load->dest.ssa;
vc4_nir_store_sample_mask(c, b, bitmask);
} else if (c->fs_key->sample_alpha_to_coverage) {
nir_ssa_def *a = nir_channel(b, frag_color, 3);
/* XXX: We should do a nice dither based on the fragment
* coordinate, instead.
*/
nir_ssa_def *num_samples = nir_imm_float(b, VC4_MAX_SAMPLES);
nir_ssa_def *num_bits = nir_f2i(b, nir_fmul(b, a, num_samples));
nir_ssa_def *bitmask = nir_isub(b,
nir_ishl(b,
nir_imm_int(b, 1),
num_bits),
nir_imm_int(b, 1));
vc4_nir_store_sample_mask(c, b, bitmask);
}
/* The TLB color read returns each sample in turn, so if our blending
* depends on the destination color, we're going to have to run the
* blending function separately for each destination sample value, and
* then output the per-sample color using TLB_COLOR_MS.
*/
nir_ssa_def *blend_output;
if (c->fs_key->msaa && blend_depends_on_dst_color(c)) {
c->msaa_per_sample_output = true;
nir_ssa_def *samples[4];
for (int i = 0; i < VC4_MAX_SAMPLES; i++)
samples[i] = vc4_nir_blend_pipeline(c, b, frag_color, i);
blend_output = nir_vec4(b,
samples[0], samples[1],
samples[2], samples[3]);
} else {
blend_output = vc4_nir_blend_pipeline(c, b, frag_color, 0);
}
nir_instr_rewrite_src(&intr->instr, &intr->src[0],
nir_src_for_ssa(blend_output));
intr->num_components = blend_output->num_components;
} }
static bool static bool
@@ -577,7 +676,7 @@ vc4_nir_lower_blend_block(nir_block *block, void *state)
{ {
struct vc4_compile *c = state; struct vc4_compile *c = state;
nir_foreach_instr(block, instr) { nir_foreach_instr_safe(block, instr) {
if (instr->type != nir_instr_type_intrinsic) if (instr->type != nir_instr_type_intrinsic)
continue; continue;
nir_intrinsic_instr *intr = nir_instr_as_intrinsic(instr); nir_intrinsic_instr *intr = nir_instr_as_intrinsic(instr);

View File

@@ -84,7 +84,7 @@ vc4_nir_unpack_16u(nir_builder *b, nir_ssa_def *src, unsigned chan)
static nir_ssa_def * static nir_ssa_def *
vc4_nir_unpack_8f(nir_builder *b, nir_ssa_def *src, unsigned chan) vc4_nir_unpack_8f(nir_builder *b, nir_ssa_def *src, unsigned chan)
{ {
return nir_swizzle(b, nir_unpack_unorm_4x8(b, src), &chan, 1, false); return nir_channel(b, nir_unpack_unorm_4x8(b, src), chan);
} }
static nir_ssa_def * static nir_ssa_def *
@@ -226,7 +226,9 @@ vc4_nir_lower_fs_input(struct vc4_compile *c, nir_builder *b,
{ {
b->cursor = nir_before_instr(&intr->instr); b->cursor = nir_before_instr(&intr->instr);
if (intr->const_index[0] == VC4_NIR_TLB_COLOR_READ_INPUT) { if (intr->const_index[0] >= VC4_NIR_TLB_COLOR_READ_INPUT &&
intr->const_index[0] < (VC4_NIR_TLB_COLOR_READ_INPUT +
VC4_MAX_SAMPLES)) {
/* This doesn't need any lowering. */ /* This doesn't need any lowering. */
return; return;
} }
@@ -309,7 +311,8 @@ vc4_nir_lower_output(struct vc4_compile *c, nir_builder *b,
/* Color output is lowered by vc4_nir_lower_blend(). */ /* Color output is lowered by vc4_nir_lower_blend(). */
if (c->stage == QSTAGE_FRAG && if (c->stage == QSTAGE_FRAG &&
(output_var->data.location == FRAG_RESULT_COLOR || (output_var->data.location == FRAG_RESULT_COLOR ||
output_var->data.location == FRAG_RESULT_DATA0)) { output_var->data.location == FRAG_RESULT_DATA0 ||
output_var->data.location == FRAG_RESULT_SAMPLE_MASK)) {
intr->const_index[0] *= 4; intr->const_index[0] *= 4;
return; return;
} }
@@ -326,9 +329,8 @@ vc4_nir_lower_output(struct vc4_compile *c, nir_builder *b,
intr_comp->const_index[0] = intr->const_index[0] * 4 + i; intr_comp->const_index[0] = intr->const_index[0] * 4 + i;
assert(intr->src[0].is_ssa); assert(intr->src[0].is_ssa);
intr_comp->src[0] = nir_src_for_ssa(nir_swizzle(b, intr_comp->src[0] =
intr->src[0].ssa, nir_src_for_ssa(nir_channel(b, intr->src[0].ssa, i));
&i, 1, false));
nir_builder_instr_insert(b, &intr_comp->instr); nir_builder_instr_insert(b, &intr_comp->instr);
} }

View File

@@ -0,0 +1,172 @@
/*
* Copyright © 2015 Broadcom
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the "Software"),
* to deal in the Software without restriction, including without limitation
* the rights to use, copy, modify, merge, publish, distribute, sublicense,
* and/or sell copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice (including the next
* paragraph) shall be included in all copies or substantial portions of the
* Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
* THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
* IN THE SOFTWARE.
*/
#include "vc4_qir.h"
#include "kernel/vc4_packet.h"
#include "tgsi/tgsi_info.h"
#include "glsl/nir/nir_builder.h"
/** @file vc4_nir_lower_txf_ms.c
* Walks the NIR generated by TGSI-to-NIR to lower its nir_texop_txf_ms
* coordinates to do the math necessary and use a plain nir_texop_txf instead.
*
* MSAA textures are laid out as 32x32-aligned blocks of RGBA8888 or Z24S8.
* We can't load them through the normal sampler path because of the lack of
* linear support in the hardware. So, we treat MSAA textures as a giant UBO
* and do the math in the shader.
*/
static void
vc4_nir_lower_txf_ms_instr(struct vc4_compile *c, nir_builder *b,
nir_tex_instr *txf_ms)
{
if (txf_ms->op != nir_texop_txf_ms)
return;
b->cursor = nir_before_instr(&txf_ms->instr);
nir_tex_instr *txf = nir_tex_instr_create(c->s, 1);
txf->op = nir_texop_txf;
txf->sampler = txf_ms->sampler;
txf->sampler_index = txf_ms->sampler_index;
txf->coord_components = txf_ms->coord_components;
txf->is_shadow = txf_ms->is_shadow;
txf->is_new_style_shadow = txf_ms->is_new_style_shadow;
nir_ssa_def *coord = NULL, *sample_index = NULL;
for (int i = 0; i < txf_ms->num_srcs; i++) {
assert(txf_ms->src[i].src.is_ssa);
switch (txf_ms->src[i].src_type) {
case nir_tex_src_coord:
coord = txf_ms->src[i].src.ssa;
break;
case nir_tex_src_ms_index:
sample_index = txf_ms->src[i].src.ssa;
break;
default:
unreachable("Unknown txf_ms src\n");
}
}
assert(coord);
assert(sample_index);
nir_ssa_def *x = nir_channel(b, coord, 0);
nir_ssa_def *y = nir_channel(b, coord, 1);
uint32_t tile_w = 32;
uint32_t tile_h = 32;
uint32_t tile_w_shift = 5;
uint32_t tile_h_shift = 5;
uint32_t tile_size = (tile_h * tile_w *
VC4_MAX_SAMPLES * sizeof(uint32_t));
unsigned unit = txf_ms->sampler_index;
uint32_t w = align(c->key->tex[unit].msaa_width, tile_w);
uint32_t w_tiles = w / tile_w;
nir_ssa_def *x_tile = nir_ushr(b, x, nir_imm_int(b, tile_w_shift));
nir_ssa_def *y_tile = nir_ushr(b, y, nir_imm_int(b, tile_h_shift));
nir_ssa_def *tile_addr = nir_iadd(b,
nir_imul(b, x_tile,
nir_imm_int(b, tile_size)),
nir_imul(b, y_tile,
nir_imm_int(b, (w_tiles *
tile_size))));
nir_ssa_def *x_subspan = nir_iand(b, x,
nir_imm_int(b, (tile_w - 1) & ~1));
nir_ssa_def *y_subspan = nir_iand(b, y,
nir_imm_int(b, (tile_h - 1) & ~1));
nir_ssa_def *subspan_addr = nir_iadd(b,
nir_imul(b, x_subspan,
nir_imm_int(b, 2 * VC4_MAX_SAMPLES * sizeof(uint32_t))),
nir_imul(b, y_subspan,
nir_imm_int(b,
tile_w *
VC4_MAX_SAMPLES *
sizeof(uint32_t))));
nir_ssa_def *pixel_addr = nir_ior(b,
nir_iand(b,
nir_ishl(b, x,
nir_imm_int(b, 2)),
nir_imm_int(b, (1 << 2))),
nir_iand(b,
nir_ishl(b, y,
nir_imm_int(b, 3)),
nir_imm_int(b, (1 << 3))));
nir_ssa_def *sample_addr = nir_ishl(b, sample_index, nir_imm_int(b, 4));
nir_ssa_def *addr = nir_iadd(b,
nir_ior(b, sample_addr, pixel_addr),
nir_iadd(b, subspan_addr, tile_addr));
txf->src[0].src_type = nir_tex_src_coord;
txf->src[0].src = nir_src_for_ssa(nir_vec2(b, addr, nir_imm_int(b, 0)));
nir_ssa_dest_init(&txf->instr, &txf->dest, 4, NULL);
nir_builder_instr_insert(b, &txf->instr);
nir_ssa_def_rewrite_uses(&txf_ms->dest.ssa,
nir_src_for_ssa(&txf->dest.ssa));
nir_instr_remove(&txf_ms->instr);
}
static bool
vc4_nir_lower_txf_ms_block(nir_block *block, void *arg)
{
struct vc4_compile *c = arg;
nir_function_impl *impl =
nir_cf_node_get_function(&block->cf_node);
nir_builder b;
nir_builder_init(&b, impl);
nir_foreach_instr_safe(block, instr) {
if (instr->type == nir_instr_type_tex) {
vc4_nir_lower_txf_ms_instr(c, &b,
nir_instr_as_tex(instr));
}
}
return true;
}
static bool
vc4_nir_lower_txf_ms_impl(struct vc4_compile *c, nir_function_impl *impl)
{
nir_foreach_block(impl, vc4_nir_lower_txf_ms_block, c);
nir_metadata_preserve(impl,
nir_metadata_block_index |
nir_metadata_dominance);
return true;
}
void
vc4_nir_lower_txf_ms(struct vc4_compile *c)
{
nir_foreach_overload(c->s, overload) {
if (overload->impl)
vc4_nir_lower_txf_ms_impl(c, overload->impl);
}
}

View File

@@ -94,7 +94,12 @@ static void
replace_with_mov(struct vc4_compile *c, struct qinst *inst, struct qreg arg) replace_with_mov(struct vc4_compile *c, struct qinst *inst, struct qreg arg)
{ {
dump_from(c, inst); dump_from(c, inst);
inst->op = QOP_MOV; if (qir_is_mul(inst))
inst->op = QOP_MMOV;
else if (qir_is_float_input(inst))
inst->op = QOP_FMOV;
else
inst->op = QOP_MOV;
inst->src[0] = arg; inst->src[0] = arg;
inst->src[1] = c->undef; inst->src[1] = c->undef;
dump_to(c, inst); dump_to(c, inst);
@@ -181,6 +186,7 @@ qir_opt_algebraic(struct vc4_compile *c)
case QOP_SUB: case QOP_SUB:
if (is_zero(c, inst->src[1])) { if (is_zero(c, inst->src[1])) {
replace_with_mov(c, inst, inst->src[0]); replace_with_mov(c, inst, inst->src[0]);
progress = true;
} }
break; break;

View File

@@ -294,6 +294,76 @@ ntq_umul(struct vc4_compile *c, struct qreg src0, struct qreg src1)
qir_uniform_ui(c, 24))); qir_uniform_ui(c, 24)));
} }
static struct qreg
ntq_scale_depth_texture(struct vc4_compile *c, struct qreg src)
{
struct qreg depthf = qir_ITOF(c, qir_SHR(c, src,
qir_uniform_ui(c, 8)));
return qir_FMUL(c, depthf, qir_uniform_f(c, 1.0f/0xffffff));
}
/**
* Emits a lowered TXF_MS from an MSAA texture.
*
* The addressing math has been lowered in NIR, and now we just need to read
* it like a UBO.
*/
static void
ntq_emit_txf(struct vc4_compile *c, nir_tex_instr *instr)
{
uint32_t tile_width = 32;
uint32_t tile_height = 32;
uint32_t tile_size = (tile_height * tile_width *
VC4_MAX_SAMPLES * sizeof(uint32_t));
unsigned unit = instr->sampler_index;
uint32_t w = align(c->key->tex[unit].msaa_width, tile_width);
uint32_t w_tiles = w / tile_width;
uint32_t h = align(c->key->tex[unit].msaa_height, tile_height);
uint32_t h_tiles = h / tile_height;
uint32_t size = w_tiles * h_tiles * tile_size;
struct qreg addr;
assert(instr->num_srcs == 1);
assert(instr->src[0].src_type == nir_tex_src_coord);
addr = ntq_get_src(c, instr->src[0].src, 0);
/* Perform the clamping required by kernel validation. */
addr = qir_MAX(c, addr, qir_uniform_ui(c, 0));
addr = qir_MIN(c, addr, qir_uniform_ui(c, size - 4));
qir_TEX_DIRECT(c, addr, qir_uniform(c, QUNIFORM_TEXTURE_MSAA_ADDR, unit));
struct qreg tex = qir_TEX_RESULT(c);
c->num_texture_samples++;
struct qreg texture_output[4];
enum pipe_format format = c->key->tex[unit].format;
if (util_format_is_depth_or_stencil(format)) {
struct qreg scaled = ntq_scale_depth_texture(c, tex);
for (int i = 0; i < 4; i++)
texture_output[i] = scaled;
} else {
struct qreg tex_result_unpacked[4];
for (int i = 0; i < 4; i++)
tex_result_unpacked[i] = qir_UNPACK_8_F(c, tex, i);
const uint8_t *format_swiz =
vc4_get_format_swizzle(c->key->tex[unit].format);
for (int i = 0; i < 4; i++) {
texture_output[i] =
get_swizzled_channel(c, tex_result_unpacked,
format_swiz[i]);
}
}
struct qreg *dest = ntq_get_dest(c, &instr->dest);
for (int i = 0; i < 4; i++) {
dest[i] = get_swizzled_channel(c, texture_output,
c->key->tex[unit].swizzle[i]);
}
}
static void static void
ntq_emit_tex(struct vc4_compile *c, nir_tex_instr *instr) ntq_emit_tex(struct vc4_compile *c, nir_tex_instr *instr)
{ {
@@ -301,6 +371,11 @@ ntq_emit_tex(struct vc4_compile *c, nir_tex_instr *instr)
bool is_txb = false, is_txl = false, has_proj = false; bool is_txb = false, is_txl = false, has_proj = false;
unsigned unit = instr->sampler_index; unsigned unit = instr->sampler_index;
if (instr->op == nir_texop_txf) {
ntq_emit_txf(c, instr);
return;
}
for (unsigned i = 0; i < instr->num_srcs; i++) { for (unsigned i = 0; i < instr->num_srcs; i++) {
switch (instr->src[i].src_type) { switch (instr->src[i].src_type) {
case nir_tex_src_coord: case nir_tex_src_coord:
@@ -396,11 +471,7 @@ ntq_emit_tex(struct vc4_compile *c, nir_tex_instr *instr)
struct qreg unpacked[4]; struct qreg unpacked[4];
if (util_format_is_depth_or_stencil(format)) { if (util_format_is_depth_or_stencil(format)) {
struct qreg depthf = qir_ITOF(c, qir_SHR(c, tex, struct qreg normalized = ntq_scale_depth_texture(c, tex);
qir_uniform_ui(c, 8)));
struct qreg normalized = qir_FMUL(c, depthf,
qir_uniform_f(c, 1.0f/0xffffff));
struct qreg depth_output; struct qreg depth_output;
struct qreg one = qir_uniform_f(c, 1.0f); struct qreg one = qir_uniform_f(c, 1.0f);
@@ -1109,6 +1180,10 @@ emit_frag_end(struct vc4_compile *c)
} }
} }
if (c->output_sample_mask_index != -1) {
qir_MS_MASK(c, c->outputs[c->output_sample_mask_index]);
}
if (c->fs_key->depth_enabled) { if (c->fs_key->depth_enabled) {
struct qreg z; struct qreg z;
if (c->output_position_index != -1) { if (c->output_position_index != -1) {
@@ -1120,7 +1195,12 @@ emit_frag_end(struct vc4_compile *c)
qir_TLB_Z_WRITE(c, z); qir_TLB_Z_WRITE(c, z);
} }
qir_TLB_COLOR_WRITE(c, color); if (!c->msaa_per_sample_output) {
qir_TLB_COLOR_WRITE(c, color);
} else {
for (int i = 0; i < VC4_MAX_SAMPLES; i++)
qir_TLB_COLOR_WRITE_MS(c, c->sample_colors[i]);
}
} }
static void static void
@@ -1171,7 +1251,7 @@ emit_point_size_write(struct vc4_compile *c)
struct qreg point_size; struct qreg point_size;
if (c->output_point_size_index != -1) if (c->output_point_size_index != -1)
point_size = c->outputs[c->output_point_size_index + 3]; point_size = c->outputs[c->output_point_size_index];
else else
point_size = qir_uniform_f(c, 1.0); point_size = qir_uniform_f(c, 1.0);
@@ -1359,6 +1439,9 @@ ntq_setup_outputs(struct vc4_compile *c)
case FRAG_RESULT_DEPTH: case FRAG_RESULT_DEPTH:
c->output_position_index = loc; c->output_position_index = loc;
break; break;
case FRAG_RESULT_SAMPLE_MASK:
c->output_sample_mask_index = loc;
break;
} }
} else { } else {
switch (var->data.location) { switch (var->data.location) {
@@ -1462,20 +1545,48 @@ ntq_emit_intrinsic(struct vc4_compile *c, nir_intrinsic_instr *instr)
instr->const_index[0]); instr->const_index[0]);
break; break;
case nir_intrinsic_load_sample_mask_in:
*dest = qir_uniform(c, QUNIFORM_SAMPLE_MASK, 0);
break;
case nir_intrinsic_load_input: case nir_intrinsic_load_input:
assert(instr->num_components == 1); assert(instr->num_components == 1);
if (instr->const_index[0] == VC4_NIR_TLB_COLOR_READ_INPUT) { if (instr->const_index[0] >= VC4_NIR_TLB_COLOR_READ_INPUT) {
*dest = qir_TLB_COLOR_READ(c); /* Reads of the per-sample color need to be done in
* order.
*/
int sample_index = (instr->const_index[0] -
VC4_NIR_TLB_COLOR_READ_INPUT);
for (int i = 0; i <= sample_index; i++) {
if (c->color_reads[i].file == QFILE_NULL) {
c->color_reads[i] =
qir_TLB_COLOR_READ(c);
}
}
*dest = c->color_reads[sample_index];
} else { } else {
*dest = c->inputs[instr->const_index[0]]; *dest = c->inputs[instr->const_index[0]];
} }
break; break;
case nir_intrinsic_store_output: case nir_intrinsic_store_output:
assert(instr->num_components == 1); /* MSAA color outputs are the only case where we have an
c->outputs[instr->const_index[0]] = * output that's not lowered to being a store of a single 32
qir_MOV(c, ntq_get_src(c, instr->src[0], 0)); * bit value.
c->num_outputs = MAX2(c->num_outputs, instr->const_index[0] + 1); */
if (c->stage == QSTAGE_FRAG && instr->num_components == 4) {
assert(instr->const_index[0] == c->output_color_index);
for (int i = 0; i < 4; i++) {
c->sample_colors[i] =
qir_MOV(c, ntq_get_src(c, instr->src[0],
i));
}
} else {
assert(instr->num_components == 1);
c->outputs[instr->const_index[0]] =
qir_MOV(c, ntq_get_src(c, instr->src[0], 0));
c->num_outputs = MAX2(c->num_outputs, instr->const_index[0] + 1);
}
break; break;
case nir_intrinsic_discard: case nir_intrinsic_discard:
@@ -1672,6 +1783,7 @@ vc4_shader_ntq(struct vc4_context *vc4, enum qstage stage,
nir_lower_clip_vs(c->s, c->key->ucp_enables); nir_lower_clip_vs(c->s, c->key->ucp_enables);
vc4_nir_lower_io(c); vc4_nir_lower_io(c);
vc4_nir_lower_txf_ms(c);
nir_lower_idiv(c->s); nir_lower_idiv(c->s);
nir_lower_load_const_to_scalar(c->s); nir_lower_load_const_to_scalar(c->s);
@@ -1907,12 +2019,19 @@ vc4_setup_shared_key(struct vc4_context *vc4, struct vc4_key *key,
struct pipe_sampler_state *sampler_state = struct pipe_sampler_state *sampler_state =
texstate->samplers[i]; texstate->samplers[i];
if (sampler) { if (!sampler)
key->tex[i].format = sampler->format; continue;
key->tex[i].swizzle[0] = sampler->swizzle_r;
key->tex[i].swizzle[1] = sampler->swizzle_g; key->tex[i].format = sampler->format;
key->tex[i].swizzle[2] = sampler->swizzle_b; key->tex[i].swizzle[0] = sampler->swizzle_r;
key->tex[i].swizzle[3] = sampler->swizzle_a; key->tex[i].swizzle[1] = sampler->swizzle_g;
key->tex[i].swizzle[2] = sampler->swizzle_b;
key->tex[i].swizzle[3] = sampler->swizzle_a;
if (sampler->texture->nr_samples) {
key->tex[i].msaa_width = sampler->texture->width0;
key->tex[i].msaa_height = sampler->texture->height0;
} else if (sampler){
key->tex[i].compare_mode = sampler_state->compare_mode; key->tex[i].compare_mode = sampler_state->compare_mode;
key->tex[i].compare_func = sampler_state->compare_func; key->tex[i].compare_func = sampler_state->compare_func;
key->tex[i].wrap_s = sampler_state->wrap_s; key->tex[i].wrap_s = sampler_state->wrap_s;
@@ -1952,6 +2071,11 @@ vc4_update_compiled_fs(struct vc4_context *vc4, uint8_t prim_mode)
} else { } else {
key->logicop_func = PIPE_LOGICOP_COPY; key->logicop_func = PIPE_LOGICOP_COPY;
} }
key->msaa = vc4->rasterizer->base.multisample;
key->sample_coverage = (vc4->rasterizer->base.multisample &&
vc4->sample_mask != (1 << VC4_MAX_SAMPLES) - 1);
key->sample_alpha_to_coverage = vc4->blend->alpha_to_coverage;
key->sample_alpha_to_one = vc4->blend->alpha_to_one;
if (vc4->framebuffer.cbufs[0]) if (vc4->framebuffer.cbufs[0])
key->color_format = vc4->framebuffer.cbufs[0]->format; key->color_format = vc4->framebuffer.cbufs[0]->format;

View File

@@ -86,7 +86,9 @@ static const struct qir_op_info qir_op_info[] = {
[QOP_TLB_STENCIL_SETUP] = { "tlb_stencil_setup", 0, 1, true }, [QOP_TLB_STENCIL_SETUP] = { "tlb_stencil_setup", 0, 1, true },
[QOP_TLB_Z_WRITE] = { "tlb_z", 0, 1, true }, [QOP_TLB_Z_WRITE] = { "tlb_z", 0, 1, true },
[QOP_TLB_COLOR_WRITE] = { "tlb_color", 0, 1, true }, [QOP_TLB_COLOR_WRITE] = { "tlb_color", 0, 1, true },
[QOP_TLB_COLOR_WRITE_MS] = { "tlb_color_ms", 0, 1, true },
[QOP_TLB_COLOR_READ] = { "tlb_color_read", 1, 0 }, [QOP_TLB_COLOR_READ] = { "tlb_color_read", 1, 0 },
[QOP_MS_MASK] = { "ms_mask", 0, 1, true },
[QOP_VARY_ADD_C] = { "vary_add_c", 1, 1 }, [QOP_VARY_ADD_C] = { "vary_add_c", 1, 1 },
[QOP_FRAG_X] = { "frag_x", 1, 0 }, [QOP_FRAG_X] = { "frag_x", 1, 0 },
@@ -399,6 +401,7 @@ qir_compile_init(void)
c->output_position_index = -1; c->output_position_index = -1;
c->output_color_index = -1; c->output_color_index = -1;
c->output_point_size_index = -1; c->output_point_size_index = -1;
c->output_sample_mask_index = -1;
c->def_ht = _mesa_hash_table_create(c, _mesa_hash_pointer, c->def_ht = _mesa_hash_table_create(c, _mesa_hash_pointer,
_mesa_key_pointer_equal); _mesa_key_pointer_equal);
@@ -420,13 +423,19 @@ qir_remove_instruction(struct vc4_compile *c, struct qinst *qinst)
struct qreg struct qreg
qir_follow_movs(struct vc4_compile *c, struct qreg reg) qir_follow_movs(struct vc4_compile *c, struct qreg reg)
{ {
int pack = reg.pack;
while (reg.file == QFILE_TEMP && while (reg.file == QFILE_TEMP &&
c->defs[reg.index] && c->defs[reg.index] &&
c->defs[reg.index]->op == QOP_MOV && (c->defs[reg.index]->op == QOP_MOV ||
!c->defs[reg.index]->dst.pack) { c->defs[reg.index]->op == QOP_FMOV ||
c->defs[reg.index]->op == QOP_MMOV)&&
!c->defs[reg.index]->dst.pack &&
!c->defs[reg.index]->src[0].pack) {
reg = c->defs[reg.index]->src[0]; reg = c->defs[reg.index]->src[0];
} }
reg.pack = pack;
return reg; return reg;
} }

View File

@@ -38,6 +38,7 @@
#include "vc4_screen.h" #include "vc4_screen.h"
#include "vc4_qpu_defines.h" #include "vc4_qpu_defines.h"
#include "kernel/vc4_packet.h"
#include "pipe/p_state.h" #include "pipe/p_state.h"
struct nir_builder; struct nir_builder;
@@ -121,7 +122,9 @@ enum qop {
QOP_TLB_STENCIL_SETUP, QOP_TLB_STENCIL_SETUP,
QOP_TLB_Z_WRITE, QOP_TLB_Z_WRITE,
QOP_TLB_COLOR_WRITE, QOP_TLB_COLOR_WRITE,
QOP_TLB_COLOR_WRITE_MS,
QOP_TLB_COLOR_READ, QOP_TLB_COLOR_READ,
QOP_MS_MASK,
QOP_VARY_ADD_C, QOP_VARY_ADD_C,
QOP_FRAG_X, QOP_FRAG_X,
@@ -230,6 +233,8 @@ enum quniform_contents {
/** A reference to a texture config parameter 2 cubemap stride uniform */ /** A reference to a texture config parameter 2 cubemap stride uniform */
QUNIFORM_TEXTURE_CONFIG_P2, QUNIFORM_TEXTURE_CONFIG_P2,
QUNIFORM_TEXTURE_MSAA_ADDR,
QUNIFORM_UBO_ADDR, QUNIFORM_UBO_ADDR,
QUNIFORM_TEXRECT_SCALE_X, QUNIFORM_TEXRECT_SCALE_X,
@@ -247,6 +252,7 @@ enum quniform_contents {
QUNIFORM_STENCIL, QUNIFORM_STENCIL,
QUNIFORM_ALPHA_REF, QUNIFORM_ALPHA_REF,
QUNIFORM_SAMPLE_MASK,
}; };
struct vc4_varying_slot { struct vc4_varying_slot {
@@ -283,11 +289,18 @@ struct vc4_key {
struct vc4_uncompiled_shader *shader_state; struct vc4_uncompiled_shader *shader_state;
struct { struct {
enum pipe_format format; enum pipe_format format;
unsigned compare_mode:1;
unsigned compare_func:3;
unsigned wrap_s:3;
unsigned wrap_t:3;
uint8_t swizzle[4]; uint8_t swizzle[4];
union {
struct {
unsigned compare_mode:1;
unsigned compare_func:3;
unsigned wrap_s:3;
unsigned wrap_t:3;
};
struct {
uint16_t msaa_width, msaa_height;
};
};
} tex[VC4_MAX_TEXTURE_SAMPLERS]; } tex[VC4_MAX_TEXTURE_SAMPLERS];
uint8_t ucp_enables; uint8_t ucp_enables;
}; };
@@ -304,6 +317,10 @@ struct vc4_fs_key {
bool alpha_test; bool alpha_test;
bool point_coord_upper_left; bool point_coord_upper_left;
bool light_twoside; bool light_twoside;
bool msaa;
bool sample_coverage;
bool sample_alpha_to_coverage;
bool sample_alpha_to_one;
uint8_t alpha_test_func; uint8_t alpha_test_func;
uint8_t logicop_func; uint8_t logicop_func;
uint32_t point_sprite_mask; uint32_t point_sprite_mask;
@@ -348,6 +365,9 @@ struct vc4_compile {
*/ */
struct qreg *inputs; struct qreg *inputs;
struct qreg *outputs; struct qreg *outputs;
bool msaa_per_sample_output;
struct qreg color_reads[VC4_MAX_SAMPLES];
struct qreg sample_colors[VC4_MAX_SAMPLES];
uint32_t inputs_array_size; uint32_t inputs_array_size;
uint32_t outputs_array_size; uint32_t outputs_array_size;
uint32_t uniforms_array_size; uint32_t uniforms_array_size;
@@ -396,6 +416,7 @@ struct vc4_compile {
uint32_t output_position_index; uint32_t output_position_index;
uint32_t output_color_index; uint32_t output_color_index;
uint32_t output_point_size_index; uint32_t output_point_size_index;
uint32_t output_sample_mask_index;
struct qreg undef; struct qreg undef;
enum qstage stage; enum qstage stage;
@@ -418,6 +439,8 @@ struct vc4_compile {
*/ */
#define VC4_NIR_TLB_COLOR_READ_INPUT 2000000000 #define VC4_NIR_TLB_COLOR_READ_INPUT 2000000000
#define VC4_NIR_MS_MASK_OUTPUT 2000000000
/* Special offset for nir_load_uniform values to get a QUNIFORM_* /* Special offset for nir_load_uniform values to get a QUNIFORM_*
* state-dependent value. * state-dependent value.
*/ */
@@ -476,6 +499,7 @@ nir_ssa_def *vc4_nir_get_state_uniform(struct nir_builder *b,
enum quniform_contents contents); enum quniform_contents contents);
nir_ssa_def *vc4_nir_get_swizzled_channel(struct nir_builder *b, nir_ssa_def *vc4_nir_get_swizzled_channel(struct nir_builder *b,
nir_ssa_def **srcs, int swiz); nir_ssa_def **srcs, int swiz);
void vc4_nir_lower_txf_ms(struct vc4_compile *c);
void qir_lower_uniforms(struct vc4_compile *c); void qir_lower_uniforms(struct vc4_compile *c);
void qpu_schedule_instructions(struct vc4_compile *c); void qpu_schedule_instructions(struct vc4_compile *c);
@@ -616,9 +640,11 @@ QIR_ALU0(FRAG_REV_FLAG)
QIR_ALU0(TEX_RESULT) QIR_ALU0(TEX_RESULT)
QIR_ALU0(TLB_COLOR_READ) QIR_ALU0(TLB_COLOR_READ)
QIR_NODST_1(TLB_COLOR_WRITE) QIR_NODST_1(TLB_COLOR_WRITE)
QIR_NODST_1(TLB_COLOR_WRITE_MS)
QIR_NODST_1(TLB_Z_WRITE) QIR_NODST_1(TLB_Z_WRITE)
QIR_NODST_1(TLB_DISCARD_SETUP) QIR_NODST_1(TLB_DISCARD_SETUP)
QIR_NODST_1(TLB_STENCIL_SETUP) QIR_NODST_1(TLB_STENCIL_SETUP)
QIR_NODST_1(MS_MASK)
static inline struct qreg static inline struct qreg
qir_UNPACK_8_F(struct vc4_compile *c, struct qreg src, int i) qir_UNPACK_8_F(struct vc4_compile *c, struct qreg src, int i)

View File

@@ -116,6 +116,17 @@ qpu_tlbc()
return r; return r;
} }
static inline struct qpu_reg
qpu_tlbc_ms()
{
struct qpu_reg r = {
QPU_MUX_A,
QPU_W_TLB_COLOR_MS,
};
return r;
}
static inline struct qpu_reg qpu_r0(void) { return qpu_rn(0); } static inline struct qpu_reg qpu_r0(void) { return qpu_rn(0); }
static inline struct qpu_reg qpu_r1(void) { return qpu_rn(1); } static inline struct qpu_reg qpu_r1(void) { return qpu_rn(1); }
static inline struct qpu_reg qpu_r2(void) { return qpu_rn(2); } static inline struct qpu_reg qpu_r2(void) { return qpu_rn(2); }

View File

@@ -387,6 +387,14 @@ vc4_generate_code(struct vc4_context *vc4, struct vc4_compile *c)
qpu_rb(QPU_R_MS_REV_FLAGS))); qpu_rb(QPU_R_MS_REV_FLAGS)));
break; break;
case QOP_MS_MASK:
src[1] = qpu_ra(QPU_R_MS_REV_FLAGS);
fixup_raddr_conflict(c, dst, &src[0], &src[1],
qinst, &unpack);
queue(c, qpu_a_AND(qpu_ra(QPU_W_MS_FLAGS),
src[0], src[1]) | unpack);
break;
case QOP_FRAG_Z: case QOP_FRAG_Z:
case QOP_FRAG_W: case QOP_FRAG_W:
/* QOP_FRAG_Z/W don't emit instructions, just allocate /* QOP_FRAG_Z/W don't emit instructions, just allocate
@@ -430,6 +438,13 @@ vc4_generate_code(struct vc4_context *vc4, struct vc4_compile *c)
} }
break; break;
case QOP_TLB_COLOR_WRITE_MS:
queue(c, qpu_a_MOV(qpu_tlbc_ms(), src[0]));
if (discard) {
set_last_cond_add(c, QPU_COND_ZS);
}
break;
case QOP_VARY_ADD_C: case QOP_VARY_ADD_C:
queue(c, qpu_a_FADD(dst, src[0], qpu_r5()) | unpack); queue(c, qpu_a_FADD(dst, src[0], qpu_r5()) | unpack);
break; break;

View File

@@ -295,6 +295,10 @@ process_waddr_deps(struct schedule_state *state, struct schedule_node *n,
add_write_dep(state, &state->last_tlb, n); add_write_dep(state, &state->last_tlb, n);
break; break;
case QPU_W_MS_FLAGS:
add_write_dep(state, &state->last_tlb, n);
break;
case QPU_W_NOP: case QPU_W_NOP:
break; break;

View File

@@ -22,6 +22,7 @@
* IN THE SOFTWARE. * IN THE SOFTWARE.
*/ */
#include "util/u_blit.h"
#include "util/u_memory.h" #include "util/u_memory.h"
#include "util/u_format.h" #include "util/u_format.h"
#include "util/u_inlines.h" #include "util/u_inlines.h"
@@ -72,11 +73,18 @@ vc4_resource_transfer_unmap(struct pipe_context *pctx,
{ {
struct vc4_context *vc4 = vc4_context(pctx); struct vc4_context *vc4 = vc4_context(pctx);
struct vc4_transfer *trans = vc4_transfer(ptrans); struct vc4_transfer *trans = vc4_transfer(ptrans);
struct pipe_resource *prsc = ptrans->resource;
struct vc4_resource *rsc = vc4_resource(prsc);
struct vc4_resource_slice *slice = &rsc->slices[ptrans->level];
if (trans->map) { if (trans->map) {
struct vc4_resource *rsc;
struct vc4_resource_slice *slice;
if (trans->ss_resource) {
rsc = vc4_resource(trans->ss_resource);
slice = &rsc->slices[0];
} else {
rsc = vc4_resource(ptrans->resource);
slice = &rsc->slices[ptrans->level];
}
if (ptrans->usage & PIPE_TRANSFER_WRITE) { if (ptrans->usage & PIPE_TRANSFER_WRITE) {
vc4_store_tiled_image(rsc->bo->map + slice->offset + vc4_store_tiled_image(rsc->bo->map + slice->offset +
ptrans->box.z * rsc->cube_map_stride, ptrans->box.z * rsc->cube_map_stride,
@@ -88,10 +96,52 @@ vc4_resource_transfer_unmap(struct pipe_context *pctx,
free(trans->map); free(trans->map);
} }
if (trans->ss_resource && (ptrans->usage & PIPE_TRANSFER_WRITE)) {
struct pipe_blit_info blit;
memset(&blit, 0, sizeof(blit));
blit.src.resource = trans->ss_resource;
blit.src.format = trans->ss_resource->format;
blit.src.box.width = trans->ss_box.width;
blit.src.box.height = trans->ss_box.height;
blit.src.box.depth = 1;
blit.dst.resource = ptrans->resource;
blit.dst.format = ptrans->resource->format;
blit.dst.level = ptrans->level;
blit.dst.box = trans->ss_box;
blit.mask = util_format_get_mask(ptrans->resource->format);
blit.filter = PIPE_TEX_FILTER_NEAREST;
pctx->blit(pctx, &blit);
vc4_flush(pctx);
pipe_resource_reference(&trans->ss_resource, NULL);
}
pipe_resource_reference(&ptrans->resource, NULL); pipe_resource_reference(&ptrans->resource, NULL);
util_slab_free(&vc4->transfer_pool, ptrans); util_slab_free(&vc4->transfer_pool, ptrans);
} }
static struct pipe_resource *
vc4_get_temp_resource(struct pipe_context *pctx,
struct pipe_resource *prsc,
const struct pipe_box *box)
{
struct pipe_resource temp_setup;
memset(&temp_setup, 0, sizeof(temp_setup));
temp_setup.target = prsc->target;
temp_setup.format = prsc->format;
temp_setup.width0 = box->width;
temp_setup.height0 = box->height;
temp_setup.depth0 = 1;
temp_setup.array_size = 1;
return pctx->screen->resource_create(pctx->screen, &temp_setup);
}
static void * static void *
vc4_resource_transfer_map(struct pipe_context *pctx, vc4_resource_transfer_map(struct pipe_context *pctx,
struct pipe_resource *prsc, struct pipe_resource *prsc,
@@ -101,7 +151,6 @@ vc4_resource_transfer_map(struct pipe_context *pctx,
{ {
struct vc4_context *vc4 = vc4_context(pctx); struct vc4_context *vc4 = vc4_context(pctx);
struct vc4_resource *rsc = vc4_resource(prsc); struct vc4_resource *rsc = vc4_resource(prsc);
struct vc4_resource_slice *slice = &rsc->slices[level];
struct vc4_transfer *trans; struct vc4_transfer *trans;
struct pipe_transfer *ptrans; struct pipe_transfer *ptrans;
enum pipe_format format = prsc->format; enum pipe_format format = prsc->format;
@@ -155,6 +204,50 @@ vc4_resource_transfer_map(struct pipe_context *pctx,
ptrans->usage = usage; ptrans->usage = usage;
ptrans->box = *box; ptrans->box = *box;
/* If the resource is multisampled, we need to resolve to single
* sample. This seems like it should be handled at a higher layer.
*/
if (prsc->nr_samples) {
trans->ss_resource = vc4_get_temp_resource(pctx, prsc, box);
if (!trans->ss_resource)
goto fail;
assert(!trans->ss_resource->nr_samples);
/* The ptrans->box gets modified for tile alignment, so save
* the original box for unmap time.
*/
trans->ss_box = *box;
if (usage & PIPE_TRANSFER_READ) {
struct pipe_blit_info blit;
memset(&blit, 0, sizeof(blit));
blit.src.resource = ptrans->resource;
blit.src.format = ptrans->resource->format;
blit.src.level = ptrans->level;
blit.src.box = trans->ss_box;
blit.dst.resource = trans->ss_resource;
blit.dst.format = trans->ss_resource->format;
blit.dst.box.width = trans->ss_box.width;
blit.dst.box.height = trans->ss_box.height;
blit.dst.box.depth = 1;
blit.mask = util_format_get_mask(prsc->format);
blit.filter = PIPE_TEX_FILTER_NEAREST;
pctx->blit(pctx, &blit);
vc4_flush(pctx);
}
/* The rest of the mapping process should use our temporary. */
prsc = trans->ss_resource;
rsc = vc4_resource(prsc);
ptrans->box.x = 0;
ptrans->box.y = 0;
ptrans->box.z = 0;
}
/* Note that the current kernel implementation is synchronous, so no /* Note that the current kernel implementation is synchronous, so no
* need to do syncing stuff here yet. * need to do syncing stuff here yet.
*/ */
@@ -170,6 +263,7 @@ vc4_resource_transfer_map(struct pipe_context *pctx,
*pptrans = ptrans; *pptrans = ptrans;
struct vc4_resource_slice *slice = &rsc->slices[level];
if (rsc->tiled) { if (rsc->tiled) {
uint32_t utile_w = vc4_utile_width(rsc->cpp); uint32_t utile_w = vc4_utile_width(rsc->cpp);
uint32_t utile_h = vc4_utile_height(rsc->cpp); uint32_t utile_h = vc4_utile_height(rsc->cpp);
@@ -203,7 +297,7 @@ vc4_resource_transfer_map(struct pipe_context *pctx,
ptrans->box.height != orig_height) { ptrans->box.height != orig_height) {
vc4_load_tiled_image(trans->map, ptrans->stride, vc4_load_tiled_image(trans->map, ptrans->stride,
buf + slice->offset + buf + slice->offset +
box->z * rsc->cube_map_stride, ptrans->box.z * rsc->cube_map_stride,
slice->stride, slice->stride,
slice->tiling, rsc->cpp, slice->tiling, rsc->cpp,
&ptrans->box); &ptrans->box);
@@ -216,9 +310,9 @@ vc4_resource_transfer_map(struct pipe_context *pctx,
ptrans->layer_stride = ptrans->stride; ptrans->layer_stride = ptrans->stride;
return buf + slice->offset + return buf + slice->offset +
box->y / util_format_get_blockheight(format) * ptrans->stride + ptrans->box.y / util_format_get_blockheight(format) * ptrans->stride +
box->x / util_format_get_blockwidth(format) * rsc->cpp + ptrans->box.x / util_format_get_blockwidth(format) * rsc->cpp +
box->z * rsc->cube_map_stride; ptrans->box.z * rsc->cube_map_stride;
} }
@@ -283,7 +377,13 @@ vc4_setup_slices(struct vc4_resource *rsc)
if (!rsc->tiled) { if (!rsc->tiled) {
slice->tiling = VC4_TILING_FORMAT_LINEAR; slice->tiling = VC4_TILING_FORMAT_LINEAR;
level_width = align(level_width, utile_w); if (prsc->nr_samples) {
/* MSAA (4x) surfaces are stored as raw tile buffer contents. */
level_width = align(level_width, 32);
level_height = align(level_height, 32);
} else {
level_width = align(level_width, utile_w);
}
} else { } else {
if (vc4_size_is_lt(level_width, level_height, if (vc4_size_is_lt(level_width, level_height,
rsc->cpp)) { rsc->cpp)) {
@@ -300,7 +400,8 @@ vc4_setup_slices(struct vc4_resource *rsc)
} }
slice->offset = offset; slice->offset = offset;
slice->stride = level_width * rsc->cpp; slice->stride = (level_width * rsc->cpp *
MAX2(prsc->nr_samples, 1));
slice->size = level_height * slice->stride; slice->size = level_height * slice->stride;
offset += slice->size; offset += slice->size;
@@ -357,7 +458,10 @@ vc4_resource_setup(struct pipe_screen *pscreen,
prsc->screen = pscreen; prsc->screen = pscreen;
rsc->base.vtbl = &vc4_resource_vtbl; rsc->base.vtbl = &vc4_resource_vtbl;
rsc->cpp = util_format_get_blocksize(tmpl->format); if (prsc->nr_samples == 0)
rsc->cpp = util_format_get_blocksize(tmpl->format);
else
rsc->cpp = sizeof(uint32_t);
assert(rsc->cpp); assert(rsc->cpp);
@@ -371,8 +475,12 @@ get_resource_texture_format(struct pipe_resource *prsc)
uint8_t format = vc4_get_tex_format(prsc->format); uint8_t format = vc4_get_tex_format(prsc->format);
if (!rsc->tiled) { if (!rsc->tiled) {
assert(format == VC4_TEXTURE_TYPE_RGBA8888); if (prsc->nr_samples) {
return VC4_TEXTURE_TYPE_RGBA32R; return ~0;
} else {
assert(format == VC4_TEXTURE_TYPE_RGBA8888);
return VC4_TEXTURE_TYPE_RGBA32R;
}
} }
return format; return format;
@@ -389,6 +497,7 @@ vc4_resource_create(struct pipe_screen *pscreen,
* communicate metadata about tiling currently. * communicate metadata about tiling currently.
*/ */
if (tmpl->target == PIPE_BUFFER || if (tmpl->target == PIPE_BUFFER ||
tmpl->nr_samples ||
(tmpl->bind & (PIPE_BIND_SCANOUT | (tmpl->bind & (PIPE_BIND_SCANOUT |
PIPE_BIND_LINEAR | PIPE_BIND_LINEAR |
PIPE_BIND_SHARED | PIPE_BIND_SHARED |
@@ -492,13 +601,9 @@ vc4_surface_destroy(struct pipe_context *pctx, struct pipe_surface *psurf)
FREE(psurf); FREE(psurf);
} }
/** Debug routine to dump the contents of an 8888 surface to the console */ static void
void vc4_dump_surface_non_msaa(struct pipe_surface *psurf)
vc4_dump_surface(struct pipe_surface *psurf)
{ {
if (!psurf)
return;
struct pipe_resource *prsc = psurf->texture; struct pipe_resource *prsc = psurf->texture;
struct vc4_resource *rsc = vc4_resource(prsc); struct vc4_resource *rsc = vc4_resource(prsc);
uint32_t *map = vc4_bo_map(rsc->bo); uint32_t *map = vc4_bo_map(rsc->bo);
@@ -592,6 +697,147 @@ vc4_dump_surface(struct pipe_surface *psurf)
} }
} }
static uint32_t
vc4_surface_msaa_get_sample(struct pipe_surface *psurf,
uint32_t x, uint32_t y, uint32_t sample)
{
struct pipe_resource *prsc = psurf->texture;
struct vc4_resource *rsc = vc4_resource(prsc);
uint32_t tile_w = 32, tile_h = 32;
uint32_t tiles_w = DIV_ROUND_UP(psurf->width, 32);
uint32_t tile_x = x / tile_w;
uint32_t tile_y = y / tile_h;
uint32_t *tile = (vc4_bo_map(rsc->bo) +
VC4_TILE_BUFFER_SIZE * (tile_y * tiles_w + tile_x));
uint32_t subtile_x = x % tile_w;
uint32_t subtile_y = y % tile_h;
uint32_t quad_samples = VC4_MAX_SAMPLES * 4;
uint32_t tile_stride = quad_samples * tile_w / 2;
return *((uint32_t *)tile +
(subtile_y >> 1) * tile_stride +
(subtile_x >> 1) * quad_samples +
((subtile_y & 1) << 1) +
(subtile_x & 1) +
sample);
}
static void
vc4_dump_surface_msaa_char(struct pipe_surface *psurf,
uint32_t start_x, uint32_t start_y,
uint32_t w, uint32_t h)
{
bool all_same_color = true;
uint32_t all_pix = 0;
for (int y = start_y; y < start_y + h; y++) {
for (int x = start_x; x < start_x + w; x++) {
for (int s = 0; s < VC4_MAX_SAMPLES; s++) {
uint32_t pix = vc4_surface_msaa_get_sample(psurf,
x, y,
s);
if (x == start_x && y == start_y)
all_pix = pix;
else if (all_pix != pix)
all_same_color = false;
}
}
}
if (all_same_color) {
static const struct {
uint32_t val;
const char *c;
} named_colors[] = {
{ 0xff000000, "" },
{ 0x00000000, "" },
{ 0xffff0000, "r" },
{ 0xff00ff00, "g" },
{ 0xff0000ff, "b" },
{ 0xffffffff, "w" },
};
int i;
for (i = 0; i < ARRAY_SIZE(named_colors); i++) {
if (named_colors[i].val == all_pix) {
fprintf(stderr, "%s",
named_colors[i].c);
return;
}
}
fprintf(stderr, "x");
} else {
fprintf(stderr, ".");
}
}
static void
vc4_dump_surface_msaa(struct pipe_surface *psurf)
{
uint32_t tile_w = 32, tile_h = 32;
uint32_t tiles_w = DIV_ROUND_UP(psurf->width, tile_w);
uint32_t tiles_h = DIV_ROUND_UP(psurf->height, tile_h);
uint32_t char_w = 140, char_h = 60;
uint32_t char_w_per_tile = char_w / tiles_w - 1;
uint32_t char_h_per_tile = char_h / tiles_h - 1;
uint32_t found_colors[10];
uint32_t num_found_colors = 0;
fprintf(stderr, "Surface: %dx%d (%dx MSAA)\n",
psurf->width, psurf->height, psurf->texture->nr_samples);
for (int x = 0; x < (char_w_per_tile + 1) * tiles_w; x++)
fprintf(stderr, "-");
fprintf(stderr, "\n");
for (int ty = 0; ty < psurf->height; ty += tile_h) {
for (int y = 0; y < char_h_per_tile; y++) {
for (int tx = 0; tx < psurf->width; tx += tile_w) {
for (int x = 0; x < char_w_per_tile; x++) {
uint32_t bx1 = (x * tile_w /
char_w_per_tile);
uint32_t bx2 = ((x + 1) * tile_w /
char_w_per_tile);
uint32_t by1 = (y * tile_h /
char_h_per_tile);
uint32_t by2 = ((y + 1) * tile_h /
char_h_per_tile);
vc4_dump_surface_msaa_char(psurf,
tx + bx1,
ty + by1,
bx2 - bx1,
by2 - by1);
}
fprintf(stderr, "|");
}
fprintf(stderr, "\n");
}
for (int x = 0; x < (char_w_per_tile + 1) * tiles_w; x++)
fprintf(stderr, "-");
fprintf(stderr, "\n");
}
for (int i = 0; i < num_found_colors; i++) {
fprintf(stderr, "color %d: 0x%08x\n", i, found_colors[i]);
}
}
/** Debug routine to dump the contents of an 8888 surface to the console */
void
vc4_dump_surface(struct pipe_surface *psurf)
{
if (!psurf)
return;
if (psurf->texture->nr_samples)
vc4_dump_surface_msaa(psurf);
else
vc4_dump_surface_non_msaa(psurf);
}
static void static void
vc4_flush_resource(struct pipe_context *pctx, struct pipe_resource *resource) vc4_flush_resource(struct pipe_context *pctx, struct pipe_resource *resource)
{ {

View File

@@ -32,6 +32,9 @@
struct vc4_transfer { struct vc4_transfer {
struct pipe_transfer base; struct pipe_transfer base;
void *map; void *map;
struct pipe_resource *ss_resource;
struct pipe_box ss_box;
}; };
struct vc4_resource_slice { struct vc4_resource_slice {

View File

@@ -95,6 +95,7 @@ vc4_screen_get_param(struct pipe_screen *pscreen, enum pipe_cap param)
case PIPE_CAP_BLEND_EQUATION_SEPARATE: case PIPE_CAP_BLEND_EQUATION_SEPARATE:
case PIPE_CAP_TWO_SIDED_STENCIL: case PIPE_CAP_TWO_SIDED_STENCIL:
case PIPE_CAP_USER_INDEX_BUFFERS: case PIPE_CAP_USER_INDEX_BUFFERS:
case PIPE_CAP_TEXTURE_MULTISAMPLE:
return 1; return 1;
/* lying for GL 2.0 */ /* lying for GL 2.0 */
@@ -140,7 +141,6 @@ vc4_screen_get_param(struct pipe_screen *pscreen, enum pipe_cap param)
case PIPE_CAP_PREFER_BLIT_BASED_TEXTURE_TRANSFER: case PIPE_CAP_PREFER_BLIT_BASED_TEXTURE_TRANSFER:
case PIPE_CAP_CONDITIONAL_RENDER: case PIPE_CAP_CONDITIONAL_RENDER:
case PIPE_CAP_PRIMITIVE_RESTART: case PIPE_CAP_PRIMITIVE_RESTART:
case PIPE_CAP_TEXTURE_MULTISAMPLE:
case PIPE_CAP_TEXTURE_BARRIER: case PIPE_CAP_TEXTURE_BARRIER:
case PIPE_CAP_SM3: case PIPE_CAP_SM3:
case PIPE_CAP_INDEP_BLEND_ENABLE: case PIPE_CAP_INDEP_BLEND_ENABLE:
@@ -358,7 +358,6 @@ vc4_screen_is_format_supported(struct pipe_screen *pscreen,
unsigned retval = 0; unsigned retval = 0;
if ((target >= PIPE_MAX_TEXTURE_TYPES) || if ((target >= PIPE_MAX_TEXTURE_TYPES) ||
(sample_count > 1) ||
!util_format_is_supported(format, usage)) { !util_format_is_supported(format, usage)) {
return FALSE; return FALSE;
} }
@@ -417,11 +416,13 @@ vc4_screen_is_format_supported(struct pipe_screen *pscreen,
} }
if ((usage & PIPE_BIND_RENDER_TARGET) && if ((usage & PIPE_BIND_RENDER_TARGET) &&
(sample_count == 0 || sample_count == VC4_MAX_SAMPLES) &&
vc4_rt_format_supported(format)) { vc4_rt_format_supported(format)) {
retval |= PIPE_BIND_RENDER_TARGET; retval |= PIPE_BIND_RENDER_TARGET;
} }
if ((usage & PIPE_BIND_SAMPLER_VIEW) && if ((usage & PIPE_BIND_SAMPLER_VIEW) &&
(sample_count == 0 || sample_count == VC4_MAX_SAMPLES) &&
(vc4_tex_format_supported(format))) { (vc4_tex_format_supported(format))) {
retval |= PIPE_BIND_SAMPLER_VIEW; retval |= PIPE_BIND_SAMPLER_VIEW;
} }

View File

@@ -65,7 +65,7 @@ struct drm_device {
}; };
struct drm_gem_object { struct drm_gem_object {
uint32_t size; size_t size;
struct drm_device *dev; struct drm_device *dev;
}; };

View File

@@ -79,7 +79,7 @@ static void
vc4_set_sample_mask(struct pipe_context *pctx, unsigned sample_mask) vc4_set_sample_mask(struct pipe_context *pctx, unsigned sample_mask)
{ {
struct vc4_context *vc4 = vc4_context(pctx); struct vc4_context *vc4 = vc4_context(pctx);
vc4->sample_mask = (uint16_t)sample_mask; vc4->sample_mask = sample_mask & ((1 << VC4_MAX_SAMPLES) - 1);
vc4->dirty |= VC4_DIRTY_SAMPLE_MASK; vc4->dirty |= VC4_DIRTY_SAMPLE_MASK;
} }
@@ -121,6 +121,9 @@ vc4_create_rasterizer_state(struct pipe_context *pctx,
so->offset_factor = float_to_187_half(cso->offset_scale); so->offset_factor = float_to_187_half(cso->offset_scale);
} }
if (cso->multisample)
so->config_bits[0] |= VC4_CONFIG_BITS_RASTERIZER_OVERSAMPLE_4X;
return so; return so;
} }
@@ -457,6 +460,22 @@ vc4_set_framebuffer_state(struct pipe_context *pctx,
rsc->cpp); rsc->cpp);
} }
vc4->msaa = false;
if (cso->cbufs[0])
vc4->msaa = cso->cbufs[0]->texture->nr_samples != 0;
else if (cso->zsbuf)
vc4->msaa = cso->zsbuf->texture->nr_samples != 0;
if (vc4->msaa) {
vc4->tile_width = 32;
vc4->tile_height = 32;
} else {
vc4->tile_width = 64;
vc4->tile_height = 64;
}
vc4->draw_tiles_x = DIV_ROUND_UP(cso->width, vc4->tile_width);
vc4->draw_tiles_y = DIV_ROUND_UP(cso->height, vc4->tile_height);
vc4->dirty |= VC4_DIRTY_FRAMEBUFFER; vc4->dirty |= VC4_DIRTY_FRAMEBUFFER;
} }

View File

@@ -71,6 +71,18 @@ write_texture_p2(struct vc4_context *vc4,
VC4_SET_FIELD((data >> 16) & 1, VC4_TEX_P2_BSLOD)); VC4_SET_FIELD((data >> 16) & 1, VC4_TEX_P2_BSLOD));
} }
static void
write_texture_msaa_addr(struct vc4_context *vc4,
struct vc4_cl_out **uniforms,
struct vc4_texture_stateobj *texstate,
uint32_t unit)
{
struct pipe_sampler_view *texture = texstate->textures[unit];
struct vc4_resource *rsc = vc4_resource(texture->texture);
cl_aligned_reloc(vc4, &vc4->uniforms, uniforms, rsc->bo, 0);
}
#define SWIZ(x,y,z,w) { \ #define SWIZ(x,y,z,w) { \
UTIL_FORMAT_SWIZZLE_##x, \ UTIL_FORMAT_SWIZZLE_##x, \
@@ -244,6 +256,11 @@ vc4_write_uniforms(struct vc4_context *vc4, struct vc4_compiled_shader *shader,
cl_aligned_reloc(vc4, &vc4->uniforms, &uniforms, ubo, 0); cl_aligned_reloc(vc4, &vc4->uniforms, &uniforms, ubo, 0);
break; break;
case QUNIFORM_TEXTURE_MSAA_ADDR:
write_texture_msaa_addr(vc4, &uniforms,
texstate, uinfo->data[i]);
break;
case QUNIFORM_TEXTURE_BORDER_COLOR: case QUNIFORM_TEXTURE_BORDER_COLOR:
write_texture_border_color(vc4, &uniforms, write_texture_border_color(vc4, &uniforms,
texstate, uinfo->data[i]); texstate, uinfo->data[i]);
@@ -303,6 +320,10 @@ vc4_write_uniforms(struct vc4_context *vc4, struct vc4_compiled_shader *shader,
cl_aligned_f(&uniforms, cl_aligned_f(&uniforms,
vc4->zsa->base.alpha.ref_value); vc4->zsa->base.alpha.ref_value);
break; break;
case QUNIFORM_SAMPLE_MASK:
cl_aligned_u32(&uniforms, vc4->sample_mask);
break;
} }
#if 0 #if 0
uint32_t written_val = *((uint32_t *)uniforms - 1); uint32_t written_val = *((uint32_t *)uniforms - 1);
@@ -345,6 +366,7 @@ vc4_set_shader_uniform_dirty_flags(struct vc4_compiled_shader *shader)
case QUNIFORM_TEXTURE_CONFIG_P1: case QUNIFORM_TEXTURE_CONFIG_P1:
case QUNIFORM_TEXTURE_CONFIG_P2: case QUNIFORM_TEXTURE_CONFIG_P2:
case QUNIFORM_TEXTURE_BORDER_COLOR: case QUNIFORM_TEXTURE_BORDER_COLOR:
case QUNIFORM_TEXTURE_MSAA_ADDR:
case QUNIFORM_TEXRECT_SCALE_X: case QUNIFORM_TEXRECT_SCALE_X:
case QUNIFORM_TEXRECT_SCALE_Y: case QUNIFORM_TEXRECT_SCALE_Y:
dirty |= VC4_DIRTY_TEXSTATE; dirty |= VC4_DIRTY_TEXSTATE;
@@ -363,6 +385,10 @@ vc4_set_shader_uniform_dirty_flags(struct vc4_compiled_shader *shader)
case QUNIFORM_ALPHA_REF: case QUNIFORM_ALPHA_REF:
dirty |= VC4_DIRTY_ZSA; dirty |= VC4_DIRTY_ZSA;
break; break;
case QUNIFORM_SAMPLE_MASK:
dirty |= VC4_DIRTY_SAMPLE_MASK;
break;
} }
} }

View File

@@ -32,7 +32,8 @@ platform::platform() : adaptor_range(evals(), devs) {
for (pipe_loader_device *ldev : ldevs) { for (pipe_loader_device *ldev : ldevs) {
try { try {
devs.push_back(create<device>(*this, ldev)); if (ldev)
devs.push_back(create<device>(*this, ldev));
} catch (error &) { } catch (error &) {
pipe_loader_release(&ldev, 1); pipe_loader_release(&ldev, 1);
} }

View File

@@ -1446,6 +1446,7 @@ dri2_init_screen(__DRIscreen * sPriv)
struct pipe_screen *pscreen = NULL; struct pipe_screen *pscreen = NULL;
const struct drm_conf_ret *throttle_ret; const struct drm_conf_ret *throttle_ret;
const struct drm_conf_ret *dmabuf_ret; const struct drm_conf_ret *dmabuf_ret;
int fd = -1;
screen = CALLOC_STRUCT(dri_screen); screen = CALLOC_STRUCT(dri_screen);
if (!screen) if (!screen)
@@ -1457,7 +1458,10 @@ dri2_init_screen(__DRIscreen * sPriv)
sPriv->driverPrivate = (void *)screen; sPriv->driverPrivate = (void *)screen;
if (pipe_loader_drm_probe_fd(&screen->dev, dup(screen->fd))) if (screen->fd < 0 || (fd = dup(screen->fd)) < 0)
goto fail;
if (pipe_loader_drm_probe_fd(&screen->dev, fd))
pscreen = pipe_loader_create_screen(screen->dev); pscreen = pipe_loader_create_screen(screen->dev);
if (!pscreen) if (!pscreen)
@@ -1502,6 +1506,8 @@ fail:
dri_destroy_screen_helper(screen); dri_destroy_screen_helper(screen);
if (screen->dev) if (screen->dev)
pipe_loader_release(&screen->dev, 1); pipe_loader_release(&screen->dev, 1);
else
close(fd);
FREE(screen); FREE(screen);
return NULL; return NULL;
} }
@@ -1519,6 +1525,7 @@ dri_kms_init_screen(__DRIscreen * sPriv)
struct dri_screen *screen; struct dri_screen *screen;
struct pipe_screen *pscreen = NULL; struct pipe_screen *pscreen = NULL;
uint64_t cap; uint64_t cap;
int fd = -1;
screen = CALLOC_STRUCT(dri_screen); screen = CALLOC_STRUCT(dri_screen);
if (!screen) if (!screen)
@@ -1529,7 +1536,10 @@ dri_kms_init_screen(__DRIscreen * sPriv)
sPriv->driverPrivate = (void *)screen; sPriv->driverPrivate = (void *)screen;
if (pipe_loader_sw_probe_kms(&screen->dev, dup(screen->fd))) if (screen->fd < 0 || (fd = dup(screen->fd)) < 0)
goto fail;
if (pipe_loader_sw_probe_kms(&screen->dev, fd))
pscreen = pipe_loader_create_screen(screen->dev); pscreen = pipe_loader_create_screen(screen->dev);
if (!pscreen) if (!pscreen)
@@ -1557,6 +1567,8 @@ fail:
dri_destroy_screen_helper(screen); dri_destroy_screen_helper(screen);
if (screen->dev) if (screen->dev)
pipe_loader_release(&screen->dev, 1); pipe_loader_release(&screen->dev, 1);
else
close(fd);
FREE(screen); FREE(screen);
#endif // GALLIUM_SOFTPIPE #endif // GALLIUM_SOFTPIPE
return NULL; return NULL;

View File

@@ -28,10 +28,14 @@
#include "pipe/p_screen.h" #include "pipe/p_screen.h"
#include "util/u_video.h"
#include "vl/vl_winsys.h" #include "vl/vl_winsys.h"
#include "va_private.h" #include "va_private.h"
DEBUG_GET_ONCE_BOOL_OPTION(mpeg4, "VAAPI_MPEG4_ENABLED", false)
VAStatus VAStatus
vlVaQueryConfigProfiles(VADriverContextP ctx, VAProfile *profile_list, int *num_profiles) vlVaQueryConfigProfiles(VADriverContextP ctx, VAProfile *profile_list, int *num_profiles)
{ {
@@ -45,12 +49,16 @@ vlVaQueryConfigProfiles(VADriverContextP ctx, VAProfile *profile_list, int *num_
*num_profiles = 0; *num_profiles = 0;
pscreen = VL_VA_PSCREEN(ctx); pscreen = VL_VA_PSCREEN(ctx);
for (p = PIPE_VIDEO_PROFILE_MPEG2_SIMPLE; p <= PIPE_VIDEO_PROFILE_HEVC_MAIN_444; ++p) for (p = PIPE_VIDEO_PROFILE_MPEG2_SIMPLE; p <= PIPE_VIDEO_PROFILE_HEVC_MAIN_444; ++p) {
if (u_reduce_video_profile(p) == PIPE_VIDEO_FORMAT_MPEG4 && !debug_get_option_mpeg4())
continue;
if (pscreen->get_video_param(pscreen, p, PIPE_VIDEO_ENTRYPOINT_BITSTREAM, PIPE_VIDEO_CAP_SUPPORTED)) { if (pscreen->get_video_param(pscreen, p, PIPE_VIDEO_ENTRYPOINT_BITSTREAM, PIPE_VIDEO_CAP_SUPPORTED)) {
vap = PipeToProfile(p); vap = PipeToProfile(p);
if (vap != VAProfileNone) if (vap != VAProfileNone)
profile_list[(*num_profiles)++] = vap; profile_list[(*num_profiles)++] = vap;
} }
}
/* Support postprocessing through vl_compositor */ /* Support postprocessing through vl_compositor */
profile_list[(*num_profiles)++] = VAProfileNone; profile_list[(*num_profiles)++] = VAProfileNone;

View File

@@ -152,11 +152,15 @@ xa_tracker_create(int drm_fd)
struct xa_tracker *xa = calloc(1, sizeof(struct xa_tracker)); struct xa_tracker *xa = calloc(1, sizeof(struct xa_tracker));
enum xa_surface_type stype; enum xa_surface_type stype;
unsigned int num_formats; unsigned int num_formats;
int fd = -1;
if (!xa) if (!xa)
return NULL; return NULL;
if (pipe_loader_drm_probe_fd(&xa->dev, dup(drm_fd))) if (drm_fd < 0 || (fd = dup(drm_fd)) < 0)
goto out_no_fd;
if (pipe_loader_drm_probe_fd(&xa->dev, fd))
xa->screen = pipe_loader_create_screen(xa->dev); xa->screen = pipe_loader_create_screen(xa->dev);
if (!xa->screen) if (!xa->screen)
@@ -208,6 +212,9 @@ xa_tracker_create(int drm_fd)
out_no_screen: out_no_screen:
if (xa->dev) if (xa->dev)
pipe_loader_release(&xa->dev, 1); pipe_loader_release(&xa->dev, 1);
fd = -1;
out_no_fd:
close(fd);
free(xa); free(xa);
return NULL; return NULL;
} }

View File

@@ -31,6 +31,7 @@
#include "pipe/p_state.h" #include "pipe/p_state.h"
#include "target-helpers/drm_helper.h" #include "target-helpers/drm_helper.h"
#include "target-helpers/sw_helper.h"
#include "state_tracker/drm_driver.h" #include "state_tracker/drm_driver.h"
#include "d3dadapter/d3dadapter9.h" #include "d3dadapter/d3dadapter9.h"

View File

@@ -1,4 +1,5 @@
#include "target-helpers/drm_helper.h" #include "target-helpers/drm_helper.h"
#include "target-helpers/sw_helper.h"
#include "dri_screen.h" #include "dri_screen.h"

View File

@@ -1 +1,2 @@
#include "target-helpers/drm_helper.h" #include "target-helpers/drm_helper.h"
#include "target-helpers/sw_helper.h"

View File

@@ -20,7 +20,7 @@ lib@OPENCL_LIBNAME@_la_LIBADD = \
$(top_builddir)/src/gallium/auxiliary/libgallium.la \ $(top_builddir)/src/gallium/auxiliary/libgallium.la \
$(top_builddir)/src/util/libmesautil.la \ $(top_builddir)/src/util/libmesautil.la \
$(ELF_LIB) \ $(ELF_LIB) \
-ldl \ $(DLOPEN_LIBS) \
-lclangCodeGen \ -lclangCodeGen \
-lclangFrontendTool \ -lclangFrontendTool \
-lclangFrontend \ -lclangFrontend \

View File

@@ -1 +1,2 @@
#include "target-helpers/drm_helper.h" #include "target-helpers/drm_helper.h"
#include "target-helpers/sw_helper.h"

View File

@@ -1 +1,2 @@
#include "target-helpers/drm_helper.h" #include "target-helpers/drm_helper.h"
#include "target-helpers/sw_helper.h"

View File

@@ -1 +1,2 @@
#include "target-helpers/drm_helper.h" #include "target-helpers/drm_helper.h"
#include "target-helpers/sw_helper.h"

View File

@@ -1 +1,2 @@
#include "target-helpers/drm_helper.h" #include "target-helpers/drm_helper.h"
#include "target-helpers/sw_helper.h"

View File

@@ -1737,7 +1737,7 @@ ast_function_expression::handle_method(exec_list *instructions,
result = new(ctx) ir_constant(op->type->array_size()); result = new(ctx) ir_constant(op->type->array_size());
} }
} else if (op->type->is_vector()) { } else if (op->type->is_vector()) {
if (state->ARB_shading_language_420pack_enable) { if (state->has_420pack()) {
/* .length() returns int. */ /* .length() returns int. */
result = new(ctx) ir_constant((int) op->type->vector_elements); result = new(ctx) ir_constant((int) op->type->vector_elements);
} else { } else {
@@ -1746,7 +1746,7 @@ ast_function_expression::handle_method(exec_list *instructions,
goto fail; goto fail;
} }
} else if (op->type->is_matrix()) { } else if (op->type->is_matrix()) {
if (state->ARB_shading_language_420pack_enable) { if (state->has_420pack()) {
/* .length() returns int. */ /* .length() returns int. */
result = new(ctx) ir_constant((int) op->type->matrix_columns); result = new(ctx) ir_constant((int) op->type->matrix_columns);
} else { } else {
@@ -2075,7 +2075,7 @@ ast_aggregate_initializer::hir(exec_list *instructions,
} }
const glsl_type *const constructor_type = this->constructor_type; const glsl_type *const constructor_type = this->constructor_type;
if (!state->ARB_shading_language_420pack_enable) { if (!state->has_420pack()) {
_mesa_glsl_error(&loc, state, "C-style initialization requires the " _mesa_glsl_error(&loc, state, "C-style initialization requires the "
"GL_ARB_shading_language_420pack extension"); "GL_ARB_shading_language_420pack extension");
return ir_rvalue::error_value(ctx); return ir_rvalue::error_value(ctx);

View File

@@ -2649,7 +2649,9 @@ apply_explicit_binding(struct _mesa_glsl_parse_state *state,
return; return;
} }
} else if (state->is_version(420, 310) && base_type->is_image()) { } else if ((state->is_version(420, 310) ||
state->ARB_shading_language_420pack_enable) &&
base_type->is_image()) {
assert(ctx->Const.MaxImageUnits <= MAX_IMAGE_UNITS); assert(ctx->Const.MaxImageUnits <= MAX_IMAGE_UNITS);
if (max_index >= ctx->Const.MaxImageUnits) { if (max_index >= ctx->Const.MaxImageUnits) {
_mesa_glsl_error(loc, state, "Image binding %d exceeds the " _mesa_glsl_error(loc, state, "Image binding %d exceeds the "
@@ -3736,7 +3738,7 @@ process_initializer(ir_variable *var, ast_declaration *decl,
* expressions. Const-qualified global variables must still be * expressions. Const-qualified global variables must still be
* initialized with constant expressions. * initialized with constant expressions.
*/ */
if (!state->ARB_shading_language_420pack_enable if (!state->has_420pack()
|| state->current_function == NULL) { || state->current_function == NULL) {
_mesa_glsl_error(& initializer_loc, state, _mesa_glsl_error(& initializer_loc, state,
"initializer of %s variable `%s' must be a " "initializer of %s variable `%s' must be a "
@@ -5365,7 +5367,7 @@ ast_jump_statement::hir(exec_list *instructions,
if (state->current_function->return_type != ret_type) { if (state->current_function->return_type != ret_type) {
YYLTYPE loc = this->get_location(); YYLTYPE loc = this->get_location();
if (state->ARB_shading_language_420pack_enable) { if (state->has_420pack()) {
if (!apply_implicit_conversion(state->current_function->return_type, if (!apply_implicit_conversion(state->current_function->return_type,
ret, state)) { ret, state)) {
_mesa_glsl_error(& loc, state, _mesa_glsl_error(& loc, state,

View File

@@ -948,7 +948,7 @@ parameter_qualifier:
if (($1.flags.q.in || $1.flags.q.out) && ($2.flags.q.in || $2.flags.q.out)) if (($1.flags.q.in || $1.flags.q.out) && ($2.flags.q.in || $2.flags.q.out))
_mesa_glsl_error(&@1, state, "duplicate in/out/inout qualifier"); _mesa_glsl_error(&@1, state, "duplicate in/out/inout qualifier");
if (!state->has_420pack() && $2.flags.q.constant) if (!state->has_420pack_or_es31() && $2.flags.q.constant)
_mesa_glsl_error(&@1, state, "in/out/inout must come after const " _mesa_glsl_error(&@1, state, "in/out/inout must come after const "
"or precise"); "or precise");
@@ -960,7 +960,7 @@ parameter_qualifier:
if ($2.precision != ast_precision_none) if ($2.precision != ast_precision_none)
_mesa_glsl_error(&@1, state, "duplicate precision qualifier"); _mesa_glsl_error(&@1, state, "duplicate precision qualifier");
if (!(state->has_420pack() || state->is_version(420, 310)) && if (!state->has_420pack_or_es31() &&
$2.flags.i != 0) $2.flags.i != 0)
_mesa_glsl_error(&@1, state, "precision qualifiers must come last"); _mesa_glsl_error(&@1, state, "precision qualifiers must come last");
@@ -1482,7 +1482,7 @@ layout_qualifier_id:
$$.index = $3; $$.index = $3;
} }
if ((state->has_420pack() || if ((state->has_420pack_or_es31() ||
state->has_atomic_counters() || state->has_atomic_counters() ||
state->has_shader_storage_buffer_objects()) && state->has_shader_storage_buffer_objects()) &&
match_layout_qualifier("binding", $1, state) == 0) { match_layout_qualifier("binding", $1, state) == 0) {
@@ -1714,7 +1714,7 @@ type_qualifier:
if ($2.flags.q.invariant) if ($2.flags.q.invariant)
_mesa_glsl_error(&@1, state, "duplicate \"invariant\" qualifier"); _mesa_glsl_error(&@1, state, "duplicate \"invariant\" qualifier");
if (!state->has_420pack() && $2.flags.q.precise) if (!state->has_420pack_or_es31() && $2.flags.q.precise)
_mesa_glsl_error(&@1, state, _mesa_glsl_error(&@1, state,
"\"invariant\" must come after \"precise\""); "\"invariant\" must come after \"precise\"");
@@ -1747,7 +1747,7 @@ type_qualifier:
if ($2.has_interpolation()) if ($2.has_interpolation())
_mesa_glsl_error(&@1, state, "duplicate interpolation qualifier"); _mesa_glsl_error(&@1, state, "duplicate interpolation qualifier");
if (!state->has_420pack() && if (!state->has_420pack_or_es31() &&
($2.flags.q.precise || $2.flags.q.invariant)) { ($2.flags.q.precise || $2.flags.q.invariant)) {
_mesa_glsl_error(&@1, state, "interpolation qualifiers must come " _mesa_glsl_error(&@1, state, "interpolation qualifiers must come "
"after \"precise\" or \"invariant\""); "after \"precise\" or \"invariant\"");
@@ -1767,7 +1767,7 @@ type_qualifier:
* precise qualifiers since these are useful in ARB_separate_shader_objects. * precise qualifiers since these are useful in ARB_separate_shader_objects.
* There is no clear spec guidance on this either. * There is no clear spec guidance on this either.
*/ */
if (!state->has_420pack() && $2.has_layout()) if (!state->has_420pack_or_es31() && $2.has_layout())
_mesa_glsl_error(&@1, state, "duplicate layout(...) qualifiers"); _mesa_glsl_error(&@1, state, "duplicate layout(...) qualifiers");
$$ = $1; $$ = $1;
@@ -1785,7 +1785,7 @@ type_qualifier:
"duplicate auxiliary storage qualifier (centroid or sample)"); "duplicate auxiliary storage qualifier (centroid or sample)");
} }
if (!state->has_420pack() && if (!state->has_420pack_or_es31() &&
($2.flags.q.precise || $2.flags.q.invariant || ($2.flags.q.precise || $2.flags.q.invariant ||
$2.has_interpolation() || $2.has_layout())) { $2.has_interpolation() || $2.has_layout())) {
_mesa_glsl_error(&@1, state, "auxiliary storage qualifiers must come " _mesa_glsl_error(&@1, state, "auxiliary storage qualifiers must come "
@@ -1803,7 +1803,7 @@ type_qualifier:
if ($2.has_storage()) if ($2.has_storage())
_mesa_glsl_error(&@1, state, "duplicate storage qualifier"); _mesa_glsl_error(&@1, state, "duplicate storage qualifier");
if (!state->has_420pack() && if (!state->has_420pack_or_es31() &&
($2.flags.q.precise || $2.flags.q.invariant || $2.has_interpolation() || ($2.flags.q.precise || $2.flags.q.invariant || $2.has_interpolation() ||
$2.has_layout() || $2.has_auxiliary_storage())) { $2.has_layout() || $2.has_auxiliary_storage())) {
_mesa_glsl_error(&@1, state, "storage qualifiers must come after " _mesa_glsl_error(&@1, state, "storage qualifiers must come after "
@@ -1819,7 +1819,7 @@ type_qualifier:
if ($2.precision != ast_precision_none) if ($2.precision != ast_precision_none)
_mesa_glsl_error(&@1, state, "duplicate precision qualifier"); _mesa_glsl_error(&@1, state, "duplicate precision qualifier");
if (!(state->has_420pack() || state->is_version(420, 310)) && if (!(state->has_420pack_or_es31()) &&
$2.flags.i != 0) $2.flags.i != 0)
_mesa_glsl_error(&@1, state, "precision qualifiers must come last"); _mesa_glsl_error(&@1, state, "precision qualifiers must come last");
@@ -2575,7 +2575,7 @@ interface_block:
{ {
ast_interface_block *block = (ast_interface_block *) $2; ast_interface_block *block = (ast_interface_block *) $2;
if (!state->has_420pack() && block->layout.has_layout() && if (!state->has_420pack_or_es31() && block->layout.has_layout() &&
!block->layout.is_default_qualifier) { !block->layout.is_default_qualifier) {
_mesa_glsl_error(&@1, state, "duplicate layout(...) qualifiers"); _mesa_glsl_error(&@1, state, "duplicate layout(...) qualifiers");
YYERROR; YYERROR;

View File

@@ -477,7 +477,7 @@ _mesa_glsl_msg(const YYLTYPE *locp, _mesa_glsl_parse_state *state,
struct gl_context *ctx = state->ctx; struct gl_context *ctx = state->ctx;
/* Report the error via GL_ARB_debug_output. */ /* Report the error via GL_ARB_debug_output. */
_mesa_shader_debug(ctx, type, &msg_id, msg, strlen(msg)); _mesa_shader_debug(ctx, type, &msg_id, msg);
ralloc_strcat(&state->info_log, "\n"); ralloc_strcat(&state->info_log, "\n");
} }

View File

@@ -255,6 +255,11 @@ struct _mesa_glsl_parse_state {
return ARB_shading_language_420pack_enable || is_version(420, 0); return ARB_shading_language_420pack_enable || is_version(420, 0);
} }
bool has_420pack_or_es31() const
{
return ARB_shading_language_420pack_enable || is_version(420, 310);
}
bool has_compute_shader() const bool has_compute_shader() const
{ {
return ARB_compute_shader_enable || is_version(430, 310); return ARB_compute_shader_enable || is_version(430, 310);

View File

@@ -57,8 +57,7 @@ _mesa_ast_field_selection_to_hir(const ast_expression *expr,
expr->primary_expression.identifier); expr->primary_expression.identifier);
} }
} else if (op->type->is_vector() || } else if (op->type->is_vector() ||
(state->ARB_shading_language_420pack_enable && (state->has_420pack() && op->type->is_scalar())) {
op->type->is_scalar())) {
ir_swizzle *swiz = ir_swizzle::create(op, ir_swizzle *swiz = ir_swizzle::create(op,
expr->primary_expression.identifier, expr->primary_expression.identifier,
op->type->vector_elements); op->type->vector_elements);

View File

@@ -1669,6 +1669,7 @@ ir_variable::ir_variable(const struct glsl_type *type, const char *name,
this->data.pixel_center_integer = false; this->data.pixel_center_integer = false;
this->data.depth_layout = ir_depth_layout_none; this->data.depth_layout = ir_depth_layout_none;
this->data.used = false; this->data.used = false;
this->data.always_active_io = false;
this->data.read_only = false; this->data.read_only = false;
this->data.centroid = false; this->data.centroid = false;
this->data.sample = false; this->data.sample = false;

View File

@@ -658,6 +658,13 @@ public:
*/ */
unsigned assigned:1; unsigned assigned:1;
/**
* When separate shader programs are enabled, only input/outputs between
* the stages of a multi-stage separate program can be safely removed
* from the shader interface. Other input/outputs must remains active.
*/
unsigned always_active_io:1;
/** /**
* Enum indicating how the variable was declared. See * Enum indicating how the variable was declared. See
* ir_var_declaration_type. * ir_var_declaration_type.

Some files were not shown because too many files have changed in this diff Show More