This fixes a hang in
piglit/arb_blend_func_extended-fbo-extended-blend-pattern_gles2 on REDWOOD.
Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
(cherry-picked from commit b5b87c4ed1)
Conflicts:
src/gallium/drivers/r600/r600_shader.c
This reverts commit 34cbde2e63.
As mentioned in the beginning of this revert series - let's pull the lot
out, as they cause regressions.
Additionally they are bugfixes (as opposed to regression fixes), which
if needed will need to be reworked.
This reverts commit 4acb394f45.
As discussed with Jason on IRC. Earlier commit in the series, causes
regression, and "there's no point in having the others in there, if we
cannot get to the last patch."
This patch modifies the SSE4.1 test in configure.ac to use a global
variable to initialize vector variables. In addition, we now return the
value of the computation instead of 0.
This is done so gcc 4.9 (and lower) won't optimize the SSE4.1 assembly
instructions (when using -O1 and higher), because then the configure test
might incorrectly pass even though the assembler doesn't support the
SSE4.1 instructions (the test will pass because the compiler does support the intrinsics).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91806
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 6e44bbe0f5)
GL_DRAW_FRAMEBUFFER does not exist in OpenGL ES 1.x, and since
_mesa_meta_begin hasn't been called yet, we have to work-around API
difficulties. The whole reason that GL_DRAW_FRAMEBUFFER is used instead
of GL_FRAMEBUFFER is that the read framebuffer may be different. This
is moot in OpenGL ES 1.x.
I have another patch series that would also fix this (by removing the
calls to _mesa_BindFramebuffer and friends), but it's not quite ready
yet... and I think it may be a bit heavy for some stable branches.
Consider this a stop-gap fix.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93215
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
(cherry picked from commit 96dc732ed8)
GRID Autosport uses SSO shaders. When a tessellation evaluation shader
is passed through this, it triggers assertion failures down the line
with unassigned varying locations. Make sure to do this when the first
shader in the pipeline is not a vertex shader.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit eca8f38dcf)
commit 839793680f "MESA_FORMAT_B8G8R8X8_SRGB for RGB visuals" causes a
handfull of regressions, some of which listed in fdo#92759.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
brw_init_surface_formats overrides the render format for RGBX formats
which aren't supported for rendering so that they internally use RGBA
instead. However, B8G8R8X8_SRGB was missing so it wasn't marked as a
renderable format. This patch just adds it.
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 43f4be5f06)
This is the recommended setting according to hw people and it makes Hyper-Z
stable. Just the two magic states.
This fixes Evergreen, Cayman, SI, CI, VI (using the Cayman code).
Cc: 11.0 11.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit d3c08309ab)
[Emil Velikov: s/radeon_set_context_reg/r600_write_context_reg/g]
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Conflicts:
src/gallium/drivers/r600/evergreen_state.c
src/gallium/drivers/radeon/cayman_msaa.c
If we have a dmat2[4], then dmat2[0] is at 17, dmat2[1] at 19,
dmat2[2] at 21 etc. The old code was returning 17,18,19.
I think this code is also wrong for float matricies as well.
There is now a piglit for the float case.
This partly fixes:
GL41-CTS.vertex_attrib_64bit.limits_test
[airlied: update with Tapani suggestion to clean it up].
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 18ad641c3b)
In case a state tracker unbinds every slot by a seperate
pipe->set_vertex_buffers() call, starting from slot zero, the number
of bound buffers would not reach zero at all.
The current algorithm does not account for pre-existing holes in the
buffer list.
Unbinding all buffers at once or starting at the top-most slot results
in correct behaviour.
Calculating the correct number of bound buffers fixes a NULL pointer
dereference in nvc0_validate_vertex_buffers_shared().
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93004
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 79bff488bc)
Doesn't have any effect in practice I don't think, but
CTS reads back using GetVertexAttrib.
This fixes: GL41-CTS.vertex_attrib_64bit.get_vertex_attrib
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 21abaad8fe)
This fixes:
glsl-1.50/execution/geometry/dynamic_input_array_index.shader_test
my profanity.
We need to load the AR register with the value from the index reg
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit cce3864046)
These utilities are to be used to do things like integer adds and
multiplies to be used in calculating the LDS offsets etc.
It handles CAYMAN MULLO differences as well.
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 0696ebc899)
[Emil Velikov: requred by the next commit]
Nominated-by: Emil Velikov <emil.velikov@collabora.com>
The only effect here is a space savings - 822 programs in shader-db
affected with the following overall change:
total bytes used in shared programs : 44154976 -> 44139880 (-0.03%)
Fixes: 641eda0c (nv50/ir: r63 is only 0 if we are using less than 63 registers)
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit f920f8eb02)
The one and only place where the FS backend allows reladdr is on uniforms.
For locals, inputs, and outputs, we lower it away before the backend ever
sees it. This commit gets rid of the dead indirect handling code.
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 22c273de2b)
[Emil Velikov: squash trivial conflicts]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Conflicts:
src/mesa/drivers/dri/i965/brw_fs_nir.cpp
Previously, the VS_OPCODE_PULL_CONSTANT_LOAD opcode operated on
vec4-aligned byte offsets on Iron Lake and below and worked in terms of
vec4 offsets on Sandy Bridge. On Ivy Bridge, we add a new *LOAD_GEN7
variant which works in terms of vec4s. We're about to change the GEN7
version to work in terms of bytes, so this is a nice unification.
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit e3e70698c3)
[Emil Velikov: s/brw_imm_ud/src_reg/g ,s/offset.ud/offset.dw1.ud/ ]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
According to nvdisasm both the immediate and non-imm cases use the same
bits. Both of these flags are quite rarely set though.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 1d708aacb7)
We actually leave the sampler unset for OP_TXF, which caused the GK104+
logic to treat some texel fetches as indirect. While this works, it's
incredibly wasteful. This only happened when the texture was > 0 (since
sampler remained == 0).
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 63b850403c)
A situation where there's a 128-bit load where the last component gets
DCE'd causes a 96-bit load to be generated, which no GPU can actually
emit. Avoid generating such instructions by scaling back to 64-bit on
the first load when splitting.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 49692f86a1)
Atomic counters and Images were using ctx::Shader that does not take in
to account program pipeline changes, ctx::_Shader must be used for SSO to
work. Commit c0347705 already changed ubo's to use this.
Fixes failures seen with following Piglit test:
arb_separate_shader_object-atomic-counter
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 231db5869c)
For example if there are only returns, the break bb will not end up part
of the CFG. However there will have been a prebreak already emitted for
it, and when hitting the RET that comes after, we will try to insert the
current (i.e. break) BB into the graph even though it will be
unreachable. This makes the SSA code sad.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit adcc547bfb)
While we correctly set output[] for composite varyings, we set completely
bogus values for output_components[], making emit_urb_writes() output
zeros instead of the actual values.
Unfortunately, our simple approach goes out the window, and we need to
recurse into structs to get the proper value of vector_elements for each
field.
Together with the previous patch, this fixes rendering in an upcoming
game from Feral Interactive.
v2: Use pointers instead of pass-by-mutable-reference (Jason, Matt).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit 3810c15614)
Apparently we have literally no support for FS varying struct inputs.
This is somewhat surprising, given that we've had tests for that very
feature that have been passing for a long time.
Normally, varying packing splits up structures for us, so we don't see
them in the backend. However, with SSO, varying packing isn't around
to save us, and we get actual structs that we have to handle.
This patch changes fs_visitor::emit_general_interpolation() to work
recursively, properly handling nested structs/arrays/and so on.
(It's easier to read with diff -b, as indentation changes.)
When using the vec4 VS backend, this fixes rendering in an upcoming
game from Feral Interactive. (The scalar VS backend requires additional
bug fixes in the next patch.)
v2: Use pointers instead of pass-by-mutable-reference (Jason, Matt)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit 3e9003e9cf)
Just point the hw to valid memory.
This fixes hangs in piglit/depthstencil-render-miplevel tests.
What's even more bizzare is that the hanging tests report "skip".
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
The compiler has more information and is able to optimize the bits
it sets in these registers.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
CC: <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 89851a2965)
[Emil Velikov: squash trivial conflict]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Conflicts:
src/gallium/drivers/radeonsi/si_compute.c
In the future, these will be used by other shaders types.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 95e0510916)
[Emil Velikov: squash trivial conflicts]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Conflicts:
src/gallium/drivers/radeonsi/si_state_draw.c
src/gallium/drivers/radeonsi/si_state_shaders.c
Looks like a4xx hw does this in a more standard way and we don't need to
hack around it like we do on a3xx. Fixes GL_ALPHA formats in
fbo-blending-formats, fbo-colormask-formats, and fbo-alphatest-formats.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit ff9450ecd1)
[Emil Velikov: squash trivial conflict]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Conflicts:
src/gallium/drivers/freedreno/a4xx/fd4_program.c
This fixes teximage-colors, fbo-generatemipmap-formats, and probably
others (in relation to the RGB5 formats, others still fail).
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 769b3ab6c5)
The lower layers assume that we support this, and it's been core since
GL 1.4. This fixes a slew of piglit tests, especially around
tex-miplevel-selection.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 0a4462ad6e)
This is a very rudimentary script that checks if any of the applied
cherry-picks have been referenced (fixed?) by another patch. With the
latter either missing the stable tag or hasn't yet been picked.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Curiously this has no actual effect. I think it's because the first 8
textures are bound in multiple slots for some reason. However seems
prudent to use these the same way as regular texturing, esp in the case
where there are more than 8 textures bound.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
(cherry picked from commit 5877a594d5)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93110
eglSwapBuffersWithDamage accepts damage-region rectangles to hint the
compositor that it only needs to redraw certain areas, which was passed
through the wl_surface_damage request, as designed.
Wayland also offers a buffer transformation interface, e.g. to allow
users to render pre-rotated buffers. Unfortunately, there is no way to
query buffer transforms, and the damage region was provided in surface,
rather than buffer, co-ordinate space.
Users could in theory account for this themselves, but EGL also requires
co-ordinates to be passed in GL/mathematical co-ordinate space, with an
inversion to Wayland's natural/scanout co-ordinate space, so
transformations other than a 180-degree rotation will fail as EGL
attempts to subtract the region from (its view of the) surface height.
Pending creation and acceptance of a wl_surface.buffer_damage request,
which will accept co-ordinates in buffer co-ordinate space, pessimise to
always sending full-surface damage.
bce64c6c provides the explanation for why we send maximum-range damage,
rather than the full size of the surface: in the presence of buffer
transformations, full-surface damage may not actually cover the entire
surface.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
(cherry picked from commit d1314de293)
We need to emit at least one cut/emit in every
geometry shader, the easiest workaround it to
stick a single CUT at the top of each geom shader.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "10.6 11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 4f34722575)
[Emil Velikov: squash trivial conflict]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Conflicts:
src/gallium/drivers/r600/r600_shader.c
This should fix the getteximage-depth test that currently asserts.
I was hitting problem with virgl as well in this area.
This moves the 1D array handling code to a single place.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ben Skeggs <bskeggs@redhat.com>
Cc: "10.6 11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 237bcdbab5)
In case that the buffer has no bind at all, assume it can be a regular
buffer. This can happen on buffers created through the ARB_dsa
interfaces.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit ad5f6b03e7)
With ARB_direct_state_access, buffers can be created without any binding
hints at all. We still need to allocate these buffers to VRAM or GART,
as we don't have logic down the line to place them into GPU-mappable
space. Ideally we'd be able to shift these things around based on usage.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92438
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 079f713754)
In nv50, and in the python script that Rob circulated, we do:
bld.mkCmp(OP_SET, CC_GE, TYPE_U32, (s = bld.getSSA()), TYPE_U32, m, b);
Do the same in the nir div lowering pass. This fixes the large-udiv-udiv
piglit tests on freedreno.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Rob Clark <robclark@freedesktop.org>
(cherry picked from commit b40e144a66)
This patch disables the use of VSX instructions, as they cause some
piglit tests to fail
For more details, see: https://llvm.org/bugs/show_bug.cgi?id=25503#c7
With this patch, ppc64le reaches parity with x86-64 as far as piglit test
suite is concerned.
v2:
- Added check that we have at least LLVM 3.4
- Added the LLVM bug URL as a comment in the code
v3:
- Only disable VSX if Altivec is supported, because if Altivec support
is missing, then VSX support doesn't exist anyway.
- Change original patch description.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
(cherry picked from commit 4581f8428e)
A nomination unadorned with a specific version is now interpreted as
being aimed at the 11,0 branch, which was recently opened.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
tl;dr: For many types of GL object, we can *NEVER* use the Gen function.
In OpenGL ES (all versions!) and OpenGL compatibility profile,
applications don't have to call Gen functions. The GL spec is very
clear about how you can mix-and-match generated names and non-generated
names: you can use any name you want for a particular object type until
you call the Gen function for that object type.
Here's the problem scenario:
- Application calls a meta function that generates a name. The first
Gen will probably return 1.
- Application decides to use the same name for an object of the same
type without calling Gen. Many demo programs use names 1, 2, 3,
etc. without calling Gen.
- Application calls the meta function again, and the meta function
replaces the data. The application's data is lost, and the app
fails. Have fun debugging that.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92363
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
(cherry picked from commit 58aa56d40b)
tl;dr: For many types of GL object, we can *NEVER* use the Gen function.
In OpenGL ES (all versions!) and OpenGL compatibility profile,
applications don't have to call Gen functions. The GL spec is very
clear about how you can mix-and-match generated names and non-generated
names: you can use any name you want for a particular object type until
you call the Gen function for that object type.
Here's the problem scenario:
- Application calls a meta function that generates a name. The first
Gen will probably return 1.
- Application decides to use the same name for an object of the same
type without calling Gen. Many demo programs use names 1, 2, 3,
etc. without calling Gen.
- Application calls the meta function again, and the meta function
replaces the data. The application's data is lost, and the app
fails. Have fun debugging that.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92363
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
(cherry picked from commit 76cfe2bc44)
tl;dr: For many types of GL object, we can *NEVER* use the Gen function.
In OpenGL ES (all versions!) and OpenGL compatibility profile,
applications don't have to call Gen functions. The GL spec is very
clear about how you can mix-and-match generated names and non-generated
names: you can use any name you want for a particular object type until
you call the Gen function for that object type.
Here's the problem scenario:
- Application calls a meta function that generates a name. The first
Gen will probably return 1.
- Application decides to use the same name for an object of the same
type without calling Gen. Many demo programs use names 1, 2, 3,
etc. without calling Gen.
- Application calls the meta function again, and the meta function
replaces the data. The application's data is lost, and the app
fails. Have fun debugging that.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92363
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
(cherry picked from commit 37d11b13ce)
The fixed-function attribute paths don't get the DSA treatment because
there are no DSA entry-points for fixed-function attributes. These
could have been added, but this is a temporary patch intended to make
later patches easier to review.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
(cherry picked from commit 52921f8e08)
Meta currently does this, but future changes will make this impossible.
Explicitly do it as a step in the patch series now to catch any possible
kinks.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
(cherry picked from commit 3b5a7d450d)
tl;dr: For many types of GL object, we can *NEVER* use the Gen function.
In OpenGL ES (all versions!) and OpenGL compatibility profile,
applications don't have to call Gen functions. The GL spec is very
clear about how you can mix-and-match generated names and non-generated
names: you can use any name you want for a particular object type until
you call the Gen function for that object type.
Here's the problem scenario:
- Application calls a meta function that generates a name. The first
Gen will probably return 1.
- Application decides to use the same name for an object of the same
type without calling Gen. Many demo programs use names 1, 2, 3,
etc. without calling Gen.
- Application calls the meta function again, and the meta function
replaces the data. The application's data is lost, and the app
fails. Have fun debugging that.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92363
Reviewed-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
(cherry picked from commit 4e6b9c11fc)
Instead of going through the GL API implementation functions, use the
lower-level functions. This means that we have to keep track of a
pointer to the gl_buffer_object and the gl_vertex_array_object.
This has two advantages. First, it avoids a bunch of CPU overhead in
looking up objects and validing API parameters. Second, and much more
importantly, it will allow us to stop calling _mesa_GenBuffers /
_mesa_CreateBuffers and pollute the buffer namespace (next patch).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
(cherry picked from commit e62799bd4e)
Future patches will use the brw_context instead. Keeping this
non-functional change separate should make the function changes easier
to review.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
(cherry picked from commit dcadd855f1)
Pulls the parts of enable_vertex_array_attrib that aren't just parameter
validation out into a function that can be called from other parts of
Mesa (e.g., meta).
_mesa_enable_vertex_array_attrib can also be used to enable
fixed-function arrays.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
(cherry picked from commit 4a644f1caa)
Pulls the parts of update_array_format that aren't just parameter
validation out into a function that can be called from other parts of
Mesa (e.g., meta).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
(cherry picked from commit a336fcd36a)
If the user is specifying a subregion of a buffer using SKIP_ROWS and
SKIP_PIXELS, we must compute the buffer size carefully as the end of the
last row may be much shorter than stride*image_height*depth. The current
code tries to memcpy from beyond the end of the user data, for example
causing:
==28136== Invalid read of size 8
==28136== at 0x4C2D94E: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:915)
==28136== by 0xB4ADFE3: brw_bo_write (brw_batch.c:1856)
==28136== by 0xB5B3531: brw_buffer_data (intel_buffer_objects.c:208)
==28136== by 0xB0F6275: _mesa_buffer_data (bufferobj.c:1600)
==28136== by 0xB0F6346: _mesa_BufferData (bufferobj.c:1631)
==28136== by 0xB37A1EE: create_texture_for_pbo (meta_tex_subimage.c:103)
==28136== by 0xB37A467: _mesa_meta_pbo_TexSubImage (meta_tex_subimage.c:176)
==28136== by 0xB5C8D61: intelTexSubImage (intel_tex_subimage.c:195)
==28136== by 0xB254AB4: _mesa_texture_sub_image (teximage.c:3654)
==28136== by 0xB254C9F: texsubimage (teximage.c:3712)
==28136== by 0xB2550E9: _mesa_TexSubImage2D (teximage.c:3853)
==28136== by 0x401CA0: UploadTexSubImage2D (teximage.c:171)
==28136== Address 0xd8bfbe0 is 0 bytes after a block of size 1,024 alloc'd
==28136== at 0x4C28C20: malloc (vg_replace_malloc.c:296)
==28136== by 0x402014: PerfDraw (teximage.c:270)
==28136== by 0x402648: Draw (glmain.c:182)
==28136== by 0x8385E63: ??? (in /usr/lib/x86_64-linux-gnu/libglut.so.3.9.0)
==28136== by 0x83896C8: fgEnumWindows (in /usr/lib/x86_64-linux-gnu/libglut.so.3.9.0)
==28136== by 0x838641C: glutMainLoopEvent (in /usr/lib/x86_64-linux-gnu/libglut.so.3.9.0)
==28136== by 0x8386C1C: glutMainLoop (in /usr/lib/x86_64-linux-gnu/libglut.so.3.9.0)
==28136== by 0x4019C1: main (glmain.c:262)
==28136==
==28136== Invalid read of size 8
==28136== at 0x4C2D940: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:915)
==28136== by 0xB4ADFE3: brw_bo_write (brw_batch.c:1856)
==28136== by 0xB5B3531: brw_buffer_data (intel_buffer_objects.c:208)
==28136== by 0xB0F6275: _mesa_buffer_data (bufferobj.c:1600)
==28136== by 0xB0F6346: _mesa_BufferData (bufferobj.c:1631)
==28136== by 0xB37A1EE: create_texture_for_pbo (meta_tex_subimage.c:103)
==28136== by 0xB37A467: _mesa_meta_pbo_TexSubImage (meta_tex_subimage.c:176)
==28136== by 0xB5C8D61: intelTexSubImage (intel_tex_subimage.c:195)
==28136== by 0xB254AB4: _mesa_texture_sub_image (teximage.c:3654)
==28136== by 0xB254C9F: texsubimage (teximage.c:3712)
==28136== by 0xB2550E9: _mesa_TexSubImage2D (teximage.c:3853)
==28136== by 0x401CA0: UploadTexSubImage2D (teximage.c:171)
==28136== Address 0xd8bfbe8 is 8 bytes after a block of size 1,024 alloc'd
==28136== at 0x4C28C20: malloc (vg_replace_malloc.c:296)
==28136== by 0x402014: PerfDraw (teximage.c:270)
==28136== by 0x402648: Draw (glmain.c:182)
==28136== by 0x8385E63: ??? (in /usr/lib/x86_64-linux-gnu/libglut.so.3.9.0)
==28136== by 0x83896C8: fgEnumWindows (in /usr/lib/x86_64-linux-gnu/libglut.so.3.9.0)
==28136== by 0x838641C: glutMainLoopEvent (in /usr/lib/x86_64-linux-gnu/libglut.so.3.9.0)
==28136== by 0x8386C1C: glutMainLoop (in /usr/lib/x86_64-linux-gnu/libglut.so.3.9.0)
==28136== by 0x4019C1: main (glmain.c:262)
==28136==
Fixes regression from commit 7f396189f0
Author: Jason Ekstrand <jason.ekstrand@intel.com>
Date: Mon Jan 5 18:17:04 2015 -0800
meta: Add a BlitFramebuffers-based implementation of TexSubImage
v2: However, the teximage we create does need to be width x full_height x 1
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jason Ekstrand <jason.ekstrand@intel.com>
Cc: Neil Roberts <neil@linux.intel.com>
Reviewed-by Neil Roberts <neil@linux.intel.com>
(cherry picked from commit f30cf3258e)
With llvm 3.7 semi-dropping the autoconf build, we rely on their cmake
build. With the latter of which annoyingly using another (busted?)
SONAME.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit c45b4257c2)
There are currently two methods in llvmpipe code to calculate coeffs to
be used as inputs for the fragment shader. The two methods use slightly
different ways to do the floating point calculations and thus produce
slightly different results.
The decision which method to use is determined by the size of the vector
that is used by the platform.
For vectors with size of more than 128bit, a single-step method is used,
in which coeffs_init_simple() + attribs_update_simple() are called.
For vectors with size of 128bit or less, a two-step method is used, in
which coeffs_init() + attribs_update() are called.
This causes some piglit tests (clip-distance-bulk-copy,
interface-vs-unnamed-to-fs-unnamed) to fail when using platforms with
128bit vectors (such as ppc64le or x86-64 without AVX).
This patch makes platforms with 128bit vectors use the single-step
method (aka "simple" method) instead of the two-step method.
This would make the resulting coeffs identical between more platforms,
make sure the piglit tests passes, and make debugging and maintainability
a bit easier as the generated LLVM IR will be the same for more platforms.
The performance impact is negligible for x86-64 without AVX, and
basically non-existent for ppc64le, as it can be seen from the following
benchmarking results:
- glxspheres, on ppc64le:
- original code: 4.892745317 frames/sec 5.460303857 Mpixels/sec
- with the patch: 4.932083873 frames/sec 5.504205571 Mpixels/sec
- Additional 0.8% performance boost
- glxspheres, on x86-64 without AVX:
- original code: 20.16418809 frames/sec 22.50323395 Mpixels/sec
- with the patch: 20.31328989 frames/sec 22.66963152 Mpixels/sec
- Additional 0.74% performance boost
- glmark2, on ppc64le:
- original code: score of 58
- with my change: score of 57
- glmark2, on x86-64 without AVX:
- original code: score of 175
- with the patch: score of 167
- Impact of of -4.5% on performance
- OpenArena, on ppc64le:
- original code: 3398 frames 1719.0 seconds 2.0 fps
255.0/505.9/2773.0/0.0 ms
- with the patch: 3398 frames 1690.4 seconds 2.0 fps
241.0/497.5/2563.0/0.2 ms
- 29 seconds faster with the patch, which is about 2%
- OpenArena, on x86-64 without AVX:
- original code: 3398 frames 239.6 seconds 14.2 fps
38.0/70.5/719.0/14.6 ms
- with the patch: 3398 frames 244.4 seconds 13.9 fps
38.0/71.9/697.0/14.3 ms
- 0.3 fps slower with the patch (about 2%)
Additional details can be found at:
http://lists.freedesktop.org/archives/mesa-dev/2015-October/098635.html
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
(cherry picked from commit 39b4dfe6ab)
It looks like nir_lower_idiv is going to use it soon, so add support.
With Ilia's change, this fixes one case in fs-op-div-large-uint-uint (with
GL 3.0 forced on).
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit a4bf28178f)
[Emil Velikov: Resolve trivial conflicts]
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Conflicts:
src/gallium/drivers/vc4/vc4_qpu_emit.c
Since 779cabfc7d the same txformat table entries
are used for "normal" texturing as well as for blits. However, I forgot to put
in an entry for the bgrx8 (le) and xrgb8 (be) formats - the normal texturing
path can't hit them because the radeon tex format chooser will never chose
them, but we get that format from the dri buffers (at least I assume we got
it from there).
This is untested but essentially addressing the same bug as for radeon.
(I don't think that the second entry per le/be table is actually necessary,
but shouldn't hurt...)
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit a2611ffe4b)
Since d21320f625 the same txformat table entries
are used for "normal" texturing as well as for blits. However, I forgot to put
in an entry for the bgrx8 (le) and xrgb8 (be) formats - the normal texturing
path can't hit them because the radeon tex format chooser will never chose
them, but we get that format from the dri buffers (at least I assume we got
it from there). This caused lots of piglit regressions (and probably lots of
trouble outside piglit too).
This fixes bug https://bugs.freedesktop.org/show_bug.cgi?id=92900.
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 983614dbed)
Previously GL_FRAMEBUFFER was used. However, if GL_EXT_framebuffer_blit
is supported (note: it is supported by every Mesa driver), this is
*sometimes* an alias for GL_DRAW_FRAMEBUFFER (getters) and *sometimes*
an alias for *both* GL_DRAW_FRAMEBUFFER and GL_READ_FRAMEBUFFER
(setters). As a result, the code saved one binding but modified both.
If the bindings were different, the GL_READ_FRAMEBUFFER would be
incorrect on exit.
Fixes the piglit fbo-generatemipmap-versus-READ_FRAMEBUFFER test.
Ideally this function would use DSA functions and not modify the binding
at all. However, that would be a much more intrusive change because
_mesa_meta_bind_fbo_image would also need to be modified.
_mesa_meta_bind_fbo_image has a lot of callers. Much of this code is
about to get a major rework due to bug #92363, so I don't think it
matters too much. In fact, I discovered this bug while working on the
other bug. Le bon temps!
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit c40a88b6c5)
GLSL 4.00 and GL_ARB_gpu_shader5 introduced a new int -> uint implicit
conversion rule and updated the rules for modulus to use them. (In
earlier languages, none of the implicit conversion rules did anything
relevant, so there was no point in applying them.)
This allows expressions such as:
int foo;
uint bar;
uint mod = foo % bar;
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
(cherry picked from commit 511de1a80c)
Previously, we walked through a given deref_node's copies and, after
lowering the copy away, removed it from both the source and destination
copy sets. This commit changes this to only remove it from the other
node's copy set (not the one we're lowering). At the end of the loop, we
just throw away the copy set for the node we're lowering since that node no
longer has any copies. This has two advantages:
1) It's more efficient because we're doing potentially half as many set
search operations.
2) It now properly handles copies from a node to itself. Perviously, it
would delete the copy from the set when processing the destinatioon and
then assert-fail when we couldn't find it for the source.
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92588
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
(cherry picked from commit 226ba889a0)
The comment in the code details the restriction. Thanks to Ken for having a very
helpful conversation with me, and spotting the blurb in the link I sent him :P.
There are still stability problems for me on GT4, but this definitely helps with
some of the failures.
v2: Comment fixes
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 55314c5be4)
For compressed textures, the image size is not necessarily a multiple of
the block size (e.g. the last mip levels). Section 18.3.2 (Copying
Between Images) of the OpenGL 4.5 Core Profile spec says:
An INVALID_VALUE error is generated if the dimensions of either
subregion exceeds the boundaries of the corresponding image
object, or if the image format is compressed and the dimensions of
the subregion fail to meet the alignment constraints of the
format.
and Section 8.7 (Compressed Texture Images) says:
An INVALID_OPERATION error is generated if any of the following
conditions occurs:
* width is not a multiple of four, and width + xoffset is not
equal to the value of TEXTURE_WIDTH.
* height is not a multiple of four, and height + yoffset is not
equal to the value of TEXTURE_HEIGHT.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92860
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 912babba7b)
[Emil Velikov: resolve trivial conflicts]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Conflicts:
src/mesa/main/copyimage.c
I'm not sure what the caller does is appropriate (just have a NULL sampler
at this slot), but it fixes the immediate crash.
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 5980389bbf)
I was afraid our callers weren't prepared for this, but it looks like
at least for resource creation, mesa/st throws an error appropriately.
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit eb8fb0064d)
This reverts commit 2294f6f311.
It introduces a regression in the following test
piglit.spec.oes_compressed_paletted_texture.basic api
In general this commit is needed to prevent regressions in
GL_KHR_texture_compression_astc_ldr, which... isn't in 11.0
Reported-by: Mark Janes <mark.a.janes@intel.com>
If formats are not the same vlVaPutImage re-creates the video
buffer with the right format. But if the creation of this new
video buffer fails then the surface looses its current buffer.
Let's just destroy the previous buffer on success.
Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
(cherry picked from commit d42029d2d9)
Nominated-by: Emil Velikov <emil.velikov@collabora.co.uk>
We also have the "reserved for kick" space available. Some of my earlier
changes can probably be removed, but this is a quick fix for some of the
rarer fallout.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: <mesa-stable@lists.freedesktop.org>
(cherry picked from commit bb73fc4cb8)
This greatly increases the pressure you can put on the driver before
create fails. Ultimately we need to let the kernel take control of
our cached BOs and just take them from us (and other clients)
directly, but this is a very easy patch for the moment.
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 6d3a24bce8)
Previously, we were passing the shader around, we were just calling it
"mem_ctx". However, the nir_shader is (and must be for the purposes of
mark-and-sweep) the mem_ctx so we might as well pass it around explicitly.
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
(cherry picked from commit b7eeced3c7)
Like other gen8+ hardware, the hardware automatically scales up thread counts.
We must be careful about the URB sizes since GT4 adds another slice.
One of the existing PCI IDs is actually mislabeled as GT3. Arguably this is a
real bug since the URB size will be wrong. Because this patch is simply meant to
add the missing IDs, that will be fixed in a later patch.
v2: No longer relevant.
v3: Update the wm thread count to support GT4. The WM thread count is used to
determine the maximum scratch space required. Currently the code always
allocates the maximum amount even though lower GT SKUs require less. The formula
is threads_per_psd * subslices_per_slice * slices
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
(cherry picked from commit 7cbd6608f5)
Some constants (like 1.0 and 0.5) could be inlined as immediate inputs
without using their literal value. The r600_bytecode_special_constants()
function emulates the negative of these constants by using NEG modifier.
However some shaders define -1.0 constant and want to use it as 1.0.
They do so by using ABS modifier. But r600_bytecode_special_constants()
set NEG in addition to ABS. Since NEG modifier have priority over ABS one,
we get -|1.0| as result, instead of |1.0|.
The patch simply prevents the additional switching of NEG when ABS is set.
[According to Ivan Kalvachev, this bug was fond via
https://github.com/iXit/Mesa-3D/issues/126 and
https://github.com/iXit/Mesa-3D/issues/127]
Signed-off-by: Ivan Kalvachev <ikalvachev@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
CC: <mesa-stable@lists.freedesktop.org>
(cherry picked from commit f75f21a24a)
Without the clamping by NumLevels, the state tracker would reallocate the
texture storage (incorrect) and even fail to copy the base level image
after reallocation, leading to the graphical glitch of
https://bugs.freedesktop.org/show_bug.cgi?id=91993 .
A piglit test has been submitted for review as well (subtest of
arb_texture_storage-texture-storage).
v2: also bypass all calls to st_finalize_texture (suggested by Marek Olšák)
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 24c90888ae)
Consider the case of two nearly identical GLSL fragment shaders:
out vec4 color;
void main() { color = vec4(1); }
and
layout(early_fragment_tests) in;
out vec4 color;
void main() { color = vec4(1); }
These shaders compile to the exact same assembly, but have distinct
values for brw_wm_prog_data::early_fragment_tests.
Since these are two independent GLSL shaders, they have different
program keys - notably, brw_wm_prog_key::program_string_id differs.
When uploading the second, brw_upload_cache will find an existing copy
of the assembly in the cache BO, which means matching_data will be
non-NULL. Although we create a second cache item (with the new key
and prog_data), we set item->offset to the existing copy and avoid
re-uploading duplicate assembly.
However, brw_search_cache() would only flag BRW_NEW_*_PROG_DATA if
item->offset differed from the supplied offset. With reuse, both
programs have the same offset, but prog_data changed. We have to
flag it, but failed to.
To fix this, we simply need to check if the aux (prog_data) pointer
changed. If either the assembly or the prog_data differs, flag it.
This fixes a regression since 1bba29ed40,
where Topi fixed brw_upload_cache() to actually reuse identical
assembly. Prior to that, reuse basically never happened due to bugs.
Unfortunately, this code apparently wasn't prepared to handle reuse!
Fixes GPU hangs in Dolphin on Broadwell.
Huge thanks to Pierre Bourdon and Ilia Mirkin for debugging this
and helping track down the real issue.
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92623
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Tested-by: Pierre Bourdon <delroth@gmail.com>
(cherry picked from commit bf05af3f0e)
Previously we could create a renderbuffer with format
MESA_FORMAT_R8G8B8A8_UNORM, convert that renderbuffer to an EGLImage,
then FAIL to convert the EGLImage back to a renderbuffer because
reasons. Just use the same check in
intel_image_target_renderbuffer_storage that brw_render_target_supported
uses.
There are more checks in brw_render_target_supported, but I don't think
they are necessary here. A different approach would be to refactor
brw_render_target_supported to take rb->Format and rb->NumSamples as
parameters (instead of a gl_renderbuffer) and use the new function here.
Fixes:
ES2-CTS.gtf.GL2ExtensionTests.egl_image.egl_image
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Tested-by: Tapani Pälli <tapani.palli@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92476
Cc: "10.3 10.4 10.5 10.6 11.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 7070c8879a)
The edgeflag comes in as ubyte with glEdgeFlagPointer but as float with
plain immediate glEdgeFlag. Avoid reading bytes that weren't meant for
the edgeflag in the pointer case.
Fixes intermittent failures with gl-2.0-edgeflag piglit (and valgrind
complaints about reading uninitialized memory).
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit e05021ff72)
f16c intrinsic can only be emitted when AVX is used. So when we disable AVX
due to forcing 128bit vectors we must not use this intrinsic (depending on
llvm version, this worked previously because llvm used AVX even when we didn't
tell it to, however I've seen this fail with llvm 3.3 since
718249843b which seems to have the side effect
of disabling avx in llvm albeit it only touches sse flags really, but
with ea421e919a it's now really disabled).
Albeit being able to use AVX with 128bit vectors also would have its uses, the
code as is really was meant to emulate jit code creation for less capable cpus.
v2: add some (ifdefed out) missing de-featuring options for simulating
less capable cpus.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
(cherry picked from commit 711489648b)
Nominated-by: Roland Scheidegger <sroland@vmware.com>
The section for UVD 2 and older was not updated
when HEVC support was added. Reported by Kano
on irc.
v2: integrate the UVD2 and older checks into the
main switch statement.
v3: handle encode checking as well. Encode is
already checked in the top case statement, so
drop encode checks in the lower case statement.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 7b63658125)
The commit base varies greatly between master and 11.0. It seems that
the commit (in it's current form) is not applicable for the branch.
Signed-off-by: Emil Velikov <emil.velikov@collabora.co.uk>
GNU make predefines RM to rm -f but this is not required by POSIX
so ensure that RM is set. This fixes "make clean" on OpenBSD.
v2: use AC_CHECK_PROG
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
CC: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit 99c4079c37)
Patch also refactors name length queries which were using array size
in computation, this has to be done in same time to avoid regression in
arb_program_interface_query-resource-query Piglit test.
Fixes rest of the failures with
ES31-CTS.program_interface_query.no-locations
v2: make additional check only for GS inputs
v3: create helper function for resource name length
so that it gets calculated only in one place
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Martin Peres <martin.peres@linux.intel.com>
(cherry picked from commit c0722be9f5)
intel_update_winsys_renderbuffer_miptree() will release the existing
miptree when wrapping a new DRI2 buffer, so we can remove the early
release and so prevent a NULL mt dereference should importing the new
DRI2 name fail for any reason. (Reusing the old DRI2 name will result
in the rendering going astray, to a stale buffer, and not shown on the
screen, but it allows us to issue a warning and not crash much later in
innocent code.)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86281
Reviewed-by: Martin Peres <martin.peres@linux.intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
(cherry picked from commit 70e91d61fd)
The src_reg constructor that received the glsl_type was using it
only to build the swizzle, but not to fill this->type as dst_reg
is doing.
This caused some type mismatch between movs and alu operations
on the NIR path, so copy propagation optimization was not applied
to remove unneeded movs if negate modifier was involved. This was
first detected on minus (negate+add) operations.
Shader DB results (taking into account only vec4):
total instructions in shared programs: 20019 -> 19934 (-0.42%)
instructions in affected programs: 2918 -> 2833 (-2.91%)
helped: 79
HURT: 0
GAINED: 0
LOST: 0
Reviewed-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit 4de86e1371)
Nominated-by: Christoph Brill <egore911@egore911.de>
opt_register_coalesce stopped to check previous instructions to
coalesce with if somebody else was writing on the same
destination. This can be optimized to check if somebody else was
writing to the same channels of the same destination using the
writemask.
Shader DB results (taking into account only vec4):
total instructions in shared programs: 1781593 -> 1734957 (-2.62%)
instructions in affected programs: 1238390 -> 1191754 (-3.77%)
helped: 12782
HURT: 0
GAINED: 0
LOST: 0
v2: removed some parenthesis, fixed indentation, as suggested by
Matt Turner
v3: added brackets, for consistency, as suggested by Eduardo Lima
Reviewed-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit d4e29af234)
Nominated-by: Jason Ekstrand <jason@jlekstrand.net>
This avoids a serious r600g bug leading to a GPU hang.
The chances this bug will get fixed are pretty low now.
I deeply regret listening to others and not pushing this patch, leaving
other users with a GPU-crashing driver. Yes, it should be fixed
in the compiler and it's ugly, but users couldn't care less about that.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86720
Cc: 11.0 10.6 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
(cherry picked from commit 814f31457e)
Otherwise there are problems when user overrides version and application
such as Piglit wants to detect used api with glGetString(GL_VERSION).
This makes it currently impossible to run glslparsertest tests for
OpenGL ES when using version override.
Below is example when using MESA_GLES_VERSION_OVERRIDE=3.1.
Before:
"3.1 Mesa 11.1.0-devel (git-24a1a15)"
After:
"OpenGL ES 3.1 Mesa 11.1.0-devel (git-78042ff)"
v2: only include api prefix for OpenGL ES (Boyan Ding)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit dc8c221e28)
pipe_surface_reference have problems with deleted contexts,
so use of pipe_surface_release might be more appropriate.
Fixes Wasteland 2 Director's Cut crash on start.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Brian Paul <brianp@vmware.com>
(cherry picked from commit 14f7ce4248)
The variable 'i' is a value in [0, MAT_ATTRIB_MAX-1] so subtracting
VERT_ATTRIB_GENERIC0 gave a bogus value and we executed the default
switch clause for all loop iterations.
This doesn't fix any known issues but was clearly incorrect.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit dd293d8aae)
Fixes:
ES3-CTS.shaders.negative.constant_sequence
spec/glsl-es-3.00/compiler/global-initializer/from-sequence.vert
spec/glsl-es-3.00/compiler/global-initializer/from-sequence.frag
v2: Fix a couple copy-and-paste mistake in the spec quotations.
Suggested by Matt.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 92635a84a7)
This will be used in the next patch to enforce some language sematics.
v2: Fix inverted logic in
ast_function_expression::has_sequence_subexpression. The method
originally had a different name and a different meaning. I fixed the
logic in ast_to_hir.cpp, but I only changed the names in
ast_function.cpp.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com> [v1]
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 05e4601c6b)
v2: Combine this check with the existing const and uniform checks. This
change depends on the previous patch (glsl: Only set
ir_variable::constant_value for const-decorated variables).
Fixes:
ES2-CTS.shaders.negative.initialize
ES3-CTS.shaders.negative.initialize
spec/glsl-es-1.00/compiler/global-initializer/from-attribute.vert
spec/glsl-es-1.00/compiler/global-initializer/from-uniform.vert
spec/glsl-es-1.00/compiler/global-initializer/from-uniform.frag
spec/glsl-es-1.00/compiler/global-initializer/from-global.vert
spec/glsl-es-1.00/compiler/global-initializer/from-global.frag
spec/glsl-es-1.00/compiler/global-initializer/from-varying.frag
spec/glsl-es-3.00/compiler/global-initializer/from-uniform.vert
spec/glsl-es-3.00/compiler/global-initializer/from-uniform.frag
spec/glsl-es-3.00/compiler/global-initializer/from-in.vert
spec/glsl-es-3.00/compiler/global-initializer/from-in.frag
spec/glsl-es-3.00/compiler/global-initializer/from-global.vert
spec/glsl-es-3.00/compiler/global-initializer/from-global.frag
Note: spec/glsl-es-3.00/compiler/global-initializer/from-sequence.*
still fail because the result of a sequence operator is still considered
to be a constant expression.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92304
Reviewed-by: Tapani Pälli <tapani.palli@intel.com> [v1]
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> [v1]
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit bb329f2ff6)
Right now we're also setting for uniforms, and that doesn't seem to hurt
things. The next patch will make general global variables in GLSL ES,
and those definitely should not have constant_value set!
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 3524d6df33)
This is the way layout(binding=xxx) works from GLSL. The old method
just happened to work (and significantly predated support for
layout(binding=xxx)), but future changes will break this.
v2: Remove some stale comments. Suggested by Matt and Chris Forbes.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 8acce5d53a)
In d4a24745 (August 2012), Paul made functions calls not be constant
expressions in GLSL ES 1.00. Since this feature was added in desktop
GLSL 1.20, we believed that it was added in GLSL ES 3.00. That turns
out to be completely wrong. Built-in functions have always been allowed
as constant expressions in GLSL ES, and the patch adds the (many) spec
quotations to prove it.
While we never previously encountered this, a later patch enforces a GLSL
ES 1.00 rule that global variable initializers must be constant
expressions. Without this fix, several dEQP tests fail.
Fixes:
tests/spec/glsl-es-1.00/compiler/const-initializer/from-function.frag
tests/spec/glsl-es-1.00/compiler/const-initializer/from-function.vert
tests/spec/glsl-es-1.00/compiler/const-initializer/from-sequence-in-function.frag
tests/spec/glsl-es-1.00/compiler/const-initializer/from-sequence-in-function.vert
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: "10.0 10.1 10.2 10.3 10.4 10.5 10.6 11.0" <mesa-stable@lists.freedesktop.org>
Yes, I know we don't maintain stable branches that far back, but that
*is* how far back this bug goes!
(cherry picked from commit 43b07eb60f)
Vertex attributes of different categories (constant/per-instance/
per-vertex) go into different buffers for translation, and this is now
properly reflected in the vertex buffers passed to the driver.
Fixes e.g. piglit's point-vertex-id divisor test.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 45ed627d89)
The initial glGetUniformdv support didn't cover all the
casting cases that are apparantly legal, and cts seems to
test for them.
I've updated the piglit test to cover these cases now.
v2: fix indentation - it's all broken in this file (Ilia)
fix src/dst index tracking in light of fp64 support (Ilia)
cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit bcfaab3885)
The call to _mesa_test_texobj_completeness() is unnecessary if the
texture is already known to be complete. If the texture object is
dirtied in the meantime _BaseComplete and _MipmapComplete will be
reset to false. _mesa_is_image_unit_valid() will start to be called
more frequently in a future commit, so it seems desirable to avoid the
unnecessary work.
Tested-by: Ye Tian <yex.tian@intel.com>
CC: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
(cherry picked from commit 25d3338be3)
A future commit will remove all texture object-dependent derived state
from the image unit struct to make validation unnecessary on texture
state changes. Instead of checking gl_image_unit::_Valid drivers will
be required to call this function when needed to find out whether an
image unit is in a valid state and whether access from the shader is
allowed.
Tested-by: Ye Tian <yex.tian@intel.com>
CC: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
(cherry picked from commit 5152db415f)
The hardware documentation relating to the UAV HW-assisted coherency
mechanism and UAV access enable bits is scarce and sometimes
contradictory, and there's quite some guesswork behind this commit, so
let me summarize the background first: HSW and later hardware have
infrastructure to support a stricter form of data coherency between
shader invocations from separate primitives. The mechanism is
controlled by the "Accesses UAV" bits on 3DSTATE_VS, _HS, _DS, _GS and
_PS (or _PS_EXTRA on BDW+), and the "UAV Coherency Required" bit on
the 3DPRIMITIVE command.
Regardless of whether "UAV Coherency Required" is set, the hardware
fixed-function units will increment a per-stage semaphore for each
request received if "Accesses UAV" is set for the same or any lower
stage. An implicit DC flush is emitted by the lowermost stage with
"Accesses UAV" set once it's done processing the request, this also
happens regardless of the value of "UAV Coherency Required". The
completion of the DC flush will cause the same stage and all previous
ones to decrement the semaphore, marking the UAV accesses for the
primitive as coherent with L3.
The "UAV Coherency Required" 3DPRIMITIVE bit will cause a pipeline
stall before any threads are dispatched for the first FF stage with
"Accesses UAV" set until the semaphore is cleared for the same stage.
Effectively this guarantees that UAV memory accesses performed by
previous primitives from any stage will be strictly ordered (and
thanks to the implicit DC flush visible in memory) with UAV accesses
from the following primitives.
None of this is required by the usual image, atomic counter and SSBO
GL APIs which have very relaxed cross-primitive coherency and ordering
requirements, so we don't actually ever set the "UAV Coherency
Required" bit -- Ordering with respect to shader invocations from
previous stages on the same primitive where there is a data dependency
is of course already guaranteed as the spec requires, regardless of
this mechanism being enabled. We do set the "Accesses UAV" bits
though since my commit ac7664e493 (which
this patch partially reverts), mainly because of comments like the
following from the BDW PRM:
> 3DSTATE_GS
>[...]
> 12 Accesses UAV
> Format: Enable
> This field must be set when GS has a UAV access.
There are similar comments in the documentation for the other
3DSTATE_*S commands. The "must" part is misleading and unjustified
AFAIK. Most of the "Accesses UAV" bits don't seem to have any side
effects other than the implicit DC flushes and the related
book-keeping in anticipation for a subsequent primitive with "UAV
Coherency Required" set, so in most cases they are unnecessary and may
incur a performance penalty. There is an exception though. On Gen8+
the PS_EXTRA UAV access bit influences the calculation of the PS
UAV-only and ThreadDispatchEnable signals which on previous
generations were set explicitly by the driver, so we cannot always
avoid enabling it on the PS stage.
The primary motivation for this change is that in fact the hardware
coherency mechanism is buggy and will cause a rather non-deterministic
hang on Gen8 when VS is the only stage with "Accesses UAV" set and the
processing of a request terminates immediately after the implicit DC
flush is sent for a previous primitive with no additional vertices
being emitted for the second primitive, what will cause the hardware
to skip sending a second DC flush and cause the VS to stall
indefinitely waiting for a response from the DC (BDWGFX HSD 1912017).
This hardware bug can be reproduced on current master with the
spec@arb_shader_image_load_store@host-mem-barrier@Indirect/RaW piglit
subtest (if you have the patience to run it a few dozen times).
The proposed workaround is to insert CS STALLs speculatively between
3DPRIMITIVE commands when "Accesses UAV" is enabled for the VS stage
only. Because this would affect one of the hottest paths in the
driver and likely decrease performance even further due to the
unnecessary serialization, and because we don't actually need the
implicit DC flushes, it seems better to just disable them.
Cc: 11.0 <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 5346c11670)
Patch adds missing type (used with NV_read_depth) so that it gets
handled correctly. This fixes errors seen with following CTS test:
ES3-CTS.gtf.GL3Tests.packed_pixels.packed_pixels
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit d8d0e4a81e)
I started seeing a lot of situations on nv30 where fence emission
wouldn't fit into the previous buffer (causing assertions). This ensures
that whenever checking for space, we always leave a bit of extra room
for the fence emission commands. Adjusts the nv30 and nvc0 fence
emission logic to bypass the space checking as well.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 47d11990b2)
Squashed with commit
nouveau: avoid emitting new fences unnecessarily
Right now we emit on every kick, but this is only necessary if something
will ever be able to observe that the fence completed. If there are no
refs, leave the fence alone and emit it another day.
This also happens to work around an issue for the kick handler -- a kick
can be a result of e.g. nouveau_bo_wait or explicit kick, or it can be
due to lack of space in the pushbuf. We want the emit to happen in the
current batch, so we want there to always be enough space. However an
explicit kick could take the reserved space for the implicitly-triggered
kick's fence emission if it happened right after. With the new mechanism,
hopefully there's no way to cause two fences to be emitted into the same
reserved space.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Fixes: 47d11990b (nouveau: make sure there's always room to emit a fence)
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 8053c9208f)
Squashed with commit
nv50,nvc0: don't base decisions on available pushbuf space
We still have to push everything out, might as well kick earlier and
flip pushbufs when we know we'll need it. This resolves some issues with
the new policy of making sure that we always leave a bit of room at the
end for fences.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Fixes: 47d11990b (nouveau: make sure there's always room to emit a fence)
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 9fe458335f)
Squashed with commit
nouveau: avoid double-emitting fence
The act of ensuring that there is space can cause a flush to happen,
which will emit the current screen fence. If that is the fence we're
trying to wait on, then it will have been emitted as a result of doing
the PUSH_SPACE. Don't attempt to emit it a second time.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Fixes: 8053c9208f (nouveau: avoid emitting new fences unnecessarily)
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit bf97f8d467)
This reverts commit 30570b2629.
As mentioned by Ilia Mirkin:
Please remove this one from your list of cherry-picked patches. While
it fixes real issues on nv30 (and probably the other generations too),
it appears to introduce some new ones on nvc0. I've figured out what's
causing it, but haven't figured out a proper fix. Not sure I'll be
able to before you do a release.
The EXT_texture_format_BGRA8888 extension (which mesa supports
unconditionally) adds a new format and internal format called GL_BGRA_EXT.
Previously, this was not really handled at all in
_mesa_ex3_error_check_format_and_type. When the checks were tightened in
commit f15a7f3c, we accidentally tightened things too far and GL_BGRA_EXT
would always cause an error to be thrown.
There were two primary issues here. First, is that
_mesa_es3_effective_internal_format_for_format_and_type didn't handle the
GL_BGRA_EXT format. Second is that it blindly uses _mesa_base_tex_format
which returns GL_RGBA for GL_BGRA_EXT. This commit fixes both of these
issues as well as adds explicit checks that GL_BGRA_EXT is only ever used
with GL_BGRA_EXT and GL_UNSIGNED_BYTE.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92265
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 6ad9ebb073)
I started seeing a lot of situations on nv30 where fence emission
wouldn't fit into the previous buffer (causing assertions). This ensures
that whenever checking for space, we always leave a bit of extra room
for the fence emission commands. Adjusts the nv30 and nvc0 fence
emission logic to bypass the space checking as well.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 47d11990b2)
It seems like things are either coming in slighly wrong, or perhaps
uploaded incorrectly, but either way passing them through the translate
module seems to fix everything. Eventually we should figure out what's
going wrong and fix it "for real", but this should do for now.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 78ec9e28ec)
This puts us in line with what the DDX/DRI2 st are expecting. It also
happens to work... no idea why, but seems better to have it work than to
ask lots of questions.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 1fec05d114)
Drivers and state trackers that use LLVM for generating code, must
register the targets they use with LLVM's global TargetRegistry.
The TargetRegistry is not thread-safe, so all targets must be added
to the registry before it can be queried for target information.
When drivers and state trackers initialize their own targets, they need
a way to force gallivm to initialize its targets at the same time.
Otherwise, there can be a race condition in some multi-threaded
applications (e.g. glx-multihreaded-shader-compile in piglit),
when one thread creates a context for a driver that uses LLVM (e.g.
radeonsi) and another thread creates a gallivm context (glxContextCreate
does this).
The race happens when the driver thread initializes its LLVM targets and
then starts using the registry before the gallivm thread has a chance to
register its targets.
This patch allows users to force gallivm to register its targets by
calling the gallivm_init_llvm_targets() function.
v2:
- Use call_once and remove mutexes and static initializations.
- Replace gallivm_init_llvm_{begin,end}() with
gallivm_init_llvm_targets().
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
CC: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 76cfd6f1da)
Add a macro GL_LIB_NAME to hold the filename that configure comes up with
based on the --with-gl-lib-name and --enable-mangling options.
In driOpenDriver, use the GL_LIB_NAME macro instead of hard-coding
"libGL.so.1".
v2: Add an #ifndef/#define for GL_LIB_NAME so that non-autoconf builds will
work.
v3: Fix the library filename in the Makefile.
Signed-off-by: Kyle Brenneman <kbrenneman@nvidia.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit d35391cfda)
When USE_MGL_NAMESPACE is defined, _glapi_get_stub will check for the "m"
prefix before trying to skip it, so that "glFoo" and "mglFoo" are
equivalent.
This should let it work with all the places where something calls
_glapi_get_proc_offset with a hard-coded name that starts with the normal
"gl" prefix.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=55552
Signed-off-by: Kyle Brenneman <kbrenneman@nvidia.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 798f260a2f)
The old code had some significant problems with respect to
sampler2DArray textures. The biggest problem was that some of the code
would use vec3 for the texture coordinate type, and other parts of the
code would use vec2. The resulting shader would not even compile.
Since there were not tests for this path, nobody noticed.
The input to the fragment shader is always treated as a vec3. If the
source data is only vec2, the vertex puller will supply 0 for the .z
component. The texture coordinate passed to the fragment shader is
always a vec2 that comes from the .xy part of the vertex shader input.
The layer, taken from the .z of the vertex shader input is passed
separately as a flat integer. If the generated fragment shader does not
use the layer integer, the GLSL linker will eliminate all the dead code
in the vertex shader.
Fixes the new piglit tests "blit-scaled samples=2 with
gl_texture_2d_multisample_array", etc. on i965.
Note for stable maintainer: This patch may depend on 46037237, and that
patch should be safe for stable.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 9bd9cf1fa4)
i915 fragment programs utilize the texture coordinate registers
for both texture coordinates and varyings. Unfortunately the
code doesn't check if the same index might be in use for both.
It just naively uses the index to pick a texture unit, which
could lead to collisions.
Add an extra mapping step to allocate non conflicting texture
units for both uses.
The issue can be reproduced with a pair of simple shaders like
these:
attribute vec4 in_mod;
varying vec4 mod;
void main() {
mod = in_mod;
gl_TexCoord[0] = gl_MultiTexCoord0;
gl_Position = gl_ModelViewProjectionMatrix * gl_Vertex;
}
varying vec4 mod;
uniform sampler2D tex;
void main() {
gl_FragColor = texture2D(tex, vec2(gl_TexCoord[0])) * mod;
}
Fixes many piglit tests on i915:
glsl-link-varyings-2
glsl-orangebook-ch06-bump
interpolation-none-gl_frontcolor-smooth-fixed
interpolation-none-gl_frontcolor-smooth-none
interpolation-none-gl_frontcolor-smooth-vertex
interpolation-none-gl_frontsecondarycolor-smooth-fixed
interpolation-none-gl_frontsecondarycolor-smooth-vertex
interpolation-none-gl_frontsecondarycolor-smooth-none
interpolation-none-other-flat-fixed
interpolation-none-other-flat-none
interpolation-none-other-flat-vertex
interpolation-none-other-smooth-fixed
interpolation-none-other-smooth-none
interpolation-none-other-smooth-vertex
v2 [idr]: Minor formatting tweaks.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit c349031c27)
I830_UPLOAD_RASTER_RULES and I830_UPLOAD_TEX(0) are trying to occupy
the same bit. Move the texture bits upwards a bit to make room for
I830_UPLOAD_RASTER_RULES.
Now the driver will actually upload the raster rules which is rather
important to get the provoking vertex right. Fixes the appearance
of glxgears teeth on gen2.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 9504740f3e)
For 8-bit RGB(A) texture formats we set the PIPE_BIND_RENDER_TARGET flag
to try to get a hardware format which also supports rendering (for FBO
textures). Do the same thing for floating point formats.
This allows the Redway3D Flat demo to run.
Cc: 10.6 11.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
(cherry picked from commit cb758b892a)
The bo will often come from a slab in which case it doesn't matter. But
for larger allocations this will be in its own bo, and we have to make
sure to wait until it's no longer used in order for it to be freed.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
Tested-by: Marcin Ślusarz <marcin.slusarz@gmail.com>
(cherry picked from commit 1d8cba9b51)
If there is an unflushed fence on the bo, then the resource may still be
used in commands built up in the local pushbuf. Flushing can cause all
sorts of unwanted effects, so just free the bo when the relevant fence
is hit.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
Tested-by: Marcin Ślusarz <marcin.slusarz@gmail.com>
(cherry picked from commit 3a6b9a7830)
IVB and VLV hang sporadically when an untyped surface read or write
message is used to access a surface of format other than RAW, as may
happen when there is a mismatch between the format qualifier of the
image uniform and the format of the actual image bound to the
pipeline. According to the spec this condition gives undefined
results but may not lead to program termination (which is one of the
possible outcomes of the hang). Fix it by checking at runtime whether
the surface is of the right type.
Fixes the "arb_shader_image_load_store.invalid/format mismatch" piglit
subtest.
Reported-by: Mark Janes <mark.a.janes@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91718
CC: mesa-stable@lists.freedesktop.org
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
(cherry picked from commit b61292296b)
If the immutable compressed texture didn't have the full mip pyramid,
this didn't work, because it tried to generate mip levels for non-existing
levels. _mesa_prepare_mipmap_level() would correctly handle this by returning
FALSE if the mip level didn't exist, however we actually created the
non-existing mip level right before that because we used _mesa_get_tex_image()
before calling _mesa_prepare_mipmap_level(). It would then proceed to crash
(we allocated the mip level, which is a bad idea on an immutable texture,
but didn't initialize the values, leading to assertion failures or segfaults).
Fix this by using _mesa_select_tex_image() instead and call it after
_mesa_prepare_mipmap_level(), as that function will allocate missing mip levels
for non-immutable textures already.
This fixes a (2 year old) crash with astromenace which was hack-fixed in ubuntu
packages instead: http://bugs.debian.org/718680 (I guess most apps do full mip
chains - I believe this app not doing it is actually unintentional, always one
level less than full mip chain...).
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
(cherry picked from commit 19604d30e1)
When validating format+type+internalFormat for texture pixel operations
on GLES3, the effective internal format should be used if the one
specified is an unsized internal format. Page 127, section "3.8 Texturing"
of the GLES 3.0.4 spec says:
"if internalformat is a base internal format, the effective internal
format is a sized internal format that is derived from the format and
type for internal use by the GL. Table 3.12 specifies the mapping of
format and type to effective internal formats. The effective internal
format is used by the GL for purposes such as texture completeness or
type checks for CopyTex* commands. In these cases, the GL is required
to operate as if the effective internal format was used as the
internalformat when specifying the texture data."
v2: Per the spec, Luminance8Alpha8, Luminance8 and Alpha8 should not be
considered sized internal formats. Return the corresponding unsize format
instead.
v4: * Improved comments in
_mesa_es3_effective_internal_format_for_format_and_type().
* Splitted patch to separate chunk about reordering of
error_check_subtexture_dimensions() error check, which is not directly
related with this patch.
v5: Dropped the splitted patch because it was actually a work around 3
dEQP tests that are buggy:
dEQP-GLES2.functional.negative_api.texture.texsubimage2d_neg_offset
dEQP-GLES2.functional.negative_api.texture.texsubimage2d_offset_allowed
dEQP-GLES2.functional.negative_api.texture.texsubimage2d_neg_wdt_hgt
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Tested-by: Mark Janes <mark.a.janes@intel.com>
(cherry picked from commit 5edd9961c1)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91582
This function will be needed as part of validating the combination of format,
type and internal format of texture pixel operations, which happens in
glformats files. Specifically, we want to be able to obtain the base format
of a resolved effective internal format, to compare it with the original
internal format passed.
Also, since this function deals solely with GL formats, it fits better in
glformats where the rest of similar format functionality rests.
The function is moved as-is, without any modification.
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Tested-by: Mark Janes <mark.a.janes@intel.com>
(cherry picked from commit c6bf1cd146)
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Conflicts:
src/mesa/main/teximage.c
src/mesa/main/teximage.h
The more specific GLES constrains should be checked after the general
validation performed by _mesa_error_check_format_and_type(). This is also
for consistency with the error checks order of glTexSubImage ops.
v3: The change of order uncovered a bug that regresses a couple of piglit
tests written against OpenGL-ES 1.1 spec, which expects an INVALID_VALUE
instead of the INVALID_ENUM returned by _mesa_error_check_format_and_type()
when an invalid format is passed to glTexImage2D. This version of the patch
accounts for those cases.
Fixes 1 dEQP test:
* dEQP-GLES3.functional.negative_api.texture.teximage2d
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Tested-by: Mark Janes <mark.a.janes@intel.com>
(cherry picked from commit 15ab968f62)
... with only ARB_shader_atomic_counters.
I expected to see interactions with ARB_tessellation_shader in the
ARB_shader_atomic_counters spec, but they do not exist. It seems that we
should unconditionally expose these variables in the presence of
ARB_shader_atomic_counters:
gl_MaxTessControlAtomicCounters
gl_MaxTessEvaluationAtomicCounters
This partially reverts commit da7adb99e8. The commit also affected
gl_MaxTessControlImageUniforms and gl_MaxTessEvaluationImageUniforms
similarly but the ARB_shader_image_load_store spec does list an
interaction with ARB_tessellation_shader.
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92095
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit d6bb46bbe8)
When we assign hw regs to attributes, we don't incorporate the stride
and subreg_offset from the fs_reg. It's rarely used, but the integer
multiplication lowering uses unusual stride and subreg_offset
combination breaks when one source is an attribute.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91970
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit 2ea16966ae)
The value passed in count previously was "vertex after the last vertex
to be processed." Calling that "count" was misleading and kind of mean.
Looking at the code, many functions immediately do "count-start" to get
back the true count. That's just silly.
If it is better for the loops to be 'for (j = start; j < (start +
count); j++)', GCC will do that transformation.
NOTE: There is some strange formatting left by this patch. That was
done to make it more obvious that the before and after code is
equivalent. These will be fixed in the next patch.
No piglit regressions on i915 (G33) or radeon (Radeon 7500).
v2: Fix a remaining (count-start) in render_quad_strip_verts.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com> [v1]
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit d7bf7969b9)
From section 9.2. Binding and Managing Framebuffer Objects:
"Upon successful return from Get*FramebufferAttachmentParameteriv, if
pname is FRAMEBUFFER_ATTACHMENT_OBJECT_TYPE, then params will contain
one of NONE, FRAMEBUFFER_DEFAULT, TEXTURE, or RENDERBUFFER, identifying
the type of object which contains the attached image."
And then it clarifies further:
"If the value of FRAMEBUFFER_ATTACHMENT_OBJECT_TYPE is NONE, then
either no framebuffer is bound to target; or the default framebuffer is
bound, attachment is DEPTH or STENCIL, and the number of depth or stencil
bits, respectively, is zero"
Currently, if the default framebuffer is bound, we always return
GL_FRAMEBUFFER_DEFAULT for FRAMEBUFFER_ATTACHMENT_OBJECT_TYPE, but
according to the spec, when GL_DEPTH or GL_STENCIL attachments are
the ones being queried, we should return GL_NONE if they don't exist.
Fixes the following dEQP test:
dEQP-GLES3.functional.state_query.fbo.framebuffer_attachment_x_size_initial
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "10.6" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit cf439951b7)
res_ptr already contains the resource values. fmask_ptr needs to be
looked up relative to the start of the resource params.
Note that this only affects indirect loads of MS sampler arrays.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 7d5162bdc0)
OpenGL ES 3.0 spec 3.7.2 "Transfer of Pixel Rectangles" specifies
DEPTH_COMPONENT, UNSIGNED_INT as a valid couple, validation for
internal format is checked by is_float_depth().
Fix regression caused by 81d2fd91a9 in:
ES3-CTS.gtf.GL3Tests.packed_pixels.packed_pixels
Test uses GL_DEPTH_COMPONENT, UNSIGNED_INT only when GL_NV_read_depth
extension is present.
v2: change check in _mesa_error_check_format_and_type to be explicit
for ES 2.0+, desktop OpenGL does not allow this behaviour + uses
this function for both glReadPixels and glDrawPixels validation.
(No Piglit regressions seen with v2.)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> [v1]
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92009
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit afa1efdc85)
When updating texture buffers, we might end up replacing the whole
buffer. Check that the tic address matches the resource address, and if
not, update the tic and reupload it.
This fixes:
arb_direct_state_access-texture-buffer
arb_texture_buffer_object-data-sync
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 323c912506)
Various pieces of code to create compressed textures will first
generate an uncompressed RGBA texture into a temporary buffer,
and then read from that buffer while creating the final compressed
texture in the requested format.
The code reading from the temporary buffer assumes the buffer is
formatted as an array of bytes in RGBA order. However, the buffer
is filled using a _mesa_texstore call with MESA_FORMAT_R8G8B8A8_UNORM
format -- this is defined as an array of *integers* holding the
RGBA values in packed format (least-significant to most-significant).
This means incorrect bytes are accessed on big-endian systems.
This patch fixes this by using the MESA_FORMAT_A8B8G8R8_UNORM format
instead on big-endian systems when filling the buffer. This fixes
about 100 piglit test case failures on s390x for me.
Signed-off-by: Ulrich Weigand <ulrich.weigand@de.ibm.com>
Tested-by: Oded Gabbay <oded.gabbay@gmail.com>
Cc: "10.6" "11.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@gmail.com>
(cherry picked from commit bd016a2601)
Even though luminance formats don't have alpha, we still want the alpha
output to go to the blender. This fixes the luminance blending tests.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 545a3cbb01)
At the moment if a gbm buffer is imported and the gbm buffer
has an old-style GBM_BO_FORMAT format, the import will crash,
since it's passed directly to DRI functions that expect
a fourcc format (as provided by the newer GBM_FORMAT
definitions)
This commit addresses the problem in two ways:
1) it prevents invalid formats from leading to a crash by
returning EINVAL if the image couldn't be created
2) it translates GBM_BO_FORMAT formats into the comparable
GBM_FORMAT formats.
Reference: https://bugzilla.gnome.org/show_bug.cgi?id=753531
CC: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
(cherry picked from commit 4bf151e662)
If the register types do not match and the instruction
that contains the final destination is saturated, register
coalescing generated non-equivalent code.
This did not happen when using IR because types usually
matched, but it is visible in nir-vec4.
For example,
mov vgrf7:D vgrf2:D
mov.sat m4:F vgrf7:F
is coalesced to:
mov.sat m4:D vgrf2:D
The patch prevents coalescing in such scenario, unless the
instruction we want to coalesce into is a MOV (without type
conversion implied). In that case, the patch sets the register
types to the type of the final destination.
Shader-db results in HSW (only vec4 instructions shown):
total instructions in shared programs: 1754415 -> 1754416 (0.00%)
instructions in affected programs: 74 -> 75 (1.35%)
helped: 0
HURT: 1
GAINED: 0
LOST: 0
Only one extra instruction in one of the shaders, that comes from
eliminating a saturation error by preventing register coalesce.
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
(cherry picked from commit 79f1a7ae28)
As of a10d4937, we would really like things associated with an instruction
to be allocated out of that instruction and not out of the shader. In
particular, you should be passing the instruction that will ultimately be
holding the source into nir_src_copy rather than an arbitrary memory
context.
We also change the prototypes of nir_dest_copy and nir_alu_src/dest_copy to
explicitly take an instruction so we catch this earlier in the future.
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
(cherry picked from commit 8c8fc5f833)
When parsing an variable declaration qualified with the typename
keyword, clang attempted to declare a variable with the type of non
type member "enum type type" of module::argument (within the header
file clover/core/module.hpp) instead of the typed member of
module::argument "enum type".
Replaced "typename" with "enum" to force clang to declare the variable
marg_type with type "enum type" of module::argument.
CC: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Albert Freeman <albertwdfreeman@gmail.com>
(cherry picked from commit 1691ead1b8)
Our old value of 16384 is the minimum value. DirectX apparently
requires 65536 at a minimum; that's also what nVidia and the Intel
Windows driver advertise. AMD advertises MAX_INT.
Ilia Mirkin noticed that "Shadow Warrior" uses UBOs larger than 16k
on Nouveau, which advertises 65536 bytes for this limit. Traces
captured on Nouveau don't work on i965 because our lower limit causes
the GLSL linker to reject the captured shaders. While this isn't
important in and of itself, it does suggest that raising the limit
would be beneficial.
We can read linear buffers up to 2^27 bytes in size, so raising this
should be safe; we could probably even go larger. For now, matching
nVidia and Intel/Windows seems like a good plan.
We have to reinitialize MaxCombinedUniformComponents as core Mesa will
have set it based on a stale value for MaxUniformBlockSize.
According to Tapani, there's an unreleased game that asserts on this.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit bf58a2c362)
It is advantageous to use r63 instead of r127 since r63 can fit into the
shorter encoding. However if we've RA'd over 63 registers, we must use
r127 as the replacement instead.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 641eda0c79)
Unfortunately nv50_ir phi nodes aren't directly connected to the CFG, so
the mapping between source and the actual BB is by inbound edge order.
So when manipulating edges one has to be extremely careful. We were
insufficiently careful when splitting critical edges which resulted in
the phi nodes being confused as to where their sources were coming from.
This primarily manifests itself with the TXL-lowering logic on nv50,
when it is inside of a conditional. I've been unable to trigger the
issue anywhere else so far. This resolves rendering failures
in a number of games like Two Worlds 2, Trine: Enchanted Edition, Trine 2,
XCOM:Enemy Unknown, Stacking. It also improves the situation in
Hearthstone, Sonic Generations, and The Raven: Legacy of a Master Thief.
However more work needs to be done there (splitting a lot more edges
solves it, so it's some other sort of RA-related issue).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90887
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit a072ef8748)
CB updates to bound buffers need to go through the CB_DATA endpoints,
otherwise the shader may not notice that the updates happened.
Furthermore, these updates have to go in to the same address as the
bound buffer, otherwise, again, the shader may not notice updates.
So we keep track of all the places where a constbuf is bound, and
iterate over all of them when updating data. If a binding is found that
encompasses the region to be updated, then we use the settings of that
binding for the upload. Otherwise we upload as a regular data update.
This fixes piglit 'arb_uniform_buffer_object-rendering offset' as well
as blurriness in Witcher2.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91890
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit e50c01d5af)
Some modern apps try to use msaa without keeping in mind the
restrictions on videomem of older cards. Resulting in dmesg saying:
[ 1197.850642] nouveau E[soffice.bin[3785]] fail ttm_validate
[ 1197.850648] nouveau E[soffice.bin[3785]] validating bo list
[ 1197.850654] nouveau E[soffice.bin[3785]] validate: -12
Because we are running out of video memory, after which the program
using the msaa visual freezes, and eventually the entire system freezes.
To work around this we do not allow msaa visauls by default and allow
the user to override this via NV30_MAX_MSAA.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
[imirkin: move env var lookup to screen so that it's only done once]
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 3e9df0e3af)
We do not have a generic blitter on nv3x cards, so we must use the
sifm object for color resolving.
This commit divides the sources and dest surfaces in to tiles which
match the constraints of the sifm object, so that color resolving
will work properly on nv3x cards.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit ac066bf65c)
Otherwise the android build fails with
error : unable to find string literal operator ‘operator"" PRIx64’
There are several resources referring to the problem, which is related
to c++11, in our case used when building mesa for lollipop.
http://comments.gmane.org/gmane.comp.graphics.opensg.user/5883
I've not investigated all the semantics, some people even suggested a
bug in the gcc compiler,
I just saw the building error was solved with one little space for
lollipop and no side effect when c+11 not used.
v2: [Emil Velikov] add an alternative commit message from Mauro.
Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit e838d91b94)
There are a few bits this commit aims to resolve:
One can generalise the mkdir rule to a simple MKDIR_P $(@D) which will
expand appropriately for even if we change the subdir name, and/or add
new rules. We can also drop the explicit $(srcdir) prefix for the
dependency rules, they they are not strictly required, nor used
elsewhere in mesa.
Finally replace $< with explicit filename to be consistent through the
file, and honour PYTHON_FLAGS.
v2: Add comprehensive commit summary/message (Ian, Matt)
Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit 0d39279448)
Rather than folding one variable within the other only to unwrap them,
just use the ones we need.
v2: bring back LOCAL_PATH prefix for nir_constant_expressions,h
Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com> (v1)
(cherry picked from commit a3b05e0492)
The glsl equivalent of "mesa: automake: rework the source generation
rules". Plus let's make things consistent and always explicitly provide
the header name.
v2: Rebase on top of reverted "remove custom AM_V_LEX/YACC" (Matt)
Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit 9e0594418d)
Same logic as previous commit applies.
Additionally remove the odd (set -e/mv/INDENT) from the rules.
The last one is the only one we remotely care about, if reading the
generated sources.
Upcoming work from DylanB which will replace the existing python
scripts with ones that produce more readable output anyway.
Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit fd913f47b7)
A handful of changes/cleanups paving the way to bmake support:
- Remove optional $(srcdir)/ prefix for files in the prereq list.
- Drop the space after the AM_V_GEN variable.
- Using $< in a non-suffix rule is a GNU make idiom.
- Use $(@D) over $(dir $@). The latter is a POSIX standard.
v2: Cosmetic tweaks in the commit summary.
Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com> (v1)
(cherry picked from commit d65bd7a7be)
This is the only place in mesa that uses this constuct which seems
to be GNUmake-ism. Attempting to build with POSIX make implementations
(bmake) would fail as below.
--- options.h ---
LOCALEDIR := .
sh: line 2: LOCALEDIR: command not found
*** [options.h] Error code 127
So let's keep things consistent and compatible by making the variable
non target specific.
v2:
- Bring back LOCALEDIR.
- Reword the commit message
- Change mesa-stable tag 10.6 > 11.0
Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Cc: Jonathan Gray <jsg@jsg.id.au>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit c8984a7a46)
Since 7a32652231
r600: Turn 'r600_shader_key' struct into union
we were accessing key fields that might be aliased in the union
with other fields, so we should check what shader type we are
compiling for before using key values from it.
v1.1: make it compile
v2: have caffeine, make it work - we don't set type
until later, so don't reference it until we've set it.
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 6d2ceb10cd)
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Conflicts:
src/gallium/drivers/r600/r600_shader.c
Indications are that if the colormask indicates a single bit set on
fermi, that value will always be read from $r0 instead of a potentially
higher register (if e.g. green is set). Not to upset the counting logic,
always set the header up with a full color mask for each RT. Such a
situation can basically only ever happen with generated blit shaders.
Fixes the following piglit on Fermi (Kepler is unaffected):
fbo-stencil blit GL_DEPTH32F_STENCIL8
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 39df725f73)
The sifm object has a limit of 1024x1024 for its input size and 2048x2048
for its output. The code checking this was trying to be clever resulting
in it seeing a surface of e.g 1024x256 being outside of the input size
limit.
This commit fixes this.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 87073c69f3)
Nothing in the spec allows for the reduced precision, and this also
fixes st_QuerySamplesForFormat for nv50, which does not allow MS8 on
RGBA32F. Now this will be respected instead of reporting MS8 as
supported with an assumption that the format used will be RGBA16F.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit e40f32d562)
The hardware only generates vertexid when vertices come from a VBO. This
fixes:
vertexid-drawelements
vertexid-drawarrays
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit c830d193db)
The stride was being set to 0, which is illegal (and also non-sensical).
Also we must wait for the buffer to become available for reading as
otherwise a wrong value may be prefetched. Since we must wait for the
buffer anyways, and it's mapped and in GART, we may as well avoid the
annoyance of the indirect pushbuf submit.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 75e34d1df8)
round(val*dscale) produces a double result, as val and dscale are double.
However, LLVMConstInt receives unsigned long long, so there is an
implicit conversion from double to unsigned long long.
This is an undefined behavior. Therefore, we need to first explicitly
convert the round result to long long, and then let the compiler handle
conversion from that to unsigned long long.
This bug manifests itself in POWER, where all IMM values of -1 are being
converted to 0 implicitly, causing a wrong LLVM IR output.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
CC: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
(cherry picked from commit 4f2290d161)
Note this is not ideal. Since the sifm can only do source sizes upto
1024x1024 we end up using the blitter on nv4x, which is not that fast.
And on nv3x we end up using the cpu which is really slow.
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
(cherry picked from commit 3c6c4d4f29)
Scanout buffers on nv30 must always be non-swizzled and have special
width alignment constraints.
These constrains have been taken from the xf86-video-nouveau
src/nv_accel_common.c: nouveau_allocate_surface() function.
nouveau_allocate_surface() applies these width constraints only when a
tiled attribute is set, which it sets for all surfaces allocated via
dri, and this "tiling" is not the same as swizzling, scanout surfaces
must be linear / have a uniform_pitch or only complete garbage is shown.
This commit fixes dri3 on nv30 showing a garbled display, with dri3 the
scanout buffers are allocated by mesa, rather then by the ddx, and the
wrong stride of these buffers was causing the garbled display.
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
(cherry picked from commit 3329703eb1)
The tiled memcpy fast paths perform a simple blit (with only a couple of
trivial pixel conversion routines) and do not accommodate PixelTransfer
operations. Therefore if any are set, fallback to the regular routines.
Note that PixelTransfer only applies to TexImage and ReadPixels, not to
GetTexImage.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jason Ekstrand <jason.ekstrand@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 099f5b3a62)
commit 472ef9a02f introduced code to
change the types of SEL and MOV instructions for moves that simply
"copy bits around". It didn't account for type conversion moves,
however. So it would happily turn this:
mov(8) vgrf6:D, -vgrf5:D
mov(8) vgrf7:F, vgrf6:UD
into this:
mov(8) vgrf6:D, -vgrf5:D
mov(8) vgrf7:D, -vgrf5:D
which erroneously drops the conversion to float.
Cc: "11.0 10.6" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit 2ace64fd59)
This must be done before exporting a buffer as dmabuf fds, because
we lose track of who is using it and can't trust the reference counter.
Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 35d0f12797)
In various versions of OpenGL and GLSL, it's possible to declare
multiple VS input variables with aliasing attribute locations.
So, when computing the storage requirements for vertex attributes,
we can't simply add up the sizes. Instead, we need to look at the
enabled slots.
This patch begins tracking which attributes are double types that
are larger than 128-bits (i.e. take up two vec4 slots). We then
count normal attributes once, and count the double-size attributes
a second time.
Fixes deQP functional.attribute_location.bind_aliasing.max_cond_* tests
on i965, which regressed with commit ad208d975a.
No Piglit changes on llvmpipe (which actually supports dvecs).
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Tested-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit c3294ca5a1)
Previously we would allow glUniformMatrix4fv on a dmat4 and
glUniformMatrix4dv on a mat4. Both are illegal. That later also
overwrites the storage for the mat4 and causes bad things to happen.
Should fix the (new) arb_gpu_shader_fp64-wrong-type-setter piglit test.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Cc: Dave Airlie <airlied@redhat.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 7237c937af)
The lowered code reads from the destination, which isn't possible from
message registers.
Fixes the following dEQP tests on SNB:
dEQP-GLES3.functional.shaders.precision.int.highp_mul_fragment
dEQP-GLES3.functional.shaders.precision.int.mediump_mul_fragment
dEQP-GLES3.functional.shaders.precision.int.lowp_mul_fragment
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Tested-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
(cherry picked from commit 9390cb8459)
The CTS packed_pixels test checks that readpixels doesn't write
into the space between rows, however we fail that here unless
we check the format and stride match.
This fixes all the core mesa problems with CTS packed_pixels
tests.
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 32769ac016)
The fastpath currently checks the RowLength != width, but
if you have a RowLength of 7, and Alignment of 4, then
that shouldn't match.
align the rowlength to the pack alignment before comparing.
This fixes compressed cases in CTS packed_pixels_pixelstore
test when SKIP_PIXELS is enabled, which causes row length
to get set.
v1.1: add fxt1 fix (Iago)
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit b4a70401f5)
We don't need to use the 3d image address here as that will
include SKIP_IMAGES, and we are only blitting a single
2D anyways, so just use the 2D path.
This fixes some memory overruns under CTS
packed_pixels.packed_pixels_pixelstore when PACK_SKIP_IMAGES
is used.
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 6a3e1fb958)
GL3.3 added GL_ARB_texture_rgb10_a2ui, which specifies
a lot more things than just rgb10/a2ui.
While playing with ogl conform one of the tests must
attempted all valid formats for GL3.3 and hits the
unreachable here.
This adds the first chunk of formats that hit the
assert.
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 8185a02316)
In a number of places the SwapBytes handling didn't handle cases with
GL_(UN)PACK_ALIGNMENT set and 7 byte width cases aligned to 8 bytes.
This adds a common routine to swap bytes a 2D image and uses this
code in:
texture storage
texture get
readpixels
swrast drawpixels.
[airlied: updated with Brian's nitpicks].
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 0ad3a475ef)
Fixes regression from
commit 8c17d53823
Author: Kenneth Graunke <kenneth@whitecape.org>
Date: Wed Apr 15 03:04:33 2015 -0700
i965: Make intel_emit_linear_blit handle Gen8+ alignment restrictions.
which adjusted the coordinates to be relative to the nearest cacheline.
However, this then offsets the coordinates by up to 63 and this may then
cause them to overflow the BLT limits. For the well aligned large
transfer case, we can use 32bpp pixels and so reduce the coordinates by
4 (versus the current 8bpp pixels). We also have to be more careful
doing the last line just in case it may exceed the coordinate limit.
Reported-and-tested-by: kaillasse91@hotmail.fr
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90734
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Ian Romanick <ian.d.romanick@intel.com>
Cc: Anuj Phogat <anuj.phogat@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
(cherry picked from commit d38a560106)
I've been chasing a geom shader hang on rv635 since I wrote
r600 geom code, and finally I hacked some values from fglrx
in and I could run texelfetch without failures.
This is totally my fault as well, maths fail 101.
This makes geom shaders on r600 not fail heavily.
Cc: "10.6" "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 0de53ccc8c)
As Glenn did for finalize_loop we need to update_cf when we
add a POP at the end of a shader.
I think this fixes one of the earlier shader going off end
of memory problems we've stopped.
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Cc: "10.6" "11.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 3063913f77)
The hardware is capable of dealing with GL1-style user clip planes.
No clip vertex, no clip distances. Fixes a number of ucp tests, as well
as neverball.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 58e24b4761)
This adds index queries (glGet*i_v) for GL_TEXTURE_BINDING_* and
GL_SAMPLER_BINDING, as well as textue queries
(glGetTex{,ture}Parameter*) for GL_TEXTURE_TARGET.
CC: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
(cherry picked from commit 5aaaaebf22)
Shaders that contain instruction data after an instruction with EOP could end
up parsing that as an instruction, leading to various crashes and asserts in
SB as it gets very confused if it sees for instance a loop start instruction
jumping off to some random point.
Add a couple of asserts, and print EOP bit if set in old asm printer.
Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit a830225adb)
Cube maps are special in that they have separate teximages for each
face. We handled that by copying the data to them separately, but in
case zoffset != 0 or depth != 6 we would read off the end of the client
array or modify the wrong images.
zoffset/depth have already been verified by the time the code gets to
this stage, so no need to double-check.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 2259b11100)
The split_virtual_grfs code doesn't properly rewrite reladdr so we need to
make sure that any uniform indirects are lowered away first.
This fixes the glsl-fs-uniform-indexed-by-swizzled-vec4.shader_test in piglit
Cc: "10.6" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit fee0c5af11)
Conflicts:
src/mesa/drivers/dri/i965/brw_fs.cpp
This works if drivers upsample on upload (like all radeon ones do).
The alternative is an unexpected GL error from anything calling
_mesa_update_state and possibly other issues.
Cc: 10.6 11.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit f432ae899f)
GetTexImage can read to stencil8 but only from
a stencil or depthstencil textures.
This fixes a bunch of failures in CTS
GL33-CTS.gtf32.GL3Tests.packed_pixels
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit c1452983b4)
On the older platforms where we don't have logical contexts preserving
state across batches, we emit the invariant state setup on every batch
using the brw_invariant_state atom. This includes the pipeline selection
which is cached with the introduction of
commit 0e0e23ef53
Author: Jordan Justen <jordan.l.justen@intel.com>
Date: Wed Apr 22 11:43:50 2015 -0700
i965/state: Emit pipeline select when changing pipelines
However, we do not reset the cache between batches on context-less
platforms resulting in us not setting the pipeline selection and can
cause GPU hangs if a media pipelined was loaded in the meantime (e.g.
mixing mplayer/gstreamer using libva and gnome-shell). A simple solution
is to just forcibly re-emit the pipeline select along with the invariant
state and reset the cache at that point.
Reported-and-tested-by: Tomasz C. <tomaszc@o2.pl>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91254
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 4e5752e2b7)
This reverts commit 567394112d.
It regressed performance. It looks like smaller IBs are better, because
the GPU goes idle quicker and there is less waiting for buffers and fences.
Cc: 11.0 <mesa-stable@lists.freedesktop.org>
(cherry picked from commit a83c36b5c0)
When the edge flag element is enabled then the elements are slightly
reordered so that the edge flag is always the last one. This was
confusing the code to upload the 3DSTATE_VF_INSTANCING state because
that is uploaded with a separate loop which has an instruction for
each element. The indices used in these instructions weren't taking
into account the reordering so the state would be incorrect.
v2: Use nr_elements instead of brw->vb.nr_enabled so that it will cope
when gl_VertexID is used.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91292
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Tested-by: Mark Janes <mark.a.janes@intel.com>
(cherry picked from commit 3a1ab23480)
The edge flag data on Gen6+ is passed through the fixed function hardware as
an extra attribute. According to the PRM it must be the last valid
VERTEX_ELEMENT structure. However if the vertex ID is also used then another
extra element is added to source the VID. This made it so the vertex ID is in
the wrong register in the vertex shader and the edge attribute is no longer in
the last element.
v2: Also implement for BDW+
v3 [by Ben]: Remove 10.5 tag. Too late.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84677
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Tested-by: Ben Widawsky <ben@bwidawsk.net>
Tested-by: Mark Janes <mark.a.janes@intel.com>
(cherry picked from commit fb02b4ec48)
The shader-cache isn't finished, so the configure checks are a bit
premature and will only stand to confuse users of Mesa 11.0.
This is a squash of the follow four reverts:
Revert "Rename sha1.c and sha1.h to mesa-sha1.c and mesa-sha1.h"
Revert "configure: Add machinery for --enable-shader-cache (and --disable-shader-cache)"
Revert "sha1: Fix gcry_md_hd_t typo."
Revert "mesa: Add mesa SHA-1 functions"
Reviewed-by: Carl Worth <cworth@cworth.org>
The build/file was removed with an earlier commit while the EXTRA_DIST
was forgotten.
Fixes: 66d77cd71c (scons: don't build the kms-dri winsys)
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
The files are not referenced in any other place in whole of
mesa. They are likely remnants of the early development stage.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
vc4 conflicts with ilo, when build on x86 as it's build for emulation
purposes. In that mode a i965-like symbol is exported by vc4, which
conflicts with the ilo one in the gallium-dri megadriver.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
The nv_conditional_render piglits were sporadically failing. Moving
the control flush from the write and placing it just before the read
was sufficient to make the piglits pass a 1000/1000 times. The bspec
says that the flush enable bit "waits until all previous writes of
immediate data from post sync circles are complete before executing the
next command" - the operative word being previous!
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90691
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Neil Roberts <neil@linux.intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I switched us to tracking whether the results *could* go to r4, but then
didn't make a separate register class for the class bits that included r4.
Switch the "any" class to actually be "any", and name the "any but r4"
class more appropriately.
total instructions in shared programs: 96798 -> 94680 (-2.19%)
instructions in affected programs: 62736 -> 60618 (-3.38%)
We had several reports of users hitting bugs
with the other path to upload constants,
and switching to the user constant buffer
path solves the bugs.
User constant buffers are expected to be slower
for Nvidia cards, so ideally this patch should be
reverted when the path is fixed.
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Krzysztof Sobiecki <sobkas@gmail.com>
It is very common for d3d9 apps to set again the constants
they need before every draw call, even if nothing changed.
Since we are mostly gpu bound, it is better to check
for change, and upload constants again (and thus use
gpu bandwith) only if the constants changed.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
The number of texture stages is 8.
'tex_stage' array was too big, and thus
the checks with 'Elements(state->ff.tex_stage)' were passing,
causing some invalid API calls to pass, and crash because of
out of bounds write since bumpmap_vars was just the correct size.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
The CSO cache unbinds views that are not needed anymore,
which we don't do.
It checks for change before committing the views.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
nine_update_state called every draw call.
This patch attemps to change the order
of the checks to have better control flow
Signed-off-by: Axel Davy <axel.davy@ens.fr>
There were flags all sm3 cards do advertise,
and we weren't.
Some games can trigger buggy rendering path
if the caps are not what they expect.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Always use a user constant buffer for ff.
It means we have to:
. commit the user constant buffer for ff when we use it
. commit back the non-ff constant buffer when we stop using it
Signed-off-by: Axel Davy <axel.davy@ens.fr>
We have two paths:
. One that uses a fixed constant buffer, and updates it when needed
. One that uses a user constant buffer, and uploads it when needed.
This patch separates the preparation of the constant buffer
and the commit.
It also removes NineDevice9_RestoreNonCSOState, which was
used to restore all states. Instead the commit of the constant
buffer is moved to nine_state, and the other field settings
moved to other functions where more appropriate.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
For now the path updated is only used by Amd drivers, but a later
patch will make it used by all drivers. Some drivers like llvmpipe
doesn't support the uploading of constants from user buffers, so improve
the path to work for all drivers
Inspired from the gl state tracker.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Instead of mixing state preparation (filling pipe_****)
and state commit (pipe->set_*****),
begin doing so in two separate functions.
This will allow to implement efficient Stateblocks,
and eventually lead to optimisation where the complete
pipe_*** structure is only partially updated.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Set all values to 0 after allocation. Found using valgrind.
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
In case NineBaseTexture9_ctor returns an error
This->surfaces[l] might be NULL.
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Allow more than two errors, and return D3DERR_INVALIDCALL
for failed display resolution changes.
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Tests showed, that in case of errors, the pBits pointer is set to NULL.
The pBits field isn't set to NULL in case of an already locked object.
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Return 0 for non MANAGED textures and surfaces.
Fixes failing wine d3d9 tests device.c test_resource_priority.
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Move the assert to return error codes in the correct order.
Always set the pSizeOfData to the required buffer size.
Fixes failing wine test device.c test_private_data()
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
This fixes wine test device.c test_lockrect_invalid()
Mimic WindowsXp behaviour and allow negative values in the rectangle passed.
Add comment to point out behaviour used.
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
For the case of D3DPOOL_MANAGED textures, This->base.resource can be NULL
at the start of the function. In This case, UploadSelf will take care
of the defining. Assign resource after the UploadSelf call
to prevent NULL pointer exception.
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
IT is better check if the surface was created with RT flag,
instead of checking capability (llvmpipe was complaining)
Signed-off-by: Axel Davy <axel.davy@ens.fr>
EvictManagedResources is used by apps to free
the gpu memory of MANAGED textures (which have
a cpu memory backing)
Signed-off-by: Axel Davy <axel.davy@ens.fr>
UpdateTexture is supposed to optimise by uploading only for the
dirty region of the source (d3d9 doc, wine tests).
This patch adds the behaviour for surfaces, but not entirely for
volumes.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Dirty regions should be tracked for both MANAGED
and SYSTEMMEM.
Until now we didn't bother to track for SYSTEMMEM,
because we hadn't implemented using the dirty region
to avoid some copies
Signed-off-by: Axel Davy <axel.davy@ens.fr>
If the texture is bound and dirty_mip is true,
BASETEX_REGISTER_UPDATE adds the texture to the list
of things to update before the next draw call.
Some calls to it were missing.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
NineSurface9_CopySurface was supporting more cases than what
we needed, and doing checks that were innapropriate for
some NineSurface9_CopySurface use cases.
This patch splits it into two for the two use cases, and moves
the checks to the caller.
This patch also adds a few checks to NineDevice9_UpdateSurface
Signed-off-by: Axel Davy <axel.davy@ens.fr>
For sw cursor we do not tell wine the cursor position (the app
tells us directly). We shouldn't use ID3DPresent_GetCursorPos.
device->cursor.pos already contains the coordinates the app
gave us.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: David Heidelberg <david@ixit.cz>
We have either hardware cursor or software cursor.
When we use software cursor, we should hide the hardware
cursor.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: David Heidelberg <david@ixit.cz>
D3DRS_DITHERENABLE was assigned to the rasterizer state
group, but it was used for the blend group.
Assign it to the blend group.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
When using D3DRS_POINTSIZE make sure the value is at least
D3DRS_POINTSIZE_MIN but not greater than D3DRS_POINTSIZE_MAX.
Fixes some Wine tests.
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Align texture memory on 32 byte boundry to allow
SSE/AVX memcpy to work on locked rects.
This fixes some crashes with games using SSE.
Reviewed-by: David Heidelberg <david@ixit.cz>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
We had red and green in the wrong channels
for the ATI2 format (RGTC2).
Found thanks to wine tests.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: David Heidelberg <david@ixit.cz>
Add support for multiple cards and fill in Win
like card name, driver name and version info.
Use fallback for unknown vendors and unknown card names.
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Nine code uses some C11 features, and this
leads to compile error on gcc <= 4.5
Another way would have been to use the
-fms-extensions CFLAG
Signed-off-by: David Heidelberg <david@ixit.cz>
Cc: "10.4 10.5 10.6" <mesa-stable@lists.freedesktop.org>
This got missed because the piglit test only tested int images to avoid a
combinatiorial explosion of format, targets, stages and sizes which
takes more than 5 minutes to test on nvidia's driver.
This patch also drops the IMAGE_FUNCTION_AVAIL_ATOMIC which is not applicable
to the image_size codepath but was not hurting in any way.
Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
There is no MDOperand in llvm 3.5.
v2: Check if kernel metadata is present to avoid crash (EdB).
v3: Second attempt to avoid crash: switch off metadata query for llvm < 3.6.
Reviewed-by: Serge Martin (EdB) <edb+mesa@sigluy.net>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
We're generating rcps as part of backend lowering of the packed coordinate
in the CS, and we don't want to lower them in NIR because of the extra
newton-raphson steps in the common case. However, GLB2.7 is moving a
vertex attribute with a 1.0 W component to the position, and that makes us
produce some silly RCPs.
total instructions in shared programs: 97590 -> 97580 (-0.01%)
instructions in affected programs: 74 -> 64 (-13.51%)
I had QPU emit code to do it, but forgot to flag the register class.
total instructions in shared programs: 97974 -> 97590 (-0.39%)
instructions in affected programs: 25291 -> 24907 (-1.52%)
Now that we do non-SSA QIR instructions, we can take a NIR SSA src that's
only used by the unorm packing and just stuff the pack bits into it.
total instructions in shared programs: 98136 -> 97974 (-0.17%)
instructions in affected programs: 4149 -> 3987 (-3.90%)
This helps ensure that the register allocator doesn't force the later pack
operations to insert extra MOVs.
total instructions in shared programs: 98170 -> 98159 (-0.01%)
instructions in affected programs: 2134 -> 2123 (-0.52%)
Now that we have NIR, most of the optimization we still need to do is
peepholes on instruction selection rather than general dataflow
operations. This means we want to be able to have QIR be a lot closer to
the actual QPU instructions, just with virtual registers. Allowing
multiple instructions writing the same register opens up a lot of
possibilities.
The constant folding pass can take a long time to complete
so rather than running through the entire pass each time
a new constant is propagated (and vice versa) interleave them.
This change helps ES31-CTS.arrays_of_arrays.InteractionFunctionCalls1
go from around 2 min -> 23 sec.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Due to a quirk in how the nv50 opt passes run, the algebraic
optimization that looks for these BFE's happens before the constant
folding pass. Rearranging these passes isn't a great idea, but this is
easy enough to fix. Allows a following cvt to eliminate the bfe in
certain situations.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
See issue from the ARB_texture_query_lod spec for LOD vs Lod confusion:
(3) The core specification uses the "Lod" spelling, not "LOD". Should
this extension be modified to use "Lod"?
RESOLVED: The "Lod" spelling is the correct spelling for the core
specification and the preferred spelling for use. However, use of
"LOD" also exists, as the extension predated the core specification,
so this extension won't remove use of "LOD".
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
This reverts commit ffe6c6ad5f.
_mesa_format_num_components() does not include the padding bits in mesa formats
containing 'X' channels. This could cause mipmap generation for certain
uncompressed formats to underestimate the number of channels in the source
image by 1.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
FLT_TO_INT goes in the vector pipes on evergreen/NI,
not the trans unit as on earlier chips.
Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This struct was getting a bit crowded, following the lead of
radeonsi, mirror the idea of having sub-structures for each
shader type. Turning 'r600_shader_key' into an union saves
some trivial memory and CPU cycles for the shader keys.
[airlied: drop as_ls, and reorder so larger fields at start.]
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This happens with unpackSnorm lowering. There's yet another
bitfield-extract behind it, but there's too much variation to be worth
cutting through.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
unpackUnorm* lowering doesn't AND the high byte/word as it's
unnecessary. Detect that situation as well.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Some Unigine shaders have been observed to unpack bytes out of 32-bit
integers and convert them to floats. I2F/I2I can handle this sort of
thing directly. Detect the handleable situations.
This misses 16-bit word capabilities in nv50, but I haven't seen shaders
that would actually make use of that.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Some shaders appear to extract bits using shift/and combos. Detect
(some) of those and convert to EXTBF instead.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
If build with C++11 standard, use std::unordered_set.
Otherwise if build on old Android version with stlport,
use std::tr1::unordered_set with a wrapper class.
Otherwise use std::tr1::unordered_set.
Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
I pushed a half-baked version of "i965: handle nir_intrinsic_image_size" by
accident. Not having the Reviewed-by: tags on the last two commits should
have been a red flag but I somehow missed it after the QA check.
This patch should fix image-size for non-int images. I will add support to
the piglit test for all the other image types.
Sorry for the noise.
Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
v2, Review from Francisco Jerez:
- avoid the camelCase for the booleans
- init the booleans using the sampler type
- force the initialization of all the components of the output register
v3:
- Rename a variable from CubeMapArray to CubeArray to re-use GLSL's name (Ilia)
- Fix some indentation and drop parenthesis (Topi)
- Fix a signed/unsigned comparaison warning
Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
v2, review from Francisco Jerez:
- make the destination variable as large as what the nir instrinsic
defines (4) instead of the size of the return variable of glsl. This
is still safe for the already existing code because all the intrinsics
affected returned the same amount of components as expected by glsl IR.
In the case of image_size, it is not possible to do so because the
returned number of component depends on the image type and this case
is not well handled by nir.
v3:
- Style fix
Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
The code is heavily inspired from Francisco Jerez's code supporting the
image_load_store extension.
Backends willing to support this builtin should handle
__intrinsic_image_size.
v2: Based on the review of Ilia Mirkin
- Enable the extension for GLES 3.1
- Fix indentation
- Fix the return type (float to int, number of components for CubeImages)
- Add a warning related to GLES 3.1
v3: Based on the review of Francisco Jerez
- Refactor the code to share both add_image_function and _image with the other
image-related functions
v4: Based on Topi Pohjolainen's comments
- Do not add parenthesis for the return value
v5: based on Francisco Jerez's comments:
- Fix a few indent issues
- Reduce the size of a condition by testing the dimension and array properties
instead of enumerating all the formats.
Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
I'll mark the OES_shader_image_atomic extension entry with this tag to
make sure that we don't expose it on earlier GLES API versions
accidentally, because according to the extension:
"OpenGL ES 3.1 and GLSL ES 3.10 are required."
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
This includes the minimum required desktop/ES GLSL version in the
format qualifier table in anticipation of new GLSL versions extending
the set of supported image formats. According to section 4.4.7 of the
GLSL ES 3.1 spec:
"The format layout qualifier identifiers for image variable
declarations are:
[...]
rgba32f
rgba16f
r32f
rgba8
rgba8_snorm
[...]
rgba32i
rgba16i
rgba8i
r32i
[...]
rgba32ui
rgba16ui
rgba8ui
r32ui"
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
These seem to have been re-added at some point during the
ARB_tessellation_shader implementation work. AFAICT the second
(correct) definition of each constant would have had no effect because
the symbols were already defined.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
These are a subset of the image types supported by desktop GL,
excluding 1D, 1D array, rectangle, buffer, cube array, 2D MS and 2D
MS array texture targets.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
From the GLSL ES 3.1 spec, section 4.7.3:
"Any floating point, integer, opaque type declaration can have the
type preceded by one of these precision qualifiers: [...] highp
[...], mediump [...], lowp [...]."
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Note that this is slightly more permissive than the spec language
requires: "Any image variable must specify a format layout qualifier."
The GLSL ES spec seems really sketchy regarding format layout
qualifiers on function formal parameters -- On the one hand they are
required, but on the other hand it doesn't provide any syntax to
specify them (see section 6.1.1), they don't participate in parameter
type matching for overload resolution, and are in fact explictly
forbidden ("Layout qualifiers cannot be used on formal function
parameters"). Of course none of the image built-in functions defined
by the spec specify format layout qualifiers (and they probably
couldn't sensibly), to contradict its own requirement.
This probably qualifies for a spec bug, but in the meantime do the
sensible thing and require layout qualifiers on uniforms *only*.
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Support for binding an image to an image unit explicitly in the shader
source is required by both GLSL 4.2 and GLSL ES 3.1, but not by the
original ARB_shader_image_load_store extension.
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
The GLES 3.1 spec removed support for updating the image unit bound to
an image uniform using glUniform1i() calls.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
There is no GL_R8 image format in GLES, according to the state table
20.32 of the GLES 3.1 spec the default value should be GL_R32UI. The
ES31-CTS.shader_image_load_store.basic-api-bind Khronos conformance
test checks that this is the case.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The ES31-CTS.shader_image_load_store.basic-api-bind conformance test
expects the whole image unit state to be reset when the bound texture
object is deleted. The ARB_shader_image_load_store extension is
rather vague regarding what should happen with image unit state other
than the texture object in that case, but the GL 4.2 and GLES 3.1
specifications (section "Automatic Unbinding of Deleted Objects")
explicitly require it to be reset to the default values.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The spec requires that all layers of the image starting from the 0-th
are bound to the image unit regardless of the Layer parameter when
Layered is true, so I was setting gl_image_unit::Layer to zero in that
case for the convenience of the driver back-end. However the
ES31-CTS.shader_image_load_store.basic-api-bind conformance test
checks that the layer value returned by glGetInteger is the same that
was originally specified, regardless of the value of layered. Rename
Layer to _Layer as is usual for other derived state and keep track of
the original layer value as gl_image_unit::Layer.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The name of both the GLSL built-in variable and the glGetInteger param
with the same value changed in GLSL ES 3.1 and GL 4.5. Its semantics
also changed slightly, since the limit now also takes into account the
number of SSBs in use. Switch our internal data structures to the
up-to-date name.
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Combine the adjacent cases which have the same GL type in the switch statemnt.
Reviewed-by: Chad Versace <chad.versace@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Add the classes of compressed formats as layouts. This allows the detection
of compressed formats belonging to a certain category of compressed formats.
v2. simplify layout name construction (Ilia).
Reviewed-by: Chad Versace <chad.versace@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
GL_ARB_compute_shader is limited for GLSL version 430.
This enables for GLSL ES version 310.
V2: Updated error string to also include GLSL 3.10
Signed-off-by: Marta Lofstedt <marta.lofstedt@linux.intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
According to Open GL ES 3.1 specification, section 8.10.2
GL_IMAGE_FORMAT_COMPATIBILITY_TYPE should be supported by
glGetTexParameterfv.
Signed-off-by: Marta Lofstedt <marta.lofstedt@linux.intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
This support should be removed in favor of something that actually works
in all the weird cases. However this is simple and is enough to allow
Bioshock Infinite to render properly on nvc0.
Since the functionality is not implemented correctly, the extension will
not appear in the extension string and mesa will still return
INVALID_OPERATION for any glCopyImageSubData calls. In order to make use
of this functionality, run with
MESA_EXTENSION_OVERRIDE=GL_ARB_copy_image
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Patch separates array samplers from the texture_multisample check so that we
can enable only [iu]sampler2DMS, [iu]sampler2DMSArray are not supported.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
v2: code cleanup
v3: check only dimensions, samples is checked separately later
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
This is done so that following patch can use it to verify dimensions
for multisample variants of glTex*Storage.
v2: move function to header, use bool instead GLboolean
v3: small changes, cleanup
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
When gl_VertexID or gl_InstanceID is used a 3DSTATE_VF_SGVS
instruction is sent to create a sort of element to store the generated
values. The last instruction in this chunk of code looks like it was
trying to set the instancing state for the element using the
3DSTATE_VF_INSTANCING instruction. However it was sending
brw->vb.nr_buffers instead of the element index. This instruction is
supposed to take an element index and that is how it is used further
down in the function so the previous code looks wrong. Perhaps
previously the number of buffers coincidentally matched the number of
enabled elements so the value was generally correct anyway. In a
subsequent patch I want to change a bit how it chooses the SGVS
element index so this needs to be fixed.
v2 [by Ben]
Remove stable 10.5 stable tag (it's too late now)
Commit update as follows:
The number of vertex buffers emitted is always <= the number of vertex elements.
To maximize reuse (actually, to minimize relocations - according to the code
comments), a vertex buffer is only emitted once, even when we setup multiple
components (3DSTATE_VERTEX_ELEMENT) from that buffer. This meant that the
previous code would use the wrong indexed element for these reuse cases. This
patch by itself prevents hangs on BSW in the linked bug. It doesn't make the
test pass, the remaining patches are needed for that.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91610
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Tested-by: Mark Janes <mark.a.janes@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Since i965 is now using make_reg_conflicts_transitive and doesn't need
q-value computations, they are disabled on i965. They are enabled
everywhere else so that they get the old behavior. This reduces the time
spent in eglInitialize() on BDW by around 10-15%.
Reviewed-by: Eric Anholt <eric@anholt.net>
Instead of adding transitive conflicts as we go, we now add regular
conflicts and them make them all transitive at the end. This reduces
screen creation time substantially on BDW. The time spent in eglInitialize
is reduced from 27.78 ms/call to 9.92 ms/call in debug mode and from 13.15
ms/call to 4.54 ms/call in release mode (about 65% in either case).
Reviewed-by: Eric Anholt <eric@anholt.net>
They're used by glsl_to_nir.cpp, and I want to use them in TGSI-to-NIR as
well (our use of the var->index slot to store slot properties no longer
works since it got truncated).
The *_MAX defines are left in mtypes.h, because they depend on config.h.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
This code was split out into a separate function to be used also
by GL_EXT_separate_shader_objects which has since been removed from
Mesa, so move it back.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
To properly support the case of waiting on a fence with a 0 timeout, we
still need to call down to the kernel. Which requires the use of the
new fd_pipe_wait_timeout() API.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Don't take current timestamp/fence from current ring, as we might have
already rolled over to new rb.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
The positive and negative value of a float can only
be equal to each other if it is -0.0f and 0.0f.
This is safe for Nan and Inf, as -Nan != Nan, and -Inf != Inf
This gives no changes in my shader-db
Signed-off-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
-NaN != NaN, and -Inf != Inf, so this should be safe.
Found while working on my VRP pass.
Shader-db results on my IVB:
total instructions in shared programs: 1698267 -> 1698067 (-0.01%)
instructions in affected programs: 15785 -> 15585 (-1.27%)
helped: 36
HURT: 0
GAINED: 0
LOST: 0
Some shaders was found to have the following pattern in NIR:
vec1 ssa_26 = fneg ssa_21
vec1 ssa_27 = fne ssa_21, ssa_26
Make that:
vec1 ssa_27 = fne ssa_21, 0.0f
This is found in Dota2 and Brutal Legend.
One shader is cut by 8%, from 323 -> 296 instructons in SIMD8
Signed-off-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
mesa/src/mesa/drivers/dri/i965/gen7_sol_state.c: In function 'gen7_upload_3dstate_so_decl_list':
mesa/src/mesa/drivers/dri/i965/gen7_sol_state.c:119:22: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (int i = 0; i < linked_xfb_info->NumOutputs; i++) {
^
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
mesa/src/mesa/drivers/dri/i965/gen6_vs_state.c: In function 'gen6_upload_push_constants':
mesa/src/mesa/drivers/dri/i965/gen6_vs_state.c:85:21: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (i = 0; i < prog_data->nr_params; i++) {
^
mesa/src/mesa/drivers/dri/i965/gen6_vs_state.c:92:17: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (i = 0; i < prog_data->nr_params; i++) {
^
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
mesa/src/mesa/drivers/dri/i965/brw_vs_surface_state.c: In function 'brw_upload_pull_constants':
mesa/src/mesa/drivers/dri/i965/brw_vs_surface_state.c:84:18: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (i = 0; i < prog_data->nr_pull_params; i++) {
^
mesa/src/mesa/drivers/dri/i965/brw_vs_surface_state.c:89:21: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (i = 0; i < ALIGN(prog_data->nr_pull_params, 4) / 4; i++) {
^
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
mesa/src/mesa/drivers/dri/i965/brw_wm_surface_state.c: In function 'brw_upload_abo_surfaces':
mesa/src/mesa/drivers/dri/i965/brw_wm_surface_state.c:961:22: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (int i = 0; i < prog->NumAtomicBuffers; i++) {
^
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
mesa/src/mesa/drivers/dri/i965/brw_wm_surface_state.c: In function 'brw_upload_ubo_surfaces':
mesa/src/mesa/drivers/dri/i965/brw_wm_surface_state.c:901:22: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (int i = 0; i < shader->NumUniformBlocks; i++) {
^
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
mesa/src/mesa/drivers/dri/i965/brw_tex_layout.c: In function 'brw_miptree_layout_texture_array':
mesa/src/mesa/drivers/dri/i965/brw_tex_layout.c:560:25: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (int q = 0; q < mt->level[level].depth; q++) {
^
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
mesa/src/mesa/drivers/dri/i965/brw_state_cache.c: In function 'brw_try_upload_using_copy':
mesa/src/mesa/drivers/dri/i965/brw_state_cache.c:216:18: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (i = 0; i < cache->size; i++) {
^
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
mesa/src/mesa/drivers/dri/i965/brw_primitive_restart.c: In function 'can_cut_index_handle_prims':
mesa/src/mesa/drivers/dri/i965/brw_primitive_restart.c:94:22: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (int i = 0; i < nr_prims; i++) {
^
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
mesa/src/mesa/drivers/dri/i965/brw_draw_upload.c: In function 'brw_prepare_vertices':
mesa/src/mesa/drivers/dri/i965/brw_draw_upload.c:434:22: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (i = j = 0; i < brw->vb.nr_enabled; i++) {
^
mesa/src/mesa/drivers/dri/i965/brw_draw_upload.c:557:17: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (i = 0; i < nr_uploads; i++) {
^
mesa/src/mesa/drivers/dri/i965/brw_draw_upload.c:569:18: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (i = 0; i < nr_uploads; i++) {
^
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
mesa/src/mesa/drivers/dri/i965/brw_draw.c: In function 'brw_draw_destroy':
mesa/src/mesa/drivers/dri/i965/brw_draw.c:630:18: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (i = 0; i < brw->vb.nr_buffers; i++) {
^
mesa/src/mesa/drivers/dri/i965/brw_draw.c:636:18: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (i = 0; i < brw->vb.nr_enabled; i++) {
^
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
mesa/src/egl/drivers/dri2/platform_drm.c: In function 'release_buffer':
mesa/src/egl/drivers/dri2/platform_drm.c:73:18: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (i = 0; i < ARRAY_SIZE(dri2_surf->color_buffers); i++) {
^
mesa/src/egl/drivers/dri2/platform_drm.c: In function 'has_free_buffers':
mesa/src/egl/drivers/dri2/platform_drm.c:87:18: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (i = 0; i < ARRAY_SIZE(dri2_surf->color_buffers); i++)
^
mesa/src/egl/drivers/dri2/platform_drm.c: In function 'dri2_drm_destroy_surface':
mesa/src/egl/drivers/dri2/platform_drm.c:199:18: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (i = 0; i < ARRAY_SIZE(dri2_surf->color_buffers); i++) {
^
mesa/src/egl/drivers/dri2/platform_drm.c: In function 'get_back_bo':
mesa/src/egl/drivers/dri2/platform_drm.c:224:21: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (i = 0; i < ARRAY_SIZE(dri2_surf->color_buffers); i++) {
^
mesa/src/egl/drivers/dri2/platform_drm.c: In function 'dri2_drm_swap_buffers':
mesa/src/egl/drivers/dri2/platform_drm.c:425:24: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (i = 0; i < ARRAY_SIZE(dri2_surf->color_buffers); i++)
^
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
mesa/src/gbm/main/backend.c: In function 'find_backend':
mesa/src/gbm/main/backend.c:70:18: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (i = 0; i < ARRAY_SIZE(backends); ++i) {
^
mesa/src/gbm/main/backend.c: In function '_gbm_create_device':
mesa/src/gbm/main/backend.c:95:18: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (i = 0; i < ARRAY_SIZE(backends) && dev == NULL; ++i) {
^
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
mesa/src/glx/dri_common_query_renderer.c: In function 'dri2_convert_glx_query_renderer_attribs':
mesa/src/glx/dri_common_query_renderer.c:61:18: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (i = 0; i < ARRAY_SIZE(query_renderer_map); i++)
^
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
mesa/src/glx/dri_common.c: In function 'scalarEqual':
mesa/src/glx/dri_common.c:259:18: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (i = 0; i < ARRAY_SIZE(attribMap); i++)
^
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
mesa/src/mesa/drivers/dri/i965/intel_screen.c: In function 'intel_screen_make_configs':
mesa/src/mesa/drivers/dri/i965/intel_screen.c:1222:22: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (int i = 0; i < ARRAY_SIZE(formats); i++) {
^
mesa/src/mesa/drivers/dri/i965/intel_screen.c:1259:22: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (int i = 0; i < ARRAY_SIZE(formats); i++) {
^
mesa/src/mesa/drivers/dri/i965/intel_screen.c:1291:22: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (int i = 0; i < ARRAY_SIZE(formats); i++) {
^
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
mesa/src/mesa/drivers/dri/i965/intel_fbo.c: In function 'intel_validate_framebuffer':
mesa/src/mesa/drivers/dri/i965/intel_fbo.c:734:18: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (i = 0; i < ARRAY_SIZE(fb->Attachment); i++) {
^
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
mesa/src/mesa/drivers/dri/common/utils.c: In function 'driGetConfigAttrib':
mesa/src/mesa/drivers/dri/common/utils.c:457:19: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (i = 0; i < ARRAY_SIZE(attribMap); i++)
^
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
mesa/src/mesa/drivers/dri/i965/intel_screen.c: In function 'aub_dump_bmp':
mesa/src/mesa/drivers/dri/i965/intel_screen.c:125:22: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (int i = 0; i < fb->_NumColorDrawBuffers; i++) {
^
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
mesa/src/mesa/drivers/dri/i965/intel_fbo.c: In function 'intel_blit_framebuffer_with_blitter':
mesa/src/mesa/drivers/dri/i965/intel_fbo.c:836:21: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (i = 0; i < drawFb->_NumColorDrawBuffers; i++) {
^
V2 (Thomas Helland):
-Use unsigned instead of GLuint (trivial)
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
mesa/src/mesa/drivers/dri/i965/brw_wm_state.c: In function 'brw_color_buffer_write_enabled':
mesa/src/mesa/drivers/dri/i965/brw_wm_state.c:53:18: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (i = 0; i < ctx->DrawBuffer->_NumColorDrawBuffers; i++) {
^
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
mesa/src/mesa/drivers/dri/i965/brw_draw.c: In function 'brw_postdraw_set_buffers_need_resolve':
mesa/src/mesa/drivers/dri/i965/brw_draw.c:390:22: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (int i = 0; i < fb->_NumColorDrawBuffers; i++) {
^
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
In the DRI2 path this event is magically synthesized from the
corresponding DRI2 event, but with Present, the server sends us the
event itself. The DRI2 path fills in the serial number, send_event, and
display fields of the XEvent struct that the app sees, but the Present
path did not.
This is likely related to a class of crashes seen in gtk/clutter apps:
https://bugzilla.redhat.com/attachment.cgi?id=1032631
Note that the crashing instruction is looking up the lock_fns slot in
the Display *, and %rdi (holding the Display *) is 0x1.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Tessellation control shaders write to outputs as OUT[ADDR[0].x][0], make
sure to parse the indirect dimension on outputs.
Also tess control inputs/outputs and tess eval input declarations need
to receive the same treatment as geometry shader inputs.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
The previous patch replaces the other use case.
V2: remove the validation from it old location.
Cc: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
The GL 4.5 spec says its an GL_INVALID_VALUE error if samples equals 0 for
glTexImage*Multisample and an GL_INVALID_VALUE error if samples < 1 for
glTexStorage*Multisample.
The spec says its undefined what happens if glTexImage*Multisample is passed
a samples value < 0 but we currently already produced a GL_INVALID_VALUE error
in this case, this is also consistent with the Nvidia binary.
Cc: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
If an immediate is written to multiple channels, we can load it in a
single writemasked MOV.
total instructions in shared programs: 6285144 -> 6261991 (-0.37%)
instructions in affected programs: 718991 -> 695838 (-3.22%)
helped: 5762
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This only appears in cubemaps which have have packed layers, so are very
sensitive to any layout disagreement between sw and hw.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
The hardware checks for multisampling being enabled, but does not have
the rule about cbuf0 being an integer format. Only enable
alpha-to-one/alpha-to-coverage if cbuf0 is not an integer format.
Fixes piglits
ext_framebuffer_multisample-int-draw-buffers-alpha-to-one
ext_framebuffer_multisample-int-draw-buffers-alpha-to-coverage
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
According to OpenGL version 4.5 and OpenGL ES 3.1 standards, section 7.3:
GL_INVALID_VALUE should be generated, if count is less than 0.
V2: Changed title, eased Open GL ES 3.1 restriction and added comments.
Signed-off-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
According to OpenGL specification version 4.5 table 23.46
and OpenGL ES specification version 3.1 table 20.31:
ATOMIC_COUNTER_BUFFER_START and ATOMIC_COUNTER_BUFFER_SIZE
should have the initial value of zero.
Signed-off-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Erik Faye-Lund <kusmabite@gmail.com>
With non-dsa functions we need to do target error checking before
_mesa_get_current_tex_object which would just call _mesa_problem without
raising GL_INVALID_ENUM error. In other places of Mesa, target gets checked
before this call.
Fixes failures in:
ES31-CTS.texture_storage_multisample.APIGLGetTexLevelParameterifv.*
v2: do the target check also for dsa functions (Timothy)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
GK110/GK208 have 256 registers, not 64. Find out the number of registers
from the target to avoid unnecessary iteration for pre-GK110.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
The function glMemoryBarrierByRegion is part of OpenGL ES 3.1
and OpenGL 4.5 core and compatibility profiles.
Signed-off-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
There are separate line widths for smooth and aliased lines. The smooth
one is selected when multisampling is enabled even if line smoothing
isn't explicitly turned on.
Fixes the ext_framebuffer_multisample-line-smooth piglits
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Apparently this is necessary in order for tess factors to work in a tess
eval program without a tess control program bound. Probably because it
uses the fake program's shader header to work out the number of patch
constants.
Fixes vs-tes-tessinner-tessouter-inputs
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
There's a lot of functionality duplicated in the gm107 lowering pass
from the nvc0 pass. As that one gets updated, the gm107 one falls
behind. Avoid this by sharing the code.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
This fixes arb_get_texture_sub_image-get, and any situation where the 2d
engine was being used for multi-layer blits to a non-0 level.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.6" <mesa-stable@lists.freedesktop.org>
Do the same as in st_TexSubImage. This fixes
arb_get_texture_sub_image-get on llvmpipe when it is set to prefer
blits, and nouveau when it uses the 3d engine for blits.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The address calculations are all different (e.g. see GP), there appear
to be sync's in programs, and probably a bunch of other differences.
Just disable it for now.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
NIR instruction count results on i965:
total instructions in shared programs: 1261954 -> 1261937 (-0.00%)
instructions in affected programs: 455 -> 438 (-3.74%)
One in yofrankie, two in tropics. Apparently i965 had also optimized all
of these out anyway.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
There are so many flags in textures, that the CSE pass would have a hard
time referencing the correct set when figuring out if two texture ops are
the same. By zeroing, we can avoid that fragility.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
In order to move more of our lowering into NIR, we need the ability to
reference various pipeline state (like texture rectangle scaling factors
or blend colors), so we just set those up as a load_uniform with a big
offset to indicate that it's not within the shader's uniform storage and
is one of our state values.
Avoids regressions in vc4 when trying to do our blending in NIR.
v2: Add the other unpack ops I meant to when writing the original commit
message.
Reviewed-by: Matt Turner <mattst88@gmail.com>
We may find a cause to do more undef optimization in the future, but for
now this fixes up things after if flattening. vc4 was handling this
internally most of the time, but a GLB2.7 shader that did a conditional
discard and assign gl_FragColor in the else was still emitting some extra
code.
total instructions in shared programs: 100809 -> 100795 (-0.01%)
instructions in affected programs: 37 -> 23 (-37.84%)
v2: Use nir_instr_rewrite_src() to update def/use on src[0] (by Thomas
Helland).
v3: Make sure to flag metadata dirties, and copy the swizzle and abs/neg
over to src[0], too (by anholt).
Reviewed-by: Thomas Helland <thomashelland90@gmail.com> (v2)
Tested-by: Thomas Helland <thomashelland90@gmail.com> (v2)
VCE dual instances are encoding in parallel, it needs two frames for
encoding with their own parameters in one IB. Master instance will check
the task info to find another frame, assign it to the slave instance
Signed-off-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
The config task has own task ID, extract the configuration functions
into config task.
v2 (chk): calculate offset automatically
Signed-off-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
v2: -make tonga use new h264 performance HW decoder;
-integrate it scaling buffer to msg_fb buffer
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
v2: (leo) add checking for driver backend
v3: (leo) change variable name from use_amdgpu to use_vm
v4: rebase by Marek
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
v2: (leo) add checking for driver backend
v3: (leo) change variable name from use_amdgpu to use_vm
v4: rebase by Marek
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
v2: incorporate comments from Marek
v3: add missing fiji case in winsys init
use tonga raster config (double check this)
v4: rebase on harvest patch
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v3)
Reviewed-by: Christian König <christian.koenig@amd.com> (v3)
Reviewed-by: David Zhang <david1.zhang@amd.com> (v3)
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Properly calculate the PA_SC_RASTER_CONFIG[_1] settings
for harvest chips.
v2: - fix default raster config settings for CZ and KV
- Suggestions from Michel
v3: - handle multiple packers properly for CI+
- GRBM_GFX_INDEX is privileged on VI+
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> (v2)
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Useful for debugging hangs with the read-register interface.
I checked that this adds the same register fields as the kernel driver.
Acked-by: Michel Dänzer <michel.daenzer@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
This is an internal project that Catalyst uses and now open source will do
too.
v2: squashed these commits in:
- winsys/amdgpu: fix warnings in addrlib
- winsys/amdgpu: set PIPE_CONFIG and NUM_BANKS in tiling_flags
v2: - lots of changes according to Emil Velikov's comments
- implemented radeon_winsys::read_registers
v3: - a lot of new work, many of them adapt to libdrm interface changes
Squashed patches:
winsys/amdgpu: implement radeon_winsys context support
winsys/amdgpu: add reference counting for contexts
winsys/amdgpu: add userptr support
winsys/amdgpu: allocate IBs like normal buffers
winsys/amdgpu: add IBs to the buffer list, adapt to interface changes
winsys/amdgpu: don't use KMS handles as reloc hash keys
winsys/amdgpu: sync buffer accesses to different rings
winsys/amdgpu: use dependencies instead of waiting for last fence v2
gallium/radeon: unify buffer_wait and buffer_is_busy in the winsys interface (amdgpu part)
winsys/amdgpu: track fences per ring and be thread-safe
winsys/amdgpu: simplify waiting on a variable in amdgpu_fence_wait
gallium/radeon: allow the winsys to choose the IB size (amdgpu part)
winsys/amdgpu: switch to new amdgpu_cs_query_fence_status interface
winsys/amdgpu: handle fence and dependencies merge
winsys/amdgpu follow libdrm change to move user fence into UMD
winsys/amdgpu: use amdgpu_bo_va_op for va map/unmap v2
winsys/amdgpu: use the new tiling flags
winsys/amdgpu: switch to new GTT_USWC definition
winsys/amdgpu: expose amdgpu_cs_query_reset_state to drivers
winsys/amdgpu: fix valgrind warnings
winsys/amdgpu: don't use VRAM with APUs that don't have much of it
winsys/amdgpu: require LLVM 3.6.1 for VI because of bug fixes there
winsys/amdgpu: remove amdgpu_winsys::num_cpus
winsys/amdgpu: align BO size to page size
winsys/amdgpu: reduce BO cache timeout
winsys/amdgpu: remove useless flushing and waiting in amdgpu_bo_set_tiling
winsys/amdgpu: use amdgpu_device_handle as a unique device ID instead of fd
winsys/amdgpu: use safer access to amdgpu_fence_wait::signalled
winsys/amdgpu: allow maximum IB size of 4 MB
winsys/amdgpu: add ip_instance into amdgpu_fence
gallium/radeon: add RING_COMPUTE instead of RADEON_FLUSH_COMPUTE
winsys/amdgpu: set the ring type at CS initilization
winsys/amdgpu: query the GART page size from the kernel
winsys/amdgpu: correctly wait for shared buffers to become idle
winsys/amdgpu: set the amdgpu_cs_fence structure only once at fence creation
winsys/amdgpu: add a specific error message for cs_submit -> -ENOMEM
winsys/amdgpu: check num_active_ioctls before calling amdgpu_bo_wait_for_idle
winsys/amdgpu: clear user fence BO after allocating it
winsys/amdgpu: fix user fences
winsys/amdgpu: make amdgpu_winsys_create public
winsys/amdgpu: remove thread offloading
winsys/amdgpu: flatten the amdgpu_cs_context structure and simplify more
v4: require libdrm 2.4.63
zMin and zMax can't use _DepthMaxF, because the test is done in Z32_UNORM.
Probably a useless patch given how popular swrast is nowadays, but it helped
create and validate the piglit test.
v2: add an explicit cast to GLuint
Reviewed-by: Brian Paul <brianp@vmware.com>
v2: Add compute mode flag to sampler state setup (Marek).
Drop branches which avoid reference counting (Marek).
Simplify unset branch condition (Marek).
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
raw_svector_ostream::flush() is now unnecessary and forbidden:
CXX llvm/libclllvm_la-invocation.lo
../../../../../src/gallium/state_trackers/clover/llvm/invocation.cpp: In function 'clover::module {anonymous}::build_module_llvm(llvm::Module*, unsigned int (&)[7])':
../../../../../src/gallium/state_trackers/clover/llvm/invocation.cpp:574:29: error: use of deleted function 'void llvm::raw_svector_ostream::flush()'
bitcode_ostream.flush();
^
In file included from /home/daenzer/src/llvm-git/llvm/include/clang/Basic/VirtualFileSystem.h:22:0,
from /home/daenzer/src/llvm-git/llvm/include/clang/Basic/FileManager.h:20,
from /home/daenzer/src/llvm-git/llvm/include/clang/Basic/SourceManager.h:38,
from /home/daenzer/src/llvm-git/llvm/include/clang/Frontend/CompilerInstance.h:16,
from ../../../../../src/gallium/state_trackers/clover/llvm/invocation.cpp:25:
/home/daenzer/src/llvm-git/llvm/include/llvm/Support/raw_ostream.h:512:8: note: declared here
void flush() = delete;
^
Makefile:862: recipe for target 'llvm/libclllvm_la-invocation.lo' failed
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
We do not want bug reports from this early stepping of SKL. Few if any were ever
shipped outside of Intel to early enabling partners, and none will be sold.
There is a functional change here. If you're using new mesa on an old
kernel/libdrm, the revid will be -1, and we'll use new SKL values instead of
early ones (a hopefully irrelevant improvement IMO).
v2: Remove hunk which warned before dying. Instead, default to normal SKL
support (Ken)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
The EGL 1.4 spec states for eglCreateContext:
"attribute EGL_CONTEXT_CLIENT_VERSION is only valid when the current
rendering API is EGL_OPENGL_ES_API"
Additionally, if the EGL_KHR_create_context EGL extension is supported
(this is mandatory in EGL 1.5) then the EGL_CONTEXT_MAJOR_VERSION_KHR,
which is an alias for EGL_CONTEXT_CLIENT_VERSION, and
EGL_CONTEXT_MINOR_VERSION_KHR attributes are also accepted by
eglCreateContext with the extension spec stating:
"The values for attributes EGL_CONTEXT_MAJOR_VERSION_KHR and
EGL_CONTEXT_MINOR_VERSION_KHR specify the requested client API
version. They are only meaningful for OpenGL and OpenGL ES
contexts, and specifying them for other types of contexts will
generate an error."
Add the necessary checks against the extension and rendering APIs when
validating these attributes as part of eglCreateContext.
Signed-off-by: Frank Binns <frank.binns@imgtec.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
[Emil Velikov: Add newline before the spec quote (Matt)]
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
When a buffer is provided to eglGetConfigs it's supposed to set the value
of the num_config parameter to the total number of configs that have been
copied into this buffer. For some reason the EGL spec doesn't consider it
to be an error to pass this function a buffer while specifying its size to
be less than 0. Given this, one would expect this combination to result in
the num_config parameter being set to 0 but this wasn't the case. This was
due to the buffer size being copied straight into num_configs without being
clamped to 0.
This was causing the following dEQP EGL test to fail:
dEQP-EGL.functional.query_config.get_configs.get_configs_bounds
Signed-off-by: Frank Binns <frank.binns@imgtec.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
When calling either eglCreateWindowSurface or eglCreatePixmapSurface it
was possible for an application to be aborted as a result of it failing
to create a DRI2 drawable on the server. This could happen due to an
application passing in an invalid native drawable handle, for example.
v2: Handle the case where an error has been set on the connection
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Frank Binns <frank.binns@imgtec.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Both eglCreatePixmapSurface and eglCreateWindowSurface were incorrectly
setting the EGL error to be EGL_BAD_ALLOC when an invalid native drawable
handle was being passed in. The EGL spec states the following for
eglCreatePixmapSurface:
"If pixmap is not a valid native pixmap handle, then an EGL_BAD_-
NATIVE_PIXMAP error should be generated."
(eglCreateWindowSurface has similar text)
Correctly set the EGL error value based on xcb_get_geometry_reply returning
an error structure containing something other than BadAlloc.
v2: Check for BadAlloc error and update commit message to reflect this
Signed-off-by: Frank Binns <frank.binns@imgtec.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Commit 4ed23fd590 introduced some calls to _eglError inappropriately
passing it EGL_BAD_NATIVE_WINDOW. This was actually harmless in two of the
cases as _eglError gets called later on with a more appropriate error code
but (just to be safe) switch these to _eglLog calls instead.
The final case is a little trickier as it actually needs to set an error
of which the following are available (according to the EGL spec):
EGL_BAD_MATCH, EGL_BAD_CONFIG, EGL_BAD_NATIVE_(PIXMAP|WINDOW) and
EGL_BAD_ALLOC.
Of these, EGL_BAD_ALLOC seems to be the most appropriate given that
failure can occur either as a result of xcb_get_setup failing due to an
earlier error on the connection (where the most commonly occurring error
code is XCB_CONN_CLOSED_MEM_INSUFFICIENT) or as a result of the
xcb_screen_iterator_t 'rem' field being 0.
In addition to this, commit af2aea40d2 unconditionally set the error to
EGL_BAD_NATIVE_WINDOW when creating a window or pixmap surface with a NULL
native handle. Change this to correctly set the error based on surface
type.
v2: Updated patch description (Emil Velikov)
Return EGL_BAD_NATIVE_PIXMAP when eglCreatePixmapSurface is called
with a NULL native pixmap handle
Signed-off-by: Frank Binns <frank.binns@imgtec.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Items in the program cache consist of three things: key, the data
representing the instructions and auxiliary data representing
uniform storage. The data consisting of instructions is stored into
a drm buffer object while the key and the auxiliary data reside in
malloced section. Now the cache uploading is equipped with a check
that iterates over existing items and seeks to find a another item
using identical instruction data than the one being just uploaded.
If such is found there is no need to add another section into the
drm buffer object holding identical copy of the existing one. The
item just being uploaded should instead simply point to the same
offset in the underlying drm buffer object.
Unfortunately the check for the matching instruction data is
coupled with a check for matching auxiliary data also. This
effectively prevents the cache from ever containing two items
that could share a section in the drm buffer object.
The constraint for the instruction data and auxiliary data to
match is, fortunately, unnecessary strong. When items are stored
into the cache they will anyway contain their own copy of the
auxiliary data (even if they matched - which they in real world
never will). The only thing the items would be sharing is the
instruction data and hence we should only check for that to match
and nothing else.
No piglit regression in jenkins.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Current logic re-writes the same data when existing data is found.
Not that this actually matters at the moment in practice, the
contraint for finding matching data is too severe to ever allow
data to be shared between two items in the cache.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
and simplify the interface to take directly the size and to return
the offset. The routine does nothing more than allocate, it doesn't
upload anything.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Extension spec originally required 2^24 but 2^27 is the minimum value
required by OpenGL 4.5 and OpenGL ES 3.1 specifications.
Fixes:
ES31-CTS.shader_storage_buffer_object.basic-max
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
_mesa_get_program_resource_name has logic to append '[0]' in name
if variable is an array, this should be skipped for XFB varyings
that have array index already appended.
v2: fix comment, change also GL_NAME_LENGTH query to match
the behaviour
Fixes:
ES31-CTS.program_interface_query.transform-feedback-types
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Martin Peres <martin.peres@linux.intel.com>
The value was copied from r300g, which uses 1/12 subpixels, but this hw
uses 1/16 subpixels.
Should fix piglit: gl-1.4-polygon-offset (formerly a glean test)
(untested, ported from radeonsi)
Reviewed-by: Edward O'Callaghan <eocallaghan at alterapraxis.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Cc: mesa-stable@lists.freedesktop.org
The value was copied from r300g, which uses 1/12 subpixels, but this hw
uses 1/16 subpixels.
Fixes piglit: gl-1.4-polygon-offset (formerly a glean test)
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Cc: mesa-stable@lists.freedesktop.org
It must be obtained from the VS.
The GS scenario A must be enabled for PrimID to be generated for the VS.
+ 4 piglits
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Basic texture buffer support. Should be straightforward to add first/
last_element support. And with a bit of work in ir3 emulate larger
texture buffer sizes. But this seems to be enough for stk gl31 render
paths.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
This shows up with a glamor shader, which does a TXF and uses the result
for conditional kill. Before we wouldn't group the fanin (collect)
neighbors which need to be allocated adjacently at RA, resulting in
badness.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
a4xx needs similar treatment as 995f55a6
Also fixup a few point-size and vpsrepl issues and drop fix_blit_fp()
hack previously needed for mem2gmem.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Move a few things around to group stuff that is common to a3xx/a4xx
together. Also, introduce is_ir3() for things that are more specific to
the compiler / shader-ISA than to the gpu generation.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
These extensions allow reading depth/stencil for GLES contexts, which is
useful for tools like apitrace.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This function would always report that a dimension or size error occurred
in glTexImage even when it was called from glCompressedTexImage. Replace
the static string with the dynamically determined caller name.
Reviewed-by: Tapani Palli <tapani.palli@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Because we build here an array format, we don't need to swap the
bytes for big endian.
If it isn't an array format, the bytes will be swapped in
_mesa_format_convert.
v2: remove temp variable
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
Before, if we encountered an array format of 0 on a BE system, we would
flip all the channels even though it's an invalid format. This would
result in a mostly invalid format with a swizzle of yyyy or wwww. Instead,
we should just return 0 if the array format stashed in the format info is
invalid.
Cc: "10.6 10.5" <mesa-stable@lists.freedesktop.org>
The swizzle defines where in the format you should look for any given
channel. When we flip the format around for BE targets, we need to change
the destinations of the swizzles, not the sources. For example, say the
format is an RGBX format with a swizzle of xyz1 on LE. Then it should be
wzy1 on BE; however, the code as it was before, would have made it 1zyx on
BE which is clearly wrong.
Reviewed-by: Iago Toral <itoral@igalia.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Cc: "10.6 10.5" <mesa-stable@lists.freedesktop.org>
Cuts about 2k of .text.
text data bss dec hex filename
5017141 197160 27672 5241973 4ffc75 i965_dri.so before
5014981 197160 27672 5239813 4ff405 i965_dri.so after
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cuts about 1k of .text.
text data bss dec hex filename
5018165 197160 27672 5242997 500075 i965_dri.so before
5017141 197160 27672 5241973 4ffc75 i965_dri.so after
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
MESA_LLVM_VERSION_PATCH is undefined.
Reviewed-by: Edward O'Callaghan <eocallaghan at alterapraxis.com>
Tested-by: Benjamin Bellec <b.bellec@gmail.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
r600 currently has 73 atoms and looping through their dirty flags has
become costly because checking each flag requires a pointer
dereference before the read. To avoid having to do that add additional
bitfield which can be checked really quickly thanks to tzcnt instruction.
id field was added to struct r600_atom but that doesn't affect memory
usage for both 32 and 64 bit CPUs because it was stuffed into padding.
The performance improvement is ~2% for benchmarks that can have FPS in
the thousands but is hardly measurable in "real" programs.
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
This is analogous to r300_mark_atom_dirty() used by r300, and will
be used by later patches. For common radeon code, appropriate helper
is called through a function pointer.
No functional changes.
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
This fixes the spec@arb_shader_image_load_store@invalid index bounds
piglit tests on IVB, which were causing a GPU hang and then a crash
due to the invalid binding table index result of the array index
calculation. Other generations seem to behave sensibly when an
invalid surface is provided so it doesn't look like we need to care.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
v2: Drop VEC4 suport.
v3: Rebase.
v4: Move array coordinate workaround into the surface builder.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Define bitfield packing, unpacking and type conversion operations in
terms of which the image format conversion code will be implemented.
These don't directly know about image formats: The packing and
unpacking functions take a 4-tuple of bit shifts and a 4-tuple of bit
widths as arguments, determining the bitfield position of each
component. Most of the remaining functions perform integer, fixed
point normalized, and floating point type conversions, mapping between
a target type with per-component bit widths given by a parameter and a
matching native representation of the same type.
v2: Drop VEC4 suport.
v3: Rebase.
v4: Fix clamping of negative floats in the unsigned case of
emit_convert_to_scaled().
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Define some utility functions to query the bitfield layout of a given
image format and whether it satisfies a number of more or less
hardware-specific properties.
v2: Drop VEC4 suport.
v3: Add SKL support.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Define a function to calculate the memory address of the image
location given by a vector of coordinates. This is required in cases
where we need to fall back to untyped surface access, which take a raw
memory offset and know nothing about surface coordinates, type
conversion or memory tiling and swizzling. They are still useful
because typed surface reads don't support any 64 or 128-bit formats on
IVB, and they don't support any 128-bit formats on HSW and BDW.
The tiling algorithm is implemented based on a number of parameters
which are passed in as uniforms and determine whether the surface
layout is X-tiled, Y-tiled or untiled. This allows binding surfaces
of different tiling layouts to the pipeline without recompiling the
program.
v2: Drop VEC4 suport.
v3: Rebase.
v4: Add plenty of comments (Jason).
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
These utility functions check whether an image access is valid.
According to the spec an invalid image access should have no effect on
the image and yield well-defined results. Typically the hardware
implements correct bounds and surface checking by itself, but in some
cases (typed atomics on IVB and untyped messages elsewhere) we need to
implement it in software to work around lacking hardware support.
v2: Drop VEC4 suport.
v3: Rebase.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
v2: Store early fragment test mode in brw_wm_prog_data instead of
getting it from core mesa data structures (Ken).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Shaders with image uniforms may have side effects. Make sure that
fragment shader threads are dispatched if the shader has any image
uniforms.
v2: Use brw_stage_prog_data::nr_image_params to find out if the shader
has image uniforms instead of checking core mesa data structures
(Ken).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This will be used to pass image meta-data to the shader when we cannot
use typed surface reads and writes. All entries except surface_idx
and size are otherwise unused and will get eliminated by the uniform
packing pass. size will be used for bounds checking with some image
formats and will be useful for ARB_shader_image_size too. surface_idx
is always used.
v2: Add CS support. Move the image_params array back to
brw_stage_prog_data.
v3: Improve documentation.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
On BDW+, the negation source modifier on NOT, AND, OR, and XOR, is actually
a boolean negate and not an integer negate. However, NIR's soruce
modifiers are the integer version. We have to resolve it with a MOV prior
to emitting the actual instruction. This is basically the same thing we do
in the FS backend.
Reviewed-by: Matt Turner <mattst88@gmail.com>
The analysis code was already there and running, we just weren't doing
anything with the result of it yet.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Previously, we were explicitly listing every instruction that needs a
resolve. However, those instructions were precicely the ones that returned
booleans so there's no reason why we shouldn't just have that check. Also,
all of the reduction opcodes such as bany and ball were missing so it
didn't properly flag stuff on vec4. If an opcode gets added in the future
that returns a bool but doesn't need a resolve, we can special-case that.
Reviewed-by: Matt Turner <mattst88@gmail.com>
This better ensures that the src_bits == dst_bits case gets optimized away.
Reviewed-by: Neil Roberts <neil@linux.intel.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
For gmem restore (mem2gmem), we swap blit programs, in order to have a
different frag shader for depth vs color restore. But we weren't
actually clearing the cached fp, so it would not actually change the
frag shader as expected.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
For gmem restore (mem2gmem), we swap blit programs, in order to have a
different frag shader for depth vs color restore. But we weren't
actually clearing the cached fp, so it would not actually change the
frag shader as expected.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
This patch fixes a bug in big-endian treatment, where the previous
swizzle info wasn't cleared before a new swizzle info was inserted into
the format field using a bitwise-OR operation.
v2: use MESA_ARRAY_FORMAT_SWIZZLE_*_MASK instead of numeric constants
v3: align according to coding style
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
CC: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Neither MSVC nor MinGW defines LONG_BIT. For MSVC this was not a problem as
it doesn't define __x86_64__ macro (it's GCC specific.)
However on Windows long type is guaranteed to be 32bits.
Also add an #error, as GCC will just warn, not throw any error, when no
value is returned.
Trivial.
To avoid collission with windows.h's PURE macro.
We could consider eventually renaming to __pure, but that would require
further care, so it's left to the future.
Reviewed-by: Brian Paul <brianp@vmware.com>
We don't need to free driverName string from dri2 reply, on the other
hand, the driver name acquired from loader doesn't need duplication.
Fixes: 45e110bad9 (egl/x11: trust our loader over the xserver for the
drivername)
Reported-by: Timothy Arceri <t_arceri@yahoo.com.au>
Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
[Emil Velikov: use brackets for both branches of conditional]
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
This should have been a part of:
commit 7eaacc1678
Author: Ben Widawsky <benjamin.widawsky@intel.com>
Date: Wed Jul 29 12:35:24 2015 -0700
i965/skl: Add production thread counts and URB size
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Since the introduction of new gl_shader_stages in
commit a2af956963
Author: Fabian Bieler <fabianbieler@fastmail.fm>
Date: Fri Mar 7 10:19:09 2014 +0100
mesa: add tessellation shader enums
the translation table for the stage into the HW binding table edit
command was broken, and so we used illegal commands. Fix the array
initialisation to be impervious to changes in the gl_shader_stages enum
and add the asserts that would have caught the issue earlier.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This was causing a failure to build on SCons due to a missing
-Isrc/egl. Instead of adding in that path, lets just -Isrc/
and include "utils/u_atomic.h".
Reviewed-by: Matt Turner <mattst88@gmail.com>
Identical to commit 60e9c35b3a0(egl/x11: bail out if we cannot fetch
the xcb connection) but for the swrast codepath.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
No real change, apart from keeping the calls to the underlying winsys
(x11) next to each other. Just like platform_wayland.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
With the follow up commits we're about to further reshuffle things. Thus
we'll honour our our driver_name lookup (src/loader), and use the one
provided by xserver as a fall-back.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
The documentation of xcb_connection_has_error() does not mention
what will happen, if NULL is fed to the function.
Upon closer look (props to Matt), it seems that we'll crash as the
implementation dereferences conn.
This will also allow us to remove the dri2_dpy->conn checking with the
next commit.
v2: Reword commit message as per Matt's findings.
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
As sugested by Tom a long time ago
and in order to be able to create Piglit tests
v2:
replace NOT_SUPPORTED_BY_CL_1_1 macro with an inline function
remove extra space in clLinkProgram arg
v3:
use __func__
v4:
back to a macro, it make more sense to use it with __func__
[ Francisco Jerez: Rename to CLOVER_NOT_SUPPORTED_UNTIL and pass the
minimum API version required by the entry point so the error
messages don't become stale when support for additional CL versions
is introduced. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
When a program is compiled, but linking failed the sh->InfoLog
could be NULL. This is expoloited by OpenGL ES 3.1 conformance tests.
Signed-off-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The dst is always written, in this case the predicate is only used to select
the value to write, so if we are spilling the dst we always want to write
whatever value we selected to scratch.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Stage ref cannot be queried for transform feedback.
Also simplify the build_stageref function by passing the
correct mode for uniforms.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Same idea as in libdrm_amdgpu.
A command stream can only be created for a specific context and it's always
submitted to that context.
This will mainly be used by amdgpu and it's required by the GPU reset status
query too.
(radeon only has a basic version of the query and thus doesn't need this)
Reviewed-by: Christian König <christian.koenig@amd.com>
Ben suggested that I rename MIPTREE_LAYOUT_ALLOC_ANY_TILED since it
needed to include no tiling at all, but the name
MIPTREE_LAYOUT_ALLOC_ANY is pretty nondescriptive. We can avoid
confusion by replacing "ALLOC" with "TILING" in the identifiers.
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Regression since commit 3a31876600, when tiling modes were moved into
layout_flags.
The relevant enum values are
MIPTREE_LAYOUT_ALLOC_YTILED = 1 << 5
MIPTREE_LAYOUT_ALLOC_XTILED = 1 << 6
MIPTREE_LAYOUT_ALLOC_ANY_TILED = MIPTREE_LAYOUT_ALLOC_YTILED |
MIPTREE_LAYOUT_ALLOC_XTILED
MIPTREE_LAYOUT_ALLOC_LINEAR = 1 << 7
so the expression (layout_flags & MIPTREE_LAYOUT_ALLOC_ANY_TILED) can
never produce a value of MIPTREE_LAYOUT_ALLOC_LINEAR.
The enum this replaced was
enum intel_miptree_tiling_mode {
INTEL_MIPTREE_TILING_ANY,
INTEL_MIPTREE_TILING_Y,
INTEL_MIPTREE_TILING_NONE,
};
where "ANY" means "Y" or "NONE" (i.e., linear). As such, remove the
unused (and worse, unhandled) MIPTREE_LAYOUT_ALLOC_XTILED and redefine
MIPTREE_LAYOUT_ALLOC_ANY_TILED to mean what it did before.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91513
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Luckily, there is a kernel query, so use the size from that.
It currently returns 256KB. It can be increased in the kernel.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
This extends the SIMD lowering pass to enforce the hardware limitation
that no directly-addressed source may read more than 2 physical GRFs.
One can easily go over this limit when doing 64-bit arithmetic
(e.g. FP64 or extended-precision integer MULs) or SIMD32, so it's nice
to be able to just emit an instruction of the intended execution size
from the visitor and let the lowering pass deal with this restriction
transparently.
Some hardware arithmetic instructions are not handled here, including
all instructions that use the accumulator implicitly (which the SIMD
lowering pass deliberately doesn't handle), instructions with
non-per-channel sources (e.g. LINE or PLANE) and SEND-like
instructions, which need special handling most likely as virtual
opcodes.
Reviewed-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Otherwise it would crash on Gen8 with scalar VS. The issue can easily
be reproduced with the following patch, but I don't see any reason why
it wouldn't be possible to end up with an ATTR argument here even
without it.
CC: mesa-stable@lists.freedesktop.org
Reviewed-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Translate MULH into the MUL/MACH sequence. This does roughly the same
thing that nir_emit_alu() used to do but we can now handle 16-wide by
taking advantage of the SIMD lowering pass. The force_sechalf
workaround near the bottom is required because the SIMD lowering pass
will emit instructions with non-zero quarter control and we need to
make sure we avoid that on integer arithmetic instructions with
implicit accumulator access due to a known hardware bug on IVB.
Reviewed-by: Matt Turner <mattst88@gmail.com>
In order to make room for the code that will lower the MULH virtual
instruction. Also move the hardware generation and execution type
checks into the same branch, they are going to have to be different
for MULH.
Reviewed-by: Matt Turner <mattst88@gmail.com>
AFAIK BXT has the same annoying alignment limitation as CHV on the
source register regions of 32x32 bit MULs, give it the same treatment.
Reviewed-by: Matt Turner <mattst88@gmail.com>
This instruction will translate to the MUL/MACH sequence that computes
the high 32-bits of the result of a 64-bit multiply. Before Gen8
integer operations that used the accumulator were limited to 8-wide,
but the SIMD lowering pass can easily be hooked up to sidestep this
limitation, we just need a virtual opcode to represent the MUL/MACH
sequence in the IR.
Reviewed-by: Matt Turner <mattst88@gmail.com>
There is apparently a subtle difference in C++ between
F f;
and
F f();
The former will use the default constructor. If there is no default
constructor specified, the compiler provides one that simply invokes the
default constructor for each field. For built-in basic types, the
default constructor does nothing. The later will, according to
http://stackoverflow.com/questions/2417065/does-the-default-constructor-initialize-built-in-types)
perform value-initialization of the type. For built-in types this means
initializing to zero.
The per_vertex_accumulator constructor is:
per_vertex_accumulator::per_vertex_accumulator()
: fields(),
num_fields(0)
{
}
This is the second form of constructor, so the glsl_struct_field
objects were previously zero initialized. With the addition of an empty
default constructor in commit 7ac946e5, per_vertex_accumulator::fields
receive no initialization.
Fixes a bunch of random (mostly tessellation related) piglit failures
since commit 7ac946e5 ("glsl: Add constuctors for the common cases of
glsl_struct_field").
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91544
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Argument validation for glTexSubImageXD is missing a check of format and type
against texture object's internal format when profile is OpenGL-ES 3.0+.
This patch also groups together all format and type checks on GLES into a
new function texture_format_error_check_gles(), to factorize similar
code in texture_format_error_check().
Fixes 2 dEQP tests:
* dEQP-GLES3.functional.negative_api.texture.texsubimage2d
* dEQP-GLES3.functional.negative_api.texture.texsubimage3d
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Page 161 of the OpenGL-ES 3.1 (PDF) spec, and page 207 of the OpenGL 4.5 (PDF),
both on section '8.6. ALTERNATE TEXTURE IMAGE SPECIFICATION COMMANDS', states:
"An INVALID_ENUM error is generated if an invalid value is specified for
internalformat".
It is currently returning INVALID_OPERATION error because
_mesa_get_read_renderbuffer_for_format() is called before the internalformat
argument has been validated. To fix this, we move this call down the validation
process, after _mesa_base_tex_format() has been called. _mesa_base_tex_format()
effectively serves as a validator for the internal format.
Fixes 1 dEQP test:
* dEQP-GLES3.functional.negative_api.texture.copyteximage2d_invalid_format
Fixes 1 piglit test:
* spec@oes_compressed_etc1_rgb8_texture@basic
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: 10.6 <mesa-stable@lists.freedesktop.org>
Currently, glTexSubImageXD attempt to resolve the texture object
(by calling _mesa_get_current_tex_object()) before validating the given
target. However, that method explicitly states that target must have been
validated before calling it, so it never returns a user error.
The target validation occurs later when texsubimage_error_check() is called.
This patch reorganizes target validation, taking it out from the error check
function and into a point before the texture object is resolved.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: 10.6 <mesa-stable@lists.freedesktop.org>
Page 68, section 7.2 'Shader Binaries" of the of the OpenGL ES 3.1,
and page 88 of the OpenGL 4.5 specs state:
"An INVALID_VALUE error is generated if count or length is negative.
An INVALID_ENUM error is generated if binaryformat is not a supported
format returned in SHADER_BINARY_FORMATS."
Currently, an INVALID_OPERATION error is returned for all cases.
Fixes 1 dEQP test:
* dEQP-GLES3.functional.negative_api.shader.shader_binary
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: 10.6 <mesa-stable@lists.freedesktop.org>
Original purpose of these lines was to be more friendly against
GUI tools using the extension. However conformance suite explicitly
checks that buffers are not modified in error conditions.
Fixes:
ES31-CTS.program_interface_query.buff-length
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Currently stage reference mask is built using the variable name
only. However it can happen that input of one stage has same name
as output from another stage. Adding check of variable mode makes
sure we do not pick wrong variable.
Fixes some subcases from
ES31-CTS.program_interface_query.no-locations
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Calling eglQuerySurface on a window or pixmap with the EGL_LARGEST_PBUFFER
attribute resulted in the contents of the 'value' parameter being modified.
This is the wrong behaviour according to the EGL spec, which states:
"Querying EGL_LARGEST_PBUFFER for a pbuffer surface returns the
same attribute value specified when the surface was created with
eglCreatePbufferSurface. For a window or pixmap surface, the
contents of value are not modified."
Avoid this from happening by checking that the surface type is EGL_PBUFFER_BIT
before modifying the contents of the parameter.
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Frank Binns <frank.binns@imgtec.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Update the DRI image interface error codes to reflect the needs of the
EGL_EXT_image_dma_buf_import extension. This means updating the existing error
code documentation and adding a new __DRI_IMAGE_ERROR_BAD_ACCESS error code
so that drivers can correctly reject unsupported pitches and offsets. Hook
the new error code up in EGL to return EGL_BAD_ACCESS.
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Frank Binns <frank.binns@imgtec.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This is useful to increase the CSE opportunities for a scalar backend. It
avoids regressions when dropping vc4's custom CSE implementation.
v2: Cleanups by Matt (decl in the for loop, and unreachable()).
Reviewed-by: Matt Turner <mattst88@gmail.com>
Since we just pulled it out of the destination as 8-bit unorm, we know
it's in [0, 1] already.
shader-db:
total instructions in shared programs: 100040 -> 98208 (-1.83%)
instructions in affected programs: 14084 -> 12252 (-13.01%)
Previously, SFU values always moved to a temporary, and TLB color reads
and texture reads always lived in r4. Instead, we can have these results
just be normal temporaries, and the register allocator can leave the
values in r4 when they don't interfere with anything else using r4.
shader-db results:
total instructions in shared programs: 100809 -> 100040 (-0.76%)
instructions in affected programs: 42383 -> 41614 (-1.81%)
Both a3xx and a4xx need the same logic to decide if half-precision can
be used for blit shaders. So move it to core and simplify things a bit
with a helper that considers all render targets.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Collapse dirty/reading bools into status bitmask (and drop writing which
should really be the same as dirty). And use 'used_resources' list for
all tracking, including zsbuf/cbufs, rather than special casing the
color and depth/stencil buffers.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
We hard-coded 4 or 8 as the max in various places. Switch it all to a
define since the limit will go up with a4xx (and maybe even again in the
future?)
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Functional change in which way half-way cases are rounded from towards
positive-infinity to even. The spec says "the passed value is rounded to
the nearest integer". Removes another case of bad half-up rounding.
gcc actually generates this for us now that we use -fno-math-errno
(which is weird, since lrintf()/lrint() don't set errno) but clang still
does not. Presumably helps MSVC as well.
Reduced .text size by 8.5k with gcc before -fno-math-errno.
text data bss dec hex filename
4935850 195136 26192 5157178 4eb13a i965_dri.so before
4927225 195128 26192 5148545 4e8f81 i965_dri.so after
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
vl/vl_mpeg12_bitstream.c: In function 'decode_slice':
vl/vl_mpeg12_bitstream.c:928:19: warning: unused variable 'extra' [-Wunused-variable]
unsigned extra = vl_vlc_get_uimsbf(&bs->vlc, 1);
^
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Christian König <christian.koenig@amd.com>
The util/hash_table was intended to be a fast hash table
replacement for the program/hash_table see 35fd61bd99 and
72e55bb688.
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Fixes a giant pile of GCC warnings:
builtin_types.cpp:60:1: warning: missing initializer for member 'glsl_struct_field::stream' [-Wmissing-field-initializers]
I had to add a default constructor because a non-default constructor
was added. Otherwise the only constructor would be the one with
parameters, and all the plases like
glsl_struct_field foo;
would fail to compile.
I wanted to do this in two patches. All of the initializers of
glsl_struct_field structures had to be converted to use the
constructor because C++ apparently forces you to do one or the other:
builtin_types.cpp:61:1: error: could not convert '{glsl_type::float_type, "near", -1, 0, 0, 0, GLSL_MATRIX_LAYOUT_INHERITED, 0, -1}' from '<brace-enclosed initializer list>' to 'glsl_struct_field'
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
When the NIR-vec4 pass is enabled, handles uniform and GRF array access
on ARB_vertex_program like it is done on vertex shaders.
When the old IR-vec4 pass is used, emit_program_code() emits pull constant
loads directly instead of using relative addressing, hence to call to
move_uniform_array_access_to_pull_constants() is not needed and it is enough
to call to split_uniform_registers().
The patch also calls to move_grf_array_access_to_scratch() like it is
done for shaders, however I suspect this is a no-op for vertex programs and
we could remove it.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
The implementation takes into account that on ARB_vertex_program
only a single nir variable is generated to support all the uniform data.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
So the implementation is independent of GLSL IR and the visit methods of the
gen6 GS visitor. This way we will be able to reuse that implementation directly
from the NIR vec4 backend.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Outputs from the vertex shader become array inputs in the geomtry shader,
but the arrays are interleaved, so we need to map our inputs accordingly.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
So the implementation is independent of GLSL IR and the visit methods of the
vec4 visitor. This way we will be able to reuse that implementation directly
from the NIR vec4 backend.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Avoid copying an overwritten swizzle, use the original values.
Example:
Former swizzle[] = xyzw
src->swizzle[] = zyxx
The expected output swizzle = zyxx but if we reuse swizzle in the loop,
then output swizzle would be zyzz.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Uses the nir structure to get all the info needed (sources,
dest reg, etc), and then it uses the common
vec4_visitor::emit_texture to emit the final code.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Splitted in two. The emission is moved to a new vec4_visitor
method, vec4_visitor::emit_texture, ir order to be reused
on the nir path.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This is useful for the upcoming texture support in NIR->vec4 pass,
as we found several cases where the brw_type is available, but not
the glsl_type.
Without this new constructor, the alternative would be:
dst_reg reg(MRF, <reg>)
reg.type = <brw_type>
reg.writemask = <mask>
Adding a new constructor makes code easier to read.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This patch changes the signature of swizzle_result() to accept lower
level arguments. The purpose is to reuse it in the upcoming NIR->vec4
pass.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This patch changes the signature of gather_channel() to accept the gather
component directly instead of fetching it internally from ir_texture.
This will allow reuse in the upcoming NIR->vec4 pass.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This patch changes the signature of emit_mcs_fetch() to accept lower level
arguments. The purpose is to reuse it in the upcoming NIR->vec4 pass.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
The is_high_sample() method is currently accessible only in the implementation of
vec4_visitor. Since we need to reuse it in the upcoming NIR->vec4 pass, lets make
it a method of the class instead.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This method returns the glsl_base_type corresponding to a nir_alu_type.
It will factorize code currently present in fs_nir, that can be reused
in vec4_nir on its upcoming emit_texture support.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
NIR ALU operations:
* nir_op_fabs
* nir_op_iabs
* nir_op_fneg
* nir_op_ineg
* nir_op_fsat
should be lowered by lower_source mods
* nir_op_fdiv
should be lowered in the compiler by DIV_TO_MUL_RCP.
* nir_op_fmod
should be lowered in the compiler by MOD_TO_FLOOR.
* nir_op_fsub
* nir_op_isub
should be handled by ir_sub_to_add_neg.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Follows the vec4_visitor IR implementation but
sets the saturate value in addition.
Adds NIR ALU operations:
* nir_op_fsign
* nir_op_isign
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
* Lowered floating-point pack and unpack operations are not valid in VS.
* Pack and unpack 2x16 operations should be handled by lower_packing_builtins.
* Adds NIR ALU operations:
* nir_op_pack_half_2x16
* nir_op_unpack_half_2x16
* nir_op_unpack_unorm_4x8
* nir_op_unpack_snorm_4x8
* nir_op_pack_unorm_4x8
* nir_op_pack_snorm_4x8
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Used the same implementation than the vec4_visitor NIR.
Adds NIR ALU operations:
* nir_op_b2i
* nir_op_b2f
* nir_op_f2b
* nir_op_i2b
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This method returns the brw_conditional_mod value used when emitting
comparative ALU operations.
It could be moved to brw_nir in the future to reuse it in fs_nir backend.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Implementation based on the vec4_visitor IR implementation
for the operations ir_binop_mul and ir_binop_imul_high.
Adds NIR ALU operations:
* nir_op_fmul
* nir_op_imul
* nir_op_imul_high
* nir_op_umul_high
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Disables nir_lower_alu_to_scalar when the shader stage being processed work
on vec4 vectors, like the upcoming NIR->vec4 backend.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This patch resolves and initializes the destination and the source
registers that are common to most ALU operations.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Based on the vec4_visitor IR implementation for the ir_binop_load_ubo
operation. Notice that unlike the vec4_visitor IR, adding the !=0
comparison for UBO bools is not needed here because that comparison is
already added by the nir_visitor when processing the ir_binop_load_ubo
(in UBOs "true" is any value different from zero, but for us is ~0).
Adds NIR instrinsics:
* nir_intrinsic_load_ubo_indirect
* nir_intrinsic_load_ubo
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
For the indirect case we need to take the index delivered by
NIR and compute the parent uniform that we are accessing (the one
that we uploaded to a surface) and the constant offset into that
surface.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
These include:
nir_intrinsic_load_vertex_id_zero_base
nir_intrinsic_load_base_vertex
nir_intrinsic_load_instance_id
The source register is fetched from the nir_system_values map initialized
during nir_setup_system_values stage.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This implementation is based on the current URB setup in vec4_visitor, which
requires the output register to be stored in the output_reg array at variable's
original shader location index. But since nir_lower_io() pass uses the value
in var->data.driver_location, we need to put there var->data.location instead,
prior to calling nir_lower_io(), so that we end up with the correct index
in const_index[0].
The driver_location is not used at all, so this patch also disables the
nir_assign_var_locations pass on non-scalar shaders.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Instead of relying on backends (currently vec4_visitor and soon NIR-vec4) to
store registers in output_reg with the correct type, this patch makes sure
that the common code in emit_urb_slot() always emit MOVs from output registers
using the same type on source and destination.
Since the actual type is not important, only that they match, we default to
float.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
The same we do in the FS NIR backend, only that here we need to consider
the number of components in the condition and adjust the swizzle
accordingly.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
These methods are essential for the implementation of the NIR->vec4 pass. They
work similar to their fs_nir counter-parts.
When processing instructions, these methods are invoked to resolve the
brw registers (source or destination) corresponding to the NIR sources
or destination. It uses the map of NIR register index to brw register for
all registers locally allocated in a block.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Similar to fs_nir backend, a nir_local_values map will be filled with
newly allocated registers as the load_const instrinsic instructions are
processed. Later, get_nir_src() will fetch the registers from this map
for sources that are ssa.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
New method brw_writemask_for_size() will return a writemask with the first
'size' components activated.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
In the vec4 backend we want uniform locations to be assigned consecutively
since that way the offsets produced by nir_lower_io are exactly what we
need to implement nir_intrinsic_load_uniform. Otherwise we would need a
mapping to match the output of nir_lower_io to the actual uniform registers
we need to use.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
The current implementation operates in scalar mode only, so add a vec4
mode where types are padded to vec4 sizes.
This will be useful in the i965 driver for its vec4 nir backend
(and possbly other drivers that have vec4-based shaders).
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
The upcoming introduction of NIR->vec4 pass will require that some NIR
lowering passes are enabled/disabled depending on the type of shader
(scalar vs. vector).
With this patch we pass a 'is_scalar' variable to the process of
constructing the NIR, to let an external context decide how the shader
should be handled.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
It basically allocates registers local to a function in a nir_locals map,
then emits all its control-flow blocks.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Similar to other variable setups, system values will initialize the
corresponding register inside a 'nir_system_values' map, which will then
be queried later when processing the different system value intrinsics
for the appropriate register.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
The new virtual method is more flexible, it has a signature:
dst_reg *make_reg_for_system_value(int location, const glsl_type *type);
v2 (Jason Ekstrand):
Use the new version in unit tests so make check passes again
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This implementation sets up a map of input variable offsets to source registers
that are already initialized with the corresponding register offset.
This map will then be queried when processing load_input intrinsic operations,
to obtain the correct register source from which the input data will be loaded.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
The type_size() method is currently accessible only in the implementation
of vec4_visitor. Since we need to reuse it in the upcoming NIR->vec4 pass,
lets make it a method of the class instead.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
The NIR->vec4 pass will be activated if both the following conditions are met:
* INTEL_USE_NIR environment variable is defined and is positive (1 or true)
* The stage is vertex shader (support for geometry shaders and
ARB_vertex_program will be added later).
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This patch will add a brw_vec4_nir.cpp file filled with entry point methods to
the main functionality, following a structure similar to brw_fs_nir.cpp.
Subsequent patches in this series will be adding the implementations for these
methods, incrementally.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
I'm not sure what the true meaning of "The rounding mode may vary." is,
but it is the case that the IROUND() path rounds differently than the
other paths (and does it wrong, at that).
Like _mesa_roundeven{f,}(), just add an use _mesa_lroundeven{f,}() that
has known semantics.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Cuts about 1k of .text size.
text data bss dec hex filename
4983676 197808 26328 5207812 4f7704 i965_dri.so before
4982522 197800 26328 5206650 4f727a i965_dri.so after
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Cuts about 9k of .text size.
text data bss dec hex filename
4992804 197808 26328 5216940 4f9aac i965_dri.so before
4983676 197808 26328 5207812 4f7704 i965_dri.so after
Also, Darwin's libm does not ever set errno, so if we care about those
systems we shouldn't rely on errno anyway.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
To circumvent a problem occuring when LINEAR_ALIGNED array mode is
selected on a TEXTURE_2D RAT.
This configuration causes MEM_RAT STORE_TYPED to write to incorrect
locations.
The only values allowed are 0 and 1, and the value is checked before
assigning.
This is a copy of 8eeca7a56c that seems to have been made to the glsl
ir type after it was copied for use in nir but before nir landed.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Read-only and write-only image arguments are recognized and
distinguished.
Attributes of the image arguments are passed to the kernel as implicit
arguments.
parse_program_resource_name returns -1 when the index is invalid this needs to
be tested before assigning the value to the unsigned array_index.
In link_varyings.cpp (the other place parse_program_resource_name is used) after
the -1 check is done the value is just assigned to an unsigned variable so it
seems long is just used so we can return the -1 rather than actually expecting
index values to be ridiculously large.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
GLES 3.1 should be able to bind a texture with the target
GL_TEXTURE_2D_MULTISAMPLE.
Signed-off-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
All three of GLX_NV_float_buffer, GLX_EXT_texture_from_pixmap and
GLX_MESA_query_renderer have been in glxext.h for a while now.
As such we can drop this workaround/hack from the header.
v2: Remove the comment about GLX_NV_float_buffer.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com> (v1)
Recently a few drivers have grown OpenGL 4+ support so we might as
well go all the way to... 11 ;-)
v2: Don't forget to update the version file (Ilia)
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Earlier commit added an extra dup(fd) to fix a ZaphodHeads issue.
Although it did not consider the (very unlikely) case where we might end
up with the valid fd == 0.
Fixes: 28dda47ae4d(winsys/radeon: Use dup fd as key in drm-winsys hash
table to fix ZaphodHeads.)
Cc: 10.6 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Mario Kleiner <mario.kleiner.de@gmail.com>
... and update the documentation to reflect reality.
null and gdi are gone, and surfaceless is a recent addition.
v2: s/platforms/platform/ (spotted by Thomas)
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Without this this extension basically can't work in indirect contexts,
TexImage2D will compute the image size as 0 and we'll send no image data
to the server.
v2: Add EXT_texture_integer to the client extension list too (Ian)
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Adam Jackson <ajax@redhat.com>
Disabling the FP16 mode didn't help.
If needed, we can use this trick for blits too, but not for scaled blits.
+ 4 piglits
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
There are 2 reasons for this:
- LLVM optimization passes can work with floor
- there are patterns to select v_fract from floor anyway
There is no change in the generated code.
The patch has a better explanation. Just a summary here:
- The CPU always uploads a whole descriptor array to previously-unused memory.
- CP DMA isn't used.
- No caches need to be flushed.
- All descriptors are always up-to-date in memory even after a hang, because
CP DMA doesn't serve as a middle man to update them.
This should bring:
- better hang recovery (descriptors are always up-to-date)
- better GPU performance (no KCACHE and TC flushes)
- worse CPU performance for partial updates (only whole arrays are uploaded)
- less used IB space (no CP_DMA and WRITE_DATA packets)
- simpler code
- hopefully, some of the corruption issues with SI cards will go away.
If not, we'll know the issue is not here.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
With num_direct_uniforms == 0 there's no space allocated in the
param_size array for the one block of direct uniforms -- On the FS
stage this would be a harmless no-op because it would simply re-set
one of the param_size entries allocated for the sampler units to zero,
but on the VS stage it has been reported to cause memory corruption
followed by a crash -- Surprising how a full piglit run on Gen8 didn't
catch it.
Reported-and-reviewed-by: "Lofstedt, Marta" <marta.lofstedt@intel.com>
For SKL: These are the production values.
For BXT: These are low estimates to enable platforms.
This patch was originally part of
i965/skl: Add production thread counts and URB size
but was split out at Jordan's request (which I found to be reasonable).
Note on stable inclusion: 10.6 does not care about hs, and ds. It does care
about cs, but since Jordan was the one that asked me to extract it, I'll leave
it up to him to deal with a backport to stable is required.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Since we really do not know what may occur in the future, pick a more
conservative value for thread counts until we know better what values are
correct. As far as I can tell, the old values will work fine, but some of the
registers seem to indicate that going even lower is possible and the purpose of
having early support is to enable as many configurations that can possibly
exist (we can trim things down after platforms begin shipping later).
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This patch adjusts the SKL values to the best known values we have.
v2: Remove HS/DS/CS fields. Adding this makes most sense to add to the
GEN9_FEATURES macro, however, doing that would require updating BXT values, and
Jordan requested I not do that. Conveniently, this request makes a lot of sense
wrt to stable backport as HS, and DS do not even exist there.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
For now, this just splits up store_output intrinsics to be scalars, and
drops unused outputs in the coordinate shader. My goal is to be able to
drop a bunch of my VC4-specific optimization by letting NIR handle it.
if we get a request to take the count from feedback, but there
is no buffer to take it from, just draw as if we got 0 vertices
so nothing.
This fixes this assert killing the ogl conform, and a piglit
test I've sent.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This removes the need for multiple functions designed to validate an array
subscript and replaces them with a call to a single function.
The change also means that validation is now only done once and the index
is retrived at the same time, as a result the getUniformLocation code can
be simplified saving an extra hash table lookup (and yet another
validation call).
This chage also fixes some tests in:
ES31-CTS.program_interface_query.uniform
V3: rebase on subroutines, and move the resource index array == 0
check into _mesa_GetProgramResourceIndex() to simplify things further
V2: Fix bounds checks for program input/output, split unrelated comment fix
and _mesa_get_uniform_location() removal into their own patch.
Cc: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Previously it could end up using the “SKL early” device on BXT
depending on the revision number. This would probably break things
because for example has_llc would be wrong.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This fixes the remaining failing tests in:
ES31-CTS.program_interface_query.uniform-types
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
This enables GL4.1 for radeonsi, and updates the
docs in the correct places.
v2: enable only for llvm 3.7 which has fixes in place.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is the final piece for ARB_gpu_shader5,
The code is based on the r600 code from Glenn Kennard,
and myself.
While developing this, I'm not 100% sure of all the calculations
made in the GS registers, this is why the max_stream is worked
out there and used to limit the changes in registers. Otherwise
my initial attempts either regressed GS texelFetch tests
or primitive-id-restart. The current code has no regressions
in piglit.
This commit doesn't enable ARB_gpu_shader5, since that just
bumps the glsl level to 4.00, so I'll just do a separate patch
for 4.10.
v1.1: fix bug introduced in rebase.
v2: Address Marek's review comments,
remove my llvm stream code for simpler C,
move gsvs_ring and gs_next_vertex to arrays.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Fixes following compiler warning:
brw_cs.cpp:386:27: warning: comparison between signed and unsigned
integer expressions [-Wsign-compare]
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This is easily accomplished by moving simd16 3src to GEN9_FEATURES.
v2: small cleanup to make it more similar to GEN8_FEATURES
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
There are a couple of unrelated changes in t_vb_lighttmp.h that I hope
you'll excuse -- there's a block of code that's duplicated modulo a few
trivial differences that I took the liberty of fixing.
Literals without an f/F suffix are of type double, and implicit
conversion rules specify that the float in (float op double) be
converted to a double before the operation is performed. I believe float
execution was intended (in nearly all cases) or is sufficient (in the
case of gen7_urb.c).
Removes a lot of float <-> double conversion instructions and replaces
many double instructions with float instructions which are cheaper.
text data bss dec hex filename
4928659 195160 26192 5150011 4e953b i965_dri.so before
4928315 195152 26192 5149659 4e93db i965_dri.so after
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
ARB_viewport_array specifies that DEPTH_RANGE consists of double-
precision parameters (corresponding commit d4dc35987), and a preparatory
commit (6340e609a) added _mesa_get_viewport_xform() which returned
double-precision scale[3] and translate[3] vectors, even though X, Y,
Width, and Height were still floats.
All users of _mesa_get_viewport_xform() immediately convert the double
scale and translation vectors into floats (which were floats originally,
but were converted to doubles in _mesa_get_viewport_xform(), sigh).
i965 at least cannot consume doubles (see SF_CLIP_VIEWPORT). If we want
to pass doubles to hardware, we should have a different function that
does that.
Acked-by: Mathias Froehlich <Mathias.Froehlich@web.de>
Not a typo. Replace the default builder with one of bogus width to
catch cases in which optimization passes assume that the default
dispatch width is good enough. The execution controls of instructions
emitted during optimization should in general match the original code
that is being manipulated. Many of the problems fixed in this series
were caught by the assertions introduced in this patch.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This could have led to somewhat increased bandwidth usage for lowered
texturing instructions on Gen4 (which is the only case in which
lower_width may be greater than inst->exec_size). After the previous
patches the invariant mentioned in the comment should no longer be
assumed by any of the other optimization and lowering passes, so the
exec_all() call shouldn't be necessary anymore.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Instead of relying on the default one. This shouldn't lead to any
functional changes because DEP_RESOLVE_MOV overrides the execution
size of the instruction anyway and other execution controls are
irrelevant.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
The global CSE pass stinks and is unable to pull this out. Easy enough
to handle it here and avoid generating unnecessary special register
loads (which can allegedly be quite slow).
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
PFETCH retrieves the address for incoming vertices, not output vertices
in TCS. For output vertices, we must use the laneid as a base.
Fixes barrier piglit test, which was failing for entirely non-barrier
reasons, but rather that it was (a) trying to draw multiple patches and
(b) the incoming patch size was not the same as the outgoing patch size.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
This reverts commit a27ec5dc46. It
breaks the intended behaviour of pipe_loader_probe() with ndev==0 as
relied upon by clover to query the number of devices available to the
pipe loader in the system.
Acked-by: Emil Velikov <emil.l.velikov@gmail.com>
opt_sampler_eot() was relying on the default builder to have the same
width as the sampler and FB write opcodes it was eliminating, the
channel selects didn't matter because the builder was only being used
to allocate registers, no new instructions were being emitted with it.
A future commit will change the width of the default builder what will
break this assumption, so initialize it explicitly here.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This wasn't taking into account the execution controls of the original
instruction, but it was most likely not a bug because control flow
instructions are typically full width.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Emit the SELs and MOVs with the same execution controls as the
original MOVs, and the CMP with the same execution controls as the IF.
Also explicitly check that the execution controls of any pair of MOVs
being folded into a SEL are compatible (which is almost always going
to be the case), since otherwise it would seem wrong to initialize the
builder object below from the then_mov instruction only.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
lower_integer_multiplication() was ignoring the execution controls of
the original MUL instruction. Fix it by using the new fs_builder
constructor.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
demote_pull_constants() was ignoring the execution size and channel
selects of the instruction that wanted the constant, which doesn't
matter for uniform pull constant loads because all channels get the
same scalar value, but it might for varying pull constant loads. Fix
it by using the new fs_builder() constructor that takes care of
setting execution controls compatible with the instruction passed as
argument.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
The execution size was being left equal to the default of 8/16, which
AFAICT would have overwritten components other than the one we wanted
to initialize and could potentially have corrupted other registers.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
We have a number of optimization passes that repeat the same pattern
before inserting new instructions into the program based on some
previous instruction: They point the default builder at the original
instruction, then call exec_all() and group() to select the same
execution controls the original instruction had, and then maybe call
annotate() to clone the debug annotation from the original
instruction.
In fact an optimization pass missing any of these steps is likely to
be broken if the intention was to emit new code based on a preexisting
instruction, so let's make it easy for passes to do the right thing by
having an fs_builder constructor that automates the task of setting up
a builder to emit a given instruction provided as argument.
The following patches fix all cases I've found in which we weren't
explicitly initializing the execution controls of the emitted
instructions, and clean-up optimization passes which were already
doing the right thing to use the new constructor.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Images take up zero uniform slots in the nir_shader::num_uniforms
calculation, but nir_setup_uniforms needs to be executed even if the
program has no non-image uniforms so the driver-specific image
parameters are uploaded. nir_setup_uniforms is a no-op if there are
really no uniforms, so checking the num_uniform count is useless in
any case.
The nir_setup_inputs and _outputs changes shouldn't lead to any
functional change, they are just meant to preserve the symmetry
between them and nir_setup_uniforms.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Image variables need to allocate additional uniform slots over
nir_shader::num_uniforms. nir_setup_uniforms() overwrites the values
imported from the SIMD8 visitor and then exits early before entering
the nir_shader::uniforms loop, so image uniforms are never re-created.
Instead leave the imported values alone, they *must* be the same for
the uniform layout of both runs to be compatible.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Rewrite the NIR atomic counter intrinsics translation code making use
of the recently introduced surface builder. This will allow the
removal of some of the functionality duplicated between the visitor
and surface builder.
v2: Drop VEC4 suport.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Implement helper functions that can be used to construct and send
untyped and typed surface read, write and atomic messages to the
shared dataport unit easily.
v2: Drop VEC4 suport.
v3: Reimplement in terms of logical send opcodes.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This will be handy to avoid some ugly ternary operators in the next
patch, like:
fs_reg reg = (size == 0 ? null_reg_ud() : vgrf(..., size));
Because a zero-size register allocation is guaranteed not to ever be
read or written we can just return the null register. Another
possibility would be to actually allocate a zero-size VGRF what would
involve defining a zero-size register class in the register allocator
and a considerable amount of churn.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Each logical variant is largely equivalent to the original opcode but
instead of taking a single payload source it expects its arguments
separately as individual sources, like:
typed_surface_write_logical null, coordinates, source, surface,
num_coordinates, num_components
This patch defines the opcodes and usual instruction boilerplate,
including a placeholder lowering function provided mainly as
documentation for their source registers.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This cleans up the VEC4 implementation of setup_uniform_values()
somewhat and will avoid duplication of the image uniform upload code
by having a common interface to upload a vector of uniforms on either
back-end.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This should match the set of cases in which we currently call fail()
or no16() from the emit_texture_*() methods and the ones in which
emit_texture_gen4() enables the SIMD16 workaround.
Hint for reviewers: It's not a big deal if I happen to have missed
some case here, it will just lead to an assertion failure down the
road which is easily fixable, however being stricter than necessary
won't cause any visible breakage, it would just decrease performance
silently due to the unnecessary message splitting, so feel free to
double-check that all cases listed here already cause a SIMD8/16
fall-back with the current texturing code -- You may want to skip over
the Gen5-6 cases though if you don't have pencil and paper at hand.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Unlike its Gen5 and Gen7 counterparts this patch isn't a plain
refactor of the previous Gen4 texturing code, it's more of a rewrite
largely based on emit_texture_gen4_simd16(). The reason is that on
the one hand the original emit_texture_gen4() code didn't seem easily
fixable to be SIMD width-invariant and had plenty of clutter to
support SIMD-width workarounds which are no longer required. On the
other hand emit_texture_gen4_simd16() was missing a number of
SIMD8-only opcodes. This should generalize both and roughly match
their current behaviour where there is overlap.
Incidentally this will fix the following piglits on Gen4:
arb_shader_texture_lod.execution.arb_shader_texture_lod-texgrad
arb_shader_texture_lod.execution.tex-miplevel-selection *gradarb 2d
arb_shader_texture_lod.execution.tex-miplevel-selection *gradarb 3d
arb_shader_texture_lod.execution.tex-miplevel-selection *projgradarb 2d
arb_shader_texture_lod.execution.tex-miplevel-selection *projgradarb 2d_projvec4
arb_shader_texture_lod.execution.tex-miplevel-selection *projgradarb 3d
Acked-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This should be largely equivalent to emit_texture_gen5() except for
slight codestyle changes and the use i965 opcodes instead of the
ir_texture_opcode enum, see "i965/fs: Implement lowering of logical
texturing opcodes on Gen7+." for the mapping between them.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These weren't being handled by emit_texture_gen7() but we can easily
lower them here for consistency with other texturing opcodes.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This should be largely equivalent to emit_texture_gen7() except that
we now get i965 sampling opcodes directly rather than
ir_texture_opcode enum values. The mapping is as follows:
- ir_tex -> SHADER_OPCODE_TEX
- ir_txb -> FS_OPCODE_TXB
- ir_txl -> SHADER_OPCODE_TXL
- ir_txd -> SHADER_OPCODE_TXD
- ir_txf -> SHADER_OPCODE_TXF
- ir_txf_ms -> SHADER_OPCODE_TXF_CMS
- ir_txs -> SHADER_OPCODE_TXS
- ir_query_levels -> SHADER_OPCODE_TXS too, the visitor will make
sure that the provided lod value is zero in this
case.
- ir_lod -> SHADER_OPCODE_LOD
- ir_tg4 -> SHADER_OPCODE_TG4_OFFSET if the offset value is not
immediate, SHADER_OPCODE_TG4 otherwise.
Other than that there are only minor changes and style fixes like the
implementation now being factored out in static functions to improve
encapsulation.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This hasn't been overallocating space for the header for a long time.
It still leaves the header uninitialized though until the generator
fixes it.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
So that it's left uninitialized by LOAD_PAYLOAD, we only need to
reserve space for it in the message since it will be initialized
implicitly by the generator.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
dispatch_width is global for a single compilation and doesn't
necessarily match the desired execution width if we had to lower the
original full-width instruction due to hardware limitations. These
were all inside a Gen4-specific branch so this patch shouldn't have
any effect on more recent hardware.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Each logical variant is largely equivalent to the original opcode but
instead of taking a single payload source it expects the arguments
separately as individual sources, like:
tex_logical dst, coordinates, shadow_c, lod, lod2,
sample_index, mcs, sampler, offset,
num_coordinate_components, num_grad_components
This patch defines the opcodes and usual instruction boilerplate,
including a placeholder lowering function provided mostly as
documentation for their source registers.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The only non-trivial thing it still has to do is figure out where to
take the src/dst depth values from and predicate the instruction if
discard is in use. The manual SIMD unrolling logic in the dual-source
case goes away because this is now handled transparently by the SIMD
lowering pass.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
This does essentially the same thing as
fs_visitor::emit_single_fb_write(), with some slight differences:
- We don't have to worry about exec_size and use_2nd_half anymore,
16-wide sources have already been lowered to 8-wide thanks to the
previous commit and the manual argument unzipping is no longer
required.
- The src/dst_depth and sample_mask values are now explicit sources
of the instruction instead of being taken from the visitor state
directly. The same goes for the kill-pixel mask that will be
passed to the instruction explicitly as predicate.
- Everything is now done in static functions to improve
encapsulation.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
This shouldn't have any effect because we don't emit logical
framebuffer writes yet.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
There's no need to initialize the wrong half of oMask in the payload
when we're doing an 8-wide framebuffer write because it will be
ignored by the hardware anyway. By doing it this way we can let the
SIMD lowering pass split the sample_mask source as a regular
per-channel source, otherwise we would have to introduce some sort of
per-instruction source query or use fs_inst::header_size for the
lowering pass to be able to find out whether some source is
header-like, and leave the source untouched in that case.
As a bonus this achieves the same purpose as the previous code without
making use of the SET_OMASK pseudo-instruction, which will be removed
in a future commit.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Flatten the if ladder to match the way that the ordering of these
fields is specified in the hardware documentation a bit more closely.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In cases where the color0 argument wasn't being provided,
emit_single_fb_writes() would take the alpha channel directly from the
visitor state instead of taking it from its arguments. This sort of
hack didn't fit nicely into the logical send-message approach because
all parameters of the instruction have to be visible to the SIMD
lowering pass for it to be able to split them into halves at all.
Fix it by using LOAD_PAYLOAD in fs_visitor::emit_fb_writes() to
provide an actual color0 vector with undefined contents except for the
alpha component to match the previous behavior when no color buffers
are enabled.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It's surprising that we weren't checking for this already. A future
patch will cause code like the following to be emitted:
MOV(16) tmp<1>:uw, src
MOV(8) dst<1>:ud, tmp<8,8,1>:ud
The second MOV comes from the expansion of a LOAD_PAYLOAD header copy,
so I don't have control over its types. Copy propagation will happily
turn this into:
MOV(8) dst<1>:ud, src
Which has different semantics. Fix it by preventing propagation in
cases where a single channel of the instruction would span several
channels of the copy (this requirement could in fact be relaxed if the
copy is just a trivial memcpy, but this case is unusual enough that I
don't think it matters in practice).
I'm deliberately only checking if the type of the instruction is
larger than the original, because the converse case seems to be
handled correctly already in the code below.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We were previously guessing the half based on the EOT flag which seems
rather gross.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The logical variant is largely equivalent to the original opcode but
instead of taking a single payload source it expects its arguments
that make up the payload separately as individual sources, like:
fb_write_logical null, color0, color1, src0_alpha,
src_depth, dst_depth, sample_mask, num_components
This patch defines the opcode and usual instruction boilerplate,
including a placeholder lowering function provided mainly as
self-documentation.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This lowering pass implements an algorithm to expand SIMDN
instructions into a sequence of SIMDM instructions in cases where the
hardware doesn't support the original execution size natively for some
particular instruction. The most important use-cases are:
- Lowering send message instructions that don't support SIMD16
natively into SIMD8 (several texturing, framebuffer write and typed
surface operations).
- Lowering messages that don't support SIMD8 natively into SIMD16
(*cough*gen4*cough*).
- 64-bit precision operations (e.g. FP64 and 64-bit integer
multiplication).
- SIMD32.
The algorithm works by splitting the sources of the original
instruction into chunks of width appropriate for the lowered
instructions, and then interleaving the results component-wise into
the destination of the original instruction. The pass is controlled
by the get_lowered_simd_width() function that currently just returns
the original execution size making the whole pass a no-op for the
moment until some user is introduced.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
v2: Reverse order of the source transformations and split_inst emit
call to make the code a bit easier to understand.
Typically BAD_FILE sources are used to mark a source as not present
what implies that no registers are read. This will become much more
frequent with logical send opcodes which have a large number of
sources, many of them optionally used and marked as BAD_FILE when they
aren't applicable. It will prove to be useful to be able to rely on
the value of regs_read() regardless of whether a source is present or
not.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
And start using it in fs_builder::LOAD_PAYLOAD(). This will be used
to emit logical send message opcodes which have an unusually large
number of arguments.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This pass will house ad-hoc lowering code for several send
message-like virtual opcodes that will represent their logically
independent arguments as separate instruction sources rather than as a
single payload blob. This pass will basically just take the separate
arguments that are supposed to be part of the payload and concatenate
them to construct a message in the form required by the hardware.
Virtual instructions in separate-source form will eventually allow
some simplification of the visitor code and make several
transformations easier like lowering SIMD16 instructions to SIMD8
algorithmically in cases where the hardware doesn't support the former
natively.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This cleans up fs_inst::regs_read() slightly by disentangling the
calculation of "components" from the handling of message payload
arguments. This will also simplify the SIMD lowering and logical send
message lowering passes, because it will avoid expressions like
'regs_read * REG_SIZE / component_size' which are not only ugly, they
may be inaccurate because regs_read rounds up the result to the
closest register multiple so they could give incorrect results when
the component size is lower than one register (e.g. uniforms). This
didn't seem to be a problem right now because all such expressions
happen to be dealing with per-channel GRFs only currently, but that's
by no means obvious so better be safe than sorry.
v2: Split PIXEL_X/Y and LINTERP into separate case blocks.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
For some reason the loop that rewrites all occurrences of the
coalesced register was iterating over all possible offsets until it
would find one that compares equal to the offset of a source or
destination of any instruction in the program. Since the mapping
between old and new offsets is already available in the regs_to_offset
array and we know that the whole register has been coalesced we can
just look it up.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
The register coalesce pass wasn't rewriting the destination and
sources of instructions that accessed the second half of a coalesced
register previously copied with a 16-wide MOV instruction. E.g.:
| ADD (16) vgrf0:f, vgrf0:f, 1.0:f
| MOV (16) vgrf1:f, vgrf0:f
| MOV (8) vgrf2:f, vgrf0+1:f { sechalf }
would get incorrectly register-coalesced into:
| ADD (16) vgrf1:f, vgrf1:f, 1.0:f
| MOV (8) vgrf2:f, vgrf0+1:f { sechalf }
The reason is that the mov[i] pointer was being left equal to NULL for
every other register. The fact that we've made it to the rewrite loop
implies that the whole register will be coalesced, so it doesn't seem
right not to update something that uses it depending on whether mov[i]
is NULL or not. Fixes an amount of texturing and image_load_store
piglit tests on my SIMD-lowering branch.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
register_coalesce() was considering the exec_size of the MOV
instruction alone to decide whether the register at offset+1 of the
source VGRF was being copied to inst->dst.reg_offset+1 of the
destination VGRF, which is only a valid assumption if the move has a
32-bit execution type. Use regs_read() instead to find out the number
of registers copied by the instruction.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This adds to the common radeon streamout code, support
for multiple streams.
It updates radeonsi/r600 to set the enabled mask up.
v2: update for changes in previous patch.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This will be used here later.
v2: update atom sizes
add check for old vs new enabled mask
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Page 497 of the PDF, section '17.4.3.1 Clearing Individual Buffers' of the
OpenGL 4.5 spec states:
"An INVALID_ENUM error is generated by ClearBufferiv and
ClearNamedFramebufferiv if buffer is not COLOR or STENCIL."
Fixes 1 dEQP test:
* dEQP-GLES3.functional.negative_api.buffer.clear_bufferiv
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
A simple shader such as
vec4 color;
color.xy.x = 1.0;
would cause ir_assignment::set_lhs() to generate bogus IR:
(swiz xy (swiz x (constant float (1.0))))
We were setting the number of components of each new RHS swizzle based
on the highest channel used in the LHS swizzle. So, .xy.y would
generate (swiz xy (swiz xx ...)), while .xy.x would break.
Our existing Piglit test happened to use .xzy.z, which worked, since
'z' is the third component, resulting in an xxx swizzle.
This patch sets the number of swizzle components based on the size of
the LHS swizzle's inner value, so we always have the correct number
at each step.
Fixes new Piglit tests glsl-vs-swizzle-swizzle-lhs-[23].
Fixes ir_validate assertions in in Metro 2033 Redux.
v2: Move num_components updating completely out of update_rhs_swizzle
(suggested by Timothy Arceri). Simplify.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Same check is made for glBindFragDataLocationIndexed but it was missing
when using layout qualifiers.
Fixes following Piglit test:
arb_blend_func_extended-output-location
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Change function to get all gl_constants for inspection, this is used
by follow-up patch.
v2: rebase, update function documentation
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
It's a bunch of work for us to emit it (and its uniforms), more work for
the kernel to validate it, and additional work for the CLE to read
it. Improves es2gears framerate by about 50%.
Signed-off-by: Eric Anholt <eric@anholt.net>
Since the conversion to keeping validated shaders around for the BO's
lifetime, we haven't been checking that rendering doesn't happen to
shaders. Make vc4_use_bo check that always, and just don't use it for the
VC4_MODE_SHADER case (so now modes are unused)
We don't want anything to appear after we've kicked off the render (and
thus job flush), since that might then get written out to the tile
allocation state.
Signed-off-by: Eric Anholt <eric@anholt.net>
Glamor asks GBM for the handle of the BO, then flinks it itself. We
were marking the bo non-private in the flink and dmabuf (DRI3) paths,
but not the GEM handle path. As a result, non-pageflipping DRI2
swapbuffers (EGL apps, in particular) were never updating the texture.
The meta CopyImageSubData path uses BlitFramebuffers to do the actual copy.
The only thing that can affect BlitFramebuffers other than the currently
bound framebuffers is the scissor so we need to save that off and reset it.
If we don't do this, applications that use a scissor together with
CopyImageSubData will get accidentally scissored copies.
Tested-by: Markus Wick <markus at selfnet.de>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This adds support for queries against the non-0 vertex streams.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
It only checks fragment textures and ignores other shaders, which makes it
incomplete, and textures are already finalized in update_single_texture.
There are no piglit regressions.
Reviewed-by: Brian Paul <brianp@vmware.com>
Fixes piglit:
spec@glsl-1.30@execution@fs-texture-sampler2dshadow-10
spec@glsl-1.30@execution@fs-texture-sampler2dshadow-11
v2: use st_shader_stage_to_ptarget
Reviewed-by: Brian Paul <brianp@vmware.com>
By using 'Tobias Klausmann' piglit test-suite patch. We obtain
a full 12/12 passes using this patch. By 'faking' to claim
support for this extension we obtain 7 fails and 5 passes.
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Tested-by: Furkan Alaca <falaca@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
This is part of ARB_gpu_shader5, and this passes
all the piglit tests currently available.
v2: use macros from the fine derivs commit.
add comments.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
EGL_EXT_image_dma_buf_import now supports those formats.
Tests:
- Tested by Piglit ext_image_dma_buf_import-transcode-nv12-as-r8-gr88.
- Tested by Peter in Kodi/XBMC to obtain 60fps NV12 transcode at 4K.
Tested-by: Peter Frühberger <peter.fruehberger@gmail.com>
Signed-off-by: Chad Versace <chad.versace@intel.com>
The Kodi/XBMC developers want to transcode NV12 to RGB with OpenGL shaders,
importing the two source planes through EGL_EXT_image_dma_buf_import. That
requires importing the Y plane as an R8 EGLImage and the UV plane as either an
RG88 or GR88 EGLImage.
This patch teaches the driver-independent part of EGL about the new
formats. Real driver support is left for follow-up patches.
The new formats landed in airlied's kernel branch 'drm-next' on July 24.
Tested-by: Peter Frühberger <peter.fruehberger@gmail.com>
Signed-off-by: Chad Versace <chad.versace@intel.com>
It seems like they're never necessary, and actively cause harm. This
fixes some of the barrier-related piglits.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Gallium exposes it unconditionally, so do our best to support it. It
fails on the negative index cases, but those seem unlikely to be used in
the wild.
Previously we had a fixed array to track kills, since they don't
generate an SSA value, and then cheated by stuffing them in the
outputs array before sending things through depth/sched/etc. But
store instructions will need similar treatment. So convert this
over to a more general array of instructions that must be kept
and fix up the places that were previously relying on kills being
in the output array.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
For store instructions, the "dst" register is a read register, not a
written register. (Ie. it is the address to store to.) Lets not
confuse register allocation, scheduling, etc, with these details.
Instead just leave a dummy instr->regs[0], and take "dst" from
instr->regs[1] and srcs following.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Add 'enum ir3_driver_param' to track driver-param slots, and a
create_driver_param() helper to avoid having the knowledge about
where driver params are placed in const regs spread throughout
the code as we add additional driver-params.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
With stream-out (transform-feedback) we have the case where resources
are *written* by the gpu, which needs basically the same tracking to
figure out when rendering must be flushed.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Details of the cmdstream packets are different between a3xx and a4xx,
but the logic about the layout of const registers is the same, as that
is dictated by the ir3 shader compiler. So rather than duplicating
logic that is tightly coupled to ir3 between a3xx and a4xx, move this
into ir3 and use per-generation callbacks for to build the cmdstream
packets.
This should make it easier to pass additional const regs (such as for
transform feedback). And it also keeps the layout internal to ir3 in
case we want to make the layout more dynamic some day.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Since for transform-feedback, we'll need more than just the TGSI
tokens from the state object, just pass the entire state object to
ir3_shader_create(). This also cleans things up a bit for some
day in the future when we could take shader either as TGSI or
directly NIR (for ex, glsl2nir or spirv2nir paths). In the same
spirit, drop extra args from ir3_compile_shader_nir() (since it
can anyways get what it needs from the ir3_shader_variant).
Signed-off-by: Rob Clark <robclark@freedesktop.org>
This adds support for fine derivatives and enables
ARB_derivative_control on radeonsi.
(just fell out of my working out interpolation)
v2: cleanup some bits, write a comment
v2.1: take Michel's comment from the mailing list
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is required as part of ARB_gpu_shader5.
no backend changes are required for this, or if
any are, it's the same ones as for samplers.
v2: use get_indirect_index (Marek)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds the frontend support, however the llvm
backend produces the wrong pattern, however
we can conditionalise enabling ARB_gpu_shader5
on whatever version of llvm we fix this in.
v2: drop unneeded sampler_indirect checks (Marek)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is prep work for using it in the interpolation code
later.
Also add storage for the input interpolation mode so we
can pick it up later.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is prep work for reusing this in the interpolation
code later.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The 420pack extension enables various GLSL rules that need to be applied
to any GLSL 4.20+ shader even if the extension is not explicitly
enabled.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Commit 17f714836 (mesa: rearrange texture error checking order) moved
the width/height/depth == 0 allowance before checking if the image was
there. This was in part due to depth having to be == 1 for 2D images and
width having to be == 1 for 1D images. Instead relax the height/depth
checks to also accept 0 as valid.
With this change,
bin/arb_direct_state_access-get-textures
starts passing again.
Fixes: 17f714836 (mesa: rearrange texture error checking order)
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
This allows us to handle cases when texImage->_BaseFormat doesn't match
_mesa_format_get_base_format(texImage->Format). _BaseFormat is what we
care about in this function.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
After recent addition of pbo testing in piglit test getteximage-luminance,
it fails on i965. This patch makes a sub test pass.
This patch adds a clear color operation to meta pbo path, which I think is
better than falling back to software path.
V2: Fix color mask for GL_LUMINANCE_ALPHA
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Replace a call to mesa_base_tex_format() that handles only internal
formats with a call to the new _mesa_unpack_format_to_base_format()
function that handles allowed unpack formats and does not care for
internal formats at all.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
This is an optimization which avoids setting pixel transfer operations
when not required. _mesa_ReadPixels falls back to slower path if
transfer operations are set.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
_mesa_meta_pbo_GetTexSubImage() uses _mesa_meta_BlitFrameBuffer(),
which will do fragment clamping if enabled. But fragment clamping
doesn't affect ReadPixels and GetTexImage.
Without this patch, piglit test arb_color_buffer_float-clear fails,
when forced to use the meta pbo path.
v2: Apply this fix to both glReadPixels and glGetTexImage.
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Meta pbo path for ReadPixels rely on BlitFramebuffer which doesn't support
signed to unsigned integer conversions and vice versa.
Without this patch, piglit test fbo_integer_readpixels_sint_uint fails, when
forced to use the meta pbo path.
v2: Make need_signed_unsigned_int_conversion() a static function. (Iago)
Bump up the comment and the commit message. (Jason)
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Iago Toral <itoral@igalia.com>
Currently used ctx->_ImageTransferState check is not sufficient
because it doesn't include the read color clamping enabled with
GL_CLAMP_READ_COLOR. So, use the helper function
_mesa_get_readpixels_transfer_ops().
Also, transfer operations don't affect glGetTexImage(). So, do
the check only for glReadPixles.
Without this patch, arb_color_buffer_float-readpixels test fails, when
forced to use meta pbo path.
V2: Add a comment and bump up the commit message.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
I was mistaken, I thought we already had fixed this in the kernel a
couple of years ago. We had not, and the broken read (the hardware
shifts the register output on 64bit kernels, but not on 32bit kernels) is
now enshrined into the ABI. I also had the buggy architecture reversed,
believing it to be 32bit that had the shifted results. On the basis of
those mistakes, I wrote
commit c8d3ebaffc
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Wed Apr 29 13:32:38 2015 +0100
i965: Query whether we have kernel support for the TIMESTAMP register once
Now that we do have an extended register read interface for always
reporting the full 36bit TIMESTAMP (irrespective of whether the hardware
is buggy or not), make use of it and in the process fix my reversed
detection of the buggy reads for unpatched kernels.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Martin Peres <martin.peres@linux.intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Tested-and-acked-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Daniel Vetter <daniel@ffwll.ch>
There's no need to attempt to avoid overlapping generic i/o with patch
i/o. By the same token, we can't merge patch and non-patch loads/stores.
This fixes at least the
tes-both-input-array-*-index-rd
tessellation variable-indexing tests.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
There's a special AL2P instruction (called AFETCH in nv50 ir) which
computes a "physical" value to be used with indirect addressing with ALD.
Fixes
tcs-input-array-*-index-rd
tcs-output-array-*-index-wr
varying-indexing tessellation tests on Kepler.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
radeon_fbo.c: In function 'radeon_map_renderbuffer_s8z24':
radeon_fbo.c:162:9: warning: variable 'ret' set but not used [-Wunused-but-set-variable]
int ret;
^
radeon_fbo.c: In function 'radeon_map_renderbuffer_z16':
radeon_fbo.c:200:9: warning: variable 'ret' set but not used [-Wunused-but-set-variable]
int ret;
^
radeon_fbo.c: In function 'radeon_map_renderbuffer':
radeon_fbo.c:242:8: warning: variable 'ret' set but not used [-Wunused-but-set-variable]
int ret;
^
radeon_fbo.c: In function 'radeon_unmap_renderbuffer':
radeon_fbo.c:419:14: warning: variable 'ok' set but not used [-Wunused-but-set-variable]
GLboolean ok;
^
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Fix many instances of:
main/shaderapi.c: In function '_mesa_GetSubroutineUniformLocation':
main/shaderapi.c:2176:7: warning: format not a string literal and no format arguments [-Wformat-security]
_mesa_error(ctx, GL_INVALID_OPERATION, api_name);
^
Ideally, many of these error messages should be improved to indicate
which argument is incorrect as we do in other parts of Mesa.
Reviewed-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Tested-by: Kai Wasserbäch <kai@dev.carbon-project.org>
When we're error-checking the target, we also need to check if the
corresponding extension is supported.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The previous fix added GL_TEXTURE_CUBE_MAP_ARRAY but we also need
to support GL_TEXTURE_CUBE_MAP (via DSA).
So in the end, GL_TEXTURE_3D is the only (legal) target for
glCompressedTex*SubImage3D() which needs additional compression
format checking. GL_TEXTURE_2D_ARRAY, GL_TEXTURE_CUBE_MAP_ARRAY
and GL_TEXTURE_CUBE_MAP are basically 2D images which support all
compressed formats.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
This doesn't provide much value since it's all done. The qbo interaction
is fairly trivial.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
This extension is about setting expectation on GL4.1 implementations
rather than actually enforcing things. So once you support GLSL 410
then you support this in theory.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This just adds some missing pieces to nir/i965,
it is lightly tested on my Haswell.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This moves the width/height/depth == 0 check to the front and avoids
doing any other checking when that is the case.
Also moves the dimensions check after the format/type checks so that we
don't bail out with success on a width/height/depth == 0 request when
the format/type don't match.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91425
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
The current message makes it seem like the zoffset is invalid.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
Basically, two different target error checks are chained consecutively, and the
second one is executed regardless the result of the first one. This produces an
incorrect error if the first check fails but is overrided by the second.
This patch conditions the execution of the second check to a successful pass of
the first one.
Fixes 1 dEQP test:
* dEQP-GLES3.functional.negative_api.texture.compressedtexsubimage3d
Reviewed-by: Laura Ekstrand <laura@jlekstrand.net>
All LLVM API calls that require an ostream object have been removed from
the disassemble() function, so we don't need to use this class to wrap
_debug_printf() we can just call this function directly.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Apparently a multi-word load can potentially overwrite the indirect
sources, so make sure that RA picks different registers for those.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Uncomment the various functionality that was already there and add in
obvious missing bits that parallel vp/gp/fp functionality.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
since this touches drivers, only enable it on gallium
for now for drivers reporting GLSL 1.30 or above.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Just add support for the subroutine type to the
glsl->tgsi convertor.
v1.1: add subroutine to int support.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This fleshes out the APIs, using the program resource
APIs where they should match.
It also sets the default values to valid subroutines.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Add support for the subroutine uniform type ir->mesa.cpp
v1.1: add subroutine to int to switch
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This fleshes out the ARB_program_query support for the
APIs that ARB_shader_subroutine introduces, leaving
some TODOs for later addition.
v2: reworked for lots of the ARB_program_interface_query
entry points and tests
v3: use common function to test for subroutine support
v3.1: add tess, fix missing breaks
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds linker support for subroutine uniforms, they
have some subtle differences from real uniforms, we also hide
them and they are given internal uniform names.
This also adds the subroutine locations and subroutine uniforms
to the program resource tracking for later use.
v1.1: drop is_subroutine_def
v2: handle explicit location properly, ARB_explicit_location
has a lot of language for subroutine shaders.
Calculate a link time the number of compatible subroutines
for a uniform, to make program resource easier later.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds the necessary storage for subroutine info to gl_shader.
v2: add comments, rename one member
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This lowers the enhanced ir_call using the lookaside table
of subroutines into an if ladder. This initially was done
at the AST level but it caused some ordering issues so a separate
pass was required.
v2: clone return value derefs.
v2.1: update for subroutine->int convert.
v2.2: add a clone for the array index
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is the guts of the GLSL parser and AST support for
shader subroutines.
The code creates a subroutine type in the parser, and
uses that there to validate the identifiers. The parser
also distinguishes between subroutine types/function prototypes
/uniforms and subroutine defintions for functions.
Then in the AST conversion it recreates the types, and
stores the subroutine definition info or subroutine info
into the ir_function along with a side lookup table in
the parser state. It also converts subroutine calls into
the enhanced ir_call.
v2: move to handling method calls in
function handling not in field selection.
v3: merge Chris's previous parser patches in here, to
make it clearer what's changed in one place.
v3.1: add more documentation, drop unused include
v3.2: drop is_subroutine_def
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds a ir_variable which contains the subroutine uniform
and an array rvalue for the deref of that uniform, these
are stored in the ir_call and lowered later.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Signed-off-by: Dave Airlie <airlied@redhat.com>
We need to store two sets of info into the ir_function,
if this is a function definition with a subroutine list
(subroutine_def) or if it a subroutine prototype.
v1.1: add some more documentation.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This checks if core profile and shader subroutine extension
is enabled.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This handles converting the shader stages to the internal
prefix along with the program resource interfaces.
v2: add tess support
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This stops dead code from removing subroutines types,
we need these for the queries to work properly.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This type will be used to store the name of subroutine types
as in subroutine void myfunc(void);
will store myfunc into a subroutine type.
This is required to the parser can identify a subroutine
type in a uniform decleration as a valid type, and also for
looking up the type later.
Also add contains_subroutine method.
v2: handle subroutine to int comparisons, needed
for lowering pass.
v3: do subroutine to int with it's own IR
operation to avoid hacking on asserts (Kayden)
v3.1: fix warnings in this patch, fix nir,
fix tgsi
v3.2: fixup tests
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Signed-off-by: Dave Airlie <airlied@redhat.com>
tests: fix warnings
Add the shader subroutine to the core only API list,
and fixup dispatch_sanity to suit.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is mainly needed for tessellation where a VS can be bound as VS, ES,
or LS, and TES (tess. evaluationshader) can be bound as VS or ES or neither.
Therefore we need the ability to move pointers to descriptors between
shaders arbitrarily.
The idea is that the context has a mapping from PIPE_SHADER_x to
SPI_SHADER_USER_DATA_x. After a shader is enabled or disabled,
si_shader_change_notify should be called to update this mapping accordingly.
There is a dirty flag for each shader pointer, but only one emit function
for all pointers in the whole context, whose code and logic is separated
from descriptors.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
emit_store will be reimplemented for tessellation control shader outputs
where only radeon_llvm_saturate will be used, but radeonsi will want to
fall back to radeon_llvm_emit_store for other register types.
This exposes both functions.
The idea is to allow 32 normal varyings and 32 patch varyings,
a total of 64. Previously, only a total of 32 was allowed.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
With the exception of always-taken switch cases (which are
indistinguishable from straight line code in our IR), this
disallows use of the builtin barrier() function in all the
places it may not appear.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tessellation control outputs can be read in directly without first
having been written. Accessing these will require some special logic
anyways, so just let them through.
V2: Never lower tess control output reads, whether patch or not -- both
can be read back by other threads.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is technically not needed, but it makes the compiler return a better
error message if tessellation is used with GLSL < 1.50.
Instead of:
error: syntax error, unexpected NEW_IDENTIFIER, expecting $end
It returns:
error: #version 150 layout qualifier `triangles' used
And the tessellation spec says:
OpenGL 3.2 and GLSL 1.50 are required.
So it makes perfect sense.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
There is no way to lower them, because the array sizes are unknown
at compile time.
Based on a patch from: Fabian Bieler <fabianbieler@fastmail.fm>
v2: add comments
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
This is to prevent a name conflict in tessellation shaders built-in interface
blocks.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Similar to gl_ClipDistance -> gl_ClipDistanceMESA
v2: - renamed is_mesa_var to lowered_builtin_array_variable
- moved LowerTessLevel into gl_constants
- cosmetic changes in lower_tess_level.cpp
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Marek: require a tess eval shader if a tess control shader is present
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Tessellation dependencies added by Marek.
v2: require tessellation in addition to atomics/images for some glGet queries
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This has been a no-op due to performance concerns. From now on, drivers
should decide when they don't want to unmap, not the winsys.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
buffer_unmap is currently a no-op on radeon and done correctly on amdgpu.
I plan to fix it for radeon, but before that, all occurences of buffer_unmap
that can negatively affect performance in the future must be removed.
There are 2 reasons for removing buffer_unmap calls:
- There is a likelihood that buffer_map will be called again, so we don't
want to unmap yet.
- The buffer is being released, which automatically unmaps it.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
None of the draw states are used here.
This fixes a crash in piglit: ext_framebuffer_blit/blit-early
Calling st_manager_validate_framebuffers is the minimum requirement here.
Cc: mesa-stable@lists.freedesktop.org
One of the plugins I use with vim "helpfully" added an underscore to the
front of mode for kicks.
Obviously this isn't a feature used very often because it's been broken
since d986cb7c70 (since May 20th), and no one has noticed.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
The scons equivalent of the previous commit - just fold the almost
identical driver + main Sconscripts.
Cc: Alexander von Gluck IV <kallisti5@unixzen.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Simplify things by merging the two makefiles. This way we can combine
the duplicated HAVE_PLATFORM_ checks, and build the library without
having a separate static library.
v2: use $() when referencing variables, use correct define (Matt)
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The only user of it (libgbm.la) immediately links it. Just build it
directly into the library.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Support for Windows has been removed for a while now, and virtually
every POSIX compliant system provides strcasecmp, strdup and snprintf.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
It is simply not possible to use the dri backend without shared glapi,
as the alternative provider (libGL) is not always present. We have fixed
the build for a while now, so we can rip this out.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
It has been broken since 2011 with commit c98ea26e16b(egl: Make
egl_dri2 and egl_glx built-in drivers.). When the backends got merged
into the main library each entry point was guarded by a
_EGL_BUILT_IN_DRIVER_* define.
As the define was missing, the linker kindly removed the whole of the
dri2 backend, thus we did not notice any errors due to the unresolved
link to xcb and friends.
Cc: Chia-I Wu <olv@lunarg.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
As of last commit the only user of it (radeon/r200) no longer uses it.
As such let's remove it and cleanup the nasty hacks that we had in place
to support this.
v2: Leave LIBDRM_CFLAGS around.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> (v1)
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
The original code only half considered hyperz as an option. As per
previous commit "major != 2 cannot occur" we can simply things, and
allow users to set the option if they choose to do so.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
As mentioned by Michel Dänzer
"FWIW though, any code which is specific to radeon DRM major version 1
can be removed, because that's the UMS major version."
and Marek Olšák
"major != 2" can't occur. You don't have to check the major version at
all and you can just assume it's always 2."
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Same as previous commit - unused (gbm is not a thing outside the
autotools build).
v2: Remove trailing HAVE_LIBDRM.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
GBM (the only user of kms-dri) is currently not available under Android.
Considering we have no way of testing/using this let's not bother
building it for now.
Cc: Chih-Wei Huang <cwhuang@linux.org.tw>
Cc: Eric Anholt <eric@anholt.net>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Double negatives in English language are normally avoided, plus the
former seems cleaner and more consistent.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
With the dri_interface.h clean of the macro, we can remove the final
only st/dri specific use of the very same.
Seemingly it was incorrectly used, as the build-time presence of dri2 is
not libdrm specific. At run-time, the code is already limited to dri2
use-cases plus returning true, when the extension is not present (or too
old) will likely lead to a crash as one tries to use it shortly after
the dri_with_format() call.
As a side effect this gives us a nice cleanup the builds.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
With the follow up commit we'll remove the __NOT_HAVE_DRM_H macro. As
requested by Ian HAVE_LIBDRM will be used instead, which will lead to
swrast including drm.h when libdrm package is available, even though we
don't need/make use of the header.
As the define is added after the AM_CFLAGS we cannnot use -UHAVE_LIBDRM,
but instead let's just add LIBDRM_CFLAGS. The latter of which will
expand to NULL when the libdrm package is not around.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
These conditionals are used to guard both dri modules and loader(s).
Currently if we try to build the gallium swrast dri module (without glx)
on a system that's missing libdrm the build will fail.
v2: Make sure we assign prior to checking the have_libdrm variable.
Cc: 10.6 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
The type stored in gl_uniform_storage is the type of a single array
element not the array type so size was always 1.
V2: Dont validate sampler units pointing to 0
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
This adds the new glGetTextureSubImage() and
glGetCompressedTextureSubImage() functions. Also update the
dispatch sanity test program.
v2: remove stray brace, move xi:include line in gl_API.xml, fix extension
number typo, s/program/texture/ in xml file.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
1. Reorganize the error checking code.
2. Lay groundwork for getting sub images by passing image offset and
dimensions to the error checking code.
3. Implement _mesa_GetnTexImageARB(), _mesa_GetTexImage() and
_mesa_GetTextureImage() all in terms of get_texture_image().
v2: pass offset/width/height/depth arguments to the error checking
function, avoid using magic width/height/depth values.
v3: remove unused bufSize param to get_texture_image()
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Needed for GL_ARB_get_texture_sub_image. But at this point, the
offsets are always zero and the sizes match the whole texture image.
v2: Fixes, suggestions from Laura Ekstrand:
* Fix calls to ctx->Driver.UnmapTextureImage() to pass the correct
slice value.
* Added comments and assertions to check zoffset+depth<=tex->Depth before
the 'img' loops.
* Added a new zoffset==0 assert in get_tex_memcpy().
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
The new driver hook has x/y/zoffset and width/height/depth parameters
for the new glGetTextureSubImage() function.
The meta code and gallium state tracker are updated to handle the
new parameters.
Callers to Driver.GetTexSubImage() pass in offsets=0 and sizes equal
to the whole texture size.
v2: update i965 driver code, s/GLint/GLsizei/ in GetTexSubImage hook
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Since s3tc works for cube maps and 2D arrays, it should also work for
cube arrays. NVIDIA's driver supports this too. Seems like the spec
should say this.
This is a minor follow-on fix for the commit "mesa: fix up some texture
error checks".
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Include stdarg.h for va_list. Unbreaks the build on OpenBSD:
In file included from mesa/program/dummy_errors.c:24:
../src/mesa/main/errors.h:85: error: expected declaration specifiers or '...' before 'va_list'
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Acked-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Generated by running:
git grep -l INLINE src/gallium/ | xargs sed -i 's/\bINLINE\b/inline/g'
git grep -l INLINE src/mesa/state_tracker/ | xargs sed -i 's/\bINLINE\b/inline/g'
git checkout src/gallium/state_trackers/clover/Doxyfile
and manual edits to
src/gallium/include/pipe/p_compiler.h
src/gallium/README.portability
to remove mentions of the inline define.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Marek Olšák <marek.olsak@amd.com>
This fixes the following piglit test:
ext_transform_feedback-immediate-reuse-uniform-buffer
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This fixes the following piglit test:
ext_transform_feedback-immediate-reuse-uniform-buffer
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
gcc says:
sb/sb_sched.cpp: In member function 'bool r600_sb::alu_group_tracker::try_reserve(r600_sb::alu_node*)':
sb/sb_sched.cpp:492:7: warning: suggest parentheses around operand of '!' or change '&' to '&&' or '!' to '~' [-Wparentheses]
if (!trans & fbs)
It happens to be harmless; if fbs is ever non-zero, it will be VEC_210,
which is 5, so (!trans & 5) == 1 and the branch works as expected. But
logical AND is clearly what was meant.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Adam Jackson <ajax@redhat.com>
Commit c9dbdc0 introduced some dead code which is supposed to be used
once we have Yf/Ys tiling working and performing better. Ken reported
the issue that static analysis tool now shows warnings due to the dead
code. To fix these warnings, this patch reverts the changes made in
commit c9dbdc0.
It'll be better to add the Yf/Ys tiling selection code later, when we
are ready to use it.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
This is essentially the same problem fixed in an earlier patch for
immediates. Setting the stride to zero will be particularly useful
for my future SIMD lowering pass, because we will be able to just
check whether the stride of a source register is zero and skip
emitting the copies required to unzip it in that case.
Instead of setting stride to zero in every caller of emit_uniformize()
I've changed the function to return the result as its return value
(previously it was being written into a caller-provided destination
register), because this way we can enforce that the result is used with
the correct regioning from the function itself.
The changes to the prototype of its VEC4 counterpart are mainly for
the sake of symmetry, VEC4 registers don't have stride.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
This fixes essentially the same problem as for immediates. Registers
of the UNIFORM file are typically accessed according to the formula:
read_uniform(r, channel_index, array_index) =
read_element(r, channel_index * 0 + array_index * 1)
Which matches the general direct addressing formula for stride=0:
read_direct(r, channel_index, array_index) =
read_element(r, channel_index * stride +
array_index * max{1, stride * width})
In either case if reladdr is present the access will be according to
the composition of two register regions, the first one determining the
per-channel array_index used for the second, like:
read_indirect(r, channel_index, array_index) =
read_direct(r, channel_index,
read(r.reladdr, channel_index, array_index))
where:
read(r, channel_index, array_index) = if r.reladdr == NULL
then read_direct(r, channel_index, array_index)
else read_indirect(r, channel_index, array_index)
In conclusion we can handle uniforms consistently with the other
register files if we set stride to zero. After lowering to a GRF
using VARYING_PULL_CONSTANT_LOAD in demote_pull_constant_loads() the
stride of the source is set to one again because the result of
VARYING_PULL_CONSTANT_LOAD is generally non-uniform.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
When the width field was removed from fs_reg the BROADCAST handling
code in opt_algebraic() started to miss a number of trivial
optimization cases resulting in the ugly indirect-addressing sequence
to be emitted unnecessarily for some variable-indexed texturing and
UBO loads regardless of one of the sources of BROADCAST being
immediate. Apparently the reason was that we were setting the stride
field to one for immediates even though they are typically uniform.
Width used to be set to one too which is why this optimization used to
work previously until the "reg.width == 1" check was removed.
The stride field of vector immediates is intentionally left equal to
one, because they are strictly speaking not uniform. The assertion in
fs_generator makes sure that immediates have the expected stride as
consistency check.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
We only consider a vgrf defined by a given block if the block writes to it
unconditionally. So far we have been checking this by testing that the
instruction is not predicated, however, in the case of BRW_OPCODE_SEL,
the predication is used to select the value to write, not to decide if
the write is actually done. The consequence of this was increased life
spans for affected vgrfs, which could lead to additional register pressure.
Since NIR generates selects for conditional writes this was causing massive
register pressure in a handful of piglit and dEQP tests that had a large
number of select operations with the NIR-vec4 backend.
Fixes the following piglit tests with the NIR-vec4 backend:
spec/glsl-1.50/execution/variable-indexing/vs-output-array-vec4-index-wr-before-gs
spec/glsl-1.50/execution/variable-indexing/gs-input-array-vec4-index-rd
spec/glsl-1.50/execution/variable-indexing/vs-output-array-vec2-index-wr-before-gs
spec/glsl-1.50/execution/variable-indexing/vs-output-array-vec3-index-wr-before-gs
spec/glsl-1.50/execution/variable-indexing/vs-output-array-float-index-wr-before-gs
Fixes 80 dEQP tests with the NIR-vec4 backend in the following category:
dEQP-GLES3.functional.ubo.*
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This fixes crashes in llvmpipe with LLVM 3.8 and also some piglit tests
on radeonsi that use the draw module.
This is just a temporary solution. The correct solution will require
creating a TargetMachine during gallivm initialization and pulling the
DataLayout from there. This will be a somewhat invasive change, and it
will need to be validatated on multiple LLVM versions.
https://llvm.org/bugs/show_bug.cgi?id=24172
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
This fixes a compilation warning introduced in commit 05a12c5
(gallium: add interface for writable shader images).
While we are at it, fix indentation and rename parameters according to
the gallium interface.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Only constbufs must be aligned to 0x100, but since all buffers can be
rebinded as constant buffers they must be also aligned.
This patch prevents this behaviour by aligning everything to 256-byte
increments at buffer creation.
This fixes dmesg fails for the following piglit test:
ext_transform_feedback-immediate-reuse-uniform-buffer -auto -fbo
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
NV50_3D_BIND_TSC only allows to bind 16 samplers, and since we don't
want to do anything with NV50_3D_BIND_TSC2, just limit the maximum
number of samplers to 16 like for nvc0.
This fixes dmesg fails with the following piglit test:
max-samplers
But the test still fails.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
As functions are inlined, and nir_lower_global_vars_to_local gets
run, all global variables are lowered to local variables.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It appears that the G80 did not have support for the sampler view
first/last clamping. Put the view's last level in the place of the
texture's so that it doesn't go past what the sampler view allows.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
There's no need to deal with samplers for texture size queries. That
code also was accidentally setting an invalid sIndirectSrc position, but
it can now just be removed.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
Switch off hardware-generated binding tables and gather push
constants in the blorp. Blorp requires only a minimal set of
simple constants. There is no need for the extra complexity
to program a gather table entry into the pipeline.
Cc: kenneth@whitecape.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
When hardware-generated binding tables are enabled, use the hw-generated
binding table format when uploading binding table state.
Normally, the CS will will just consume the binding table pointer commands
as pipelined state. When the RS is enabled however, the RS flushes whatever
edited surface state entries of our on-chip binding table to the binding
table pool before passing the command on to the CS.
Note that the the binding table pointer offset is relative to the binding table
pool base address when resource streamer instead of the surface state base address.
v2: Fix possible buffer overflow when allocating a chunk out of the
hw-binding table pool (Ken).
v3: Remove extra newline and add missing brace around if-statement (Matt).
v4: Fix broken INTEL_DEBUG=shader_time for hw-generated binding tables.
Document PRM WaStateBindingTableOverfetch workaround.
Cc: kenneth@whitecape.org
Cc: mattst88@gmail.com
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Unlike normal software binding tables where the driver has to manually
generate and fill a binding table array which are then uploaded to the
hardware, the resource streamer instead presents the driver with an option
to fill out slots for individual binding table indices. The hardware
accumulates the state for these combined edits which it then automatically
flushes to a binding table pool when the binding table pointer state
command is invoked.
v2: Clarify binding table edit bit aligment (Topi).
v3: Make comments and function names more clearer (Ken).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
This patch implements the binding table enable command which is also
used to allocate a binding table pool where where hardware-generated
binding table entries are flushed into. Each binding table offset in
the binding table pool is unique per each shader stage that are
enabled within a batch.
Also insert the required brw_tracked_state objects to enable
hw-generated binding tables in normal render path.
v2: - Use MOCS in binding table pool alloc for GEN8
- Fix spurious offset when allocating binding table pool entry
and start from zero instead.
v3: - Include GEN8 fix for spurious offset above.
v4: - Fixup wrong packet length in enable/disable hw-binding table
for GEN8 (Ville).
- Don't invoke HW-binding table disable command when we dont
have resource streamer (Chris).
v5: - Reorder the state cache invalidate flush so it happens in-between
enabling hw-generated binding tables and the previous sw-binding
table GPU state (Chris).
v6: - Do the same fix in v5 for gen7_disable_hw_binding_tables().
- Adhere to coding guidelines and make comments more informative.
Cc: kenneth@whitecape.org
Cc: syrjala@sci.fi
Cc: chris@chris-wilson.co.uk
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Check first if the hardware and kernel supports resource streamer. If this
is allowed, tell the kernel to enable the resource streamer enable bit on
MI_BATCHBUFFER_START by specifying I915_EXEC_RESOURCE_STREAMER
execbuffer flags.
v2: - Use new I915_PARAM_HAS_RESOURCE_STREAMER ioctl to check if kernel
supports RS (Ken).
- Add brw_device_info::has_resource_streamer and toggle it for
Haswell, Broadwell, Cherryview, Skylake, and Broxton (Ken).
v3: - Update I915_PARAM_HAS_RESOURCE_STREAMER to match updated kernel.
v4: - Always inspect the getparam.value (Chris Wilson).
v5: - Fold redundant devinfo->has_resource_streamer check in context create
into init screen.
Cc: kenneth@whitecape.org
Cc: chris@chris-wilson.co.uk
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
This gives the kernel a chance to validate and lock down the data,
without having to deal with mmap zapping.
With this, GLBenchmark stops on a texture relocations, because we'd
recycled a shader BO as another shader and failed to revalidate, since we
weren't clearing the cached validation state on mmap faults.
In particular, we were incorrectly accepting s3tc (and lots of others)
for CompressedTexSubImage3D (but not CompressedTexImage3D) calls with 3d
targets. At this time, the only allowed formats for these calls are the
bptc ones, since none of the specific extensions allow it (astc hdr would).
Also, fix up a bug in _mesa_target_can_be_compressed - 3d target needs to
be allowed for bptc formats.
Reviewed-by: Brian Paul <brianp@vmware.com>
On a release build, this makes the rest of vc4_qpu_validate.c go away
(the compiler didn't know that our qpu helper function calls had no
side effects).
These are really useful hints to the compiler in the absence of link-time
optimization, and I'm going to use them in VC4.
I've made the const attribute be ATTRIBUTE_CONST unlike other function
attributes, because we have other things in the tree #defining CONST for
their own unrelated purposes.
v2: Alphabetize.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
Before, we were setting payload_last_use_ip for unused payload
registers to 0, which made them interfere with whatever the first
instruction wrote to due to the workaround for SIMD16 uniform arguments.
Just use -1 to mean "unused" instead, and then skip setting any
interferences for unused payload registers.
instructions in affected programs: 0 -> 0
helped: 0
HURT: 0
GAINED: 1
LOST: 0
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
regs_read() will handle LINTERP for us since the previous commit. In
addition, we were being too conservative, since it will only read 2
registers on SIMD8.
instructions in affected programs: 9061 -> 8893 (-1.85%)
helped: 10
HURT: 0
GAINED: 0
LOST: 0
All of the changes were due to spills being eliminated, mostly in KSP
shaders.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
If a source register in the push constant registers uses more than one
register, then we wouldn't update payload_last_use_ip for subsequent
registers.
Unlike most uniform data pushed into registers, the CS gl_LocalInvocationID
data varies per execution channel. Therefore for SIMD16 mode, we have vec16
data in the payload. In this case we then need to mark 2 registers in
payload_last_use_ip as last used by the instruction. There's a similar
situation for the z and w coordinates of gl_FragCoord for fragment shaders,
where it had only happened to work before because of some bogus interferences
which the next commit removes.
(Connor: added bit about gl_FragCoord to commit message)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Connor Abbott <connor.w.abbott@intel.com>
This prevents an assertion failure in brw_fs_live_variables.cpp,
fs_live_variables::setup_one_write: Assertion `var < num_vars' failed.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This prevents an assertion failure in brw_fs_live_variables.cpp,
fs_live_variables::setup_one_read: Assertion `var < num_vars' failed.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
A fragment program from "Pixel Piracy" contains redundant OPTION
directives:
!!ARBfp1.0
OPTION ARB_precision_hint_fastest;
OPTION ARB_fog_exp2;
OPTION ARB_precision_hint_fastest;
OPTION ARB_fog_exp2;
...
We already allow redundant ARB_precision_hint_fastest directives, but
disallow the redundant (yet consistent) ARB_fog_exp2 directives, failing
to compile the program.
The specification seems to contradict itself - the main text says that
only one fog application option may be specified, but then backpedals,
indicating the intent is to disallow /contradictory/ flags. One of the
issues suggests that specifying contradictory ones is stupid, but
allowed, and only the last one should take effect.
Accepting multiple redundant (but consistent) directives seems harmless,
and like a reasonable interpretation of the specification. It also
fixes a fragment program found in the wild.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
With the last few patches a way was provided to influence lower layer miptree
layout and allocation decisions via flags (replacing bools). For simplicity, I
chose not to touch the tiling requests because the change was slightly less
mechanical than replacing the bools.
The goal is to organize the code so we can continue to add new parameters and
tiling types while minimizing risk to the existing code, and not having to
constantly add new function parameters.
v2: Rebased on Anuj's recent Yf/Ys changes
Fix non-msrt MCS allocation (was only happening in gen8 case before)
v3: small fix in assertion requested by Chad
v4: Use parens to get the order right from v3.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
With the last few patches a way was provided to influence lower layer miptree
layout and allocation decisions via flags (replacing bools). For simplicity, I
chose not to touch the tiling requests because the change was slightly less
mechanical than replacing the bools.
The goal is to organize the code so we can continue to add new parameters and
tiling types while minimizing risk to the existing code, and not having to
constantly add new function parameters.
v2: Rebased on Anuj's recent Yf/Ys changes
Fix non-msrt MCS allocation (was only happening in gen8 case before)
v3: small fix in assertion requested by Chad
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> (v2)
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> (v2)
Reviewed-by: Chad Versace <chad.versace@intel.com> (v2)
This in principle simple calculation was being open-coded in a number
of places (in a series I haven't yet sent for review there will be a
couple more), all of them were subtly broken in one way or another:
None of them were handling the HW_REG case correctly as pointed out by
Connor, and fs_inst::regs_read() was handling the stride=0 case rather
naively. This patch solves both problems and factors out the
calculation as a new fs_reg method.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This gets rid of two no16() fall-backs and should allow better
scheduling of the generated IR. There are no uses of usubBorrow() or
uaddCarry() in shader-db so no changes are expected. However the
"arb_gpu_shader5/execution/built-in-functions/fs-usubBorrow" and
"arb_gpu_shader5/execution/built-in-functions/fs-uaddCarry" piglit
tests go from 40 to 28 instructions. The reason is that the plain ADD
instruction can easily be CSE'ed with the original addition, and the
b2i negation can easily be propagated into the source modifier of
another instruction, so effectively both operations are performed with
just one instruction.
v2: Rely on carry_to_arith() and borrow_to_arith() to lower these
(Ilia Mirkin).
Reviewed-by: Matt Turner <mattst88@gmail.com>
Booleans are represented as 0/-1 on modern hardware which means we can
just negate them to convert them into a numeric type. Negation has
the benefit that it can be implemented using a source modifier which
can easily be propagated into some other instruction. shader-db
results on HSW:
total instructions in shared programs: 6349082 -> 6346693 (-0.04%)
instructions in affected programs: 40948 -> 38559 (-5.83%)
helped: 123
HURT: 1
GAINED: 1
LOST: 0
Reviewed-by: Matt Turner <mattst88@gmail.com>
PIPE_CAPs and TGSI support will be added later. The TGSI support should be
straightforward. We only need to split TGSI_FILE_RESOURCE into TGSI_FILE_IMAGE
and TGSI_FILE_BUFFER, though duplicating all opcodes shouldn't be necessary.
The idea is:
* ARB_shader_image_load_store should use set_shader_images.
* ARB_shader_storage_buffer_object should use set_shader_buffers(slots 0..M-1)
if M shader storage buffers are supported.
* ARB_shader_atomic_counters should use set_shader_buffers(slots M..N)
if N-M+1 atomic counter buffers are supported.
PIPE_CAPs can describe various constraints for early DX11 hardware.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Instead of relying on hardware defaults the i915 kernel driver is
going program custom MOCS tables system-wide on Gen9 hardware. The
"WT" entry previously used for renderbuffers had a number of problems:
It disabled caching on eLLC, it used a reserved L3 cacheability
setting, and it used to override the PTE controls making renderbuffers
always WT on LLC regardless of the kernel's setting. Instead use an
entry from the new MOCS tables with parameters: TC=LLC/eLLC, LeCC=PTE,
L3CC=WB.
The "WB" entry previously used for anything other than renderbuffers
has moved to a different index in the new MOCS tables but it should
have the same caching semantics as the old entry.
Even though the corresponding kernel change ("drm/i915: Added
Programming of the MOCS") is in a way an ABI break it doesn't seem
necessary to check that the kernel is recent enough because the change
should only affect Gen9 which is still unreleased hardware.
v2: Update MOCS values for the new Android-incompatible tables
introduced in v7 of the kernel patch.
Cc: 10.6 <mesa-stable@lists.freedesktop.org>
Reference: http://lists.freedesktop.org/archives/intel-gfx/2015-July/071080.html
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
s/build_error/compile_error in order to match the stored OpenCL status code.
Make program::build catch and log every OpenCL error.
Make tgsi error triggering uniform with the llvm one.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
This is done by returning an rvalue of type void in the
ast_function_expression::hir function instead of a void expression.
This produces (in the case of the ternary) an hir with a call
to the void returning function and an assignment of a void variable
which will be optimized out (the assignment) during the optimization
pass.
This fix results in having a valid subexpression in the many
different cases where the subexpressions are functions whose
return values are void.
Thus preventing to dereference NULL in the following cases:
* binary operator
* unary operators
* ternary operator
* comparison operators (except equal and nequal operator)
Equal and nequal had to be handled as a special case because
instead of segfaulting on a forbidden syntax it was now accepting
expressions with a void return value on either (or both) side of
the expression.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=85252
Signed-off-by: Renaud Gaubert <renaud@lse.epita.fr>
Reviewed-by: Gabriel Laskar <gabriel@lse.epita.fr>
Reviewed-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
The formats chosen (both by texture format choser, fbo storage allocation)
are different for big endian not just for rgba8 but also lower bit width
formats (why I don't actually know). Even the function to test for renderable
formats used different formats, however the actual colorbuffer setup did not.
And the blitter did not take that into account neither.
Untested (what could possibly go wrong...).
Same as for r100.
Acked-by: Marek Olšák <marek.olsak@amd.com>
The formats chosen (both by texture format choser, fbo storage allocation)
are different for big endian not just for rgba8 but also lower bit width
formats (why I don't actually know). Even the function to test for renderable
formats used different formats, however the actual colorbuffer setup did not.
And the blitter did not take that into account neither.
Untested (what could possibly go wrong...).
Acked-by: Marek Olšák <marek.olsak@amd.com>
Blit submits lots of packets which are usually handled by state atoms, so
these must be dirtied.
Not sure if this fixes anything, but it was a concern raised by bug 51658
(with this all issues there seen as actual bugs should be fixed, with the
exception of the patch to upload non-used texenv state atoms which I just
don't understand).
Acked-by: Marek Olšák <marek.olsak@amd.com>
It is rather unfortunate that we don't know if a texture is going to be used
as a rt later, and we lack the means to do something about a format chosen
which we can't render to directly, so disable this and always chose renderable
format for rgba8 textures.
This addresses an issue raised on (old) bug,
https://bugs.freedesktop.org/show_bug.cgi?id=51658 with gnome-shell, don't
know if that's still applicable but it might fix other things as well.
Acked-by: Marek Olšák <marek.olsak@amd.com>
Along with fixing the type of pitch parameter, patch also changes
the types of few local variables and function return type.
Warnings fixed are:
intel_mipmap_tree.c:671:7: warning: passing argument 3 of
'intel_get_yf_ys_bo_size' from incompatible pointer type
intel_mipmap_tree.c:563:1: note: expected 'uint64_t *' but
argument is of type 'long unsigned int *'
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously OUT_BATCH was just a macro around an inline function which
does
brw->batch.map[brw->batch.used++] = dword;
When making consecutive calls to intel_batchbuffer_emit_dword() the
compiler isn't able to recognize that we're writing consecutive memory
locations or that it doesn't need to write batch.used back to memory
each time.
We can avoid both of these problems by making a local pointer to the
next location in the batch in BEGIN_BATCH().
Cuts 18k from the .text size.
text data bss dec hex filename
4946956 195152 26192 5168300 4edcac i965_dri.so before
4928956 195152 26192 5150300 4e965c i965_dri.so after
This series (including commit c0433948) improves performance of Synmark
OglBatch7 by 8.01389% +/- 0.63922% (n=83) on Ivybridge.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
So that everything writing to the batch between BEGIN_BATCH() and
ADVANCE_BATCH() goes through OUT_BATCH.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
BEGIN_BATCH() and ADVANCE_BATCH() will contain "do {" and "} while (0)"
respectively to allow declaring local variables used by intervening
OUT_BATCH macros. As such, BEGIN_BATCH() and ADVANCE_BATCH() need to be
in the same control flow.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
This field should always be set for gen8. In the bdw PRM, Volume 2d:
Command Reference: Structures under INTERFACE_DESCRIPTOR_DATA, DWORD
6, Bits 9:0, Number of Threads in GPGPU Thread Group:
"This field should not be set to 0 even if the barrier is disabled,
since an accurate value is needed for proper pre-emption."
In the HSW PRM, the it doesn't mention that it must always be set, but
it should not hurt.
Reported-by: Kristian Høgsberg <krh@bitplanet.net>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
I needed to rewrite this a bit for safety checking in the next commit.
Despite being a static inline of the same thing that was being done, we
lose 36 bytes of code for some reason.
Now that RCL generation is in the kernel, we don't have any other
callers. Oddly, the compiler generates another 8 bytes of code for
this, but the simplification is worth it.
Now that we don't resize the CL as we build (it's set up at the top by
vc4_start_draw()), we can store the pointers instead of offsets from
the base. Saves a bit of math in emitting relocs (about 60 bytes of
code).
Extend the existing lower_ubo_reference pass to also detect SSBO loads
and lower them to __intrinsic_load_ssbo intrinsics.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Extend the existing lower_ubo_reference pass to also detect SSBO writes
and lower them to __intrinsic_store_ssbo intrinsics.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Since the backing storage for these is shared we cannot ensure that
the value won't change by writes from other threads. Normally SSBO
accesses are not guaranteed to be syncronized with other threads,
except when memoryBarrier is used. So, we might be able to optimize
some SSBO accesses, but for now we always take the safe path and emit
the SSBO access.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Since the backing storage for these is shared we cannot ensure that
the value won't change by writes from other threads. Normally SSBO
accesses are not guaranteed to be syncronized with other threads,
except when memoryBarrier is used. So, we might be able to optimize
some SSBO accesses, but for now we always take the safe path and emit
the SSBO access.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Since the backing storage for these is shared we cannot ensure that
the value won't change by writes from other threads. Normally SSBO
accesses are not guaranteed to be syncronized with other threads,
except when memoryBarrier is used. So, we might be able to optimize
some SSBO accesses, but for now we always take the safe path and emit
the SSBO access.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
If we kill dead assignments we lose the buffer writes.
Also, we never kill UBO declarations even if they are never referenced
by the shader, they are always considered active. Although the spec
does not seem say this specifically for SSBOs, it is probably implied
since SSBOs are pretty much the same as UBOs, only that you can write
to them.
v2:
- Fix the comment (Jordan)
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Due to GL_ARB_shader_storage_buffer_object extension, shader storage blocks
have the same limitations as uniform blocks.
This patch fixes the corresponding error messages.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Section 4.3.7 "Buffer Variables", GLSL 4.30 spec:
"Buffer variables may only be declared inside interface blocks
(section 4.3.9 “Interface Blocks”), which are then referred to as
shader storage blocks. It is a compile-time error to declare buffer
variables at global scope (outside a block)."
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
See GLSL 4.30 spec, section 4.4.5 "Uniform and Shader Storage Block
Layout Qualifiers".
v2:
- Add whitespace in an error message. Delete period '.' at the end of that
error message (Jordan).
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This includes the array of bindings, the current buffer bound to the
GL_SHADER_STORAGE_BUFFER target and a set of general limits and default
values for shader storage buffers.
v2:
- Use spec values for the new defined constants (Jordan)
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This is used to identify shader storage buffer interface blocks where
buffer variables are declared.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This will be used to identify buffer variables inside shader storage
buffer objects, which are very similar to uniforms except for a few
differences, most important of which is that they are writable.
Since buffer variables are so similar to uniforms, we will almost always
want them to go through the same paths as uniforms.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
The util/hash_table was intended to be a fast hash table
replacement for the program/hash_table see 35fd61bd99 and 72e55bb688.
This change replaces some more uses of the old hash table.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Most of the data stored(duplicated) was unused, and for the one that is
follow the approach set by other drivers.
This eliminates the use of legacy (dri1) types.
Cc: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
It was only useful for st/egl, although I've never got to merging the
pipe-loader and inline-helpers before it was removed. There are no users
for it ATM.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Do not iterate and (attempt to) open the render device, if we're over
the requested number of devices.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Render nodes have been around for quite some time. Removing support via
the master/primary node allows us to clean up the conditional
compilation and simplify the build greatly.
For example currently we the pipe-loader, which explicitly links against
xcb and friends (for X auth) if found at compile-time. That
would cause problems as one will be forced to use X/xcb, even if it's a
headless system that is used for opencl.
v2: Clarify the linking topic in the commit message.
Cc: Tom Stellard <thomas.stellard@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
This adds the translation from TGSI to AMDGPU llvm backend, for the
64-bit opcodes. The backend pretty much handles everything for us
fine. There is one patch required for SI DFRAC support, that I know
off.
[airlied: fixed missing comma, updated relnotes]
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
temp_reg needs to be last, as we increment things
away from it, otherwise on cayman some tests were overwriting
the index regs.
Fixes 2 piglit with ARB_gpu_shader5 forced on cayman.
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Cayman needs a different method to upload the CF IDX0/1
This fixes 31 piglits when ARB_gpu_shader5 is forced on
with cayman.
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
When binding a layered texture, the layer is already 0. There's no need
to special case this.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This ports over Chris Forbes' equivalent fixes in gen7_misc_state.c
from commit 77d55ef481.
No Piglit changes on Sandybridge.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Color clears can be performed via two separate shaders - one is the
generic "meta clear" shader (in meta.c); the other is the i965 specific
"repclear" shader (in brw_meta_fast_clear.c).
Giving them separate names makes them distinguishable when reading
INTEL_DEBUG=shader_time output.
v2: Call it "meta repclear", as suggested by Jason.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Paul's original code had emit_control_data_bits() skip the URB write if
vertex_count was 0. This meant wrapping every control data write in a
conditional write.
We accumulate control data bits in a single UD (32-bit) register. For
simple shaders that don't emit many vertices, the control data header
will be <= 32-bits long, so we only need to write it once at the end of
the shader.
For shaders with larger headers, we write out batches of control data
bits at EmitVertex(), when (vertex_count * bits_per_vertex) % 32 == 0.
On the first EmitVertex() call, the above expression will evaluate to
true simply because vertex_count == 0. But we want to avoid emitting
the control data bits, because we haven't accumulated 32-bits worth yet.
In other words, the vertex_count != 0 check is really only necessary in
the EmitVertex() batching case, not the end-of-thread case.
This saves a CMP/IF/ENDIF in every shader that uses EndPrimitive() or
multiple streams. The only downside is that a shader which emits no
vertices at all will execute an additional URB write---but such shaders
are pointless and not worth optimizing.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
When the new hash table implementation was added to Mesa it claimed to be much
faster, see commits 35fd61bd99 and 72e55bb688.
The set implementation follows the same implementation strategy so this should
be faster and there was no need to store a data field.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Don't assume that $(top_srcdir)/.git is a directory. It may be a
gitlink file [1] if $(top_srcdir) is a submodule checkout or a linked
worktree [2].
[1] A "gitlink" is a text file that specifies the real location of
the gitdir.
[2] Linked worktrees are a new feature in Git 2.5.
Cc: "10.6, 10.5" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
If we split addr/pred, the original instruction could have originated
from a different block. If we don't fixup the block ptr we hit asserts
later (in debug builds).
NOTE: perhaps we don't want to try to preserve addr/pred reg's across
block boundaries.. this at least needs some thought in case addr/pred
writes end up inside a conditional block..
Signed-off-by: Rob Clark <robclark@freedesktop.org>
The address and predicate register are special, they don't get assigned
in RA. So do a better job of ignoring them rather than hitting later
asserts.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Some, but not all, state trackers will explicitly unref (and set to
NULL) the previous *fence before calling pipe->flush(). So driver
should use fence_ref() which will unref the old fence if not NULL.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Acked-by: Eric Anholt <eric@anholt.net>
Some, but not all, state trackers will explicitly unref (and set to
NULL) the previous *fence before calling pipe->flush(). So driver
should use fence_ref() which will unref the old fence if not NULL.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Acked-by: Chia-I Wu <olvaffe@gmail.com>
Some, but not all, state trackers will explicitly unref (and set to
NULL) the previous *fence before calling pipe->flush(). So driver
should use fence_ref() which will unref the old fence if not NULL.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
XA was never unref'ing last_fence in the various call paths to
pipe->flush(). Add this to xa_context_flush() and update the other
open-coded calls to pipe->flush() to use xa_context_flush() instead.
This fixes a memory leak reported with xf86-video-freedreno.
Reported-by: Nicolas Dechesne <nicolas.dechesne@linaro.org>
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
After tearing it out another level or two, and just passing the key and
vp directly, we can finally remove this struct. It also eliminates a
pointless memcpy() of the key.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
At this point, the brw_vs_compile structure only contains the key and
gl_vertex_program pointer. We may as well pass and store them directly;
it's simpler and more convenient (key-> instead of vs_compile->key...).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Nothing outside of vec4_visitor uses it, so we may as well keep it
internal.
Commit db9c915abc for the vec4 backend.
(The empty class will be going away soon.)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This is more consistent with how we do it in the FS backend, and reduces
a tiny bit of duplication. It'll also allow for a bit more tidying.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This patch makes us only issue the performance warning about register
spilling if we actually spilled registers. We also use scratch space
for indirect addressing and the like.
This is basically commit c51163b0cf for
the vec4 backend.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Jason plumbed this through a while back in the FS backend, but
apparently we were just passing NULL in the vec4 backend.
This patch passes brw in as intended.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Adding new shader stages to a switch statement is less confusing than an
if-else-if ladder where all but the first case are fragment shader
specific (but don't claim to be).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
It's only used inside #ifdef DEBUG. Cuts ~1.7k of .text, and more
importantly prevents a larger code size regression in the next commit
when the .used field is replaced and calculated on demand.
text data bss dec hex filename
4945468 195152 26192 5166812 4ed6dc i965_dri.so before
4943740 195152 26192 5165084 4ed01c i965_dri.so after
And surround the emit and total fields with #ifdef DEBUG to prevent
such mistakes from happening again.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
This patch can cause an infinite recursion if the previous patch titled, "i965:
Track finished batch state" isn't present (backporters take notice).
v2: Sent out the wrong patch originally. This patches switches the order of
flushes, doing the generic flush before the CC_STATE, and the required
workaround flush afterwards
v3: Only perform workaround for render ring
Add text to the BATCH_RESERVE comments
v4 (By Ken): Rebase; update citation to mention PRM and Wa name; combine two
blocks.
http://otc-mesa-ci.jf.intel.com/job/bwidawsk/171/
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We need to check what the 3D pipe is able to handle for the mixer, not what
the decoder is able to decode. This fixes output of resolutions like 720x1280.
Signed-off-by: Christian König <christian.koenig@amd.com>
CC: mesa-stable@lists.freedesktop.org
Before validating vertex arrays we need to check if a VBO is present.
Checking if vb->buffer is not NULL fixes the issue.
Fixes the following piglit test:
gl-3.1-vao-broken-attrib
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
According to nv50, this should be src->ms_y instead of src->ms_x. This
code is here since 2012, so it's probably a typo error which has never
been detected since a long time. I didn't do a full piglit run to check
if it fixes some other weird issues.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
I also created an bug in Khronos 's bugzilla as you suggested:
https://www.khronos.org/bugzilla/show_bug.cgi?id=1356
I'll let you know if I get feedback from this bug or else where.
Patch with updated error messages:
[PATCH] eglplatform: treat __APPLE__ the same way as __unix__ to handle X11 types
CC eglapi.lo
./egldisplay.h:258:19: error: unknown type name 'Display'
_eglGetX11Display(Display *native_display, const EGLint *attrib_list);
eglapi.c:290:4: error: array size is negative
STATIC_ASSERT(sizeof(void*) == sizeof(nativeDisplay));
eglapi.c:291:25: warning: cast to 'void *' from smaller integer type
'EGLNativeDisplayType' (aka 'int') [-Wint-to-void-pointer-cast]
native_display_ptr = (void*) nativeDisplay;
eglapi.c:307:32: error: use of undeclared identifier 'Display'
dpy = _eglGetX11Display((Display*) native_display, attrib_list);
eglapi.c:776:35: error: use of undeclared identifier 'Window'
native_window = (void*) (* (Window*) native_window);
eglapi.c:847:35: error: use of undeclared identifier 'Pixmap'
native_pixmap = (void*) (* (Pixmap*) native_pixmap);
Bugzilla Mesa: https://bugs.freedesktop.org/show_bug.cgi?id=90249
Bugzilla Khronos: https://www.khronos.org/bugzilla/show_bug.cgi?id=1356
Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
This patch and its description are inspired from Jose Fonseca
explanations and suggestions.
With this patch the following logic applies and only if __APPLE__:
When building mesa, GLhandleARB is defined as unsigned long and
at some point casted to GLuint in gl fuction implementations.
These exact points are where these errors and warnings appear.
When building an application GLhandleARB is defined as void*.
Later when calling a gl function, for example glBindAttribLocationARB,
it will be dispatched to _mesa_BindAttribLocation. So internally
void* will be treated as unsigned long which has the same size.
So the same truncation happens when casting it to GLuint.
Same when GLhandleARB appears as return value.
For mesa it will be GLuint -> unsigned long.
For an application it will be GLuint -> unsigned long -> void*.
Note that the value will be preserved when casting back to GLuint.
When GLhandleARB appears as a pointer there are also separate
entry-points, i.e. _mesa_FuncNameARB. So the same logic can
be applied.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=66346
Signed-off-by: Julien Isorce <julien.isorce@gmail.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
With the exception of gen8, the sole user of the workaround bo are for
emitting pipe controls. Move it out of the purview of the batchbuffer
and into the pipecontrol.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Martin Peres <martin.peres@linux.intel.com>
Since there was an ABI break and linking twice against libudev.so.0 and
libudev.so.1 causes the application to quickly crash, we first check if
the application is currently linked against libudev before dlopening a
local handle. However for backwards/forwards compatability, we need to
inspect the application for current linkage against all known versions
first. Not doing so causes a crash when both libraries are present and
so mesa chooses libudev.so.1 but the application was linked against
libudev.so.0.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Emil Velikov:
I'm ever so slightly conserned that RTLD_NOLOAD is not part of the POSIX
standard, thus it's missing on some platforms (*BSD seems ok, while
Solaris, MacOS are not).
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
Move the query for the TIMESTAMP register from context init to the
screen, so that it is only queried once for all contexts.
On 32bit systems, some old kernels trigger a hw bug resulting in the
TIMESTAMP register being shifted and the low 32bits always zero. Detect
this by repeating the read a few times and check the register is
incrementing every 80ns as expected and not stuck on zero (as would be
the case with the buggy kernel/hw.).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Martin Peres <martin.peres@linux.intel.com>
Reviewed-by: Martin Peres <martin.peres@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Matrix vertex attributes have their columns padded out to vec4s, which
I was failing to account for. Scalar NIR expects them to be packed,
however.
Fixes 1256 dEQP tests on Broadwell.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
This allows drivers to report queries in units of microseconds and
have the HUD display "us" (microseconds), "ms" (milliseconds) or "s"
(seconds) on the graph.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Instead of using a boolean 'is bytes' value, use the pipe_driver_query_type
enum type. This will let is add support for time values in the next patch.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This was probably disabled due to a combination of several bugs in the
generator code (fixed earlier in this series) and a misunderstanding
of the hardware spec. The documentation for most control flow
instructions mentions among other restrictions:
"Instruction compression is not allowed."
This however doesn't have any implications on 16 wide not being
supported, because none of the control flow instructions have
multi-register operands (control flow instructions are not compressed
on more recent hardware either, except maybe SNB's IF with inline
compare). In fact Gen4-5 had 16-wide control flow masks and stacks,
and the spec mentions in several places that control flow instructions
push and pop 16 channels worth of data -- Otherwise there doesn't seem
to be any indication that it shouldn't work.
Causes no piglit regressions, and gives the following shader-db
results on ILK:
total instructions in shared programs: 4711384 -> 4711384 (0.00%)
instructions in affected programs: 0 -> 0
helped: 0
HURT: 0
GAINED: 1215
LOST: 0
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
From the hardware docs for the DO instruction:
"Execution size is ignored for this instruction."
My observation on ILK hardware contradicts the spec though, channels
over the execution size of a DO instruction won't enter the loop, and
channels over the execution size of a WHILE instruction will exit the
loop after the first iteration -- The latter is consistent with the
spec though, there's no claim about the execution size being ignored
for the WHILE instruction so it's not completely unexpected that it
has an influence on the evaluation of EMask.
The execute_size argument of brw_DO() shouldn't have any effect on
Gen6 and newer hardware. On Gen4-5 WHILE instructions inherit the
execution size from the matching DO, so this patch should fix them
too. The execution size of BREAK and CONT instructions was already
being set correctly.
Fixes some 50 piglit tests on Gen4-5 when forced to run shaders with
conditional and loop instructions 16-wide,
e.g. shaders/glsl-fs-continue-inside-do-while.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The hardware docs don't mention explicitly what these fields should
be, but I've verified experimentally on ILK that using a GRF as
destination causes the register to be corrupted when the execution
size of an ENDIF instruction is higher than 8 -- and because the
destination we were using was g0, eventually a hang.
Fixes some 150 piglit tests on Gen4-5 when forced to run shaders with
if conditionals 16-wide, e.g. shaders/glsl-fs-sampler-numbering-3.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This eliminates the error prone logic in si_shader_vs recalculating this
value.
It also fixes TGSI_SEMANTIC_CLIPDIST outputs incorrectly not being
counted for VS exports. They need to be counted because they are passed
to the pixel shader as parameters as well.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91193
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The expansion should always be to the same width as the input arguments
no matter what, since these functions should work with any bit width of
the arguments (the sext is a no-op on any sane simd architecture).
Thus, fix the caller expecting differently.
This fixes https://bugs.freedesktop.org/show_bug.cgi?id=91222
Tested-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
In the kernel, this is called __must_check; all our attribute macros in
Mesa appear to be uppercase, so I went with that.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matt Turner <mattst88@gmail.com>
In this bit of code point_five can be NULL if the expression is not a
constant. This fixes it to match the pattern of the rest of the chunk
of code so that it checks for NULLs.
Cc: Matt Turner <mattst88@gmail.com>
Cc: "10.6" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
There is a piece of code that is trying to match expressions of the
form (mul (floor (add (abs x) 0.5) (sign x))). However the check for
the add expression wasn't checking whether it had the expected
operation. It looks like this was just an oversight because it doesn't
match the pattern for the rest of the code snippet. The existing line
to check whether add_expr!=NULL was added as part of a coverity fix in
3384179f.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91226
Cc: Matt Turner <mattst88@gmail.com>
Cc: "10.6" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Ben noticed that I said each PIPE_CONTROL was 4 DWords, but it's
actually 5 DWords on Gen6-7. We've been reserving insufficient space
for performance monitoring on Sandybridge, which means it would likely
break if you used that functionality. (Thankfully, no one does...)
Also, the existing number of 146 was the result of me flubbing up the
arithmetic: it should have actually been 140.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
On Gen9+ there is a new bit in 3DSTATE_PS_EXTRA that must be set if
the shader sends a message to the pixel interpolator. This fixes the
interpolateAt* tests on SKL, apart from interpolateatsample-nonconst
but that is not implemented anywhere so it's not a regression.
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "10.6 10.5" <mesa-stable@lists.freedesktop.org>
Absolute timeouts are used with the amdgpu kernel driver.
It also makes waiting for several variables and fences at the same time
easier (the timeout doesn't have to be recalculated after every wait call).
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
This will be used by radeon and amdgpu winsyses.
Copied from the amdgpu winsys.
v2: use volatile and p_atomic_read
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Assigns a new array type based on the max access of
unsized array members. This is to support arrays of arrays.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
The first argument to UCMP needs to be compared against 0, but the
latter arguments are treated as float and need to be able to properly
apply neg/abs arguments. Adjust the inferSrcType function accordingly.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
After c61bc6e ("util: port _mesa_strto[df] to C"), "make check"
fails due to a missing _mesa_locale_init. Fixup this oversight,
by moving the stand-alone compiler initializer inside
initialize_context_to_defaults().
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
Same problem and fix as for nouveau's ZaphodHeads trouble.
See patch ...
"nouveau: Use dup fd as key in drm-winsys hash table to fix ZaphodHeads."
... for reference.
Cc: "10.3 10.4 10.5 10.6" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
If an instruction using address register value gets eliminated, we need
to remove it from the indirects list, otherwise it causes mayhem in
sched for scheduling address register usage.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
A handful of fixes and cleanups:
1) If we split addr/pred, we need the newly created instruction to
end up in the unscheduled_list
2) Avoid scheduling a write to the address register if there is no
instruction using the address register that is otherwise ready
to schedule. Note that I currently don't bother with the same
logic for predicate register, since the only instructions using
predicate (br/kill) don't take any other src registers, so this
situation should not arise.
3) few other cosmetic cleanups
Signed-off-by: Rob Clark <robclark@freedesktop.org>
cp would update instr->address but not update the indirects array
resulting in sched getting confused when it had to 'spill' the address
register. Add an ir3_instr_set_address() helper to set instr->address
and also update ir->indirects, and update all places that were writing
instr->address to use helper instead.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
We need to distinguish a shader that has separate writes to each MRT
from one which is supposed to write the data from MRT 0 to all the MRTs.
In TGSI this is done with a property. NIR doesn't have that, so encode
it as a funny location and decode on the other end.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Eric Anholt <eric@anholt.net>
There was a comment saying that in SIMD16 mode the pixel interpolator
returns coords interleaved 8 channels at a time and that this requires
extra work to support. However, this interleaved format is exactly
what the PLN instruction requires so I don't think anything needs to
be done to support it apart from removing the line to disable it and
to ensure that the message lengths for the send message are correct.
I am more convinced that this is correct because as it says in the
comment this interleaved output is identical to what is given in the
thread payload. The code generated to apply the plane equation to
these coordinates is identical on SIMD16 and SIMD8 except that the
dispatch width is larger which implies no special unmangling is
needed.
Perhaps the confusion stems from the fact that the description of the
PLN instruction in the IVB PRM seems to imply that the src1 inputs are
not interleaved so it wouldn't work. However, in the HSW and BDW PRMs,
the pseudo-code is different and looks like it expects the interleaved
format. Mesa doesn't seem to generate different code on IVB to
uninterleave the payload registers and everything is working so I can
only assume that the PRM is wrong.
I tested the interpolateAt tests on HSW and did a full Piglit run on
IVB on there were no regressions.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
When there are no color buffer render targets, gen6 and gen7 still
use the first BLEND_STATE element to determine alpha test.
gen6_upload_blend_state was allocating zero elements when
ctx->Color.AlphaEnabled was false.
That left _3DSTATE_CC_STATE_POINTERS or _3DSTATE_BLEND_STATE_POINTERS
pointing to random data from some previous brw_state_batch().
That sometimes suppressed depth rendering when those bits
happened to mean COMPAREFUNC_NEVER.
This produced flickering shadows for dota2 reborn.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=80500
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These checks were in Mesa prior to commit fbba25bba, but they were
not necessary for the purpose that Mesa intended (check if we could
resolve ReadPixels via memcpy), so that commit took them away.
Unfortunately, it seems that some Gallium drivers rely on these
checks to make the decision of whether they should fallback to Mesa's
implementation of ReadPixels correctly. Michel Dänzer reported that
the following piglit test would fail on radeonsi after commit
fbba25bba:
spec@ext_texture_integer@fbo_integer_readpixels_sint_uint
This patch puts the checks back in Gallium, where they are needed.
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
In the immediate form, src2 == dst, so it does not need to be emitted.
Otherwise it overlaps with the immediate value's low bits.
Fixes: 09ee907266 (nv50/ir: Fold IMM into MAD)
Cc: "10.6" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Prefer blit-based texture transfers only if the chip has dedicated VRAM
since it would translate to a copy into the same memory on shared-memory
chips.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Reported-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This is required on non-coherent architectures to ensure the value of
the fence is correct at all times. Failure to do this results in the
display freezing for a few seconds every now and then on Tegra.
The NOUVEAU_BO_COHERENT is a no-op for coherent architectures, so behavior
on x86 should not be affected by this patch.
Also bump the required libdrm version to 2.4.62, which introduced this
flag.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Reviewed-by: Martin Peres <martin.peres@free.fr>
Although the horizontal and vertical alignment fields are ignored here,
0 is a reserved value for them and may cause undefined behavior. Change
the default value to an abitrary valid one.
v2: add comment about chosen value (Topi).
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Now that we can create builders with a bigger width than their parent as
long as it's exec_all, we don't need to create the instruction manually.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
This assertion was meant to catch code inadvertently escaping the
control flow jail determined by the group of channel enable signals
selected by some caller, however it seems useful to be able to
increase the default execution size as long as force_writemask_all is
enabled, because force_writemask_all is an explicit indication that
there is no longer a one-to-one correspondence between channels and
SIMD components so the restriction doesn't apply.
In addition reorder the calls to fs_builder::group and ::exec_all in a
couple of places to make sure that we don't temporarily break this
invariant in the future for instructions with exec_size higher than
the dispatch width.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
We want to require different versions for nouveau and nouveau_vieux.
autoconf will only check for NOUVEAU once if both drivers are enabled,
meaning both version checks don't get executed. Rename the nouveau_vieux
one to NVVIEUX to avoid the issue.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Tested-by: Alexandre Courbot <acourbot@nvidia.com>
Tested-by: Martin Peres <martin.peres@free.fr>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Instead of using symbol table, build mask by inspecting IR. This
change is required by further patches to move resource list creation
to happen later when symbol table does not exist anymore.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Martin Peres <martin.peres@linux.intel.com>
The current implementation only moves the joinAt when splitting after
the given instruction, not before it. So if you have a BB with
foo
instr
bar
joinat
and thus with joinAt set, we end up first splitting before instr, at
which point the instr's bb is updated to the new bb. Since that bb
doesn't have a joinAt set (despite containing one), when splitting after
the instr, there is nothing to copy over. Since the joinat will be in
the "split" bb irrespective of whether we're splitting before or after
the instruction, move it over in either case.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91124
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
This adds support for ARB_gpu_shader_fp64 and ARB_vertex_attrib_64bit to
llvmpipe.
Two things that don't mix well are SoA and doubles, see
emit_fetch_double, and emit_store_double_chan in this.
I've also had to split emit_data.chan, to add src_chan,
which can be different for doubles.
It handles indirect double fetches from temps, inputs, constants
and immediates. It doesn't handle double stores to indirects,
however it appears the mesa/st doesn't currently emit these,
it always does UARL/MOV combos, which will work fine.
tested with piglit, no regressions, all the fp64 tests seem to pass.
v2:
switch to using shuffles for fetch/store (Roland)
assert on indirect double stores - mesa/st never emits these (it uses MOV)
fix indirect temp/input/constant/immediates (Roland)
typos/formatting fixes (Roland)
v2.1:
cleanup some long lines, emit_store_double_chan cleanups.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
As of now, the width field is no longer used for anything. The width field
"seemed like a good idea at the time" but is actually entirely redundant
with the instruction's execution size. Initially, it gave us the ability
to easily set the instructions execution size based entirely on register
widths. With the builder, we can easiliy set the sizes explicitly and the
width field doesn't have as much purpose. At this point, it's just
redundant information that can get out of sync so it really needs to go.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Acked-by: Francisco Jerez <currojerez@riseup.net>
There are a variety of places where we use dst.width / 8 to compute the
size of a single logical channel. Instead, we should be using exec_size.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Acked-by: Francisco Jerez <currojerez@riseup.net>
Now that all of the non-explicit constructors are gone, we don't need to
guess anymore.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Acked-by: Francisco Jerez <currojerez@riseup.net>
Previously, we were just depending on register widths to ensure that
various things were exec_size of 1 etc. Now, we do so explicitly using the
builder.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Acked-by: Francisco Jerez <currojerez@riseup.net>
Shortly, offset() will depend on the builder so we need it moved to some
place where it has access to that.
Reviewed-by: Iago Toral Quiroga <itoral@igali.com>
Acked-by: Francisco Jerez <currojerez@riseup.net>
Soon we will start using the builder to explicitly set all the execution
sizes. We could make a 32-wide builder, but the builder asserts that we
never grow it which is usually a reasonable assumption. Since this one
instruction is a bit of an odd-ball, we just set the exec_size explicitly.
v2: Explicitly new the fs_inst instead of using the builder and setting
exec_size after the fact.
v3: Set force_writemask_all with the builder instead of directly. The
builder over-writes it if we set it manually. Also, if we don't have
force_writemask_all in the builder it will assert-fail on SIMD32.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Acked-by: Francisco Jerez <currojerez@riseup.net>
Previously, fs_inst::regs_read() fell back to depending on the register
width for the second source. This isn't really correct since it isn't a
SIMD8 value at all, but a SIMD4x2 value. This commit changes it to
explicitly be always one register.
v2: Use mlen for determining the number of registers read
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Acked-by: Francisco Jerez <currojerez@riseup.net>
Previously, we were allocating the payload with different sizes per gen and
then figuring out the mlen in the generator based on gen. This meant,
among other things, that the higher level passes knew nothing about it.
Acked-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This makes things a little simpler, more efficient, and quite a bit more
readable.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Before, we would lazily emit a MOV whenever we encountered a use of a
constant. Now that we have a dedicated file for SSA values, we can
instead only emit the MOV's once, which is more consistent and prevents
us from relying on CSE to re-combine the constants when they aren't
absorbed into the instruction.
total instructions in shared programs: 6078991 -> 6073118 (-0.10%)
instructions in affected programs: 402221 -> 396348 (-1.46%)
helped: 1527
HURT: 0
GAINED: 8
LOST: 2
v2: split this out from the previous commit (Jason)
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Before, we would use registers, but set a magical "parent_instr" field
to indicate that it was actually purely an SSA value (i.e., it wasn't
involved in any phi nodes). Instead, just use SSA values directly, which
lets us get rid of the hack and reduces memory usage since we're not
allocating a nir_register for every value. It also makes our handling of
load_const more consistent compared to the other instructions.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
We already don't convert constants out of SSA, and in our backend we'd
like to have only one way of saying something is still in SSA.
The one tricky part about this is that we may now leave some undef
instructions around if they aren't part of a phi-web, so we have to be
more careful about deleting them.
v2: rename and flip meaning of flag (Jason)
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
0 is not used as a valid drawable id, as such there is no point in
attempting to query its geometry. Just bail out early and provide the
more meaningful EGL_BAD_NATIVE_WINDOW to the user.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Raise EGL_BAD_NATIVE_WINDOW instead of crashing.
v2: s/Rise/Raise/ (spotted by Michel)
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Free the memory for dri2_surf in the unlikely case that one provides
NULL for native_window. Also set the relevant EGL_ERROR to provide
feedback to the user.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Previously we were unconditionally doing ttn_get_src() even for
instructions with no src's. Which created a lot of unnecessary
load_const instructions. These were mostly harmless since NIR opt
passes would strip them back out. But for an ENDIF following a
BRK, it would result in load_const instructions created after the
NIR break instruction. Which nir_validate dislikes.
But we can actually just dtrt by using NumSrcRegs instead.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
It isn't quite yet practical to enable TGSI_ANY_INOUT_DECL_RANGE shader
cap yet, at least not in drivers that need lower_to_scalar pass (which
right now is all of the ttn users), since the register arrays do not get
converted to SSA, which angers nir_lower_alu_to_scalar.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
It is silly to traverse back to find first instruction that writes part
of a larger "virtual" register many times per instruction (plus per use
as a src to later instructions). Cache this information so we only
figure it out once.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
The fanin source could be grouped, for example with shaders like:
VERT
DCL IN[0]
DCL IN[1]
DCL OUT[0], POSITION
DCL OUT[1], GENERIC[9]
DCL SAMP[0]
DCL SVIEW[0], 2D, FLOAT
DCL TEMP[0], LOCAL
0: MOV TEMP[0].xy, IN[1].xyyy
1: MOV TEMP[0].w, IN[1].wwww
2: TXF TEMP[0], TEMP[0], SAMP[0], 2D
3: MOV OUT[1], TEMP[0]
4: MOV OUT[0], IN[0]
5: END
The second arg to the isaml is IN[1].w, so we need to look at the fanin
source to get the correct offset.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Split out most of dump_info() from ir3_cmdline compiler into a function
that can be used both by cmdline compiler and also for the disasm debug
option. This way, for FD_MESA_DEBUG=disasm we also get to see intput/
output registers, etc.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Some piglit tests, like arb_fragment_program-sparse-samplers, result in
having a null samp#0 but valid samp#1.
TODO: a3xx probably needs similar fix
Signed-off-by: Rob Clark <robclark@freedesktop.org>
We can't rely on what we get from the assembler if we have indirect
addressing of constant file, since the assembler doesn't know the array
index. This got lost in the transition to NIR.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Desktop GLSL < 130 and GLSL ES < 300 allow sampler array indexing where
index can contain a loop induction variable. This extra check will warn
during linking if some of the indexes could not be turned in to constant
expressions.
v2: warning instead of error for backends that did not enable
EmitNoIndirectSampler option (have dynamic indexing)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Cc: "10.5" and "10.6" <mesa-stable@lists.freedesktop.org>
Patch provides new compiler option for backend to force unroll loops
that have non-constant expression indexing on sampler arrays.
This makes sure that we can never end up with a shader that uses loop
induction variable as sampler array index but does not unroll because
of having too much instructions. This would not work without dynamic
indexing support.
v2: change option name as EmitNoIndirectSampler
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Cc: "10.5" and "10.6" <mesa-stable@lists.freedesktop.org>
Dynamic indexing of sampler arrays is prohibited by GLSL ES 3.00.
Earlier versions allow 'constant-index-expression' indexing, where
index can contain a loop induction variable.
Patch allows dynamic indexing for sampler arrays when GLSL ES < 3.00.
This change makes 'sampler-array-index.frag' parser test in Piglit
pass + fishgl.com works when running Chrome on OpenGL ES 2.0 backend
v2: small change and some more commit message (Tapani)
v3: refactor checks to make it more readable (Ian Romanick)
v4: change warning comment in GLSL ES case (Curro)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: Kalyan Kondapally <kalyan.kondapally@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Cc: "10.5" and "10.6" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84225
From the "apparently I don't know C" files...GCC apparently supports:
x ?: y
which is equivalent to
x ? x : y
except that it doesn't cause side-effects to occur twice. See:
https://gcc.gnu.org/onlinedocs/gcc/Conditionals.html#Conditionals
This was confusing and looked like a typo. It doesn't really buy us
anything, so just write the obvious code in normal C.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
A clear will do a partial validate, which will in turn reference all the
buffers in the bufctx again. However the fragprog last validated might
have already been deleted. So reset the bufctx when updating state.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
This patch enables using XY_FAST_COPY_BLT only for Yf/Ys tiled buffers.
It can be later turned on for other tiling patterns (X,Y) too.
V3: Flush in between sequential fast copy blits.
Fix src/dst alignment requirements.
Make can_fast_copy_blit() helper.
Use ffs(), is_power_of_two()
Move overlap computation inside intel_miptree_blit().
V4: Use _mesa_regions_overlap() function.
Add check for src_buffer == dst_buffer.
Simplify horizontal and vertical alignment computations.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
In case of I915_TILING_{X,Y} we need to pass tiling format to libdrm
using drm_intel_bo_alloc_tiled(). But, In case of YF/YS tiled buffers
libdrm need not know about the tiling format because these buffers
don't have hardware support to be tiled or detiled through a fenced
region. libdrm still need to know buffer alignment value for its use
in kernel when resolving the relocation.
Using drm_intel_bo_alloc_for_render() for YF/YS tiled buffers
satisfy both the above conditions.
V2: Delete min/max buffer size restrictions not valid for i965+.
Remove redundant align to tile size statements.
Remove some redundant code now when there are no min/max buffer size.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Buffers with Yf/Ys tiling end up using meta upload / download
paths or the blitter for cases where they used tiled_memcpy paths
in case of Y tiling. This has exposed some bugs in meta path. To
avoid any piglit regressions on SKL this patch keeps the Yf/Ys
tiling disabled at the moment.
V3: Make brw_miptree_choose_tr_mode() actually choose TRMODE. (Ben)
Few cosmetic changes.
V4: Get rid of brw_miptree_choose_tr_mode().
Take care of all tile resource modes {Yf, Ys, none} for all
generations at one place.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
In order to save a small leak if mesa is continously loaded and
unloaded, let's free the locale when the shared object is unloaded.
Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
_mesa_strtod and _mesa_strtof are only used from the GLSL compiler and
the ARB_[vertex|fragment]_program code, meaning that the locale doesn't
need to be initialized before the first OpenGL context gets initialized.
So let's use explicit initialization from the one-time init code instead
of depending on a C++ compiler to initialize at image-load time.
Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This function only gets called while mesa is unloading, so there's
no potential of racing or multiple calls at the same time. So let's
just get rid of the locking.
Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
There's no point in calling _mesa_destroy_shader_compiler multiple
times on exit; the resources will only be released once anyway.
So let's move the atexit-call into the part that is only called
once.
Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This function is for deleting per-screen resources, and the shader
compiler resources are not of such nature. Besides, dri shouldn't
need to even know about the presence of a shader compiler.
These resources will already be released when mesa gets unloaded,
and that should be sufficient.
Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
All of these enums are now in use around in the code, so there's no need
to explicitly use them here any more.
Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Work-group size should always be aligned to subgroup size; this is a
basic requirement, otherwise some work-items will be no-operation.
It might make sense to refine the value according to a kernel's
resource usage, but that's a possible optimization for the future.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Since commit 104c8fc2c2 the GLSL IR will be freed if NIR is
being used. This was causing it to segfault if INTEL_DEBUG=wm is set.
This patch just makes it avoid dumping the GLSL IR in that case.
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
This reverts commit c2ff3485b3.
Ilia and I noticed a memory leak caused by this patch: at least with
fixed-function programs, we clone things using ProgramResourceList as
the context before reralloc makes it non-NULL.
I believe Tapani found other bugs with these patches, so I'm just going
to revert them for now and let him pursue them further.
Still appears to have issues with negative indices less than -1M, but
that's a corner case of a corner case.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Legacy user clipping (using gl_Position or gl_ClipVertex) is handled by
turning those into the modern gl_ClipDistance equivalents.
This is unnecessary in Core Profile: if user clipping is enabled, but
the shader doesn't write the corresponding gl_ClipDistance entry,
results are undefined. Hence, it is also unnecessary for geometry
shaders.
This patch moves the call up to run_vs(). This is equivalent for VS,
but removes the need to pass clip distances into emit_urb_writes().
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
According to the "URB SIMD8 Write > Write Data Payload" documentation,
"The write data payload can be between 1 and 8 message phases long."
Apparently, the simulator considers it an error if you issue an URB
SIMD8 message with only a header and no actual data to write.
v2: Try to put in a better PRM citation, now that the Broadwell docs
actually exist (requested by Jordan).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
The HUD doesn't check if query_create() fails and it calls other
pipe_query functions with NULL pointer instead of a valid query object.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The dup'ed fd owned by the nouveau_screen for a device node
must also be used as key for the winsys hash table, instead
of using the original fd passed in for a screen, to make
multi-x-screen ZaphodHeads configurations work on nouveau.
The original fd's lifetime differs from that of the nouveau_screen stored
in the hash. The hash key is the fd, and in order to compare hash entries
we fstat them, so the fd must be around for as long as the screen is.
This is an extension of the fix in commit a59f2bb1 (nouveau: dup fd
before passing it to device).
Cc: "10.3 10.4 10.5 10.6" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
The meta code was setting a default depth range for all viewports
and 'restoring' all viewports to depth range values saved from viewport 0.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This isn't pretty and I'd suggest it the pm4 interface builder
could be tweaked to do this more efficently, but I'd need
guidance on how that would look.
This seems to pass the few piglit tests I threw at it.
v2: handle passing layer/viewport index to fragment shader.
fix crash in blit changes,
add support to io_get_unique_index for layer/viewport index
update docs.
v3: avoid looking up viewport index and layer in es (Marek).
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
brw_miptree_layout_2d tries to ensure that mt->total_width is a
multiple of the compressed block size, presumably because it wouldn't
be possible to make an image that has a fraction of a block. However
it was doing this by aligning mt->total_width to align_w. Previously
align_w has been used as a shortcut for getting the block width
because before Gen9 the block width was always equal to the alignment.
Commit 4ab8d59a2 tried to fix these cases to use the block width
instead of the alignment but it missed this case.
I think in practice this probably won't make any difference because
the buffer for the texture will be allocated to be large enough to
contain the entire pitch and libdrm aligns the pitch to the tile width
anyway. However I think the patch is worth having to make the
intention clearer.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Since the addition of ilo_vma, it was used only to pad a bo for sampling
engine surfaces. Replace it entirely with these functions
ilo_state_surface_buffer_size()
ilo_state_vertex_buffer_size()
ilo_state_index_buffer_size()
ilo_state_sol_buffer_size()
readpixels_can_use_memcpy will later call _mesa_format_matches_format_and_type
which does much tighter checks than these to decide if we can use
memcpy for readpixels.
Also, the checks do not seem to be extensive enough anyway, since we are
checking for signed/unsigned conversion only when the framebuffer has integers,
but the same checks could be done for other types anyway, since as long as
there is a signed/unsigned conversion we can't memcpy.
No regressions observed on i965/llvmpipe.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
From Muchnick's Advanced Compiler Design and Implementation:
"To determine which variables are live at each point in a flowgraph, we
perform a backward data-flow analysis"
Previously, we were walking the blocks forwards and updating the livein and
then the liveout. However, the livein calculation depends on the liveout
and the liveout depends on the successor blocks. The net result is that it
takes one full iteration to go from liveout to livein and then another
full iteration to propagate to the predecessors. This works out to an
O(n^2) computation where n is the number of blocks. If we run things in
the other order, it's O(nl) where l is the maximum loop depth which is
practically bounded by 3.
In b2c6ba0c4b, we made this same change in
the FS backend to great effect. Might as well keep it consistent and make
the same change for vec4. Also, this took the time to run the test:
ES31-CTS.arrays_of_arrays.InteractionFunctionCalls1
from 6:49.62 to 3:31.40 on Timothy Arceri's machine.
Reviewed-by: Matt Turner <mattst88@gmail.com>
gen8 had some special restrictions which don't seem to carry over to gen9.
Quoting the spec for SKL:
"The Z_Height and Z_Width values must equal those present in
3DSTATE_DEPTH_BUFFER incremented by one."
This fixes nothing in piglit (and regresses nothing).
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This is always 0 - only brw_workaround_depthstencil_alignment ever sets
it, and that doesn't run on Gen6+. My initial Broadwell depth state
commit had this mistake.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
We already recognize min(max(a, 0.0), 1.0) as a saturate, but neglected
this variant (which is also handled by the GLSL IR pass).
shader-db results on Broadwell:
total instructions in shared programs: 7363046 -> 7362788 (-0.00%)
instructions in affected programs: 11928 -> 11670 (-2.16%)
helped: 64
HURT: 0
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
The thread counts and URB information are all speculative numbers that were
based on some CHV numbers at the time.
v2:
Originally this patch had PCI IDs. I've moved that to a new patch at the end of
the series.
Remove is_cherryview hack.
Add PCI ids. These match the ones defined in the kernel. The only one tested by
us is 0x0a84.
Capitalize the hex string (Mark)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Tested-by: "Lecluse, Philippe" <Philippe.Lecluse@intel.com>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
Commit b765119c changed the default value of all the counter bits to
64. However, older hardware only has 32 counter bits.
This has only been build-tested. We don't have any tests that verify
the advertised value against implementation behavior, so I don't know
what additional testing could be done.
NOTE: It appears that many Gallium drivers (at least r300 and i915g)
have the same problem, but I don't see a way for the state-tracker to
determine the counter size. Marek says, "For Gallium, a new PIPE_CAP or
new get_xxx_param function will be needed."
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
From Muchnick's Advanced Compiler Design and Implementation:
"To determine which variables are live at each point in a flowgraph, we
perform a backward data-flow analysis"
Previously, we were walking the blocks forwards and updating the livein and
then the liveout. However, the livein calculation depends on the liveout
and the liveout depends on the successor blocks. The net result is that it
takes one full iteration to go from liveout to livein and then another
full iteration to propagate to the predecessors. This works out to an
O(n^2) computation where n is the number of blocks. If we run things in
the other order, it's O(nl) where l is the maximum loop depth which is
practically bounded by 3.
On my HSW desktop, one particular shadertoy test gets a 20% improvement in
compile times:
N Min Max Median Avg Stddev
x 10 15.965 16.884 16.026 16.1822 0.34736846
+ 10 12.813 13.052 12.876 12.8891 0.06913666
Difference at 95.0% confidence
-3.2931 +/- 0.235316
-20.3501% +/- 1.45417%
(Student's t, pooled s = 0.250444)
Reviewed-by: Matt Turner <mattst88@gmail.com>
This is based on Kenneth's patch to delete 'most of the IR'. Due to
linker changes to clone variables, we can now free all of IR.
Saves 58MB of memory when replaying a Dota 2 trace on Broadwell.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
This increases memory pressure during linking but makes it easier
for backend to free IR after it is not needed anymore.
v2: use resource list as ralloc context in case of relink (Kenneth)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
Start trimming the fat from intel_batchbuffer.c. First by moving the set
of routines for emitting PIPE_CONTROLS (along with the lore concerning
hardware workarounds) to a separate brw_pipe_control.c
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This caused us to always free the pipe_surface for the renderbuffer.
The subsequent call to st_update_renderbuffer_surface() would typically
just recreate it. Remove the call to pipe_surface_release() and let
st_update_renderbuffer_surface() take care of freeing the old surface
if it needs to be replaced (because of change to mipmap level, etc).
This can save quite a few calls to pipe_context::create_surface() and
surface_destroy().
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Fixes the following build issue, when building without libudev.
CCLD libGL.la
./.libs/libglx.a(dri2_glx.o): In function `dri2CreateScreen':
src/glx/dri2_glx.c:1186: undefined reference to `loader_open_device'
collect2: ld returned 1 exit status
CCLD libEGL.la
Undefined symbols for architecture x86_64:
"_loader_open_device", referenced from:
_dri2_initialize_x11_dri2 in libegl_dri2.a(platform_x11.o)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91077
Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
1000 ms is an extreme value for typical interactive loads. A large
cache has some disadvantages. Search for reusable BOs can take a long
time and memory might get exhausted.
Let's be rather conservative and use half of the old value,
500ms. This is beneficial to some loads on my test system and there
are no regressions.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This is the basic granularity for BO allocations. The alignment also
helps with BO reuse by the cached bufmgr.
This results in a huge 45% speedup in Metro 2033 Redux on my test
system. The game relies on buffer orphaning with very small buffers
(hundreds of bytes in size) and that did not work efficiently
before. This change may also affect other applications and games.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Matt, Jason, and I haven't found this useful in a long time.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
It returns a new value for each sample in the TLB. We've already avoided
trying to get the same index's color multiple times at the vc4_program.c
level, so we're not losing anything by doing this.
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=90839">Bug 90839</a> - [10.5.5/10.6 regression, bisected] PBO glDrawPixels no longer using blit fastpath</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=90873">Bug 90873</a> - Kernel hang, TearFree On, Mate desktop environment</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91056">Bug 91056</a> - The Bard's Tale (2005, native) has rendering issues</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91117">Bug 91117</a> - Nimbus (running in wine) has rendering issues, objects are semi-transparent</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91124">Bug 91124</a> - Civilization V (in Wine) has rendering issues: text missing, menu bar corrupted</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=90347">Bug 90347</a> - [NVE0+] Failure to insert texbar under some circumstances (causing bad colors in Terasology)</li>
</ul>
<h2>Changes</h2>
<p>Anuj Phogat (4):</p>
<ul>
<li>mesa: Handle integer formats in need_rgb_to_luminance_conversion()</li>
<li>mesa: Use helper function need_rgb_to_luminance_conversion()</li>
<li>mesa: Turn need_rgb_to_luminance_conversion() in to a global function</li>
<li>meta: Abort meta path if ReadPixels need rgb to luminance conversion</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=73528">Bug 73528</a> - Deferred lighting in Second Life causes system hiccups and screen flickering</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=80500">Bug 80500</a> - Flickering shadows in unreleased title trace</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=82186">Bug 82186</a> - [r600g] BARTS GPU lockup with minecraft shaders</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91047">Bug 91047</a> - [SNB Bisected] Messed up Fog in Super Smash Bros. Melee in Dolphin</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91056">Bug 91056</a> - The Bard's Tale (2005, native) has rendering issues</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91117">Bug 91117</a> - Nimbus (running in wine) has rendering issues, objects are semi-transparent</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91124">Bug 91124</a> - Civilization V (in Wine) has rendering issues: text missing, menu bar corrupted</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=85252">Bug 85252</a> - Segfault in compiler while processing ternary operator with void arguments</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91570">Bug 91570</a> - Upgrading mesa to 10.6 causes segfault in OpenGL applications with GeForce4 MX 440 / AGP 8X</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91610">Bug 91610</a> - [BSW] GPU hang for spec.shaders.point-vertex-id gl_instanceid divisor</li>
</ul>
<h2>Changes</h2>
<p>Adam Jackson (1):</p>
<ul>
<li>glx: Fix __glXWireToEvent for BufferSwapComplete</li>
</ul>
<p>Alex Deucher (2):</p>
<ul>
<li>radeonsi: add new OLAND pci id</li>
<li>radeonsi: properly set the raster_config for KV</li>
</ul>
<p>Emil Velikov (4):</p>
<ul>
<li>docs: add sha256 checksums for 10.6.4</li>
<li>vc4: add missing nir include, to fix the build</li>
<li>Revert "radeonsi: properly set the raster_config for KV"</li>
<li>Update version to 10.6.5</li>
</ul>
<p>Frank Binns (1):</p>
<ul>
<li>egl/x11: don't abort when creating a DRI2 drawable fails</li>
</ul>
<p>Ilia Mirkin (3):</p>
<ul>
<li>nouveau: no need to do tnl wakeup, state updates are always hooked up</li>
<li>gm107/ir: indirect handle goes first on maxwell also</li>
<li>nv50,nvc0: take level into account when doing eng2d multi-layer blits</li>
</ul>
<p>Jason Ekstrand (4):</p>
<ul>
<li>meta/copy_image: Stash off the scissor</li>
<li>mesa/formats: Only do byteswapping for packed formats</li>
<li>mesa/formats: Fix swizzle flipping for big-endian targets</li>
<li>mesa/formats: Don't flip channels of null array formats</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=66346">Bug 66346</a> - shader_query.cpp:49: error: invalid conversion from 'void*' to 'GLuint'</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=73512">Bug 73512</a> - [clover] mesa.icd. should contain full path</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=73528">Bug 73528</a> - Deferred lighting in Second Life causes system hiccups and screen flickering</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=74329">Bug 74329</a> - Please expose OES_texture_float and OES_texture_half_float on the ES3 context</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=80500">Bug 80500</a> - Flickering shadows in unreleased title trace</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=82186">Bug 82186</a> - [r600g] BARTS GPU lockup with minecraft shaders</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=84677">Bug 84677</a> - Triangle disappears with glPolygonMode GL_LINE</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=85252">Bug 85252</a> - Segfault in compiler while processing ternary operator with void arguments</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=89131">Bug 89131</a> - [Bisected] Graphical corruption in Weston, shows old framebuffer pieces</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=90073">Bug 90073</a> - Leaks in xcb_dri3_open_reply_fds() and get_render_node_from_id_path_tag</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=90249">Bug 90249</a> - Fails to build egl_dri2 on osx</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=90310">Bug 90310</a> - Fails to build gallium_dri.so at linking stage with clang because of multiple redefinitions</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=90347">Bug 90347</a> - [NVE0+] Failure to insert texbar under some circumstances (causing bad colors in Terasology)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=90466">Bug 90466</a> - arm: linker error ndefined reference to `nir_metadata_preserve'</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=90520">Bug 90520</a> - Register spilling clobbers registers used elsewhere in the shader</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=90797">Bug 90797</a> - [ALL bisected] Mesa change cause performance case manhattan fail.</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=90817">Bug 90817</a> - swrast fails to load with certain remote X servers</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=90830">Bug 90830</a> - [bsw bisected regression] GPU hang for spec.arb_gpu_shader5.execution.sampler_array_indexing.vs-nonzero-base</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=90839">Bug 90839</a> - [10.5.5/10.6 regression, bisected] PBO glDrawPixels no longer using blit fastpath</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=90873">Bug 90873</a> - Kernel hang, TearFree On, Mate desktop environment</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=90887">Bug 90887</a> - PhiMovesPass in register allocator broken</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=90895">Bug 90895</a> - [IVB/HSW/BDW/BSW Bisected] GLB2.7 Egypt, GfxBench3.0 T-Rex & ALU and many SynMark cases performance reduced by 10-23%</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91047">Bug 91047</a> - [SNB Bisected] Messed up Fog in Super Smash Bros. Melee in Dolphin</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91056">Bug 91056</a> - The Bard's Tale (2005, native) has rendering issues</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91077">Bug 91077</a> - dri2_glx.c:1186: undefined reference to `loader_open_device'</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91117">Bug 91117</a> - Nimbus (running in wine) has rendering issues, objects are semi-transparent</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91124">Bug 91124</a> - Civilization V (in Wine) has rendering issues: text missing, menu bar corrupted</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91222">Bug 91222</a> - lp_test_format regression on CentOS 7</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91226">Bug 91226</a> - Crash in glLinkProgram (NEW)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91231">Bug 91231</a> - [NV92] Psychonauts (native) segfaults on start when DRI3 enabled</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91254">Bug 91254</a> - (regresion) video using VA-API on Intel slow and freeze system with mesa 10.6 or 10.6.1</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91292">Bug 91292</a> - [BDW+] glVertexAttribDivisor not working in combination with glPolygonMode</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91461">Bug 91461</a> - gl_TessLevel* writes have no effect for all but the last TCS invocation</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91513">Bug 91513</a> - [IVB/HSW/BDW/SKL Bisected] Lightsmark performance reduced by 7%-10%</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91526">Bug 91526</a> - World of Warcraft (on Wine) has UI corruption with nouveau</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91544">Bug 91544</a> - [i965, regression, bisected] regression of several tests in 93977d3a151675946c03e</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91551">Bug 91551</a> - DXTn compressed normal maps produce severe artifacts on all NV5x and NVDx chipsets</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91570">Bug 91570</a> - Upgrading mesa to 10.6 causes segfault in OpenGL applications with GeForce4 MX 440 / AGP 8X</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91591">Bug 91591</a> - rounding.h:102:2: error: #error "Unsupported or undefined LONG_BIT"</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91610">Bug 91610</a> - [BSW] GPU hang for spec.shaders.point-vertex-id gl_instanceid divisor</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91673">Bug 91673</a> - Segfault when calling glTexSubImage2D on storage texture to bound FBO</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91726">Bug 91726</a> - R600 asserts in tgsi_cmp/make_src_for_op3</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91847">Bug 91847</a> - glGenerateTextureMipmap not working (no errors) unless glActiveTexture(GL_TEXTURE1) is called before</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91857">Bug 91857</a> - Mesa 10.6.3 linker is slow</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91881">Bug 91881</a> - regression: GPU lockups since mesa-11.0.0_rc1 on RV620 (r600) driver</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=38109">Bug 38109</a> - i915 driver crashes if too few vertices are submitted (Mesa 7.10.2)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91716">Bug 91716</a> - [bisected] piglit.shaders.glsl-vs-int-attrib regresses on 32 bit BYT, HSW, IVB, SNB</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91719">Bug 91719</a> - [SNB,HSW,BYT] dEQP regressions associated with using NIR for vertex shaders</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=55552">Bug 55552</a> - Compile errors with --enable-mangling</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=71789">Bug 71789</a> - [r300g] Visuals not found in (default) depth = 24</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91044">Bug 91044</a> - piglit spec/egl_khr_create_context/valid debug flag gles* fail</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91342">Bug 91342</a> - Very dark textures on some objects in indoors environments in Postal 2</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91596">Bug 91596</a> - EGL_KHR_gl_colorspace (v2) causes problem with Android-x86 GUI</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=86720">Bug 86720</a> - [radeon] Europa Universalis 4 freezing during game start (10.3.3+, still broken on 11.0.2)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91788">Bug 91788</a> - [HSW Regression] Synmark2_v6 Multithread performance case FPS reduced by 36%</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91993">Bug 91993</a> - Graphical glitch in Astromenace (open-source game).</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=92214">Bug 92214</a> - Flightgear crashes during splashboot with R600 driver, LLVM 3.7.0 and mesa 11.0.2</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=92437">Bug 92437</a> - osmesa: Expose GL entry points for Windows build, via .def file</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=92623">Bug 92623</a> - Differences in prog_data ignored when caching fragment programs (causes hangs)</li>
</ul>
<h2>Changes</h2>
<p>Alex Deucher (1):</p>
<ul>
<li>radeon/uvd: don't expose HEVC on old UVD hw (v3)</li>
</ul>
<p>Ben Widawsky (1):</p>
<ul>
<li>i965/skl: Add GT4 PCI IDs</li>
</ul>
<p>Emil Velikov (4):</p>
<ul>
<li>docs: add sha256 checksums for 11.0.4</li>
<li>cherry-ignore: ignore a possible wrong nomination</li>
<li>Revert "mesa/glformats: Undo code changes from _mesa_base_tex_format() move"</li>
<li>Update version to 11.0.5</li>
</ul>
<p>Emmanuel Gil Peyrot (1):</p>
<ul>
<li>gbm.h: Add a missing stddef.h include for size_t.</li>
</ul>
<p>Eric Anholt (1):</p>
<ul>
<li>vc4: When the create ioctl fails, free our cache and try again.</li>
</ul>
<p>Ian Romanick (1):</p>
<ul>
<li>i965: Fix is-renderable check in intel_image_target_renderbuffer_storage</li>
</ul>
<p>Ilia Mirkin (3):</p>
<ul>
<li>nvc0: respect edgeflag attribute width</li>
<li>nouveau: set MaxDrawBuffers to the same value as MaxColorAttachments</li>
<li>nouveau: relax fence emit space assert</li>
</ul>
<p>Ivan Kalvachev (1):</p>
<ul>
<li>r600g: Fix special negative immediate constants when using ABS modifier.</li>
</ul>
<p>Jason Ekstrand (2):</p>
<ul>
<li>nir/lower_vec_to_movs: Pass the shader around directly</li>
<li>nir: Report progress from lower_vec_to_movs().</li>
</ul>
<p>Jose Fonseca (2):</p>
<ul>
<li>gallivm: Translate all util_cpu_caps bits to LLVM attributes.</li>
<li>gallivm: Explicitly disable unsupported CPU features.</li>
</ul>
<p>Julien Isorce (4):</p>
<ul>
<li>st/va: pass picture desc to begin and decode</li>
<li>nvc0: fix crash when nv50_miptree_from_handle fails</li>
<li>st/va: do not destroy old buffer when new one failed</li>
<li>st/va: add more errors checks in vlVaBufferSetNumElements and vlVaMapBuffer</li>
</ul>
<p>Kenneth Graunke (6):</p>
<ul>
<li>i965: Fix missing BRW_NEW_*_PROG_DATA flagging caused by cache reuse.</li>
<li>nir: Report progress from nir_split_var_copies().</li>
<li>nir: Properly invalidate metadata in nir_split_var_copies().</li>
<li>nir: Properly invalidate metadata in nir_opt_copy_prop().</li>
<li>nir: Properly invalidate metadata in nir_lower_vec_to_movs().</li>
<li>nir: Properly invalidate metadata in nir_opt_remove_phis().</li>
</ul>
<p>Marek Olšák (1):</p>
<ul>
<li>radeonsi: add register definitions for Stoney</li>
</ul>
<p>Nanley Chery (1):</p>
<ul>
<li>mesa/glformats: Undo code changes from _mesa_base_tex_format() move</li>
</ul>
<p>Nicolai Hähnle (1):</p>
<ul>
<li>st/mesa: fix mipmap generation for immutable textures with incomplete pyramids</li>
</ul>
<p>Nigel Stewart (1):</p>
<ul>
<li>osmesa: Expose GL entry points for Windows build via DEF file.</li>
</ul>
<p>Roland Scheidegger (1):</p>
<ul>
<li>gallivm: disable f16c when not using AVX</li>
</ul>
<p>Samuel Li (2):</p>
<ul>
<li>radeonsi: add support for Stoney asics (v3)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=92900">Bug 92900</a> - [regression bisected] About 700 piglit regressions is what could go wrong</li>
</ul>
<h2>Changes</h2>
<p>Alex Deucher (1):</p>
<ul>
<li>radeonsi: enable optimal raster config setting for fiji (v2)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=90348">Bug 90348</a> - Spilling failure of b96 merged value</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=92363">Bug 92363</a> - [BSW/BDW] ogles1conform Gets test fails</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=92438">Bug 92438</a> - Segfault in pushbuf_kref when running the android emulator (qemu) on nv50</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=93110">Bug 93110</a> - [NVE4] textureSize() and textureQueryLevels() uses a texture bound during the previous draw call</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=92909">Bug 92909</a> - Offset/alignment issue with layout std140 and vec3</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=93004">Bug 93004</a> - Guild Wars 2 crash on nouveau DX11 cards</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=93215">Bug 93215</a> - [Regression bisected] Ogles1conform Automatic mipmap generation test is fail</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=93266">Bug 93266</a> - gl_arb_shading_language_420pack does not allow binding of image variables</li>
</ul>
<h2>Changes</h2>
<p>Boyuan Zhang (1):</p>
<ul>
<li>radeon/uvd: uv pitch separation for stoney</li>
</ul>
<p>Dave Airlie (9):</p>
<ul>
<li>r600: do SQ flush ES ring rolling workaround</li>
<li>r600: SMX returns CONTEXT_DONE early workaround</li>
<li>r600/shader: split address get out to a function.</li>
<li>r600/shader: add utility functions to do single slot arithmatic</li>
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.