There are three types of fast clears:
a. fast depth clears
b. fast singlesample color clears
c. fast multisample color clears
Function intel_miptree_is_fast_clear_capable() checks if a miptree
supports fast clears of type (b).
Rename the function to disambiguate what it does:
old: intel_miptree_is_fast_clear_capable
new: intel_miptree_supports_non_msrt_fast_clear
The functionally accidentally rejected multisampled color surfaces
because it thought they were singlesample array surfaces. Fix that by
explicitly rejecting surfaces with samples > 1.
This fix would have been needed before we enabled layered fast
singlesample color clears (introduced in gen8), which we want to do
eventually. For now, though, this patch changes no behavior; it just
fixes how the driver chooses its behavior.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
intel_tiling_supports_non_msrt_mcs() and
intel_miptree_is_fast_clear_capable() are not used outside of
intel_mipmap_tree.c.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
We need a virtual destructor when at least one of the class' methods is virtual.
Failure to do so might lead to undefined behavior when destructing derived classes.
Fixes the following warning:
brw_vec4_gs_visitor.cpp: In function 'const unsigned int* brw::brw_gs_emit(brw_context*, gl_shader_program*, brw_gs_compile*, void*, unsigned int*)':
brw_vec4_gs_visitor.cpp:703:11: warning: deleting object of polymorphic class type 'brw::vec4_gs_visitor' which has non-virtual destructor might cause undefined behaviour [-Wdelete-non-virtual-dtor]
delete gs;
Curro: This shouldn't be causing any actual bugs at the moment because
gen6_gs_visitor is the only subclass of vec4_visitor destroyed through
a pointer of a base class (vec4_gs_visitor *) and its destructor is
basically the same as its parent's. Anyway it seems sensible to change
this so it doesn't bite us in the future.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Fixes following Piglit test:
global-scope-binding-qualifier.frag
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
In theory we can't break this assertion since the compiler frontend checks
that we don't exceed any of the individual limits, but it does not hurt to
be extra safe.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These share the space with UBO surfaces but we need to make sure we
allocate enough space for both sets (12 of each)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The thing you want to do with the output files is diff them, which is
made more difficult by line numbers changing.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
It seems like things are either coming in slighly wrong, or perhaps
uploaded incorrectly, but either way passing them through the translate
module seems to fix everything. Eventually we should figure out what's
going wrong and fix it "for real", but this should do for now.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
This puts us in line with what the DDX/DRI2 st are expecting. It also
happens to work... no idea why, but seems better to have it work than to
ask lots of questions.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
The uniform will only be of a single type so store the data for
opaque types in a single array.
Cc: Francisco Jerez <currojerez@riseup.net>
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Geometry and tessellation shaders process multiple vertices; their
inputs are arrays indexed by the vertex number. While GLSL makes
this look like a normal array, it can be very different behind the
scenes.
On Intel hardware, all inputs for a particular vertex are stored
together - as if they were grouped into a single struct. This means
that consecutive elements of these top-level arrays are not contiguous.
In fact, they may sometimes be in completely disjoint memory segments.
NIR's existing load_input intrinsics are awkward for this case, as they
distill everything down to a single offset. We'd much rather keep the
vertex ID separate, but build up an offset as normal beyond that.
This patch introduces new nir_intrinsic_load_per_vertex_input
intrinsics to handle this case. They work like ordinary load_input
intrinsics, but have an extra source (src[0]) which represents the
outermost array index.
v2: Rebase on earlier refactors.
v3: Use ssa defs instead of nir_srcs, rebase on earlier refactors.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
get_io_offset() already walks the dereference chain and discovers
whether or not we have an indirect; we can just return that rather than
computing it a second time via deref_has_indirect(). This means moving
the call a bit earlier.
By returning a nir_ssa_def *, we can pass back both an existence flag
(via NULL checking the pointer) and the value in one parameter. It
also simplifies the code somewhat. nir_lower_samplers works in a
similar fashion.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This is only a half of the work. The next patch will handle
gl_SampleID/SamplePos, which is the other half of ARB_sample_shading.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Required by ARB_sample_shading for drivers that don't want a shader variant
in st/mesa.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Roland Scheidegger <sroland@vmware.com>
We will only do depth-only or stencil-only decompress blits, whichever is
needed by textures, instead of always doing both.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Now that everything comes in through NIR, we can pick this directly out of
the shader source and don't need to reference the gl_fragment_program.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is a common enough operation that it's nice to not have to think about
the arguments to foreach_list_typed every time.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Drivers and state trackers that use LLVM for generating code, must
register the targets they use with LLVM's global TargetRegistry.
The TargetRegistry is not thread-safe, so all targets must be added
to the registry before it can be queried for target information.
When drivers and state trackers initialize their own targets, they need
a way to force gallivm to initialize its targets at the same time.
Otherwise, there can be a race condition in some multi-threaded
applications (e.g. glx-multihreaded-shader-compile in piglit),
when one thread creates a context for a driver that uses LLVM (e.g.
radeonsi) and another thread creates a gallivm context (glxContextCreate
does this).
The race happens when the driver thread initializes its LLVM targets and
then starts using the registry before the gallivm thread has a chance to
register its targets.
This patch allows users to force gallivm to register its targets by
calling the gallivm_init_llvm_targets() function.
v2:
- Use call_once and remove mutexes and static initializations.
- Replace gallivm_init_llvm_{begin,end}() with
gallivm_init_llvm_targets().
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
CC: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Unfortunately, we can't get rid of them entirely. The FS backend still
needs gl_program for handling TEXTURE_RECTANGLE. The GS vec4 backend still
needs gl_shader_program for handling transfom feedback. However, the VS
needs neither and we can substantially reduce the amount they are used.
One day we will be free from their tyranny.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It doesn't exist for anything other than an assert that, as far as I can
tell, isn't possible to trip. Soon, we will remove prog from the visitor
entirely and this will become even more impossible to hit.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The texunit variable we create and assign in nir_emit_texture gets passed
through two more layers of function calls before it gets to its sole use in
rescale_texcoord. The best part is that we already pass the sampler into
rescale_texcoord so we can just look it up there.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
As of now, uniform setup is more-or-less unified between vec4 and fs and no
longer requires the fs_visitor. This makes uniform setup more of a
language/API thing than a backend compiler thing. This commit moves
setting up the stage_prog_data.params arrays to the same place as we set up
the rest of stage_prog_data.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Setting up binding tables really has little to do with the actual process
of turning shaders into instructions; it's more part of setting up
prog_data. This commit moves it out of the visitors and with the rest of
the prog_data setup stuff.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This really has nothing to do with the backend compiler and we'd like to
eventually be able to set this up earlier in the compile process.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The way we deal with GLSL uniforms and builtins is basically the same in
both the vec4 and the fs backend. This commit takes the best parts of both
implementations and pulls the common code into a shared helper function.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
The way we deal with ARB program uniforms is basically the same in both the
vec4 and the fs backend. This commit takes the best parts of both
implementations and pulls the common code into a shared helper function.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Previously, we were counting up uniforms as we set them up. However, this
count should be exactly identical to shader->num_uniforms provided by
nir_assign_var_locations. (If it's not, we're in trouble anyway because
that means that locations don't match up.) This matches what the fs
backend is already doing.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
I tried to do this once before but Curro pointed out that having it in
backend_shader meant it could use the setup_vec4_uniform_values helper
which did different things in vec4 and fs. Now the setup_uniform_values
function differs only by an assert in the two backends so there's no real
good reason to be using it anymore.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
The uniform_vector_size array was only ever used by pack_uniform_registers
which no longer needs it.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Previously, pack_uniform_registers worked based on the size of the uniform
as given to us when we initially set up the uniforms. However, we have to
walk through the uniforms and figure out liveness anyway, so we migh as
well record the number of channels used as we go. This may also allow us
to pack things tighter in a few cases.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
That way, if we do the usual thing of multiplying vector_elements by
matrix_columns we get the actual number of components in the type as per
component_slots().
While we're at it, we also switch to using the actual C++ field
initializers for vector_elements and matrix_columns.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Previously, we had a bunch of code in each stage to figure out how many
slots we needed in stage_prog_data.param. This code was mostly identical
across the stages and had been copied and pasted around. Unfortunately,
this meant that any time you did something special, you had to add code for
it to each of these places. In particular, none of the stages took
subroutines into account; they were working entirely by accident. By
taking this data from the NIR shader, we know the exact number of entries
we need and everything goes a bit smoother.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
The next commit will add code to codegen_vs_prog that requires the NIR
shader to be there in all cases. It doesn't hurt anything to just move it
from brw_vs_emit to its only caller.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
GLSL IR vs. NIR shader-db results for vec4 programs on i965:
total instructions in shared programs: 1499328 -> 1388354 (-7.40%)
instructions in affected programs: 1245199 -> 1134225 (-8.91%)
helped: 7469
HURT: 2440
GLSL IR vs. NIR shader-db results for vec4 programs on G4x:
total instructions in shared programs: 1436799 -> 1325825 (-7.72%)
instructions in affected programs: 1205599 -> 1094625 (-9.20%)
helped: 7469
HURT: 2440
GLSL IR vs. NIR shader-db results for vec4 programs on Iron Lake:
total instructions in shared programs: 1436654 -> 1325682 (-7.72%)
instructions in affected programs: 1205503 -> 1094531 (-9.21%)
helped: 7468
HURT: 2440
GLSL IR vs. NIR shader-db results for vec4 programs on Sandy Bridge:
total instructions in shared programs: 2016249 -> 1787033 (-11.37%)
instructions in affected programs: 1850547 -> 1621331 (-12.39%)
helped: 14856
HURT: 1481
GLSL IR vs. NIR shader-db results for vec4 programs on Ivy Bridge:
total instructions in shared programs: 1848027 -> 1648216 (-10.81%)
instructions in affected programs: 1660279 -> 1460468 (-12.03%)
helped: 14668
HURT: 1369
GLSL IR vs. NIR shader-db results for vec4 programs on Bay Trail:
total instructions in shared programs: 1848027 -> 1648216 (-10.81%)
instructions in affected programs: 1660279 -> 1460468 (-12.03%)
helped: 14668
HURT: 1369
GLSL IR vs. NIR shader-db results for vec4 programs on Haswell:
total instructions in shared programs: 1848027 -> 1648216 (-10.81%)
instructions in affected programs: 1660279 -> 1460468 (-12.03%)
helped: 14668
HURT: 1369
I also ran our full suite of benchmarks on a Haswell and had the following
statistically significant (according to ministat) changes:
Test master-glsl master-nir diff
bench_OglGeomPoint 461.556 463.006 1.450
bench_OglTerrainFlyInst 184.484 187.574 3.090
bench_OglTerrainPanInst 132.412 136.307 3.895
bench_OglTexFilterAniso 19.653 19.645 -0.008
bench_OglTexFilterTri 58.333 58.009 -0.324
bench_OglVSInstancing 65.049 65.327 0.278
bench_trexoff 69.474 69.694 0.220
bench_valley 40.708 41.125 0.417
v2 (Jason Ekstrand):
- Remove more uses of NirOptions as a switch
- New shader-db numbers
- Added benchmark numbers
Reviewed-by: Matt Turner <mattst88@gmail.com>
Commit 0a1adaf11d (nir: Report progress
from nir_lower_system_values().) introduced a bug caught by Valgrind:
==823== Conditional jump or move depends on uninitialised value(s)
==823== at 0xB09020C: convert_block (nir_lower_system_values.c:68)
==823== by 0xB079FB8: foreach_cf_node (nir.c:1310)
==823== by 0xB07A0AF: nir_foreach_block (nir.c:1336)
==823== by 0xB09026B: convert_impl (nir_lower_system_values.c:79)
...
==823== Uninitialised value was created by a stack allocation
==823== at 0xB090249: convert_impl (nir_lower_system_values.c:76)
which is trivially fixed by initializing progress.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Add a macro GL_LIB_NAME to hold the filename that configure comes up with
based on the --with-gl-lib-name and --enable-mangling options.
In driOpenDriver, use the GL_LIB_NAME macro instead of hard-coding
"libGL.so.1".
v2: Add an #ifndef/#define for GL_LIB_NAME so that non-autoconf builds will
work.
v3: Fix the library filename in the Makefile.
Signed-off-by: Kyle Brenneman <kbrenneman@nvidia.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
When USE_MGL_NAMESPACE is defined, _glapi_get_stub will check for the "m"
prefix before trying to skip it, so that "glFoo" and "mglFoo" are
equivalent.
This should let it work with all the places where something calls
_glapi_get_proc_offset with a hard-coded name that starts with the normal
"gl" prefix.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=55552
Signed-off-by: Kyle Brenneman <kbrenneman@nvidia.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Fixes following Piglit test:
member-invalid-binding-qualifier.frag
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
When writing to a column of a row-major matrix, each component of the
vector is stored to non-consecutive memory addresses, so we generate
one instruction per component.
This patch skips the disabled components in the writemask, saving some
store instructions plus avoid storing wrong data on each disabled
component.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Fixes a failing subtest in:
ES31-CTS.shader_storage_buffer_object.negative-glsl-compileTime
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reading this output was really confusing. reg represents attribute
slots; reg_offset is the x/y/z/w component (0..3) within a vec4 slot.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The code for input lowering is going to get significantly more
complicated shortly, so I wanted to pull it out. Vertex shader inputs
are handled nearly identically regardless of vec4/scalar mode, so I
opted to not split that.
I thought about having each function actually do the lowering, but one
pass through nir_lower_io that handles all types (which weren't handled
earlier) is probably more efficient.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We may want to use different type_size functions for (e.g.) inputs
vs. uniforms. Passing in -1 for mode ignores this, handling all
modes as before.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
If the texture object exists, but the Name field is zero, it means
the object was created but never bound to a target. Trying to bind it
in _mesa_BindTextureUnit() should generate GL_INVALID_OPERATION.
Fixes piglit's arb_direct_state_access-bind-texture-unit test.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
This helper was only called from _mesa_BindTextureUnit(). It's simpler
to just inline it.
The error check / code / message in the helper was incorrect. It was
written for glBindTextures(), not glBindTextureUnit(). The correct
error for a bad texture unit number is GL_INVALID_VALUE. The error
message now reports the unit number rather than a GL_TEXTUREi enum.
Fixes a failure in piglit's arb_direct_state_access-bind-texture-unit test.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Before, we were doing the actual _mesa_reference_texobj() call and
ctx->Driver.BindTexture() and misc housekeeping in three different
places. This consolidates the common code in a new bind_texture()
function.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Since we store both in UniformBlocks, we can't just compute the index by
subtracting the array address start, we need to count the number of
buffers of the approriate type.
v2:
- Just fall back to calc_resource_index (Tapani)
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
The old code had some significant problems with respect to
sampler2DArray textures. The biggest problem was that some of the code
would use vec3 for the texture coordinate type, and other parts of the
code would use vec2. The resulting shader would not even compile.
Since there were not tests for this path, nobody noticed.
The input to the fragment shader is always treated as a vec3. If the
source data is only vec2, the vertex puller will supply 0 for the .z
component. The texture coordinate passed to the fragment shader is
always a vec2 that comes from the .xy part of the vertex shader input.
The layer, taken from the .z of the vertex shader input is passed
separately as a flat integer. If the generated fragment shader does not
use the layer integer, the GLSL linker will eliminate all the dead code
in the vertex shader.
Fixes the new piglit tests "blit-scaled samples=2 with
gl_texture_2d_multisample_array", etc. on i965.
Note for stable maintainer: This patch may depend on 46037237, and that
patch should be safe for stable.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Add comments that link the driver's miptree structures to the hardware
structures documented in the PRM. This provides sorely needed
orientation to developers new to the miptree code. And for miptree
veterans, this clarifies some of the more obscure miptree data.
For each driver struct field that closely corresponds to a
hardware struct field, add a PRM reference to that hardware field's
name. For example,
struct intel_mipmap_tree {
...
/**
* @brief One of GL_TEXTURE_2D, GL_TEXTURE_2D_ARRAY, etc.
*
* @see RENDER_SURFACE_STATE.SurfaceType
* @see RENDER_SURFACE_STATE.SurfaceArray
* @see 3DSTATE_DEPTH_BUFFER.SurfaceType
*/
GLenum target;
...
};
Also annotate the INTEL_MSAA_LAYOUT_* enums with the name of the PRM
sections that documents the layout.
v2: Replace "2D subimage" with "slice", and define what a "slice" is.
For Ben.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> (v1)
Reviewed-by: Ben Widawsky <benjamin.widawsky@intel.com> (v1)
The values of intel_mipmap_tree::align_w and ::align_h correspond to the
hardware enums HALIGN_* and VALIGN_*.
See the confusion?
align_h != HALIGN
align_h == VALIGN
Reduce the confusion by renaming the variables to match the hardware
enum names:
git ls-files |
xargs sed -i -e 's/align_w/halign/g' \
-e 's/align_h/valign/g'
Suggested-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Because that's what it is. It's an untiled, *linear* miptree.
v2:
- Add space after /*.
- Use one comment per function argument.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Acked-by: Ben Widawsky <benjamin.widawsky@intel.com>
The comment for intel_miptree_map::mode claimed that it was a bitmask of
GL_MAP_{READ,WRITE,INVALIDATE}_BIT. In reality, the bitmask may include
any of {GL,BRW}_MAP_*_BIT.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Acked-by: Ben Widawsky <benjamin.widawsky@intel.com>
Clarify that this bit extends the set of GL_MAP_*_BIT enums.
Also fix typo of "temporary".
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Acked-by: Ben Widawsky <benjamin.widawsky@intel.com>
Jason open coded this in 60befc63 when cleaning up some ugly code;
using our existing helper tidies it up a bit more.
v2: Drop inline (suggested by Matt).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
i915 fragment programs utilize the texture coordinate registers
for both texture coordinates and varyings. Unfortunately the
code doesn't check if the same index might be in use for both.
It just naively uses the index to pick a texture unit, which
could lead to collisions.
Add an extra mapping step to allocate non conflicting texture
units for both uses.
The issue can be reproduced with a pair of simple shaders like
these:
attribute vec4 in_mod;
varying vec4 mod;
void main() {
mod = in_mod;
gl_TexCoord[0] = gl_MultiTexCoord0;
gl_Position = gl_ModelViewProjectionMatrix * gl_Vertex;
}
varying vec4 mod;
uniform sampler2D tex;
void main() {
gl_FragColor = texture2D(tex, vec2(gl_TexCoord[0])) * mod;
}
Fixes many piglit tests on i915:
glsl-link-varyings-2
glsl-orangebook-ch06-bump
interpolation-none-gl_frontcolor-smooth-fixed
interpolation-none-gl_frontcolor-smooth-none
interpolation-none-gl_frontcolor-smooth-vertex
interpolation-none-gl_frontsecondarycolor-smooth-fixed
interpolation-none-gl_frontsecondarycolor-smooth-vertex
interpolation-none-gl_frontsecondarycolor-smooth-none
interpolation-none-other-flat-fixed
interpolation-none-other-flat-none
interpolation-none-other-flat-vertex
interpolation-none-other-smooth-fixed
interpolation-none-other-smooth-none
interpolation-none-other-smooth-vertex
v2 [idr]: Minor formatting tweaks.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
I830_UPLOAD_RASTER_RULES and I830_UPLOAD_TEX(0) are trying to occupy
the same bit. Move the texture bits upwards a bit to make room for
I830_UPLOAD_RASTER_RULES.
Now the driver will actually upload the raster rules which is rather
important to get the provoking vertex right. Fixes the appearance
of glxgears teeth on gen2.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Commit 1665d29ee3 introduced an incorrect
format specifier that operates on GLintptr indirect within the function
_mesa_DispatchComputeIndirect().
This patch mitigates the introduced GCC warning:
src/mesa/main/compute.c: In function '_mesa_DispatchComputeIndirect':
src/mesa/main/compute.c:53:7: warning: format '%d' expects argument of type 'int', but argument 3 has type 'GLintptr' [-Wformat=]
_mesa_debug(ctx, "glDispatchComputeIndirect(%d)\n", indirect);
^
v2: Amend for Boyan Ding <boyan.j.ding@gmail.com> feedback.
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
intel_update_winsys_renderbuffer_miptree() will release the existing
miptree when wrapping a new DRI2 buffer, so we can remove the early
release and so prevent a NULL mt dereference should importing the new
DRI2 name fail for any reason. (Reusing the old DRI2 name will result
in the rendering going astray, to a stale buffer, and not shown on the
screen, but it allows us to issue a warning and not crash much later in
innocent code.)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86281
Reviewed-by: Martin Peres <martin.peres@linux.intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
From GLSL 1.50 spec, section 4.1.8 "Structures":
"Structures must have at least one member declaration."
So the base_alignment should be higher than zero.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
If the string being copied is not NULL-terminated the result of
strlen() is undefined.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Fix OpenGL ES 3.1 conformance tests: advanced-readWrite-case1-vsfs
and advanced-matrix-vsfs.
v2:
- Fix SHADER_OPCODE_MEMORY_FENCE emission and the allocation of 'tmp'
(Francisco).
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Tested-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
From ARB_program_interface_query:
"For an active shader storage block member declared as an array, an
entry will be generated only for the first array element, regardless
of its type. For arrays of aggregate types, the enumeration rules are
applied recursively for the single enumerated array element."
v2:
- Simplify 'if' conditions and return true if it is not a buffer
variable, because these rules only apply to buffer variables (Timothy).
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
This matches the function signature created in
lower_ubo_reference_visitor::ssbo_store which has a void return.
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
At least on Intel hardware, gl_PrimitiveIDIn comes in as a special part
of the payload rather than a normal input. This is typically what we
use system values for. Dave and Ilia also agree that a system value
would be nicer.
At some point, we should change it at the GLSL IR level as well. But
that requires changing most of the drivers. For now, let's at least
make NIR do the right thing, which is easy.
v2: Add a comment about not creating a temporary (suggested by Iago).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
For 8-bit RGB(A) texture formats we set the PIPE_BIND_RENDER_TARGET flag
to try to get a hardware format which also supports rendering (for FBO
textures). Do the same thing for floating point formats.
This allows the Redway3D Flat demo to run.
Cc: 10.6 11.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
I've temporarily added code like this many times. Wrap it in a
conditional that can be enabled when needed.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This code also sets cs_prog_data->uses_num_work_groups which is later
used by state setup to indicate that the gl_NumWorkGroups surface
needs to be setup.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
This will only be setup when the prog_data uses_num_work_groups
boolean is set.
At this point nothing will set uses_num_work_groups, but soon code
will set it when emitting code for the intrinsic that loads
gl_NumWorkGroups.
We can't emit this surface information earlier at the start of the
DispatchCompute* call because we may not have generated the program
yet. Until we generate the program, we don't know if the
gl_NumWorkGroups variable is accessed.
We also can't emit the surface as part of the brw_cs_state atom,
because we might not need the surface if gl_NumWorkGroups is not used
by the program.
Lastly, we cannot emit the surface later (after state upload) in the
DispatchCompute* call, because it needs to be run before the
brw_cs_state atom is emitted, since it changes the surface state.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
If glDispatchComputeIndirect is used, then the value for this variable
must be read from the indirect BO.
To allow the same generated code to support indirect and
glDispatchCompute, we will also setup a BO for the number of work
groups using the intel_upload_data mechanism. This will only be
required if the gl_NumWorkGroups variable is accessed.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
We will need this in an atom to setup a surface to read the
gl_NumWorkGroups values from.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Unlike rendering (BINDING_TABLE_POINTERS_*S), compute doesn't have a
binding table pointers command. Instead it is part of the
MEDIA_INTERFACE_DESCRIPTOR structure loaded by the brw_cs_state atom.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
We need to re-emit push constansts when a new batch is started since
the push constants are stored in the batch. We also need to re-emit
the MEDIA_INTERFACE_DESCRIPTOR (in brw_cs_state) since it is stored in
the batch.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Patch also refactors name length queries which were using array size
in computation, this has to be done in same time to avoid regression in
arb_program_interface_query-resource-query Piglit test.
Fixes rest of the failures with
ES31-CTS.program_interface_query.no-locations
v2: make additional check only for GS inputs
v3: create helper function for resource name length
so that it gets calculated only in one place
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Martin Peres <martin.peres@linux.intel.com>
The comment says that it should be impossible for decl_type to be NULL
here, so don't try to handle the case where it is, simply add an assert.
>>> CID 1324977: Null pointer dereferences (FORWARD_NULL)
>>> Comparing "decl_type" to null implies that "decl_type" might be null.
No piglit regressions observed.
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Add an assert on the result of as_dereference() not being NULL:
>>> CID 1324978: Null pointer dereferences (NULL_RETURNS)
>>> Dereferencing a null pointer "deref_record->record->as_dereference()".
Since we are introducing a new variable to hold the result of
as_dereference(), take the opportunity to rename deref_record_type to
interface_type and just name the new variable interface_deref, which is
less confusing.
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
We don't use param in this part of the code, so no point in advancing
the pointer forward:
>>> CID 1324983: Code maintainability issues (UNUSED_VALUE)
>>> Assigning value from "param->get_next()" to "param" here, but that stored value is overwritten before it can be used.
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
v2:
- Add strndup.h to Makefile.sources (Emil)
- Use calloc instead of malloc (Emil).
- Check if allocation fails (Emil, Jose)
- Add '#pragma once' and include stdlib.h to strndup.h (Jose)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92124
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Because it counts shader storage blocks too.
v2:
- Use NumBufferInterfaceBlocks instead (Jordan).
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
NumUniformBlocks also counts shader storage blocks.
NumUniformBlocks variable will be renamed in a later patch to avoid
misunderstandings.
v2:
- Modify the condition to use !IsShaderStorage and the list of
uniform blocks (Timothy)
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
This condition restricts the use of fast copy blit to cases
where starting pixel of src and dst is oword (16 byte) aligned.
Many piglit tests (if using fast copy blit in Mesa) failed earlier
because I missed adding this condition.Fast copy blit is currently
enabled for use only with Yf/Ys tiling.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
The bo will often come from a slab in which case it doesn't matter. But
for larger allocations this will be in its own bo, and we have to make
sure to wait until it's no longer used in order for it to be freed.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
Tested-by: Marcin Ślusarz <marcin.slusarz@gmail.com>
If there is an unflushed fence on the bo, then the resource may still be
used in commands built up in the local pushbuf. Flushing can cause all
sorts of unwanted effects, so just free the bo when the relevant fence
is hit.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
Tested-by: Marcin Ślusarz <marcin.slusarz@gmail.com>
This function isn't specific to miptrees. So, drop the "miptree"
from function name.
V3: Add a comment explaining how the 1D Array texture height and
depth is interpreted by Intel hardware.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
I misinterpreted the alignmnet restriction in XY_FAST_COPY_BLT earlier.
Instead of checking pitch for 64KB alignmnet we need to check it for
tile widh alignment.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Current code checks the alignment restrictions only for Y tiling.
From Broadwell PRM vol 10:
"pitch is of 512Byte granularity for Tile-X: This means the tiled-x
surface pitch can be (512, 1024, 1536, 2048...)/4 (in Dwords)."
This patch adds the restriction for X tiling as well.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chad Versace <chad.versace@intel.com>
It takes care of using the correct tile width if we later use other
tiling patterns for aux miptree.
V2: Remove the comment about using Yf for aux miptree.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
This will require change in the parameters passed to
intel_miptree_get_tile_masks().
V2: Rearrange the order of parameters. (Ben)
Change the name to intel_get_tile_masks(). (Topi)
V3: Use temporary variables in intel_get_tile_masks()
for clarity. Fix mask_y computation.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
V2:
- Do the tile width/height computations in the new helper
function and use it later in intel_miptree_get_tile_masks().
- Change the name to intel_get_tile_dims().
V3: Return the tile_h in number of rows in place of bytes.
Document the units of tile_w, tile_h parameters.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
When validating format+type+internalFormat for texture pixel operations
on GLES3, the effective internal format should be used if the one
specified is an unsized internal format. Page 127, section "3.8 Texturing"
of the GLES 3.0.4 spec says:
"if internalformat is a base internal format, the effective internal
format is a sized internal format that is derived from the format and
type for internal use by the GL. Table 3.12 specifies the mapping of
format and type to effective internal formats. The effective internal
format is used by the GL for purposes such as texture completeness or
type checks for CopyTex* commands. In these cases, the GL is required
to operate as if the effective internal format was used as the
internalformat when specifying the texture data."
v2: Per the spec, Luminance8Alpha8, Luminance8 and Alpha8 should not be
considered sized internal formats. Return the corresponding unsize format
instead.
v4: * Improved comments in
_mesa_es3_effective_internal_format_for_format_and_type().
* Splitted patch to separate chunk about reordering of
error_check_subtexture_dimensions() error check, which is not directly
related with this patch.
v5: Dropped the splitted patch because it was actually a work around 3
dEQP tests that are buggy:
dEQP-GLES2.functional.negative_api.texture.texsubimage2d_neg_offset
dEQP-GLES2.functional.negative_api.texture.texsubimage2d_offset_allowed
dEQP-GLES2.functional.negative_api.texture.texsubimage2d_neg_wdt_hgt
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Tested-by: Mark Janes <mark.a.janes@intel.com>
This function will be needed as part of validating the combination of format,
type and internal format of texture pixel operations, which happens in
glformats files. Specifically, we want to be able to obtain the base format
of a resolved effective internal format, to compare it with the original
internal format passed.
Also, since this function deals solely with GL formats, it fits better in
glformats where the rest of similar format functionality rests.
The function is moved as-is, without any modification.
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Tested-by: Mark Janes <mark.a.janes@intel.com>
The more specific GLES constrains should be checked after the general
validation performed by _mesa_error_check_format_and_type(). This is also
for consistency with the error checks order of glTexSubImage ops.
v3: The change of order uncovered a bug that regresses a couple of piglit
tests written against OpenGL-ES 1.1 spec, which expects an INVALID_VALUE
instead of the INVALID_ENUM returned by _mesa_error_check_format_and_type()
when an invalid format is passed to glTexImage2D. This version of the patch
accounts for those cases.
Fixes 1 dEQP test:
* dEQP-GLES3.functional.negative_api.texture.teximage2d
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Tested-by: Mark Janes <mark.a.janes@intel.com>
IVB and VLV hang sporadically when an untyped surface read or write
message is used to access a surface of format other than RAW, as may
happen when there is a mismatch between the format qualifier of the
image uniform and the format of the actual image bound to the
pipeline. According to the spec this condition gives undefined
results but may not lead to program termination (which is one of the
possible outcomes of the hang). Fix it by checking at runtime whether
the surface is of the right type.
Fixes the "arb_shader_image_load_store.invalid/format mismatch" piglit
subtest.
Reported-by: Mark Janes <mark.a.janes@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91718
CC: mesa-stable@lists.freedesktop.org
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This will be used to share the same logic between buffer and image
creation.
v2: Make memory flag set constants local to validate_flags. (Serge
Martin)
This reverts commit 586142658e.
The specs are not explicit about any restrictions related to the types allowed
on buffer variables, however, the description of opaque types (like atomic
counters) is in conclict with the purpose of buffer variables:
"The opaque types declare variables that are effectively opaque
handles to other objects. These objects are
accessed through built-in functions, not through direct reading or
writing of the declared variable.
(...)
Opaque variables cannot be treated as l-values;(...)"
Also, Mesa is already disallowing opaque types in interface blocks anyway, so
that commit was not really achieving anything.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
With static vertex counts, the final EOT write doesn't actually write
any data - it's just there to end the thread. Typically, the last
thing before ending the thread will be an EmitVertex() call, resulting
in a URB write. We can just set EOT on that.
Note that this isn't always possible - there might be an intervening
SSBO write/image store, or the URB write may have been in a loop.
shader-db statistics for geometry shaders only:
total instructions in shared programs: 3173 -> 3149 (-0.76%)
instructions in affected programs: 176 -> 152 (-13.64%)
helped: 8
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
GS_OPCODE_SET_WRITE_OFFSET is a MUL with a constant src[1] and special
strides. We can easily make the generator handle constant src[0]
arguments by instead generating a MOV with the product of both operands.
This isn't necessarily a win in and of itself - instead of a MUL, we
generate a MOV, which should be basically the same cost. However, we
can probably avoid the earlier MOV to put src[0] into a register.
shader-db statistics for geometry shaders only:
total instructions in shared programs: 3207 -> 3173 (-1.06%)
instructions in affected programs: 3207 -> 3173 (-1.06%)
helped: 11
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Broadwell's 3DSTATE_GS contains new "Static Output" and "Static Vertex
Count" fields, which control a new optimization. Normally, geometry
shaders can output arbitrary numbers of vertices, which means that
resource allocation has to be done on the fly. However, if the number
of vertices is statically known, the hardware can pre-allocate resources
up front, which is more efficient.
Thanks to the new NIR GS intrinsics, this is easy. We just call the
function introduced in the previous commit to get the vertex count.
If it obtains a count, we stop emitting the extra 32-bit "Vertex Count"
field in the VUE, and instead fill out the 3DSTATE_GS fields.
Improves performance of Gl32GSCloth by 5.16347% +/- 0.12611% (n=91)
on my Lenovo X250 laptop (Broadwell GT2) at 1024x768.
shader-db statistics for geometry shaders only:
total instructions in shared programs: 3227 -> 3207 (-0.62%)
instructions in affected programs: 242 -> 222 (-8.26%)
helped: 10
v2: Don't break non-NIR paths (just skip this optimization).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
The visitor was setting a mlen that was wrong for Broadwell, but the
generator was ignoring it and doing the right thing regardless. We may
as well move the logic fully into the visitor. This will be useful in
the next commit as well.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Some hardware (such as Broadwell) can run geometry shaders more
efficiently when the number of vertices emitted is statically known.
This pass provides a way to obtain the constant vertex count, or
-1 indicating that the vertex count is unknown/non-constant.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
The old code was disasterously complex - spread across multiple atoms
which may not even run, inspecting the dirty bits to try and decide
whether it was necessary to do checks...storing VS information in
brw_context...extra flagging...
This code tripped me and Carl up very badly when working on the
shader cache code. It's very fragile and hard to maintain.
Now that geometry shaders only depend on their inputs and don't have
to worry about the VS VUE map, we can dramatically simplify this:
just compute the VUE map coming out of the geometry shader stage
in brw_upload_programs. If it changes, flag it. Done.
v2: Also check vue_map.separable.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Because we only support geometry shaders in core profile, we can safely
ignore any driver-extending of VS outputs.
Those are:
- Legacy userclipping (doesn't exist in core profile)
- Edgeflag copying (Gen4-5 only, no GS support)
- Point coord replacement (Gen4-5 only, no GS support)
- front/back color hacks (Gen4-5 only, no GS support)
v2: Rebase; leave a comment about why SSO works.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Previously, our VUE map code always assigned slots to varyings
sequentially, in one contiguous block.
This was a bad fit for separate shaders - the GS input layout depended
or the VS output layout, so if we swapped out vertex shaders, we might
have to recompile the GS on the fly - which rather defeats the point of
using separate shader objects. (Tessellation would suffer from this
as well - we could have to recompile the HS, DS, and GS.)
Instead, this patch makes the VUE map for separate shaders use a fixed
layout, based on the input/output variable's location field. (This is
either specified by layout(location = ...) or assigned by the linker.)
Corresponding inputs/outputs will match up by location; if there's a
mismatch, we're allowed to have undefined behavior.
This may be less efficient - depending what locations were chosen, we
may have empty padding slots in the VUE. But applications presumably
use small consecutive integers for locations, so it hopefully won't be
much worse in practice.
3% of Dota 2 Reborn shaders are hurt, but only by 2 instructions.
This seems like a small price to pay for avoiding recompiles.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Our plan of assigning consecutive slots doesn't work properly for
separate shader objects - at least, if we want to avoid recompiling them
whenever the interface changes.
As a first step, make assign_vue_map take an explicit slot parameter,
rather than implicitly incrementing it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Nothing actually relies on unused slots being initialized to
BRW_VARYING_SLOT_COUNT. Soon, we're going to have VUE maps with holes
in them, at which point pre-filling with BRW_VARYING_SLOT_PAD make a lot
more sense.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
We can't just break for padding slots. Instead, treat them like
unwritten output variables, so we handle flushing and incrementing
urb_offset correctly.
Paul introduced the concept of padding slots back in 2011, but we've
never actually used them for anything. So it's unsurprising that the
scalar VS backend didn't handle them quite right.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Passing NULL to C11 threads functions isn't safe, so there's no need for
our implementation to handle it. Cuts about 1k of .text.
text data bss dec hex filename
5009514 198440 26328 5234282 4fde6a i965_dri.so before
5008346 198440 26328 5233114 4fd9da i965_dri.so after
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Brian Paul <brianp@vmware.com>
Commit 1ca25ab (glsl: Do not eliminate 'shared' or 'std140' blocks
or block members) considered as active 'shared' and 'std140' uniform
blocks and uniform block arrays, but did not include the block array
elements. Because of that, it was possible to have an active uniform
block array without any elements marked as used, making the assertion
((b->num_array_elements > 0) == b->type->is_array())
in link_uniform_blocks() fail.
Fixes the following 5 dEQP tests:
* dEQP-GLES3.functional.ubo.random.nested_structs_instance_arrays.18
* dEQP-GLES3.functional.ubo.random.nested_structs_instance_arrays.24
* dEQP-GLES3.functional.ubo.random.nested_structs_arrays_instance_arrays.19
* dEQP-GLES3.functional.ubo.random.all_per_block_buffers.49
* dEQP-GLES3.functional.ubo.random.all_shared_buffer.36
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83508
Tested-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
v2:
- Add tessellation shader constants support
v3:
- Add GLES 3.1 support.
v4:
- Move the getters to the proper place
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Including TOP_LEVEL_ARRAY_SIZE and TOP_LEVEL_ARRAY_STRIDE queries.
v2:
- Use std430_array_stride() to get top level array stride following std430's rules.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
The error location won't be right, but fixing that would require to check
for this as we process each type of AST node that can involve a variable
read.
v2:
- Limit the check to buffer variables, image variables have different
semantics involved.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
v2:
- Merge the error check for the readonly qualifier with the already
existing check for variables flagged as readonly (Timothy).
- Limit the check to buffer variables, image variables have different
semantics involved (Curro).
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
v2:
- Memory qualifiers on shader storage buffer objects do not come in the form
of layout qualifiers, they are block-level qualifiers.
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
v2:
- Save memory qualifier info in the top level members of a shader
storage block.
- Add a checks to record_compare() which is used when comparing
shader storage buffer declarations in different shaders.
- Always report an error for incompatible readonly/writeonly
definitions, whether they are present at block or field level.
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
According to ARB_uniform_buffer_object spec:
"If the parameter (starting offset or size) was not specified when the
buffer object was bound (e.g. if bound with BindBufferBase), or if no
buffer object is bound to <index>, zero is returned."
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
These handle querying the buffer name attached to a giving binding point
as well as the start offset and size of that buffer.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Defined in ARB_shader_storage_buffer_object extension.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
v2:
- Add ssbo_in the names of the static functions so it is clear that this
is specific to SSBO atomics.
v3:
- Move the check after the loop (Kristian Høgsberg)
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
The original GLSL IR intrinsics have been lowered to an internal
version that accepts a block index and an offset instead of a
SSBO reference.
v2 (Connor):
- Document the sources used by the atomic intrinsics.
Reviewed-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
The first argument to SSBO atomics is a reference to a SSBO buffer variable
so we want to compute its block index and offset and provide these values
to an internal version of the intrinsic that takes them instead of the
buffer variable reference.
v2:
- Support single components of integer vectors to be passed in as arguments.
- Get interface packing information from interface's type.
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
In a later commit we will need to handle ir_swizzle nodes too, which are
not an ir_dereference. That can happen, for example, when we pass a
component of an integer vector as argument to any of the SSBO atomic
functions.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Shader Storage Buffer Object will add new atomic functions that are not
associated with counters, so better have atomic counter-specific functions
explicitly include the word "counter" in their names.
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
This patch moves nir_instr_insert_after_cf_list call into each case
in the intrinsics switch at nir_visitor::visit(ir_call *ir) and
define a nir_dest variable which will be used when handling
ir->return_deref after the switch.
This patch simplifies the code for nir_intrinsic_load_ssbo
implementation changes we are going to do next.
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
v2 (Connor):
- Make the STORE() macro take arguments for the extra sources (and their
size) and any extra indices required.
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Implement helper functions that can be used to construct and send
untyped and typed surface read, write and atomic messages to the
shared dataport unit.
v2: Split from the FS implementation.
v3: Rewrite to avoid evil array_reg, emit_collect and emit_zip.
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
These functions handle the conversion of a vec4 into the form expected
by the dataport unit in message and message return payloads. The
conversion is not always trivial because some messages don't support
SIMD4x2 for some generations, in which case a strided copy may be
necessary.
v2: Split from the FS implementation.
v3: Rewrite to avoid evil array_reg, emit_collect and emit_zip.
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
See "i965/fs: Introduce FS IR builder." for the rationale.
v2: Drop scalarizing VEC4 builder.
v3: Take a backend_shader as constructor argument. Improve handling
of debug annotations and execution control flags. Rename "instr"
variable. Initialize cursor to NULL by default and add method to
explicitly point the builder at the end of the program.
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Notice that we should differentiate between shader storage blocks and
uniform blocks, since they have different limits.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Otherwise, generate a link time error as per the
ARB_shader_storage_buffer_object spec.
v2:
- Fix error message (Jordan)
v3:
- Move std140_size() changes to its own patch (Kristian)
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
v2:
- Get interface packing information from interface's type, not the
variable type.
- Simplify is_std430 condition in emit_access() for readability (Jordan)
- Add a commment explaing why array of three-component vector case is
different in std430 than the rest of cases.
- Add calls to std430_array_stride().
v3:
- Simplify size_mul change for std430's case (Jordan)
- Fix commit log lines length (Jordan)
- Pass 'packing' instead of 'is_std430' to emit_access() (Kristian)
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
They are used to calculate the offset, array stride of uniform/shader
storage buffer variables. Take into account this info to get the right
value for std430.
v2:
- Fix commit log line length and indention. (Jordan)
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
v2:
- Fix a missing check in has_layout()
v3:
- Mention shader storage block in error message for layout qualifiers
(Kristian).
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
They are used to calculate size, base alignment and array stride values
for a glsl_type following std430 rules.
v2:
- Paste OpenGL 4.3 spec wording as it mentions stride of array. (Jordan)
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
This kind of definitions:
layout(xxx) buffer;
was not supported by commit 84fc5fece0.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Also if GL_ARB_shading_language_420pack extension is enabled.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
The returned drm buffer object has a size multiple of 4096 but that should not
be exposed to the API user, which is working with a different size.
As far as I can see this problem is only visible in the calculation of the
length of unsized arrays used in SSBOs, as the implementation of this needs
to query the underlying buffer size via a message.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Otherwise we can expect odd things to happen if, for example, we ask
for the size of the attached buffer from shader code, since that
might query this value from the surface we uploaded and get random
results.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
v2:
- Remove inst->regs_written assignment as the instruction only
writes to one register.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Notice that Skylake needs to include a header in the sampler message
so it will need some tweaks to work there.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
This is how backends provide the buffer size required to compute
the size of unsized arrays in the previous patch
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
v2:
- Reduce the number of lines over 80 character line width
limit. (Thomas Hellan)
v3:
- Inject the formula to compute the array length in the IR, backends
only need to provide the buffer size (Curro)
- Create an auxiliary function to simplify code (Jordan Justen)
- Rename variables (Jordan Justen)
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
The unsized array length is computed with the following formula:
array.length() =
max((buffer_object_size - offset_of_array) / stride_of_array, 0)
Of these, only the buffer size needs to be provided by the backends, the
frontend already knows the values of the two other variables.
This patch identifies the cases where we need to get the length of an
unsized array, injecting ir_unop_ssbo_unsized_array_length expressions
that will be lowered (in a later patch) to inject the formula mentioned
above.
It also adds the ir_unop_get_buffer_size expression that drivers will
implement to provide the buffer length.
v2:
- Do not define a triop that will force backends to implement the
entire formula, they should only need to provide the buffer size
since the other values are known by the frontend (Curro).
v3:
- Call state->has_shader_storage_buffer_objects() in ast_function.cpp instead
of using state->ARB_shader_storage_buffer_object_enable (Tapani).
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
They only can be defined in the last position of the shader
storage blocks.
When an unsized array is used in different shaders, it might be
converted in different sized arrays, avoid get a linker error
in that case.
v2:
- Rework error condition and error messages (Timothy Arceri)
v3:
- Move OpenGL ES check to its own patch.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Buffer variables are the same as uniforms, only that read/write, so we want
the same treatment.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Since these are a special kind of UBOs we emit them together reusing the
same infrastructure, however, we use a RAW surface so we can reuse
existing untyped read/write/atomic messages which include a pixel mask
header that we need to set to obtain correct behavior with helper
invocations of the fragment shader.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
v2:
- Set it after the driver's MaxShaderStorageBuffers value assignment.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
We use the same dirty state for SSBOs and UBOs because they share the
same infrastructure.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
This should be a cacheline (64 bytes) so that we can safely have the
CPU and GPU writing the same SSBO on non-cachecoherent systems (our
Atom CPUs). With UBOs, the GPU never writes, so there's no
problem. For an SSBO, the GPU and the CPU can be updating disjoint
regions of the buffer simultaneously and that will break if the
regions overlap the same cacheline.
v2:
- Use cacheline size (64 bytes) instead of 16 bytes (Kristian).
- Update commit log and add a comment in the code explaining
why we use cacheline size (Ben).
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
This makes sure that user is still able to query properties about
variables that have gotten packed by lower_packed_varyings pass.
Fixes following OpenGL ES 3.1 test:
ES31-CTS.program_interface_query.separate-programs-vertex
v2: fix 'name included in packed list' check (Ilia Mirkin)
v3: iterate over instances of name using strtok_r (Ilia Mirkin)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
This is required to store information about packed varyings, currently
these variables get lost and cannot be retrieved later in sensible way
for program interface queries. List will be utilized by next patch.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
If the immutable compressed texture didn't have the full mip pyramid,
this didn't work, because it tried to generate mip levels for non-existing
levels. _mesa_prepare_mipmap_level() would correctly handle this by returning
FALSE if the mip level didn't exist, however we actually created the
non-existing mip level right before that because we used _mesa_get_tex_image()
before calling _mesa_prepare_mipmap_level(). It would then proceed to crash
(we allocated the mip level, which is a bad idea on an immutable texture,
but didn't initialize the values, leading to assertion failures or segfaults).
Fix this by using _mesa_select_tex_image() instead and call it after
_mesa_prepare_mipmap_level(), as that function will allocate missing mip levels
for non-immutable textures already.
This fixes a (2 year old) crash with astromenace which was hack-fixed in ubuntu
packages instead: http://bugs.debian.org/718680 (I guess most apps do full mip
chains - I believe this app not doing it is actually unintentional, always one
level less than full mip chain...).
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
... with only ARB_shader_atomic_counters.
I expected to see interactions with ARB_tessellation_shader in the
ARB_shader_atomic_counters spec, but they do not exist. It seems that we
should unconditionally expose these variables in the presence of
ARB_shader_atomic_counters:
gl_MaxTessControlAtomicCounters
gl_MaxTessEvaluationAtomicCounters
This partially reverts commit da7adb99e8. The commit also affected
gl_MaxTessControlImageUniforms and gl_MaxTessEvaluationImageUniforms
similarly but the ARB_shader_image_load_store spec does list an
interaction with ARB_tessellation_shader.
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92095
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Without this commit, copy propagation is discarded if it involves
a uniform with an instruction that has 3 sources. But 3 sourced
instructions can access scalar values.
For example, this is what vec4_visitor::fix_3src_operand() is already
doing:
if (src.file == UNIFORM && brw_is_single_value_swizzle(src.swizzle))
return src;
Shader-db results (unfiltered) on NIR:
total instructions in shared programs: 6259650 -> 6241985 (-0.28%)
instructions in affected programs: 812755 -> 795090 (-2.17%)
helped: 7930
HURT: 0
Shader-db results (unfiltered) on IR:
total instructions in shared programs: 6445822 -> 6441788 (-0.06%)
instructions in affected programs: 296630 -> 292596 (-1.36%)
helped: 2533
HURT: 0
v2:
- Updated commit message, using Matt Turner suggestions
- Move the check after we've created the final value, as Jason
Ekstrand suggested
- Clean up the condition
v3:
- Move the check back to the original place, to keep things
tidy, as suggested by Jason Ekstrand
v4:
- Fixed missing is_single_value_swizzle() as pointed by Jason Ekstrand
Reviewed-by: Matt Turner <mattst88@gmail.com>
When we assign hw regs to attributes, we don't incorporate the stride
and subreg_offset from the fs_reg. It's rarely used, but the integer
multiplication lowering uses unusual stride and subreg_offset
combination breaks when one source is an attribute.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91970
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Previously, core Mesa's _mesa_CopyImageSubData() created temporary textures
to wrap renderbuffer sources/destinations. This caused a bit of a mess in
the Mesa/gallium state tracker because we had to basically undo that
wrapping.
Instead, change ctx->Driver.CopyImageSubData() to take both gl_renderbuffer
and gl_texture_image src/dst pointers (one being null, the other non-null)
so the driver can handle renderbuffer vs. texture as needed.
For the i965 driver, we basically moved the code that wrapped textures
around renderbuffers from copyimage.c down into the met and driver code.
The old code in copyimage.c also made some questionable calls to
_mesa_BindTexture(), etc. which weren't undone at the end.
v2 (Jason Ekstrand): Rework the intel bits
v3 (Brian Paul): Update the temporary st_CopyImageSubData() function.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Tested-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Tested-by: Nick Sarnie <commendsarnex@gmail.com>
Check for PIPE_FORMAT_R8_UNORM when setting up the copy shader.
Also re-enable the dest alpha blending with A8 destination that
actually turned out to be correct.
Verified using rendercheck that the composite operators
overreverse, in, out, atop, atopreverse and xor seem to work fine
with a8 destiation.
v2: Fix a copy-paste error.
Reported-by: Jose Fonseca <jfonseca@vmware.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
It doesn't matter whether a write is saturated or not, in another
implementation it might even have been a separate opcode. This code was
most likely copied from the copy-propagation pass (where one does have
to distinguish saturation).
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
I left a bunch of code indented a level in the previous patch to make
the diff easier to read. But now we should fix that.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
By performing the vertex counting in NIR, we're able to elide a ton of
useless safety checks around every EmitVertex() call:
total instructions in shared programs: 3952 -> 3720 (-5.87%)
instructions in affected programs: 3491 -> 3259 (-6.65%)
helped: 11
HURT: 0
Improves performance in Gl32GSCloth by 0.671742% +/- 0.142202% (n=621)
on Haswell GT3e at 1024x768.
This should also make it easier to implement Broadwell's "Static Vertex
Count" feature someday.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This patch also introduces a lowering pass to convert the simple GS
intrinsics to the new ones. See the comments above that for the
rationale behind the new intrinsics.
This should be useful for i965; it's a generic enough mechanism that I
could see other drivers potentially using it as well, so I don't feel
too bad about putting it in the generic code.
v2:
- Use nir_after_block_before_jump for the cursor (caught by Jason
Ekstrand - I'd mistakenly used nir_after_block when rebasing this
code onto the new NIR control flow API).
- Remove the old emit_vertex intrinsic at the end, rather than in
the middle (requested by Jason).
- Use state->... directly rather than locals (requested by Jason).
- Report progress from nir_lower_gs_intrinsics() (requested by me).
- Remove "Authors:" section from file comment (requested by
Michael Schellenberger Costa).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
The NIR control flow modification API churns the block structure,
splitting blocks, stitching them back together, and so on. Preserving
information about block dominance is hard (and probably not worthwhile).
This patch makes nir_cf_extract() throw away all metadata, like we do
when adding/removing jumps.
We then make the dead control flow pass compute dominance information
right before it uses it. This is necessary because earlier work by the
pass may have invalidated it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Calling unlink_blocks(block, block->successors[0]) will successfully
unlink the first successor, but then will shift block->successors[1]
down to block->successor[0]. So the successors[1] != NULL check will
always fail.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Consider the case of "while (...) { break }". Or in NIR:
block block_0 (0x7ab640):
...
/* succs: block_1 */
loop {
block block_1:
/* preds: block_0 */
break
/* succs: block_2 */
}
block block_2:
Calling nir_handle_remove_jump(block_1, nir_jump_break) will remove the break.
Unfortunately, it would mangle the predecessors and successors.
Here, block_2->predecessors->entries == 1, so we would create a fake
link, setting block_1->successors[1] = block_2, and adding block_1 to
block_2's predecessor set. This is illegal: a block cannot specify the
same successor twice. In particular, adding the predecessor would have
no effect, as it was already present in the set.
We'd then call unlink_block_successors(), which would delete the fake
link and remove block_1 from block_2's predecessor set. It would then
delete successors[0], and attempt to remove block_1 from block_2's
predecessor set a second time...except that it wouldn't be present,
triggering an assertion failure.
The fix appears to be simple: simply unlink the block's successors and
recreate them to point at the correct blocks first. Then, add the fake
link. In the above example, removing the break would cause block_1 to
have itself as a successor (as it becomes an infinite loop), so adding
the fake link won't cause a duplicate successor.
v2: Add comments (requested by Connor Abbott) and fix commit message.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
There is a bug where we mess up predecessors/successors due to the
ordering of unlinking/recreating edges/adding fake edges. In order to
fix that, I need everything in one routine.
However, calling block_add_normal_succs() isn't safe from
cleanup_cf_node() - it would crash trying to insert phi undefs.
So unfortunately I need to add a parameter.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Consider the following NIR:
block block_0;
/* succs: block_1 block_2 */
if (...) {
block block_1;
...
} else {
block block_2;
}
Calling split_block_beginning() on block_1 would break block_0's
successors: link_block() sets both successors of a block, so calling
link_block(block_0, new_block, NULL) would throw away the second
successor, leaving only /* succ: new_block */. This is invalid: the
block before an if statement must have two successors.
Changing the call to link_block(pred, new_block, pred->successors[0])
would correctly leave both successors in place, but because unlink_block
may shift successor[1] to successor[0], it may not preserve the original
order. NIR maintains a convention that successor[0] must point to the
"then" block, while successor[1] points to the "else" block, so we need
to take care to preserve this ordering.
This patch creates a new function that swaps out one successor for
another, preserving the ordering. It then uses this to fix the issue.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
I need to do this in a second place, and I'd rather make a helper
function than cut and paste the code.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This is invalid, and causes disasters if we try to unlink successors:
removing the first will work, but removing the second copy will fail
because the block isn't in the successor's predecessor set any longer.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
It's possible that, if a vecN operation is involved in a phi node, that we
could end up moving from a register to itself. If swizzling is involved,
we need to emit the move but. However, if there is no swizzling, then the
mov is a no-op and we might as well not bother emitting it.
Shader-db results on Haswell:
total instructions in shared programs: 6262536 -> 6259558 (-0.05%)
instructions in affected programs: 184780 -> 181802 (-1.61%)
helped: 838
HURT: 0
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
I don't know of any piglit tests that are currently broken. However, there
is nothing stopping a vecN instruction from getting source modifiers and
lower_vec_to_movs is run after we lower to source modifiers.
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:83:28: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
nr = MIN2(currentsz, count - j);
^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:83:55: warning: signed and unsigned type in conditional expression [-Wsign-compare]
nr = MIN2(currentsz, count - j);
^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:116:25: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
nr = MIN2(currentsz, count - j);
^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:116:52: warning: signed and unsigned type in conditional expression [-Wsign-compare]
nr = MIN2(currentsz, count - j);
^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:140:25: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
nr = MIN2(currentsz, count - j);
^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:140:52: warning: signed and unsigned type in conditional expression [-Wsign-compare]
nr = MIN2(currentsz, count - j);
^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h: In function 'intel_render_line_loop_verts':
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:174:28: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
nr = MIN2(currentsz, count - j);
^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:174:55: warning: signed and unsigned type in conditional expression [-Wsign-compare]
nr = MIN2(currentsz, count - j);
^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:224:25: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
nr = MIN2(currentsz, count - j);
^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:224:52: warning: signed and unsigned type in conditional expression [-Wsign-compare]
nr = MIN2(currentsz, count - j);
^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:255:25: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
nr = MIN2(currentsz, count - j);
^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:255:52: warning: signed and unsigned type in conditional expression [-Wsign-compare]
nr = MIN2(currentsz, count - j);
^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:281:25: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
nr = MIN2(currentsz, count - j + 1);
^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:281:56: warning: signed and unsigned type in conditional expression [-Wsign-compare]
nr = MIN2(currentsz, count - j + 1);
^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h: In function 'intel_render_poly_verts':
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:313:28: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
nr = MIN2(currentsz, count - j + 1);
^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:313:59: warning: signed and unsigned type in conditional expression [-Wsign-compare]
nr = MIN2(currentsz, count - j + 1);
^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:365:28: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
nr = MIN2(currentsz, count - nr);
^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:365:56: warning: signed and unsigned type in conditional expression [-Wsign-compare]
nr = MIN2(currentsz, count - nr);
^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:83:28: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
nr = MIN2(currentsz, count - j);
^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:83:55: warning: signed and unsigned type in conditional expression [-Wsign-compare]
nr = MIN2(currentsz, count - j);
^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:116:25: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
nr = MIN2(currentsz, count - j);
^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:116:52: warning: signed and unsigned type in conditional expression [-Wsign-compare]
nr = MIN2(currentsz, count - j);
^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:140:25: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
nr = MIN2(currentsz, count - j);
^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:140:52: warning: signed and unsigned type in conditional expression [-Wsign-compare]
nr = MIN2(currentsz, count - j);
^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h: In function 'radeon_dma_render_line_loop_verts':
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:174:28: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
nr = MIN2(currentsz, count - j);
^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:174:55: warning: signed and unsigned type in conditional expression [-Wsign-compare]
nr = MIN2(currentsz, count - j);
^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:224:25: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
nr = MIN2(currentsz, count - j);
^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:224:52: warning: signed and unsigned type in conditional expression [-Wsign-compare]
nr = MIN2(currentsz, count - j);
^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:255:25: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
nr = MIN2(currentsz, count - j);
^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:255:52: warning: signed and unsigned type in conditional expression [-Wsign-compare]
nr = MIN2(currentsz, count - j);
^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:281:25: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
nr = MIN2(currentsz, count - j + 1);
^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:281:56: warning: signed and unsigned type in conditional expression [-Wsign-compare]
nr = MIN2(currentsz, count - j + 1);
^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h: In function 'radeon_dma_render_poly_verts':
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:313:28: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
nr = MIN2(currentsz, count - j + 1);
^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:313:59: warning: signed and unsigned type in conditional expression [-Wsign-compare]
nr = MIN2(currentsz, count - j + 1);
^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:365:28: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
nr = MIN2(currentsz, count - nr);
^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:365:56: warning: signed and unsigned type in conditional expression [-Wsign-compare]
nr = MIN2(currentsz, count - nr);
^
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Two drivers use this file, and neither supports ELTs.
No piglit regressions on i915 (G33) or radeon (Radeon 7500).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Two drivers use this file, and both support triangle fans.
No piglit regressions on i915 (G33) or radeon (Radeon 7500).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
v2: Fix '- nr' typo noticed by Marius.
No piglit regressions on i915 (G33) or radeon (Radeon 7500).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com> [v1]
Two drivers use this file, and both support triangle strips.
No piglit regressions on i915 (G33) or radeon (Radeon 7500).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Two drivers use this file, and both support triangles.
No piglit regressions on i915 (G33) or radeon (Radeon 7500).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Two drivers use this file, and both support line strips.
No piglit regressions on i915 (G33) or radeon (Radeon 7500).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Two drivers use this file, and both support lines.
No piglit regressions on i915 (G33) or radeon (Radeon 7500).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Two drivers use this file, and neither supports quads.
No piglit regressions on i915 (G33) or radeon (Radeon 7500).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Two drivers use this file, and neither supports quad strips.
No piglit regressions on i915 (G33) or radeon (Radeon 7500).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The value passed in count previously was "vertex after the last vertex
to be processed." Calling that "count" was misleading and kind of mean.
Looking at the code, many functions immediately do "count-start" to get
back the true count. That's just silly.
If it is better for the loops to be 'for (j = start; j < (start +
count); j++)', GCC will do that transformation.
NOTE: There is some strange formatting left by this patch. That was
done to make it more obvious that the before and after code is
equivalent. These will be fixed in the next patch.
No piglit regressions on i915 (G33) or radeon (Radeon 7500).
v2: Fix a remaining (count-start) in render_quad_strip_verts.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com> [v1]
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
From section 9.2. Binding and Managing Framebuffer Objects:
"Upon successful return from Get*FramebufferAttachmentParameteriv, if
pname is FRAMEBUFFER_ATTACHMENT_OBJECT_TYPE, then params will contain
one of NONE, FRAMEBUFFER_DEFAULT, TEXTURE, or RENDERBUFFER, identifying
the type of object which contains the attached image."
And then it clarifies further:
"If the value of FRAMEBUFFER_ATTACHMENT_OBJECT_TYPE is NONE, then
either no framebuffer is bound to target; or the default framebuffer is
bound, attachment is DEPTH or STENCIL, and the number of depth or stencil
bits, respectively, is zero"
Currently, if the default framebuffer is bound, we always return
GL_FRAMEBUFFER_DEFAULT for FRAMEBUFFER_ATTACHMENT_OBJECT_TYPE, but
according to the spec, when GL_DEPTH or GL_STENCIL attachments are
the ones being queried, we should return GL_NONE if they don't exist.
Fixes the following dEQP test:
dEQP-GLES3.functional.state_query.fbo.framebuffer_attachment_x_size_initial
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "10.6" <mesa-stable@lists.freedesktop.org>
Patch fixes a crash in conformance test that tries out different
invalid arguments for glShaderSource and glGetShaderSource:
ES2-CTS.gtf.GL.glGetShaderSource.getshadersource_programhandle
This is a regression from commit:
04e201d0c0
Additions in v2 also fix following failing deqp test:
dEQP-GLES[2|3].functional.negative_api.shader.shader_source
v2: cleanup function, do check earlier (Iago Toral)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
We don't use any of the code after the switch anyway. Since we check for
num_components == 1 and early-return, it doesn't get executed so
everything's ok. However, it makes it much clearer what's going on if we
simply do an early return.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2 (Ken):
- Squash together commits for HS, DS, and TE, as well as fixes.
- Add INTEL_MASK variants so we can use SET_FIELD if we want.
- Rename GEN7_HS_INSTANCE_CONTROL to GEN7_HS_INSTANCE_COUNT to match
the documentation.
- Add some more fields from the PRMs.
- Add Broadwell variants.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
"r600g: apply disable workaround on all scissors" forgot to update
num_dw, fix it.
Fixes: fbb423b433 "r600g: apply disable workaround on all scissors"
Reported-and-tested-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Now it is more similar to brw_fs_copy_propagation, with three
clear stages:
1) Build up the value we are propagating as if it were the source of a
single MOV:
2) Check that we can propagate that value
3) Build the final value
Previously everything was somewhat messed up, making the
implementation on some specific cases, like knowing if you can
propagate from a previous instruction even with type mismatches, even
messier (for example, with the need of maintaining more of one
has_source_modifiers). The refactoring clears stuff, and gives
support to this mentioned use case without doing anything extra
(for example, only one has_source_modifiers is used).
Shader-db results for vec4 programs on Haswell:
total instructions in shared programs: 1683842 -> 1669037 (-0.88%)
instructions in affected programs: 739837 -> 725032 (-2.00%)
helped: 6237
HURT: 0
v2: using 'arg' index to get the from inst was wrong
v3: rebased against last change on the previous patch of the series
v4: don't need to track instructions on struct copy_entry, as we
only set the source on a direct copy
v5: change the approach for a refactoring
v6: tweaked comments
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
The function was a no-op and if the ctx->Driver.BindFramebuffer pointer
is null, Mesa won't try to use it.
Reviewed-by: Matt Turner <mattst88@gmail.com>
According to OpenGL ES 3.1 specification 10.3.1:
"An INVALID_OPERATION error is generated if buffer is not zero
or a name returned from a previous call to GenBuffers,
or if such a name has since been deleted with DeleteBuffers."
This error check was previously limited to OpenGL Core.
Signed-off-by: Marta Lofstedt <marta.lofstedt@intel.com>
The only functional change here is that we now set EmitNoIndirectOutput and
EmitNoIndirectTemp for compute shaders. Compute shaders don't have outputs
per-se and we should have been setting EmitNoIndirectTemp all along.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
All SKL SKUs except the lowest one which has half the L3 size actually have 384K
of URB per slice.
For once, I can explain how this mistake was made and how it was missed in
review... Historically when we enable a platform and put the production sizes,
you can simply look at the "smallest" SKU and see what its URB size is (and we
assumed it was the 1 slice variant). Since on newer platforms the URB sizes are
scaled automatically by HW, this was sufficient. On SKL, this is a bit different
as the lowest SKU actually has half of the L3 fused off. GT2 is the 1 slice (not
GT1) variant and it has 384K.
There are no Jenkins tests fixed (or regressions) and we don't expect any fixes
here because you can always run with less URB size.
Thanks to Sarah for bringing this to my attention.
Cc: Sarah Sharp <sarah.a.sharp@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Designated initializers are not allowed in C++ (not even C++11). Since
nir_lower_samplers is now using nir_builder, and nir_lower_samplers is in
C++, this breaks the build on some compilers. Aparently, GCC 5 allows it
in some limited extent because mesa still builds on my system without this
patch.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92052
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
With the only C++ function having its own wrapper we can 'demote' this
file to a normal C one. This allows us to get rid of extern C { #include
<foo.h> } 'hacks'. Plus some of the headers may use C99 initializers,
which are not supported by the ISO standard.
This may cause build issue on incremental builds. If so run the
following:
sed -i -e 's|samplers\.cpp|samplers.c|' src/glsl/nir/.deps/nir_lower_samplers.Plo
Fixes: ef8eebc6ad5(nir: support indirect indexing samplers in struct arrays)
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reported-by: Gottfried Haider <gottfried.haider@gmail.com>
Tested-by: Gottfried Haider <gottfried.haider@gmail.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
compr4 is represented by setting the high bit on the MRF number.
We need to mask it out before sanity checking the register number.
Fixes ~8000 assert fails on Ironlake and G45.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92066
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
res_ptr already contains the resource values. fmask_ptr needs to be
looked up relative to the start of the resource params.
Note that this only affects indirect loads of MS sampler arrays.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
There are some bug reports about shaders failing to compile in gen6
because MRF 14 is used when we need to spill. For example:
https://bugs.freedesktop.org/show_bug.cgi?id=86469https://bugs.freedesktop.org/show_bug.cgi?id=90631
Discussion in bugzilla pointed to the fact that gen6 might actually have
24 MRF registers available instead of 16, so we could use other MRF
registers and avoid these conflicts (we still need to investigate why
some shaders need up to MRF 14 anyway, since this is not expected).
Notice that the hardware docs are not clear about this fact:
SNB PRM Vol4 Part2's "Table 5-4. MRF Registers Available in Device
Hardware" says "Number per Thread" - "24 registers"
However, SNB PRM Vol4 Part1, 1.6.1 Message Register File (MRF) says:
"Normal threads should construct their messages in m1..m15. (...)
Regardless of actual hardware implementation, the thread should
not assume th at MRF addresses above m15 wrap to legal MRF registers."
Therefore experimentation was necessary to evaluate if we had these extra
MRF registers available or not. This was tested in gen6 using MRF
registers 21..23 for spilling and doing a full piglit run (all.py) forcing
spilling of everything on the FS backend. It was also tested by doing
spilling of everything on both the FS and the VS backends with a piglit run
of shader.py. In both cases no regressions were observed. In fact, many of
these tests where helped in the cases where we forced spilling, since that
triggered the same underlying problem described in the bug reports. Here are
some results using INTEL_DEBUG=spill_fs,spill_vec4 for a shader.py run on
gen6 hardware:
Using MRFs 13..15 for spilling:
crash: 2, fail: 113, pass: 6621, skip: 5461
Using MRFs 21..23 for spilling:
crash: 2, fail: 12, pass: 6722, skip: 5461
This patch sets the ground for later patches to implement spilling
using MRF registers 21..23 in gen6.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In a later patch we will make BRW_MAX_MRF return a different value depending
on the hardware generation, but it is inconvenient to add a gen parameter
to the brw_reg functions only for the assertions, so move these to places where
we have the hardware generation available.
Ken suggested to add the asserts to brw_set_src0 and brw_set_dest since that
would make sure that we catch all uses of MRF registers, even those coming
from modules that generate native code directly, like blorp. Unfortunately,
this is very late in the process which can make things harder to debug, so add
asserts to the generator as well.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Until now we only used MRFs 1..15 for regular SEND messages, so the
message length could not possibly exceed the maximum size. Soon we'll
allow to use MRF registers 1..23 in gen6, so we need to be careful
not to build messages that can go beyond the limit. That could occur,
specifically, when building URB write messages, which we may need to
split in chunks due to their size. Previously we would simply go and
create a new message when we reached MRF 13 (since 13..15 were
reserved for spilling), now we also want to check the size of the
message explicitly.
Besides adding that condition to split URB write messages properly,
this patch also adds asserts in the generator. Notice that
brw_inst_set_mlen already asserts for this, but asserting in the
generators is easy and can make debugging easier in some cases.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Not something actually hit in real life (now state is never non-null,
but only case state->syms is null is if nir_print_instr() path). But it
was something I overlooked the first time, so might as well fix it.
*** CID 1324642: Null pointer dereferences (REVERSE_INULL)
/src/glsl/nir/nir_print.c: 299 in print_var_decl()
293
294 fprintf(fp, " (%s, %u)", loc, var->data.driver_location);
295 }
296
297 fprintf(fp, "\n");
298
>>> CID 1324642: Null pointer dereferences (REVERSE_INULL)
>>> Null-checking "state" suggests that it may be null, but it has already been dereferenced on all paths leading to the check.
299 if (state) {
300 _mesa_set_add(state->syms, name);
301 _mesa_hash_table_insert(state->ht, var, name);
302 }
303 }
304
Signed-off-by: Rob Clark <robclark@freedesktop.org>
For consistency, either we have all class members dereferenced, or none.
In this case, very few are so lets get rid of them all.
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reuse utility functions instead of reimplementing the same logic.
* _mesa_is_compressed_format() performs the required checking to
determine format support in the current context.
* _mesa_gl_compressed_format_base_format() returns the base format.
As a side effect, we now check that we're in a desktop context when
determining support for the FXT1 and RGTC formats. This is in agreement
with our extension table and the glext headers.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Instead of case statements, use _mesa_get_format_layout() to
determine if a GL format is part of a family of compressed formats.
v2. restrict LATC formats to API_OPENGL_COMPAT (Ilia).
rename the variable mFormat to m_format.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
This enables us to predicate statments on a compressed format being
a type of LATC format. Also, remove the comment that lists the enum
(it was getting a tad long).
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
With this, we completely switch over to nir lowering passes instead of
tgsi_lowering. So one step closer to supporting direct glsl or spirv to
nir support for freedreno a3xx/a4xx.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Some hardware needs to clamp texture coordinates to [0.0, 1.0] in the
shader to emulate GL_CLAMP. This is added to lower_tex_proj since, in
the case of projected coords, the clamping needs to happen *after*
projection.
v2: comments/suggestions from Ilia and Eric, use txs to get texture size
and clamp RECT textures to their dimensions rather than [0.0, 1.0] to
avoid having to lower RECT textures to 2D.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: comments/suggestions from Ilia and Eric, split out get_texture_size()
helper so we can use it in the next commit for clamping RECT textures.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Some hardware, such as adreno a3xx, supports txp on some but not all
sampler types. In this case we want more fine grained control over
which texture projectors get lowered.
v2: split out nir_lower_tex_options struct to make it easier to
add the additional parameters coming in the following patches
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Since the following patches will add additional tex-lowering related
functionality, which doesn't make sense to split out into a separate
pass (as they would require duplication of the projector lowering
logic), let's give this pass a more generic name.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
SEL and MOV instructions, as long as they don't have source modifiers, are
just copying bits around. So those kind of instruction could be propagated
even if there are type mismatches. This is needed because NIR generates
integer SEL and MOV instructions whenever it doesn't know what else to
generate.
This commit adds support for copy propagation using current instruction
as reference.
Equivalent to commit 472ef9 but for vec4.
v2: include check for saturate, as Jason Ekstrand suggested
v3: check that the dst.type and the src type are the same, in order to
solve (among others) the following deqp regression with v2:
dEQP-GLES3.functional.shaders.operator.unary_operator.minus.lowp_uint_vertex
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
brw_fs_visitor.cpp: In member function 'void fs_visitor::emit_urb_writes()':
brw_fs_visitor.cpp:977:58: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
OpenGL ES 3.0 spec 3.7.2 "Transfer of Pixel Rectangles" specifies
DEPTH_COMPONENT, UNSIGNED_INT as a valid couple, validation for
internal format is checked by is_float_depth().
Fix regression caused by 81d2fd91a9 in:
ES3-CTS.gtf.GL3Tests.packed_pixels.packed_pixels
Test uses GL_DEPTH_COMPONENT, UNSIGNED_INT only when GL_NV_read_depth
extension is present.
v2: change check in _mesa_error_check_format_and_type to be explicit
for ES 2.0+, desktop OpenGL does not allow this behaviour + uses
this function for both glReadPixels and glDrawPixels validation.
(No Piglit regressions seen with v2.)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> [v1]
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92009
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Fixes:
In file included from nir/nir_lower_samplers.cpp:27:0:
nir/nir_builder.h: In function 'nir_ssa_def* nir_channel(nir_builder*, nir_ssa_def*, int)':
nir/nir_builder.h:222:37: warning: narrowing conversion of 'c' from 'int' to 'unsigned int' inside { } is ill-formed in C++11 [-Wnarrowing]
unsigned swizzle[4] = {c, c, c, c};
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Use nir_lower_clip pass for adding the VS/FS instructions to handle
user-clip-planes and CLIPDIST. Wire up support for load_user_clip_plane
intrinsic to fetch ucp[plane] values as driver-params (passed as const's
to the shader).
Signed-off-by: Rob Clark <robclark@freedesktop.org>
The vertex shader lowering adds calculation for CLIPDIST, if needed
(ie. user-clip-planes), and the frag shader lowering adds conditional
kills based on CLIPDIST value (which should be treated as a normal
interpolated varying by the driver).
Note that this won't quite do the right thing in the face of MSAA plus
user-clip-planes, since all the samples would be killed or not (rather
than potentially only a portion of them). But it's better than no UCP
support at all for drivers that don't have this in hw.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
For lowering user-clip-planes, we need a way to pass the enabled/used
user-clip-planes in to shader.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Used internally in freedreno/ir3 to calc stream-out position. Seems
like a generic enough way to implement stream-out (using str instrs),
plus it avoids compiler warnings by sneaking in a non-enum value in
switch statements.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
When updating texture buffers, we might end up replacing the whole
buffer. Check that the tic address matches the resource address, and if
not, update the tic and reupload it.
This fixes:
arb_direct_state_access-texture-buffer
arb_texture_buffer_object-data-sync
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Instructions with difference in PM field can actually be paired up if
the one without PM doesn't do packing/unpacking and non-NOP
packing/unpacking operations from PM instruction aren't added to the
other without PM.
total instructions in shared programs: 48209 -> 47460 (-1.55%)
instructions in affected programs: 11688 -> 10939 (-6.41%)
Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
The idea here is not that it gives register coalescing a little bit of a
helping hand. It doesn't actually fix the coalescing problems, but it
seems to help a good bit.
Shader-db results for vec4 programs on Haswell:
total instructions in shared programs: 1746280 -> 1683959 (-3.57%)
instructions in affected programs: 1259166 -> 1196845 (-4.95%)
helped: 11363
HURT: 148
v2 (Jason Ekstrand):
- Run nir_move_vec_src_uses_to_dest after going out of SSA
- New shader-db numbers
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
The provided indices have the very nice property that if A dominates B then
A->index <= B->index. We should document that somewhere.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Various pieces of code to create compressed textures will first
generate an uncompressed RGBA texture into a temporary buffer,
and then read from that buffer while creating the final compressed
texture in the requested format.
The code reading from the temporary buffer assumes the buffer is
formatted as an array of bytes in RGBA order. However, the buffer
is filled using a _mesa_texstore call with MESA_FORMAT_R8G8B8A8_UNORM
format -- this is defined as an array of *integers* holding the
RGBA values in packed format (least-significant to most-significant).
This means incorrect bytes are accessed on big-endian systems.
This patch fixes this by using the MESA_FORMAT_A8B8G8R8_UNORM format
instead on big-endian systems when filling the buffer. This fixes
about 100 piglit test case failures on s390x for me.
Signed-off-by: Ulrich Weigand <ulrich.weigand@de.ibm.com>
Tested-by: Oded Gabbay <oded.gabbay@gmail.com>
Cc: "10.6" "11.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@gmail.com>
XA has been using L8_UNORM for a8 and yuv component surfaces.
This commit instead makes XA prefer R8_UNORM since it's assumed to have a
higher availability.
Also neither of these formats are suitable as destination formats using
destination alpha blending, so reject those operations.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
From OpenGL 4.5 Core spec (7.13):
"If pipeline is a name that has been generated (without subsequent
deletion) by GenProgramPipelines, but refers to a program pipeline
object that has not been previously bound, the GL first creates a
new state vector in the same manner as when BindProgramPipeline
creates a new program pipeline object."
I interpret this as "If GetProgramPipelineiv gets called without a
bound (but valid) pipeline object, the state should reflect initial
state of a new pipeline object." This is also expected behaviour by
ES31-CTS.sepshaderobjs.PipelineApi conformance test.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
From OpenGL ES 3.1 spec (7.12):
"Most properties set within program objects are specified not to
take effect until the next call to LinkProgram or ProgramBinary.
Some properties further require a successful call to either of
these commands before taking effect. GetProgramiv returns the
properties currently in effect for program, which may differ from
the properties set within program since the most recent call to
LinkProgram or ProgramBinary, which have not yet taken effect. If
there has been no such call putting changes to pname into effect,
initial values are returned."
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
As a bonus we get indirect support for arrays of arrays for free.
V5: couple of small clean-ups suggested by Jason.
V4: fix struct member location caclulation, use nir_ssa_def rather than
nir_src for the indirect as suggested by Jason
V3: Use nir_instr_rewrite_src() with empty src rather then clearing
the use_link list directly for the old indirects as suggested by Jason
V2: Fixed validation error in debug build
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This will allow us to access the uniform later on without resorting to
building a name string and looking it up in UniformHash.
V3: remove line wrap change from this patch
V2: store slot number for all non-UBO uniforms to make code more
consitent, renamed explicit_binding to explicit_location and added
comment about what it does. Store the location at every shader stage.
Updated data.location comments in ir/nir.h.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This is required so that the next patch can safely assign the slot id
to the var.
The ids are now assigned in the order we want before allocating storage
so there is no need to sort the storage array and move things around.
V2: rename variable to make code easier to follow as suggested by Jason
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This allows the correct offset to be easily calculated for indirect
indexing when a struct array contains multiple samplers, or any crazy
nesting.
The indices for the folling struct will now look like this:
Sampler index: 0 Name: s[0].tex
Sampler index: 1 Name: s[1].tex
Sampler index: 2 Name: s[0].si.tex
Sampler index: 3 Name: s[1].si.tex
Sampler index: 4 Name: s[0].si.tex2
Sampler index: 5 Name: s[1].si.tex2
Before this change it looked like this:
Sampler index: 0 Name: s[0].tex
Sampler index: 3 Name: s[1].tex
Sampler index: 1 Name: s[0].si.tex
Sampler index: 4 Name: s[1].si.tex
Sampler index: 2 Name: s[0].si.tex2
Sampler index: 5 Name: s[1].si.tex2
struct S_inner {
sampler2D tex;
sampler2D tex2;
};
struct S {
sampler2D tex;
S_inner si;
};
uniform S s[2];
V3: Update comments with suggestions from Jason
V2: rename struct array counter to have better name
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This reverts commit 48961fa3ba.
glamor/Xwayland use this, the spec saying something when it
was written, and the fact that the comment says Mesa relies on it
hasn't changed.
I also don't have a copy of this patch in my mail archive, which
seems wierd, did it get posted to mesa-dev?
Signed-off-by: Dave Airlie <airlied@redhat.com>
Even though luminance formats don't have alpha, we still want the alpha
output to go to the blender. This fixes the luminance blending tests.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
(originally part of previous patch, split out to separate patch by Rob)
v2: squash in some fixes from Eric
v3: Another fix from Eric for point coords.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
This avoids exceeding the size of the .index bitfield since it got
truncated, and should make our NIR look more like the NIR that the rest of
the NIR developers are working on.
v2: split out vc4 updates, first patch uses varying_slot_to_tgsi_semantic()
helper, and second patch does the actual conversion.
v3: add frag_result_to_tgsi_semantic() helper and don't try to map
frag_results to semantic name/index as if they were varying_slot's
v4: use VERT_ATTRIB_ for VS inputs
v5: Fix vc4 build.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
v2: split out moving of FILE *fp into state structure into it's own
(more complete patch) to reduce the noise in this one
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Rename print_var_state to print_state, and stuff FILE ptr into the state
object. This avoids passing around an extra parameter everywhere.
v2: even more extensive conversion.. use state *everywhere* instead of
FILE ptr, and convert nir_print_instr() to use state as well
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Similar to fee0686c21, but in this case to
ensure that drm_gralloc and libGLES_mesa are sharing a single screen.
Bumps libdrm_freedreno version dependency, as it requires the new
fd_device_fd() API.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
When preparing the barrier payload, the instructions should operate in
simd8 mode since we only use 1 payload register.
fs_inst::regs_read is also updated to indicate that it only reads one
register for SHADER_OPCODE_BARRIER.
These issues were flagged by:
commit cadd7dd384
Author: Jason Ekstrand <jason.ekstrand@intel.com>
Date: Thu Jul 2 15:41:02 2015 -0700
i965/fs: Add a very basic validation pass
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
C++ gets cranky if we take references of temporaries. This isn't a problem
yet in master because nir_builder is never used from C++. However, it will
be in the future so we should fix it now.
Reviewed-by: Rob Clark <robclark@freedesktop.org>
Both use the same layout for the buffer containing border-color values,
so rather than duplicating the logic in a4xx, split it out into a
helper.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Now that we have a replicating fdot instruction, we can actually coalesce
into the destinations of vec4 instructions. We couldn't really do this
before because, if the destination had to end up in .z, we couldn't
reswizzle the instruction. With a replicated destination, the result ends
up in all channels so we can just set the writemask and we're done.
Shader-db results for vec4 programs on Haswell:
total instructions in shared programs: 1747753 -> 1746280 (-0.08%)
instructions in affected programs: 143274 -> 141801 (-1.03%)
helped: 667
HURT: 0
It turns out that dot-products matter...
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Fortunately, nir_constant_expr already auto-splats if "dst" never shows up
in the constant expression field so we don't need to do anything there.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
The old pass blindly inserted a bunch of moves into the shader with no
concern for whether or not it was really needed. This adds code to try and
coalesce into the destination of the instruction providing the value.
Shader-db results for vec4 shaders on Haswell:
total instructions in shared programs: 1754420 -> 1747753 (-0.38%)
instructions in affected programs: 231230 -> 224563 (-2.88%)
helped: 1017
HURT: 2
This approach is heavily based on a different patch by Eduardo Lima Mitev
<elima@igalia.com>. Eduardo's patch did this in a separate pass as opposed
to integrating it into nir_lower_vec_to_movs.
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Previously, we did this thing with keeping track of a separate start_idx
which was different from the iteration variable. I think this was a relic
of the way that GLSL IR implements writemasks. In NIR, if a given bit in
the writemask is unset then that channel is just "unused", not missing. In
particular, a vec4 operation with a writemask of 0xd will use sources 0, 2,
and 3 and leave source 1 alone. We can simplify things a good deal (and
make them correct) by removing this "compacting" step.
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
We made this switch in the FS backend some time ago and it seems to make a
number of things a bit easier. In particular, supporting SSA values takes
very little work in the backend and allows us to take advantage of the
majority of the SSA information even after we've gotten rid of Phi nodes.
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
v2 (Jason Ekstrand):
- Use nir_instr_rewrite_dest
- Pass the impl directly into lower_vec_to_movs_block
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Previously, we were passing the shader around, we were just calling it
"mem_ctx". However, the nir_shader is (and must be for the purposes of
mark-and-sweep) the mem_ctx so we might as well pass it around explicitly.
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Currently the validation pass only validates that regs_read and
regs_written are consistent with the sizes of VGRF's. We can add more as
we find it to be useful.
Reviewed-by: Matt Turner <mattst88@gmail.com>
In certain conditions, we have to do bounds-checking in the shader for
image_load_store. The way this works for image loads is that we do a
predicated load and then emit a series of selects, one per component,
that gives us 0 or the loaded value depending on whether or not you're
in bounds. However, we were hard-coding 4 components which may not be
correct. Instead, we should be using size which is the number of
components read.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
According to the extensions table and our glext headers,
OES_compressed_ETC1_RGB8_texture is only supported in
GLES1 and GLES2. Since we may give users a GLES3 context
when a GLES2 context is requested, we also allow this
extension for GLES3 as well.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Driver vendors do this as well. The extension specification
lists GLES 1.1 or 2.0 as requirements.
Reviewed-by: Chad Versace <chad.versace@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
It's extensively used by XA for a8- and planar yuv component surfaces.
This fixes broken XA yuv blits using vgpu10 contexts.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Currently the check was incorrect as it did not consider the (unlikely)
case of fd == 0. In order to fix this we should first correctly
initialize it to -1, as the swrast implementations leave it set to zero
(props to calloc()).
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Boyan Ding <boyan.j.ding@gmail.com>
Move the fcntl(dupfd_cloexec) to the else branch where it belongs.
Otherwise it's not immediately obvious that the code is hit, only when
an existing device is used.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Boyan Ding <boyan.j.ding@gmail.com>
v2: [Emil Velikov]
Rework the error path to a common goto, close only if we own the fd.
v3; [Emil Velikov]
Always close the fd (we either opened the device or dup'd) (Boyan, Ian)
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Boyan Ding <boyan.j.ding@gmail.com>
At the moment if a gbm buffer is imported and the gbm buffer
has an old-style GBM_BO_FORMAT format, the import will crash,
since it's passed directly to DRI functions that expect
a fourcc format (as provided by the newer GBM_FORMAT
definitions)
This commit addresses the problem in two ways:
1) it prevents invalid formats from leading to a crash by
returning EINVAL if the image couldn't be created
2) it translates GBM_BO_FORMAT formats into the comparable
GBM_FORMAT formats.
Reference: https://bugzilla.gnome.org/show_bug.cgi?id=753531
CC: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
We're trying to avoid a libdrm dependency in the core compiler, so let's
move the perf_debug code one level up from the brw_*_emit() helpers to
the brw_codegen_*_prog() helpers.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
All other precompile functions live in the brw_<stage>.c files, make fs
follow the convention.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
This moves the compute shader code around in order to make the way the
code is split up more consistent. There should be no functional changes.
Typically we have a few files per stage:
brw_vs.c, brw_wm.c brw_gs.c:
code to drive code generation and implement precompiling and
cache search.
genX_<stage>_state.c
gen specific implementation of the state emission for the shader
stage.
The brw_*_emit() functions are all in the same files as the visitor
classes they use (with the exception of VS, which may use either vec4 or
fs).
To make compute follow this convention, we move the brw_cs_emit()
function into brw_fs.cpp. We can then rename brw_cs.cpp to brw_cs.c and
do this in C like the other similar files. Finally, move state setup
and atoms to gen7_cs_state.c.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
Curiously this has no actual effect. I think it's because the first 8
textures are bound in multiple slots for some reason. However seems
prudent to use these the same way as regular texturing, esp in the case
where there are more than 8 textures bound.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
If the register types do not match and the instruction
that contains the final destination is saturated, register
coalescing generated non-equivalent code.
This did not happen when using IR because types usually
matched, but it is visible in nir-vec4.
For example,
mov vgrf7:D vgrf2:D
mov.sat m4:F vgrf7:F
is coalesced to:
mov.sat m4:D vgrf2:D
The patch prevents coalescing in such scenario, unless the
instruction we want to coalesce into is a MOV (without type
conversion implied). In that case, the patch sets the register
types to the type of the final destination.
Shader-db results in HSW (only vec4 instructions shown):
total instructions in shared programs: 1754415 -> 1754416 (0.00%)
instructions in affected programs: 74 -> 75 (1.35%)
helped: 0
HURT: 1
GAINED: 0
LOST: 0
Only one extra instruction in one of the shaders, that comes from
eliminating a saturation error by preventing register coalesce.
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
We lower gl_LocalInvocationIndex based on the extension spec formula:
gl_LocalInvocationIndex =
gl_LocalInvocationID.z * gl_WorkGroupSize.x * gl_WorkGroupSize.y +
gl_LocalInvocationID.y * gl_WorkGroupSize.x +
gl_LocalInvocationID.x;
https://www.opengl.org/registry/specs/ARB/compute_shader.txt
We need to set this variable in main(), even if gl_LocalInvocationIndex
is not referenced by the shader. (It may be used by a linked shader.)
Therefore, we can't eliminate it as a dead variable.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Also rename to _mesa_get_main_function_signature.
We will call it near the end of compilation to insert some code into
main for initializing some compute shader global variables.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
We lower gl_GlobalInvocationID based on the extension spec formula:
gl_GlobalInvocationID =
gl_WorkGroupID * gl_WorkGroupSize + gl_LocalInvocationID
https://www.opengl.org/registry/specs/ARB/compute_shader.txt
We need to set this variable in main(), even if gl_GlobalInvocationID
is not referenced by the shader. (It may be used by a linked shader.)
Therefore, we can't eliminate these as dead variables.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
This is to avoid needless float<->int conversions, since all
face-related computations are made on integers. Spotted by Emil
Velikov.
Reviewed-by: Brian Paul <brianp@vmware.com>
New enum to add to switch so compiler doesn't complain.
commit 1807a08e4f
Author: Ilia Mirkin <imirkin@alum.mit.edu>
AuthorDate: Thu Aug 27 23:05:03 2015 -0400
Commit: Ilia Mirkin <imirkin@alum.mit.edu>
CommitDate: Thu Sep 10 17:38:33 2015 -0400
nir: add nir_texop_texture_samples and convert from glsl
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Sometimes a useful thing for compilers (or, for example, tgsi_to_nir) to
know. And pretty trivial for scan to figure this out for us.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Cypress/Cayman/Aruba, earlier r6xx/r7xx chips only support a subset
of the needed fp64 ops, and don't do GL4 anyway.
Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Only for Cypress/Cayman/Aruba, older chips have only partial fp64 support.
Uses float intermediate values so only accurate for int24 range, which
matches what the blob does.
Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
I'm going to want a driver constant buffer for tess to coordinate
LDS storage, so before I go tackling that I decided to merge the
clip/samplepos and texture info buffers into one. So I can steal
the spare one.
This creates a single constant buffer between the two, with
clip/samplepos taking up a reserved 128 bytes at the start.
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
V2: -Change to "not started" for most entries
-Add status for multisample_2d_array
-Change shader_multisample_interpolation to "not_stared"
V3 (idr): Move the GLES 3.2 section after the "Additional functions"
section from GLES 3.1. Note that GL_KHR_texture_compression_astc_hdr is
done for i965 on gen9+ hardware. Note that GL_OES_shader_io_blocks is
based on some features from GLSL 1.50.
Signed-off-by: Thomas Helland <thomashelland90@gmail.com>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com> [v2]
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This commit makes a lot of variables constant - this is basically done
by moving the computation to variable definition. Some of them are
moved into lower scopes (like in img_filter_2d_ewa).
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Add a small inline function doing the casting - this is to make sure
we don't do a cast from some completely unrelated type. This commit
does not make tgsi_sampler parameters const in vfuncs themselves for
now - probably llvmpipe would need looking at before making such a
change.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Those functions actually could always take them as constants.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Those functions actually could always take them as constants.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
A followup from previous commit - since all functions called by
query_lod take pointers to const sp_sampler_view and const sp_sampler,
which are taken from tgsi_sampler subclass, we can the tgsi_sampler as
const itself now.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
This is to prepare for making tgsi_sampler parameter in query_lod a
const too. These functions do not modify anything in either sampler or
view anymore.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
With that, sp_sampler_view instances are not abused anymore as a local
storage, so we can later make them constant.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
As of a10d4937, we would really like things associated with an instruction
to be allocated out of that instruction and not out of the shader. In
particular, you should be passing the instruction that will ultimately be
holding the source into nir_src_copy rather than an arbitrary memory
context.
We also change the prototypes of nir_dest_copy and nir_alu_src/dest_copy to
explicitly take an instruction so we catch this earlier in the future.
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
We copy the output, make the old output the temporary, and give the
temporary a new name. The copy keeps the pointer to the old name. This
works just fine up until the point where we lower things to SSA and delete
the old variable and, with it, the name. Instead, we should re-parent to
the copy.
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
opt_register_coalesce stopped to check previous instructions to
coalesce with if somebody else was writing on the same
destination. This can be optimized to check if somebody else was
writing to the same channels of the same destination using the
writemask.
Shader DB results (taking into account only vec4):
total instructions in shared programs: 1781593 -> 1734957 (-2.62%)
instructions in affected programs: 1238390 -> 1191754 (-3.77%)
helped: 12782
HURT: 0
GAINED: 0
LOST: 0
v2: removed some parenthesis, fixed indentation, as suggested by
Matt Turner
v3: added brackets, for consistency, as suggested by Eduardo Lima
Reviewed-by: Matt Turner <mattst88@gmail.com>
This makes it possible for NIR shaders to know the number of output
vertices and the number of invocations. Drivers could also access
these directly without going through gl_program.
We should probably add InputType and OutputType here too, but currently
those are stored as GL_* enums, and I wanted to avoid using those in
NIR, as I suspect Vulkan/SPIR-V will use different enums. (We should
probably make our own.)
We could add VerticesIn, but it's easily computable from the input
topology, so I'm not sure whether it's worth it. It's also currently
not stored in gl_shader (only gl_shader_program), which would require
changes to the glsl_to_nir interface or require us to store it there.
This is a bit of duplication of data...ideally, we would factor these
substructs out of gl_program, gl_shader_program, and nir_shader, creating
a gl_geometry_info class...but it would need to go in a new place (in
src/glsl?) that isn't mtypes.h nor nir.h.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
These provide a convenient way to do simple variable loads and stores.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Previously the result of the complicated clamp() expression just dropped
on the floor: clamp does not modify any of its parameters. Looking at
the surrounding code, I believe this is supposed to modify the value of
tex_coord.
This change (along with a change to avoid the use of
brw_blorp_framebuffer) does not affect any existing piglit tests. I'm
not sure what this clamp is trying to accomplish, so I'm not sure how to
write a test to exercise this path.
I also noticed another bug in this code. There is no way the array
texture case could possibly work. This will generate code for the
TEXEL_FETCH macro like:
#define TEXEL_FETCH(coord) texelFetch(texSampler, ivec3(coord), sample_map[int(2 * fract(coord.x))]);
Since the coord parameter of this macro is a vec2 at all invocations, no
expansion of this macro will even compile.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
These only occurred in release builds, but they occurred in every file
that included intel_batchbuffer.h. Lots of spam. :(
intel_batchbuffer.h: In function 'intel_batchbuffer_advance':
intel_batchbuffer.h:153:47: warning: unused parameter 'brw' [-Wunused-parameter]
intel_batchbuffer_advance(struct brw_context *brw)
^
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
The for_bo parameter of intel_miptree_create_layout appears to be unused
since 27eedca when Eric removed some Gen5 code (after the i915 and i965
drivers parted ways).
intel_mipmap_tree.c: In function 'old_intel_miptree_create_layout':
intel_mipmap_tree.c:77:35: warning: unused parameter 'for_bo' [-Wunused-parameter]
bool for_bo)
^
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Hasn't existed in the i915 source since the i915 and i965 drivers parted
ways.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This hasn't been used outside intel_mipmap_tree.c since d5d4ba9 started
using meta instead of the blitter for PBO TexSubImage. While we're
here, remove the unused brw parameter from the function formerly known
as intel_miptree_unmap_raw.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
These only occurred in release builds, but they occurred in every file
that included intel_mipmap_tree.h. Lots of spam. :(
intel_mipmap_tree.h: In function 'intel_miptree_check_level_layer':
intel_mipmap_tree.h:595:59: warning: unused parameter 'mt' [-Wunused-parameter]
intel_miptree_check_level_layer(struct intel_mipmap_tree *mt,
^
intel_mipmap_tree.h:596:42: warning: unused parameter 'level' [-Wunused-parameter]
uint32_t level,
^
intel_mipmap_tree.h:597:42: warning: unused parameter 'layer' [-Wunused-parameter]
uint32_t layer)
^
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
The target parameter of compute_msaa_layout appears to be unused since
83b83fb when support for CMS textures was added for Gen7.
The brw parameter of intel_get_non_msrt_mcs_alignment appears to be
unused since e92fbdc when the GEN check (along with the "can we fast
clear" decision) was moved to a different function.
intel_mipmap_tree.c: In function 'compute_msaa_layout':
intel_mipmap_tree.c:62:73: warning: unused parameter 'target' [-Wunused-parameter]
compute_msaa_layout(struct brw_context *brw, mesa_format format, GLenum target,
^
intel_mipmap_tree.c: In function 'intel_get_non_msrt_mcs_alignment':
intel_mipmap_tree.c:143:54: warning: unused parameter 'brw' [-Wunused-parameter]
intel_get_non_msrt_mcs_alignment(struct brw_context *brw,
^
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: Ben Widawsky <benjamin.widawsky@intel.com>
If we skip a vbuffer we need to make sure we NULL out
the contents, otherwise when it gets passed to the driver
it will get confused.
This was hit by:
GL41-CTS.gpu_shader_fp64.varyings
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Enable barrier in MEDIA_INTERFACE_DESCRIPTOR if the program uses the
barrier() GLSL function.
On Ivy Bridge and Haswell, this allows the piglit test
tests/spec/arb_compute_shader/execution/simple-barrier-atomics.shader_test
to pass. On gen8, this enables a similar test with a local group size
of 896 to pass.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
first_non_payload_grf may be updated in assign_urb_setup for FS or
assign_vs_urb_setup for VS.
We need to set this in assign_curb_setup for compute shaders since cs
does not have an assign_cs_urb_setup like assign_urb_setup (fs) or
assign_vs_urb_setup (vs).
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
[v2: kayden-supplied code in fs_nir replacing need for logical opcode]
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
grep -lr 'sub license' | while read f; do \
sed --in-place -e 's/sub license/sublicense/' $f ;\
done
grep -lr 'NON-INFRINGEMENT' | while read f; do \
sed --in-place -e 's/NON-INFRINGEMENT/NONINFRINGEMENT/' $f ;\
done
As noted by Matt, both of these changes match the MIT license text found
at http://opensource.org/licenses/MIT.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Passes the shader piglit tests and introduces no regressions.
This commit finally makes use of the refactoring in previous
commits.
v2:
- adapted the code to changes in previous commits (renames,
need_cube_convert stuff)
- splitted too long lines
Reviewed-by: Brian Paul <brianp@vmware.com>
This introduces new vfunc in tgsi_sampler just for this opcode. I
decided against extending get_samples vfunc to return the mipmap level
and LOD - the function's prototype is already too scary and doing the
sampling for textureQueryLod would be a waste of time.
v2:
- splitted too long lines
Reviewed-by: Brian Paul <brianp@vmware.com>
These functions will be used by textureQueryLod.
v2:
- renamed mip_level_* funcs to mip_rel_level_* to indicate that
these functions return mip level relative to base level and
documented them
- renamed a level member in sp_filter_funcs struct to relative_level
- changed mip_rel_level_none and mip_rel_level_nearest to return mip
level relative to base level, mip_rel_level_linear already did
that
- documented clamp_lod function
Reviewed-by: Brian Paul <brianp@vmware.com>
This is to avoid tying the conversion to the sampling -
textureQueryLod will need to do the conversion too, but it does not do
any sampling.
So instead of a "get_samples" vfunc, there is just a bool saying
whether the conversion is needed or not. This solution keeps a nice
property of not adding any overhead for the common case (2D textures).
v2:
- replaced the "convert_coords" vfunc with a "need_cube_convert"
boolean to avoid overhead of copying arrays in common case
- removed an unused typedef
- splitted too long lines in convert_cube
- const fixes in convert_cube
Reviewed-by: Brian Paul <brianp@vmware.com>
This function will be later used by textureQueryLod. The
img_filter_func are optional, because textureQueryLod will not need
them.
v2:
- adapted to changes in previous commit (renames)
- simplified conditions a bit
- updated docs
- splitted too long lines
Reviewed-by: Brian Paul <brianp@vmware.com>
Putting this function pointer into a struct enables grouping of
several related functions in a single place. For now it is just a
single function, but the struct will be later extended with a
mip_level_func for returning relative mip level.
v2:
- renamed sp_mip struct to sp_filter_funcs
- renamed sp_filter_funcs instances from mip_foo to funcs_foo
- splitted too long lines
- sp_sampler now holds a pointer to sp_filter_funcs instead of an
instance of it
- some const fixes
Reviewed-by: Brian Paul <brianp@vmware.com>
textureQueryLod returns a vec2 with a mipmap information and a
LOD. The latter needs to be not clamped.
v2:
- changed the "not_clamped" part to "unclamped"
- corrected "clamp into" to "clamp to"
- splitted too long lines
Reviewed-by: Brian Paul <brianp@vmware.com>
The level-of-detail bias wasn't simply added in the explicit LOD case.
This case seems to be tested only in piglit's
fs-texturequerylod-nearest-biased test, which is currently skipped, as
softpipe does not support textureQueryLod at the moment.
Reviewed-by: Brian Paul <brianp@vmware.com>
Basically, do the same thing as for buffer_unmap, but use the explicit range
instead. It's for apps which want to map a whole buffer and mark touched
ranges explicitly.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
When checking for LLVM shared libraries, use IMP_LIB_EXT for the extension for
shared libraries appropriate to the target, rather than hardcoding '.so'
Also add some comments to explain why we have this circus of pain.
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
mesa/src/mesa/drivers/dri/i965/brw_eu_compact.c: In function 'set_3src_control_index':
mesa/src/mesa/drivers/dri/i965/brw_eu_compact.c:805:22: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (int i = 0; i < ARRAY_SIZE(gen8_3src_control_index_table); i++) {
^
mesa/src/mesa/drivers/dri/i965/brw_eu_compact.c: In function 'set_3src_source_index':
mesa/src/mesa/drivers/dri/i965/brw_eu_compact.c:839:22: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (int i = 0; i < ARRAY_SIZE(gen8_3src_source_index_table); i++) {
^
mesa/src/mesa/drivers/dri/i965/brw_state_dump.c: In function 'dump_sampler_state':
mesa/src/mesa/drivers/dri/i965/brw_state_dump.c:382:18: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (i = 0; i < size / 16; i++) {
^
mesa/src/mesa/drivers/dri/i965/brw_state_upload.c: In function 'brw_pipeline_state_finished':
mesa/src/mesa/drivers/dri/i965/brw_state_upload.c:801:13: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
if (i != pipeline) {
^
mesa/src/mesa/drivers/dri/i965/intel_mipmap_tree.c: In function 'intel_gen7_hiz_buf_create':
mesa/src/mesa/drivers/dri/i965/intel_mipmap_tree.c:1544:47: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (int level = mt->first_level; level <= mt->last_level; ++level) {
^
mesa/src/mesa/drivers/dri/i965/intel_mipmap_tree.c: In function 'intel_gen8_hiz_buf_create':
mesa/src/mesa/drivers/dri/i965/intel_mipmap_tree.c:1638:44: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (int level = mt->first_level; level <= mt->last_level; ++level) {
^
mesa/src/mesa/drivers/dri/i965/intel_mipmap_tree.c: In function 'intel_miptree_alloc_hiz':
mesa/src/mesa/drivers/dri/i965/intel_mipmap_tree.c:1771:44: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (int level = mt->first_level; level <= mt->last_level; ++level) {
^
mesa/src/mesa/drivers/dri/i965/intel_mipmap_tree.c:1775:33: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (int layer = 0; layer < mt->level[level].depth; ++layer) {
^
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
mesa/src/mesa/program/prog_to_nir.c: In function 'setup_registers_and_variables':
/mesa/src/mesa/program/prog_to_nir.c:1059:22: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (int i = 0; i < c->prog->NumTemporaries; i++) {
^
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
mesa/src/glsl/nir/nir_lower_tex_projector.c: In function 'nir_lower_tex_projector_block':
mesa/src/glsl/nir/nir_lower_tex_projector.c:63:25: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (int i = 0; i < tex->num_srcs; i++) {
^
mesa/src/glsl/nir/nir_lower_tex_projector.c: In function 'nir_lower_tex_projector_block':
mesa/src/glsl/nir/nir_lower_tex_projector.c:114:38: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (int i = proj_index + 1; i < tex->num_srcs; i++) {
^
mesa/src/glsl/nir/nir_lower_tex_projector.c: In function 'nir_lower_tex_projector_block':
mesa/src/glsl/nir/nir_lower_tex_projector.c:53:39: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (proj_index = 0; proj_index < tex->num_srcs; proj_index++) {
^
mesa/src/glsl/nir/nir_lower_tex_projector.c:57:22: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
if (proj_index == tex->num_srcs)
^
mesa/src/glsl/nir/nir_search.c: In function 'match_value':
mesa/src/glsl/nir/nir_search.c:84:22: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (int i = 0; i < num_components; ++i)
^
mesa/src/glsl/nir/nir_search.c: In function 'match_value':
mesa/src/glsl/nir/nir_search.c:110:28: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (int i = 0; i < num_components; ++i) {
^
mesa/src/glsl/nir/nir_search.c: In function 'match_value':
mesa/src/glsl/nir/nir_search.c:139:19: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
if (i < num_components)
^
mesa/src/glsl/nir/nir_opt_peephole_ffma.c: In function 'get_mul_for_src':
mesa/src/glsl/nir/nir_opt_peephole_ffma.c:130:27: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (unsigned i = 0; i < num_components; i++)
^
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Resolve a series of missing field initializer warnings within get_hash_params.py
Of the form:
In file included from mesa/src/mesa/main/get.c:495:0:
mesa/src/mesa/main/get_hash.h:180:5: warning: missing initializer for field
'extra' of 'const struct value_desc' [-Wmissing-field-initializers]
{ GL_POINT_SIZE_ARRAY_BUFFER_BINDING_OES, LOC_CUSTOM, TYPE_INT, 0 },
^
mesa/src/mesa/main/get.c:165:15: note: 'extra' declared here
const int *extra;
^
This patch addresses some likely code rot around the *extra field, where the
initialization is via C code generated indirectly from a Python script.
It resolves a number of warnings reported by GCC when configured to be pedantic.
$ gcc --version
gcc (Ubuntu 4.9.2-10ubuntu13) 4.9.2
No piglit regressions on Ironlake.
v2:
- Squash series into a single patch.
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
When parsing an variable declaration qualified with the typename
keyword, clang attempted to declare a variable with the type of non
type member "enum type type" of module::argument (within the header
file clover/core/module.hpp) instead of the typed member of
module::argument "enum type".
Replaced "typename" with "enum" to force clang to declare the variable
marg_type with type "enum type" of module::argument.
CC: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Albert Freeman <albertwdfreeman@gmail.com>
Our old value of 16384 is the minimum value. DirectX apparently
requires 65536 at a minimum; that's also what nVidia and the Intel
Windows driver advertise. AMD advertises MAX_INT.
Ilia Mirkin noticed that "Shadow Warrior" uses UBOs larger than 16k
on Nouveau, which advertises 65536 bytes for this limit. Traces
captured on Nouveau don't work on i965 because our lower limit causes
the GLSL linker to reject the captured shaders. While this isn't
important in and of itself, it does suggest that raising the limit
would be beneficial.
We can read linear buffers up to 2^27 bytes in size, so raising this
should be safe; we could probably even go larger. For now, matching
nVidia and Intel/Windows seems like a good plan.
We have to reinitialize MaxCombinedUniformComponents as core Mesa will
have set it based on a stale value for MaxUniformBlockSize.
According to Tapani, there's an unreleased game that asserts on this.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
It is advantageous to use r63 instead of r127 since r63 can fit into the
shorter encoding. However if we've RA'd over 63 registers, we must use
r127 as the replacement instead.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Unfortunately nv50_ir phi nodes aren't directly connected to the CFG, so
the mapping between source and the actual BB is by inbound edge order.
So when manipulating edges one has to be extremely careful. We were
insufficiently careful when splitting critical edges which resulted in
the phi nodes being confused as to where their sources were coming from.
This primarily manifests itself with the TXL-lowering logic on nv50,
when it is inside of a conditional. I've been unable to trigger the
issue anywhere else so far. This resolves rendering failures
in a number of games like Two Worlds 2, Trine: Enchanted Edition, Trine 2,
XCOM:Enemy Unknown, Stacking. It also improves the situation in
Hearthstone, Sonic Generations, and The Raven: Legacy of a Master Thief.
However more work needs to be done there (splitting a lot more edges
solves it, so it's some other sort of RA-related issue).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90887
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
The purpose of the macro was to create the name_as_gs_input from name.
The previous commit removed the name_as_gs_input from add_varying, so
the macro is unnecessary.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
After inserting instructions the cursor.option becomes _after_instr
(even if it started life as an _after_block). So we cannot simply stash
the current cursor on the if/loop_stack. Otherwise we end up inserting
instructions after the endif/endloop in the block preceeding the if/
loop.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
CB updates to bound buffers need to go through the CB_DATA endpoints,
otherwise the shader may not notice that the updates happened.
Furthermore, these updates have to go in to the same address as the
bound buffer, otherwise, again, the shader may not notice updates.
So we keep track of all the places where a constbuf is bound, and
iterate over all of them when updating data. If a binding is found that
encompasses the region to be updated, then we use the settings of that
binding for the upload. Otherwise we upload as a regular data update.
This fixes piglit 'arb_uniform_buffer_object-rendering offset' as well
as blurriness in Witcher2.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91890
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
This pass can be used as a helper for NIR producers so they don't have to
worry about creating the temporaries themselves.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Some modern apps try to use msaa without keeping in mind the
restrictions on videomem of older cards. Resulting in dmesg saying:
[ 1197.850642] nouveau E[soffice.bin[3785]] fail ttm_validate
[ 1197.850648] nouveau E[soffice.bin[3785]] validating bo list
[ 1197.850654] nouveau E[soffice.bin[3785]] validate: -12
Because we are running out of video memory, after which the program
using the msaa visual freezes, and eventually the entire system freezes.
To work around this we do not allow msaa visauls by default and allow
the user to override this via NV30_MAX_MSAA.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
[imirkin: move env var lookup to screen so that it's only done once]
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
We do not have a generic blitter on nv3x cards, so we must use the
sifm object for color resolving.
This commit divides the sources and dest surfaces in to tiles which
match the constraints of the sifm object, so that color resolving
will work properly on nv3x cards.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Since debugging issues w/ fd's close()d at the wrong time can be quite
fun, this should probably be made more explicit in the docs.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
This patch is necessary to avoid building error on android,
due to missing sid_tables.h generated sources
v2:[Emil Velikov] Correctly split the lists.
Fixes: fbbebeae10f(radeonsi: inline si_cmd_context_control)
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Otherwise the android build fails with
error : unable to find string literal operator ‘operator"" PRIx64’
There are several resources referring to the problem, which is related
to c++11, in our case used when building mesa for lollipop.
http://comments.gmane.org/gmane.comp.graphics.opensg.user/5883
I've not investigated all the semantics, some people even suggested a
bug in the gcc compiler,
I just saw the building error was solved with one little space for
lollipop and no side effect when c+11 not used.
v2: [Emil Velikov] add an alternative commit message from Mauro.
Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
There are a few bits this commit aims to resolve:
One can generalise the mkdir rule to a simple MKDIR_P $(@D) which will
expand appropriately for even if we change the subdir name, and/or add
new rules. We can also drop the explicit $(srcdir) prefix for the
dependency rules, they they are not strictly required, nor used
elsewhere in mesa.
Finally replace $< with explicit filename to be consistent through the
file, and honour PYTHON_FLAGS.
v2: Add comprehensive commit summary/message (Ian, Matt)
Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Rather than folding one variable within the other only to unwrap them,
just use the ones we need.
v2: bring back LOCAL_PATH prefix for nir_constant_expressions,h
Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com> (v1)
The glsl equivalent of "mesa: automake: rework the source generation
rules". Plus let's make things consistent and always explicitly provide
the header name.
v2: Rebase on top of reverted "remove custom AM_V_LEX/YACC" (Matt)
Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Same logic as previous commit applies.
Additionally remove the odd (set -e/mv/INDENT) from the rules.
The last one is the only one we remotely care about, if reading the
generated sources.
Upcoming work from DylanB which will replace the existing python
scripts with ones that produce more readable output anyway.
Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
A handful of changes/cleanups paving the way to bmake support:
- Remove optional $(srcdir)/ prefix for files in the prereq list.
- Drop the space after the AM_V_GEN variable.
- Using $< in a non-suffix rule is a GNU make idiom.
- Use $(@D) over $(dir $@). The latter is a POSIX standard.
v2: Cosmetic tweaks in the commit summary.
Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com> (v1)
This is the only place in mesa that uses this constuct which seems
to be GNUmake-ism. Attempting to build with POSIX make implementations
(bmake) would fail as below.
--- options.h ---
LOCALEDIR := .
sh: line 2: LOCALEDIR: command not found
*** [options.h] Error code 127
So let's keep things consistent and compatible by making the variable
non target specific.
v2:
- Bring back LOCALEDIR.
- Reword the commit message
- Change mesa-stable tag 10.6 > 11.0
Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Cc: Jonathan Gray <jsg@jsg.id.au>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
According to OpenGL ES 3.1 specification table : 20.2 and
OpenGL specification 4.4 table 23.4. The glGetIntegeri_v
functions should report the name of the buffer bound
when called with GL_VERTEX_BINDING_BUFFER.
Signed-off-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
This code is all pretty much identical. We just needed the translation
from one enum value to the other.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
TRIFAN_NOSTIPPLE has always been 0x16 - 0x15 is marked "Reserved" on all
platforms. See the 965 PRM, Volume 2, Table 3-1, "3D Primitive Topology
Type Encoding" for a list.
We don't currently use this, and I don't expect we will, but we may as
well not leave the bogus value around.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Docs suggest this is no longer required starting with Gen8.
Perf (no regressions in n=20)
OglMultithread 0.67%
OglTerrainPanInst 0.12%
trex 0.45%
warsow 0.64%
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Since 7a32652231
r600: Turn 'r600_shader_key' struct into union
we were accessing key fields that might be aliased in the union
with other fields, so we should check what shader type we are
compiling for before using key values from it.
v1.1: make it compile
v2: have caffeine, make it work - we don't set type
until later, so don't reference it until we've set it.
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
I meant to do this here, but it was in the wrong place:
commit c1151b18f2
Author: Ben Widawsky <benjamin.widawsky@intel.com>
Date: Wed Jun 24 20:07:54 2015 -0700
i965/skl: Use more compact hiz dimensions
NOTE: Jordan did go back and look at the original mailing list post. I mailed
the right thing, and pushed the wrong one.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
Indications are that if the colormask indicates a single bit set on
fermi, that value will always be read from $r0 instead of a potentially
higher register (if e.g. green is set). Not to upset the counting logic,
always set the header up with a full color mask for each RT. Such a
situation can basically only ever happen with generated blit shaders.
Fixes the following piglit on Fermi (Kepler is unaffected):
fbo-stencil blit GL_DEPTH32F_STENCIL8
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Commit 2126c68e5c killed the array elements parameter on load/store
intrinsics that was stored in const_index[1]. It looks like that
patch missed to remove this assignment in the UBO path.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
The sifm object has a limit of 1024x1024 for its input size and 2048x2048
for its output. The code checking this was trying to be clever resulting
in it seeing a surface of e.g 1024x256 being outside of the input size
limit.
This commit fixes this.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
X11_CFLAGS is never defined. Path to X11 headers is not needed here, so
just remove.
Future work: Using AM_CFLAGS here looks wrong, as this Makefile only builds
C++ files
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Make sure errors are correcly propagated.
Also don't flush during state emission if emission fails.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Like xa_surface_from_handle(), but takes a handle type, rather than
hard-coding 'shared' handle. This is needed to fix bugs seen with
xf86-video-freedreno with xrandr rotation, for example. The root issue
is that doing a GEM_OPEN ioctl on a bo that already has a GEM handle
associated with the drm_file will result in two unique handles for the
same bo. Which causes all sorts of follow-on fail.
v2:
- Add support for for fd handles.
- Avoid duplicating code.
- Bump xa version minor.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
My earlier attempt to fix this missed the fact that there was a #else
clause that assumes that you have openssh. This moves the whole thing
under #ifdef HAVE_SHA1 which should avoid this issue.
Fixes: 13bfa5201 (util: always include sha1 into the build)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91898
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@gmail.com>
SHA1 is now used in all builds when HAVE_SHA1 is defined. Adjust src to
do the same thing, rather than predicating on shader cache.
Fixes: 04e201d0c0 ("mesa: change 'SHADER_SUBST' facility to work with env variables")
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@gmail.com>
Nothing in the spec allows for the reduced precision, and this also
fixes st_QuerySamplesForFormat for nv50, which does not allow MS8 on
RGBA32F. Now this will be respected instead of reporting MS8 as
supported with an assumption that the format used will be RGBA16F.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
vbuf is never null. We want to make sure that a resource was allocated
for the vbuf, which is *vbuf.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The hardware only generates vertexid when vertices come from a VBO. This
fixes:
vertexid-drawelements
vertexid-drawarrays
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
The stride was being set to 0, which is illegal (and also non-sensical).
Also we must wait for the buffer to become available for reading as
otherwise a wrong value may be prefetched. Since we must wait for the
buffer anyways, and it's mapped and in GART, we may as well avoid the
annoyance of the indirect pushbuf submit.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
Gen9 changes the meaning of this to coarse LOD quality mode. Although that's a
desirable thing to be setting, it doesn't match the gen8 behavior and this was
unintentional. More importantly, we don't ever use this field. So instead of
getting it "wrong" drop it entirely.
This is a respin of a patch which only [incorrectly] tried to address gen9.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
round(val*dscale) produces a double result, as val and dscale are double.
However, LLVMConstInt receives unsigned long long, so there is an
implicit conversion from double to unsigned long long.
This is an undefined behavior. Therefore, we need to first explicitly
convert the round result to long long, and then let the compiler handle
conversion from that to unsigned long long.
This bug manifests itself in POWER, where all IMM values of -1 are being
converted to 0 implicitly, causing a wrong LLVM IR output.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
CC: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Note this is not ideal. Since the sifm can only do source sizes upto
1024x1024 we end up using the blitter on nv4x, which is not that fast.
And on nv3x we end up using the cpu which is really slow.
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Scanout buffers on nv30 must always be non-swizzled and have special
width alignment constraints.
These constrains have been taken from the xf86-video-nouveau
src/nv_accel_common.c: nouveau_allocate_surface() function.
nouveau_allocate_surface() applies these width constraints only when a
tiled attribute is set, which it sets for all surfaces allocated via
dri, and this "tiling" is not the same as swizzling, scanout surfaces
must be linear / have a uniform_pitch or only complete garbage is shown.
This commit fixes dri3 on nv30 showing a garbled display, with dri3 the
scanout buffers are allocated by mesa, rather then by the ddx, and the
wrong stride of these buffers was causing the garbled display.
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
The tiled memcpy fast paths perform a simple blit (with only a couple of
trivial pixel conversion routines) and do not accommodate PixelTransfer
operations. Therefore if any are set, fallback to the regular routines.
Note that PixelTransfer only applies to TexImage and ReadPixels, not to
GetTexImage.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jason Ekstrand <jason.ekstrand@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
If we have spilled/unspilled a register in the current instruction, avoid
emitting unspills for the same register in the same instruction or consecutive
instructions following the current one as long as they keep reading the spilled
register. This should allow us to avoid emitting costy unspills that come with
little benefit to register allocation.
v2:
- Apply the same logic when evaluating spilling costs (Curro).
v3:
- Abstract the logic that decides if a register can be reused in a function.
that can be used from both spill_reg and evaluate_spill_costs (Curro).
v4:
- Do not disallow reusing scratch_reg in predicated reads (Curro).
- Track if previous sources in the same instruction read scratch_reg (Curro).
- Return prev_inst_read_scratch_reg at the end (Curro).
- No need to explicitily skip scratch read/write opcodes in spill_reg (Curro).
- Fix the comments explaining what happens when we hit an instruction that
does not read or write scratch_reg (Curro)
- Return true early when the current or previous instructions read
scratch_reg with a compatible mask.
v5:
- Do not return true early, the loop should not be expensive anyway
and this adds more complexity (Curro).
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
We now print out the name of the message instead of its numerical
value, and label the message control and surface numbers.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
The entire VUE map is computed based on the slots_valid bitfield;
calling brw_compute_vue_map on the same bitfield will return the
same result. So we can simply compare those.
struct brw_vue_map is 136 bytes; doing a single 8-byte comparison is
much cheaper and should work just as well.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
These were only for legacy userclipping, which we no longer support
in geometry shaders.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
According to the GLSL 1.50 specification, page 76:
"The shader must also set all values in gl_ClipDistance that have been
enabled via the OpenGL API, or results are undefined."
With this patch, we only enable clip distance writes when the shader
actually writes them. We no longer force a value to be written when
clip planes are enabled in the API. This could mean the first varying
slot would be used as clip distances - I believe it should be the safe
kind of undefined behavior.
Empirically, it doesn't seem to cause a problem.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
The legacy userclip fields are only used for the vertex shader, and at
that point there's only program_string_id and the tex struct, which are
common to all keys. So there's no need for a "VUE" key base class.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
This avoids a downcast of key, which won't exist in the base class soon.
I'm not a huge fan of this patch, but given that we're currently using
inheritance, this seems like the "right" way to do it. The alternative
is to make key a void pointer in the parent class and continue
downcasting.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
I'm about to remove the base class for VS/GS/HS/DS program keys, at
which point we won't be able to use key->tex anymore. Instead, we'll
need to store a direct pointer (like we do in the FS backend).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
This is now only used for the vertex shader, so it makes sense to get it
out of any paths run by the geometry shader.
Instead of passing the gl_clip_plane array into the run() method (which
is shared among all subclasses), we add it as a vec4_vs_visitor
constructor parameter. This eliminates the bogus NULL parameter in the
GS case.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
There are two uses of this flag.
The primary use is checking whether we need to emit code to convert
legacy gl_ClipVertex/gl_Position clipping to clip distances. In this
case, we also have to upload the clip planes as uniforms, which means
setting nr_userclip_plane_consts to a positive value. Checking if it's
> 0 works for detecting this case.
Gen4-5 also wants to know whether we're doing clipping at all, so it can
emit user clip flags. Checking if output_reg[VARYING_SLOT_CLIP_DIST0]
is set to a real register suffices for this.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
We only support geometry shaders in core profiles, where gl_ClipVertex
doesn't exist. Presumably the even older behavior of clipping to
gl_Position isn't supported either. In fact, GLSL 1.50 page 76 claims:
"The shader must also set all values in gl_ClipDistance that have been
enabled via the OpenGL API, or results are undefined."
So we don't need to handle legacy clipping in geometry shaders. I think
Paul added this back when we were considering supporting the old
GL_ARB_geometry_shader4 extension.
This removes a non-orthagonal state dependency on GS compilation.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
This living in brw_fs.{h,cpp} is a historical artifact of us supporting
texturing for fragment shaders before any other stages. It's kind of
awkward given that we use it for all stages.
This avoids having to include brw_fs.h in geometry shader code in order
to access this function.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Patch modifies existing shader source and replace functionality to work
with environment variables rather than enable dumping on compile time.
Also instead of _mesa_str_checksum, _mesa_sha1_compute is used to avoid
collisions.
Functionality is controlled via two environment variables:
MESA_SHADER_DUMP_PATH - path where shader sources are dumped
MESA_SHADER_READ_PATH - path where replacement shaders are read
v2: cleanups, add strerror if fopen fails, put all functionality
inside HAVE_SHA1 since sha1 is required
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Suggested-by: Eero Tamminen <eero.t.tamminen@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
commit 472ef9a02f introduced code to
change the types of SEL and MOV instructions for moves that simply
"copy bits around". It didn't account for type conversion moves,
however. So it would happily turn this:
mov(8) vgrf6:D, -vgrf5:D
mov(8) vgrf7:F, vgrf6:UD
into this:
mov(8) vgrf6:D, -vgrf5:D
mov(8) vgrf7:D, -vgrf5:D
which erroneously drops the conversion to float.
Cc: "11.0 10.6" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
As far as I can tell, the behavior is preserved from the previous generations.
Before we set a single bit to tell the FS whether or not we'll be using an input
coverage mask. Now we have some options which are implementing various
extensions. These bits are used for the various conservative rasterization
mechanisms (for collision detection, binning, and whatever else).
I believe that the behavior is preserved because the problem which conservative
rasterization is attempting to fix would go away with the "NORMAL" mode (at the
cost of performance, I believe).
This patch serves as documentation of the change by creating the enums, as well
as giving some of the history with the links here so that the next person who
comes along and looks at it doesn't spend as long as I had to in order to
determine if there is an issue or not.
Previously, this algorithm had been done in software, and this can still be used
as long as we don't export an extension stating otherwise.
References: https://www.opengl.org/registry/specs/NV/conservative_raster.txt
References: https://http.developer.nvidia.com/GPUGems2/gpugems2_chapter42.html
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This must be done before exporting a buffer as dmabuf fds, because
we lose track of who is using it and can't trust the reference counter.
Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
This is probably the most called util function. It does almost nothing,
yet it can consume 10% of the CPU on the profile. This drops it down to 5%.
Reviewed-by: Brian Paul <brianp@vmware.com>
There doesn't seem any reason to start from 4.
Start from 1 instead (0 is left reserved to catch uninitialized atoms).
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Similarly to scissor states, we can use single atom to track all viewport
states. This will allow to simplify dirty atom handling later.
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
During review of the "r600g: make all scissor states use single atom" patch
Marek Olšák noticed that scissor disable workaround should be applied on
all scissor states and not just first one, so let's do so.
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
As suggested by Marek Olšák, we can use single atom to track all scissor
states. This will allow to simplify dirty atom handling later.
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
It's legal to call glTexSubImage with zero values for the width,
height or depth. Previously this was breaking the PBO access
validation because it tries to work out the last pixel accessed by
getting the pixel at height-1 and depth-1 which would end up with
bogus values.
This was causing GL errors to be generated during the Piglit
texsubimage test, although the test was passing anyway.
v2: Also check for width == 0. Don't validate the start pointer if any
of the dimensions are zero.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
In various versions of OpenGL and GLSL, it's possible to declare
multiple VS input variables with aliasing attribute locations.
So, when computing the storage requirements for vertex attributes,
we can't simply add up the sizes. Instead, we need to look at the
enabled slots.
This patch begins tracking which attributes are double types that
are larger than 128-bits (i.e. take up two vec4 slots). We then
count normal attributes once, and count the double-size attributes
a second time.
Fixes deQP functional.attribute_location.bind_aliasing.max_cond_* tests
on i965, which regressed with commit ad208d975a.
No Piglit changes on llvmpipe (which actually supports dvecs).
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Tested-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Previously we would allow glUniformMatrix4fv on a dmat4 and
glUniformMatrix4dv on a mat4. Both are illegal. That later also
overwrites the storage for the mat4 and causes bad things to happen.
Should fix the (new) arb_gpu_shader_fp64-wrong-type-setter piglit test.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Cc: Dave Airlie <airlied@redhat.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
All of the other state upload functions are static because the only use
is in the brw_tracked_state structure.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
All of the other state upload functions are static because the only use
is in the brw_tracked_state structure.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Because the compiler already has enough things to complain about.
grep -rl 'const static' src/ | while read f
do
sed --in-place -e 's/const static/static const/g' $f
done
brw_eu_emit.c: In function 'brw_reg_type_to_hw_type':
brw_eu_emit.c:98:7: warning: 'static' is not at beginning of declaration [-Wold-style-declaration]
const static int imm_hw_types[] = {
^
brw_eu_emit.c:120:7: warning: 'static' is not at beginning of declaration [-Wold-style-declaration]
const static int hw_types[] = {
^
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
brw_upload_cs_push_constants was based on gen6_upload_push_constants.
v2:
* Add FINISHME comments about more efficient ways to push uniforms
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Check for a valid framebuffer cbuf pointer before accessing its
associated surface.
Fix piglit test fbo-drawbuffers-none.
Reviewed-by: Brian Paul <brianp@vmware.com>
Commit b9ba8492 removes an unneeded pipe_surface_release() from
st_render_texture(). This implies a surface can now be reused for a
render buffer. Currently, when we render to a texture, we mark the
surface as dirty. But in svga_mark_surface_dirty(), if the surface
is already marked as dirty, it does not increment the texture age.
Any view to this texture might not be updated properly then.
With this patch, the texture age is incremented regardless of whether
the surface is already marked as dirty or not.
Fix bug 1499181.
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Commit b9ba8492 removes an unneeded pipe_surface_release() from
st_render_texture() and exposes a bug in the backed surface view
creation. Currently a backed surface view for a conflicted surface view
is created at framebuffer emit time. But if shader sampler views are changed
but framebuffer surface views remain unchanged, emit_framebuffer() will not
be called and conflicted surface views will not be detected.
To fix this, also check for conflicted surface views when setting sampler
views. If there is any conflicted surface views, enable the
framebuffer dirty bit so that the framebuffer emit code has a chance to
create a backed surface view for the conflicted surface view.
Fix cinebench-r11-test regression.
Reviewed-by: Brian Paul <brianp@vmware.com>
The lowered code reads from the destination, which isn't possible from
message registers.
Fixes the following dEQP tests on SNB:
dEQP-GLES3.functional.shaders.precision.int.highp_mul_fragment
dEQP-GLES3.functional.shaders.precision.int.mediump_mul_fragment
dEQP-GLES3.functional.shaders.precision.int.lowp_mul_fragment
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Tested-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This is a squash commit of roughly two years of development work.
Authors include:
Brian Paul
Charmaine Lee
Thomas Hellstrom
Jakob Bornecrantz
Sinclair Yeh
Mingcheng Chen
Kai Ninomiya
MengLin Wu
The driver supports OpenGL 3.3.
Signed-off-by: Brian Paul <brianp@vmware.com>
The VMware svga driver doesn't directly support pipe_screen::get_timestamp()
but we can do a work-around. However, we need a gallium context to do so.
This patch adds a new pipe_context::get_timestamp() function that will only
be called if the pipe_screen::get_timestamp() function is NULL.
Signed-off-by: Brian Paul <brianp@vmware.com>
This involves a few driver modifications to keep things building.
The driver may not actually run properly at this point.
Signed-off-by: Brian Paul <brianp@vmware.com>
If the user is specifying a subregion of a buffer using SKIP_ROWS and
SKIP_PIXELS, we must compute the buffer size carefully as the end of the
last row may be much shorter than stride*image_height*depth. The current
code tries to memcpy from beyond the end of the user data, for example
causing:
==28136== Invalid read of size 8
==28136== at 0x4C2D94E: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:915)
==28136== by 0xB4ADFE3: brw_bo_write (brw_batch.c:1856)
==28136== by 0xB5B3531: brw_buffer_data (intel_buffer_objects.c:208)
==28136== by 0xB0F6275: _mesa_buffer_data (bufferobj.c:1600)
==28136== by 0xB0F6346: _mesa_BufferData (bufferobj.c:1631)
==28136== by 0xB37A1EE: create_texture_for_pbo (meta_tex_subimage.c:103)
==28136== by 0xB37A467: _mesa_meta_pbo_TexSubImage (meta_tex_subimage.c:176)
==28136== by 0xB5C8D61: intelTexSubImage (intel_tex_subimage.c:195)
==28136== by 0xB254AB4: _mesa_texture_sub_image (teximage.c:3654)
==28136== by 0xB254C9F: texsubimage (teximage.c:3712)
==28136== by 0xB2550E9: _mesa_TexSubImage2D (teximage.c:3853)
==28136== by 0x401CA0: UploadTexSubImage2D (teximage.c:171)
==28136== Address 0xd8bfbe0 is 0 bytes after a block of size 1,024 alloc'd
==28136== at 0x4C28C20: malloc (vg_replace_malloc.c:296)
==28136== by 0x402014: PerfDraw (teximage.c:270)
==28136== by 0x402648: Draw (glmain.c:182)
==28136== by 0x8385E63: ??? (in /usr/lib/x86_64-linux-gnu/libglut.so.3.9.0)
==28136== by 0x83896C8: fgEnumWindows (in /usr/lib/x86_64-linux-gnu/libglut.so.3.9.0)
==28136== by 0x838641C: glutMainLoopEvent (in /usr/lib/x86_64-linux-gnu/libglut.so.3.9.0)
==28136== by 0x8386C1C: glutMainLoop (in /usr/lib/x86_64-linux-gnu/libglut.so.3.9.0)
==28136== by 0x4019C1: main (glmain.c:262)
==28136==
==28136== Invalid read of size 8
==28136== at 0x4C2D940: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:915)
==28136== by 0xB4ADFE3: brw_bo_write (brw_batch.c:1856)
==28136== by 0xB5B3531: brw_buffer_data (intel_buffer_objects.c:208)
==28136== by 0xB0F6275: _mesa_buffer_data (bufferobj.c:1600)
==28136== by 0xB0F6346: _mesa_BufferData (bufferobj.c:1631)
==28136== by 0xB37A1EE: create_texture_for_pbo (meta_tex_subimage.c:103)
==28136== by 0xB37A467: _mesa_meta_pbo_TexSubImage (meta_tex_subimage.c:176)
==28136== by 0xB5C8D61: intelTexSubImage (intel_tex_subimage.c:195)
==28136== by 0xB254AB4: _mesa_texture_sub_image (teximage.c:3654)
==28136== by 0xB254C9F: texsubimage (teximage.c:3712)
==28136== by 0xB2550E9: _mesa_TexSubImage2D (teximage.c:3853)
==28136== by 0x401CA0: UploadTexSubImage2D (teximage.c:171)
==28136== Address 0xd8bfbe8 is 8 bytes after a block of size 1,024 alloc'd
==28136== at 0x4C28C20: malloc (vg_replace_malloc.c:296)
==28136== by 0x402014: PerfDraw (teximage.c:270)
==28136== by 0x402648: Draw (glmain.c:182)
==28136== by 0x8385E63: ??? (in /usr/lib/x86_64-linux-gnu/libglut.so.3.9.0)
==28136== by 0x83896C8: fgEnumWindows (in /usr/lib/x86_64-linux-gnu/libglut.so.3.9.0)
==28136== by 0x838641C: glutMainLoopEvent (in /usr/lib/x86_64-linux-gnu/libglut.so.3.9.0)
==28136== by 0x8386C1C: glutMainLoop (in /usr/lib/x86_64-linux-gnu/libglut.so.3.9.0)
==28136== by 0x4019C1: main (glmain.c:262)
==28136==
Fixes regression from commit 7f396189f0
Author: Jason Ekstrand <jason.ekstrand@intel.com>
Date: Mon Jan 5 18:17:04 2015 -0800
meta: Add a BlitFramebuffers-based implementation of TexSubImage
v2: However, the teximage we create does need to be width x full_height x 1
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jason Ekstrand <jason.ekstrand@intel.com>
Cc: Neil Roberts <neil@linux.intel.com>
Reviewed-by Neil Roberts <neil@linux.intel.com>
The src_reg constructor that received the glsl_type was using it
only to build the swizzle, but not to fill this->type as dst_reg
is doing.
This caused some type mismatch between movs and alu operations
on the NIR path, so copy propagation optimization was not applied
to remove unneeded movs if negate modifier was involved. This was
first detected on minus (negate+add) operations.
Shader DB results (taking into account only vec4):
total instructions in shared programs: 20019 -> 19934 (-0.42%)
instructions in affected programs: 2918 -> 2833 (-2.91%)
helped: 79
HURT: 0
GAINED: 0
LOST: 0
Reviewed-by: Matt Turner <mattst88@gmail.com>
Only a subset of AMD GPUs supported by r600g support doubles,
CAYMAN and CYPRESS are probably all we'll try and support, however
I don't have a CYPRESS so ignore that for now.
This disables SB support for doubles, as we think we need to
make the scheduler smarter to introduce delay slots.
[airlied: pushing this to avoid pain of rebasing, it mostly
works on cayman only so far, Glenn has some ideas about
delay slot issues we need to look into. turned off by
default for now]
Signed-off-by: Dave Airlie <airlied@redhat.com>
This patch is taken from work by Glenn and myself,
and I've spent some time making it all work here.
This adds support for the multiple streams part of
ARB_gpu_shader5 to r600g.
It doesn't enable ARB_gpu_shader5 yet.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds a peephole and removes an assert that isn't
actually valid with some of the stream emit instructions.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This just adds support to the assembler dumper and allows
stream instructions to be generated. Also fix up the stream
debugging to add stream info.
Signed-off-by: Dave Airlie <airlied@redhat.com>
The CTS packed_pixels test checks that readpixels doesn't write
into the space between rows, however we fail that here unless
we check the format and stride match.
This fixes all the core mesa problems with CTS packed_pixels
tests.
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The fastpath currently checks the RowLength != width, but
if you have a RowLength of 7, and Alignment of 4, then
that shouldn't match.
align the rowlength to the pack alignment before comparing.
This fixes compressed cases in CTS packed_pixels_pixelstore
test when SKIP_PIXELS is enabled, which causes row length
to get set.
v1.1: add fxt1 fix (Iago)
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
We don't need to use the 3d image address here as that will
include SKIP_IMAGES, and we are only blitting a single
2D anyways, so just use the 2D path.
This fixes some memory overruns under CTS
packed_pixels.packed_pixels_pixelstore when PACK_SKIP_IMAGES
is used.
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
GL3.3 added GL_ARB_texture_rgb10_a2ui, which specifies
a lot more things than just rgb10/a2ui.
While playing with ogl conform one of the tests must
attempted all valid formats for GL3.3 and hits the
unreachable here.
This adds the first chunk of formats that hit the
assert.
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
In a number of places the SwapBytes handling didn't handle cases with
GL_(UN)PACK_ALIGNMENT set and 7 byte width cases aligned to 8 bytes.
This adds a common routine to swap bytes a 2D image and uses this
code in:
texture storage
texture get
readpixels
swrast drawpixels.
[airlied: updated with Brian's nitpicks].
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Let it be defined externally instead, allowing setting mechanisms other
than environment variables.
Reviewed-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Matthew McClure <mcclurem@vmware.com>
The first function translates prim restart indexes to be 0xffff or
0xffffffff.
The second splits indexed primitives with restart indexes into sub-
primitives without restart indexes.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
This adds a tgsi utility tgsi_add_aa_point to transform a fragment shader
to support anti-aliased wide point by computing the fragment distance from
the point center. This utility assumes the geometry shader is emitting
an extra generic output with point coord data. The semantic index of
this generic output is passed to the tgsi_add_aa_point utility.
Reviewed-by: Brian Paul <brianp@vmware.com>
This adds a tgsi utility tgsi_add_point_sprite to transform a geometry
shader to emulate wide points by drawing quads. This utility adds an
extra output for the original point position if the point position is
to be written to a stream output buffer. It also assumes the driver will
add a constant for inverse viewport scale after the user defined constants.
Reviewed-by: Brian Paul <brianp@vmware.com>
This could be used by any driver where the device doesn't directly
support two-sided lighting. This code modifies a fragment shader
to accecpt back-face colors and choose between the front/back colors
depending on the triangle's front-face sign.
These functions deal with inclusive coordinates, hence a 0/0/0/0 rect
returned when there's no intersection doesn't actually represent an empty
rectangle. Hence return 0/-1/0/-1 instead.
This fixes some problems in llvmpipe with empty scissor rects (which up
to now didn't really matter because while the intersect test returned the
wrong result all pixels were scissored away later anyway).
It isn't really obvious if intersection test should take into account empty
rectangles or if the caller should do it. But it looks like most callers
actually verified one of the rects but not the other, but since correctly
returning an empty rect that other rect could actually be empty leading to
more bugs. Hence just verify both rects for emptyness in the intersection
test itself which makes the code easier in the caller (though it will be
slower if the caller knows the rectangles are non-empty).
Reviewed-by: Zack Rusin <zackr@vmware.com>
This patch adds some more helper functions such as
. tgsi_transform_temps_decl
. tgsi_transform_output_decl
. tgsi_transform_dst_reg
. tgsi_transform_src_reg
Reviewed-by: Brian Paul <brianp@vmware.com>
The border colors are uploaded only once when the state is created.
This brings truly immutable sampler descriptors, because they don't have
to be updated every time a sampler state is re-bound.
It also moves the TA_BC_BASE_ADDR registers to init_config, removing one
more state. The catch is there is now a limit: only 4096 border colors can
be used by one context. I don't think that will be a problem.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Since we don't put any resource descriptors in IBs, the space used by draw
calls is quite small.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
The main idea is to avoid setting CB_COLORi_INFO = 0 for i>0 repeatedly
when those colorbuffers aren't used. This is mainly for glamor.
Same for DB. Z_INFO and STENCIL_INFO need to be cleared only once.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
This mainly removes the cache misses when checking the dirty flags.
Not much else though.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
I need to initialize more atom IDs.
This adds 4 more si_init_atom calls, which simplifies the code.
(si_init_atom needs a different context type of the emit functions though)
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
- convert 16 states to 1 atom
- only emit 1 scissor if VIEWPORT_INDEX isn't written
- use only one packet when emitting consecutive scissors
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Fixes regression from
commit 8c17d53823
Author: Kenneth Graunke <kenneth@whitecape.org>
Date: Wed Apr 15 03:04:33 2015 -0700
i965: Make intel_emit_linear_blit handle Gen8+ alignment restrictions.
which adjusted the coordinates to be relative to the nearest cacheline.
However, this then offsets the coordinates by up to 63 and this may then
cause them to overflow the BLT limits. For the well aligned large
transfer case, we can use 32bpp pixels and so reduce the coordinates by
4 (versus the current 8bpp pixels). We also have to be more careful
doing the last line just in case it may exceed the coordinate limit.
Reported-and-tested-by: kaillasse91@hotmail.fr
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90734
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Ian Romanick <ian.d.romanick@intel.com>
Cc: Anuj Phogat <anuj.phogat@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
v2: fix detecting if the loop has any phi nodes after it.
v2: use nir_foreach_ssa_def() instead of nir_foreach_dest() when
checking for values live after the loop to catch const_load
instructions.
v2: fix handling return instructions
v2: add some documentation to loop_is_dead()
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We were already doing this internally for iterating over a function
implementation, so just expose it directly.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: use nir_cf_node_remove_after().
v2: use foreach_list_typed() instead of hardcoding a list walk.
v3: update to new control flow modification helpers.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: use nir_cf_node_remove_after() instead of our own broken thing.
v3: use the new control flow modification helpers.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I've been chasing a geom shader hang on rv635 since I wrote
r600 geom code, and finally I hacked some values from fglrx
in and I could run texelfetch without failures.
This is totally my fault as well, maths fail 101.
This makes geom shaders on r600 not fail heavily.
Cc: "10.6" "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
According to OpenGL ES 3.1 specification, section 9.2.1 for
glFramebufferParameter and section 9.2.3 for glGetFramebufferParameteriv:
"An INVALID_ENUM error is generated if pname is not FRAMEBUFFER_DEFAULT_WIDTH,
FRAMEBUFFER_DEFAULT_HEIGHT, FRAMEBUFFER_DEFAULT_SAMPLES, or
FRAMEBUFFER_DEFAULT_FIXED_SAMPLE_LOCATIONS."
Therefore exclude OpenGL ES 3.1 from using the GL_FRAMEBUFFER_DEFAULT_LAYERS
parameter.
Signed-off-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Kevin Rogovin <kevin.rogovin at intel.com>
This *should* ensure that the cursor gets properly advanced in all cases.
We had a problem before where, if the cursor was created using
nir_after_cf_node on a non-block cf_node, that would call nir_before_block
on the block following the cf node. Instructions would then get inserted
in backwards order at the top of the block which is not at all what you
would expect from nir_after_cf_node. By just resetting to after_instr, we
avoid all these problems.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This function's cases for non-generic compressed formats duplicate
the GL to MESA translation in _mesa_glenum_to_compressed_format().
This patch replaces the switch cases with a call to the translation
function. This change teaches this function about ASTC, thus enabling
ASTC for glTex*Storage*() calls.
Reviewed-by: Chad Versace <chad.versace@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
MESA_FORMAT_RGBA_DXT5 should actually be reserved for GL_RGBA[4]_DXT5_S3TC.
Also, Gallium and other dri drivers (radeon and nouveau) follow this mapping
scheme.
Reviewed-by: Chad Versace <chad.versace@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
As Glenn did for finalize_loop we need to update_cf when we
add a POP at the end of a shader.
I think this fixes one of the earlier shader going off end
of memory problems we've stopped.
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Cc: "10.6" "11.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The docs specifically call out SEL with .l and .ge as the
implementations of MIN and MAX respectively. Among other things,
SEL with these conditional mods are commutative.
See commit 3b7f683f.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Certain compressed formats require this setting. The docs don't go into much
detail as to why it's needed exactly.
This patch introduces no piglit regressions on gen9 (bsw is untested). Note that
the SKL "regressions" are fixed tests, and the egl_khr_gl_colorspace tests are
WTF. The patch also fixes nothing I can find.
http://otc-mesa-ci.jf.intel.com/job/Leeroy/127820/
v2:
Reworded commit message (Matt); Added piglit results link.
Restructured condition (Matt)
Moved check out to function (Nanley). I left the setting of the bit in the
surface state open coded because it seems to go better with the existing code.
v3:
Use and inline function only in gen8_emit_texture_surface_state() (Matt).
Cc: Matt Turner <mattst88@gmail.com>
Cc: Nanley Chery <nanleychery@gmail.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This can be done with a single pass for the instruction base,
and takes renumber_registers out of its spot on the profile.
Acked-by: Marek Olšák <marek.olsak@amd.com
Signed-off-by: Dave Airlie <airlied@redhat.com>
The glsl->tgsi convertor does some temporary register reduction
however in profiling shader-db this shows up quite highly,
so optimise things to reduce the number of loops through
all the instructions we do. This drops merge_registers
from 4-5% on the profile to 1%. I think this can be reduced
further by possibly optimising the renumber pass.
Acked-by: Marek Olšák <marek.olsak@amd.com
Signed-off-by: Dave Airlie <airlied@redhat.com>
Instead of looking this up lots, lets just cache it in the instruction
translation up front. I just noticed this function what high in a profile
of shader-db on radeonsi.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The selector is shared by all shader variants, so the
individual shaders shouldn't change it. Use tgsi_shader_scan()
results to set geometry properties within a
r600_create_shader_state() call and treat said propertices in
the selector as read-only within r600_shader_from_tgsi().
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Note that 'geometry shader properties' should be carried in the
selector state over the shader state in any case.
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
The hardware is capable of dealing with GL1-style user clip planes.
No clip vertex, no clip distances. Fixes a number of ucp tests, as well
as neverball.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
According to NVIDIA, local performance counters (MP) are prefixed
with SM, while global performance counters (PCOUNTER) are called PM.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Otherwise this will crash on 32-bit, and it gets rid of
warnings building on 32-bit.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This greatly improves generated code, especially for the snorm variants,
since it is able to get rid of the lshift/rshift for sext, as well as
replacing each shift + mask with a single op.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Matt Turner <mattst88@gmail.com>
It is fairly tricky to detect the proper conditions for using bitfield
insert, but easy to just use it up front. This removes a lot of
instructions on nvc0 when invoking the packing builtins.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Matt Turner <mattst88@gmail.com>
No need to walk through instructions in blocks we know don't contain our
registers' live ranges.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I always thought that the is_control_flow() -> return false check was a
bad hack, and some previous attempts to remove it have failed and have
been reverted.
The previous two patches fix some problems that caused register
coalescing to not notice some interference between registers, which the
is_control_flow() check apparently works around.
With that fixed, we can calculate interference more accurately.
total instructions in shared programs: 6261319 -> 6257917 (-0.05%)
instructions in affected programs: 346282 -> 342880 (-0.98%)
helped: 1552
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
equals() returns false for registers with different types, using it
isn't appropriate to determine whether an is overwriting a register.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Noticed when debugging things that lead to the next patch.
On G45 (and presumably ILK) this helps register coalescing:
total instructions in shared programs: 4077373 -> 4077340 (-0.00%)
instructions in affected programs: 43751 -> 43718 (-0.08%)
helped: 52
HURT: 2
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Zero sized uniforms can exist in the list, but they don't get get any space
allocated in prog_data->params or in the param_size array, so the size
should not be set for them. This was previously fixed in:
commit: 781dc7c0e1.
However,
commit: 259f7291de
removed the fix.
Signed-off-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
If the sampler object has been deleted in the same context the binding
will have been cleared. If it has been deleted in another context, the
spec does not say what should returned. None of the other binding point
queries check for deletion in another context.
Also, as names of deleted objects are free for reuse, the current code
didn't even work reliably.
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
This adds index queries (glGet*i_v) for GL_TEXTURE_BINDING_* and
GL_SAMPLER_BINDING, as well as textue queries
(glGetTex{,ture}Parameter*) for GL_TEXTURE_TARGET.
CC: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
Shaders that contain instruction data after an instruction with EOP could end
up parsing that as an instruction, leading to various crashes and asserts in
SB as it gets very confused if it sees for instance a loop start instruction
jumping off to some random point.
Add a couple of asserts, and print EOP bit if set in old asm printer.
Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Cube maps are special in that they have separate teximages for each
face. We handled that by copying the data to them separately, but in
case zoffset != 0 or depth != 6 we would read off the end of the client
array or modify the wrong images.
zoffset/depth have already been verified by the time the code gets to
this stage, so no need to double-check.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
The NIR cursor API is exactly what we want for the builder's insertion
point. This simplifies the API, the implementation, and is actually
more flexible as well.
This required a bit of reworking of TGSI->NIR's if/loop stack handling;
we now store cursors instead of cf_node_lists, for better or worse.
v2: Actually move the cursor in the after_instr case.
v3: Take advantage of nir_instr_insert (suggested by Connor).
v4: vc4 build fixes (thanks to Eric).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net> [v1]
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> [v4]
Acked-by: Connor Abbott <cwabbott0@gmail.com> [v4]
This patch implements a general nir_instr_insert() function that takes a
nir_cursor for the insertion point. It then reworks the existing API to
simply be a wrapper around that for compatibility.
This largely involves moving the existing code into a new function.
Suggested by Connor Abbott.
v2: Make the legacy functions static inline in nir.h (requested by
Connor Abbott).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Acked-by: Connor Abbott <cwabbott0@gmail.com>
We want to use this for normal instruction insertion too, not just
control flow. Generally these functions are going to be extremely
useful when working with NIR, so I want them to be widely available
without having to include a separate file.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Acked-by: Connor Abbott <cwabbott0@gmail.com>
Jumps must be the last instruction in a block, so inserting another
instruction after a jump is illegal.
Previously, we only checked this when the new instruction being inserted
was a jump. This is a red herring - inserting *any* kind of instruction
after a jump is illegal.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Acked-by: Connor Abbott <cwabbott0@gmail.com>
Previously, we used PROGRAM_ARRAY only for variables which were
arrays or matrices. But if the variable is a structure containing
an array or matrix, we need to use PROGRAM_ARRAY for that too.
Before, we failed an assertion:
state_tracker/st_glsl_to_tgsi.cpp:4900:
Assertion `src_reg->file != PROGRAM_TEMPORARY' failed.
when running the piglit test
glsl-1.20/execution/fs-const-array-of-struct-of-array.shader_test
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The split_virtual_grfs code doesn't properly rewrite reladdr so we need to
make sure that any uniform indirects are lowered away first.
This fixes the glsl-fs-uniform-indexed-by-swizzled-vec4.shader_test in piglit
Cc: "10.6" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now that all constant locations are assigned in a single function, we can
refactor it a bit to unify things. In particular, we now handle
pull_constant_loc and push_constant_loc more similarly and we only modify
stage_prog_data->params[] in one place at the end of the function.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
driParseDebugString() doesn't have actual code to parse comma separated
lists (or any other supported options?); instead it dumbly uses strstr().
This means that INTEL_DEBUG="vec4vs" will trigger both DEBUG_VEC4VS and
DEBUG_VS, as "vs" is also a substring.
We should probably improve the driconf parsing, but for now, just rename
the option so it's usable in the meantime.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Jason Ekstrand <jason.ekstrand@intel.com>
Acked-by: Kristian Høgsberg <krh@bitplanet.net>
v2: use ARB_texture_multisample enable bit
Patch adds extension enable bit and enables required keywords
and builtin functions for the extension.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This patch creates a new macro, FETCH_COMPRESSED - similar in nature
to the other FETCH_* macros. This reduces repetition in the code that
deals with compressed textures.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
In agreement with the coding style, functions that aren't directly visible
to the GL API should prefer the use of bool over GLboolean.
Suggested-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Remove redundant checks and comments by grouping our calculations for
align_w and align_h wherever possible.
v2: reintroduce brw.
don't include functional changes.
don't adjust function parameters or create a new function.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
An ASTC block takes up 16 bytes for all block width and height configurations.
This size is not integrally divisible by all ASTC block widths. Therefore cpp
is changed to mean bytes per block if the texture is compressed.
Because the original definition was bytes per block divided by block width, all
references to the mipmap width must be divided the block width. This keeps the
address calculation formulas consistent. For example, the units for miptree_level
x_offset and miptree total_width has changed from pixels to blocks.
v2: reuse preexisting ALIGN_NPOT macro located in an i965 driver file.
v3: move ALIGN_NPOT into seperate commit.
simplify cpp assignment in copy_image_with_blitter().
update miptree width and offset variables in: intel_miptree_copy_slice(),
intel_miptree_map_gtt(), and brw_miptree_layout_texture_3d().
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
In agreement with commit 4ab8d59a23, vertical alignment values are equal to
four times the block height on Gen9+.
v2: add newlines to separate declarations, statments, and comments.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
ALIGN is changed to ALIGN_NPOT because alignment values are sometimes not
powers of two when working with ASTC.
v2: handle texture arrays and LDR-only systems.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Aligning with a non-power-of-two number is a general task that can be used in
various places. This commit is required for the next one.
v2: add greater than 0 assertion (Anuj).
convert the macro to a static inline function.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
ALIGN and ROUND_DOWN_TO both require that the alignment value passed
into the macro be a power of two in the comments. Using software assertions
verifies this to be the case.
v2: use static inline functions instead of gcc-specific statement expressions (Brian).
v3: fix indendation (Brian).
v4: add greater than zero requirement (Anuj).
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Define two-thirds of the 2D Intel ASTC surface formats (LDR-only). This allows
a 1-to-1 mapping from the mesa format to the Intel format.
ASTC textures will default to being processed in LDR mode. If there is
hardware support for HDR/Full mode and the texture is not sRGB, add the
format bit necessary to process it in HDR/Full mode.
v2: remove extra newlines.
v3: follow existing coding style in translate_tex_format().
v4: expound on the GEN9_SURFACE_ASTC_HDR_FORMAT_BIT comment.
update SF table - ASTC is actually supported in Gen8.
v5: conform the ASTC MESA_FORMAT enums to the existing naming convention.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
This is necesary to initialize the gl_texture_image struct.
From the KHR_texture_compression_astc_ldr spec:
"Added to Section 3.8.6, Compressed Texture Images
Add the tokens specified above to Table 3.16, Compressed Internal Formats.
In all cases, the base internal format will be RGBA. The encoding allows
images to be encoded with fewer channels, but this is always presented as
RGBA to the sampler."
v2. use _mesa_is_astc_format().
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
The ASTC spec was revised as follows:
Revision 2, April 28, 2015 - added CompressedTex{Sub,}Image3D to
commands accepting ASTC format tokens in the New Tokens section [...].
Support only exists in the HDR submode:
Add a second new column "3D Tex." which is empty for all non-ASTC
formats. If only the LDR profile is supported by the implementation,
this column is also empty for all ASTC formats. If both the LDR and HDR
profiles are supported only, this column is checked for all ASTC
formats.
LDR-only systems should generate an INVALID_OPERATION error when
attempting to call CompressedTexImage3D with the TEXTURE_3D target.
v2. return the proper error for LDR-only systems.
v3. update is_astc_format().
v4. use _mesa_is_astc_format().
v5. place logic in _mesa_target_can_be_compressed.
v6. fix issues handling ASTC formats.
Reviewed-by: Chad Versace <chad.versace@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
In agreement with the ASTC spec, this makes calls to TexImage*D unsuccessful.
Implied by the spec, Generate[Texture]Mipmap and [Copy]Tex[Sub]Image*D calls
must be unsuccessful as well.
v2. actually force attempts to compress online to fail.
v3. indentation (Matt).
v4. update copytexture_error_check to account for CopyTexImage*D (Chad).
Reviewed-by: Chad Versace <chad.versace@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
v2: correct the spelling of the sRGB variants.
remove spaces around "=" when setting the enum value.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Define the mesa formats and make changes necessary for compilation
without errors. Also add support for _mesa_get_srgb_format_linear().
v2. conform the ASTC MESA_FORMAT enums to the existing naming convention.
v3. remove ASTC cases for _mesa_get_uncompressed_format(). This function is
only used for generating mipmaps - something ASTC formats do not support
due to lack of online compression.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
If the packet encoding is defined in the same format as register definitions,
the python script can process them automatically and the parser support
becomes trivial.
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
This adds trace points to all IBs and the parser prints them and also
prints which trace points were reached (executed) by the CP.
This can help pinpoint a problematic packet, draw call, etc.
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
GPU hang detection must be enabled by setting: GALLIUM_DDEBUG=[timeout in ms]
This may print too much information that we might not understand yet,
but some of the bits are very useful.
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
This makes writing a good IB parser a lot easier.
It generates 2 tables:
- packet3 table
- register table with all registers, fields, and named values
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Instruction encoding isn't needed in Mesa.
The border color address registers were duplicated.
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
v2: lots of improvements
This is like identity or trace, but simpler. It doesn't wrap most states.
Run with:
GALLIUM_DDEBUG=1000 [executable]
where "executable" is the app and "1000" is in miliseconds, meaning that
the context will be considered hung if a fence fails to signal in 1000 ms.
If that happens, all shaders, context states, bound resources, draw
parameters, and driver debug information (if any) will be dumped into:
/home/$username/dd_dumps/$processname_$pid_$index.
Note that the context is flushed after every draw/clear/copy/blit operation
and then waited for to find the exact call that hangs.
You can also do:
GALLIUM_DDEBUG=always
to do the dumping after every draw/clear/copy/blit operation without
flushing and waiting.
Examples of driver states that can be dumped are:
- Hardware status registers saying which hw block is busy (hung).
- Disassembled shaders in a human-readable form.
- The last submitted command buffer in a human-readable form.
v2: drop pipe-loader changes, drop SConscript
rename dd.h -> dd_pipe.h
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
This works if drivers upsample on upload (like all radeon ones do).
The alternative is an unexpected GL error from anything calling
_mesa_update_state and possibly other issues.
Cc: 10.6 11.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Otherwise we get:
warning: 'num_user_sgprs' may be used uninitialized in this function
...
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Patch refactors existing parameters check to first check common enums
between desktop GL and GLES 3.1 and modifies get_tex_level_parameter_image
to be compatible with enums specified in 3.1.
v2: remove extra is_gles31() checks (suggested by Ilia)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> (v1)
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com> (v1)
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
According to OpenGL ES specification section 7.12,
GL_COMPUTE_WORK_GROUP_SIZE, is supported by the
glGetProgramiv function.
Signed-off-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
According to the OpenGL ES 3.1 specification chapter 17, the
MAX_COMPUTE_WORK_GROUP_COUNT and MAX_COMPUTE_WORK_GROUP_SIZE
is available for glGetIntegeri_v.
Signed-off-by: Marta Lofstedt <marta.lofstedt@linux.intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
commit 26c549e69d
Author: Nanley Chery <nanley.g.chery@intel.com>
Date: Fri Jul 31 10:26:36 2015 -0700
mesa/formats: remove compressed formats from matching function
caused a regression in my CTS testing, this looks like a clear
thinko.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
sSigned-off-by: Dave Airlie <airlied@redhat.com>
I used this as some testing ground for investigating some compiler
bits initially (e.g. lrint calls etc.), figured I could do much better
in the end just for fun...
This is mathematically equivalent, but uses some tricks to avoid
doubles and also replaces some float math with ints. Good for another
performance doubling or so. As a side note, some quick tests show that
llvm's loop vectorizer would be able to properly vectorize this version
(which it failed to do earlier due to doubles, producing a mess), giving
another 3 times performance increase with sse2 (more with sse4.1), but this
may not apply to mesa.
No piglit change.
Acked-by: Marek Olšák <marek.olsak@amd.com>
This code (lifted straight from the extension) was doing things the most
inefficient way you could think of.
This drops some of the more expensive float operations, in particular
- int-cast floors (pointless, values always positive)
- 2 raised to (signed) integers (replace with simple exponent manipulation),
getting rid of a misguided comment in the process (implement with table...)
- float division (replace with mul of reverse of those exponents)
This is like 3 times faster (measured for float3_to_rgb9e5), though it depends
(e.g. llvm is clever enough to replace exp2 with ldexp whereas gcc is not,
division is not too bad on cpus with early-exit divs).
Note that keeping the double math for now (float x + 0.5), as the results may
otherwise differ.
Acked-by: Marek Olšák <marek.olsak@amd.com>
GetTexImage can read to stencil8 but only from
a stencil or depthstencil textures.
This fixes a bunch of failures in CTS
GL33-CTS.gtf32.GL3Tests.packed_pixels
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Enables _mesa_target_can_be_compressed to return the appropriate GL error
depending on it's inputs. Use the parameter to return the appropriate GL error
for ETC2 formats on GLES3.
Suggested-by: Chad Versace <chad.versace@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
All compressed formats return GL_FALSE and there isn't any evidence to
support that this behaviour would change. Remove all switch cases for
compressed formats.
v2. Since the exhaustive switch is removed, add a gtest to ensure
all formats are handled.
v3. Ensure that GL_NO_ERROR is set before returning.
v4. Fix an arg to _mesa_uncompressed_format_to_type_and_comps();
fix formatting and misc improvements (Chad).
Reviewed-by: Chad Versace <chad.versace@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
We currently check that our format info table is sane during context
initialization in debug builds. Perform this check during
`make check` instead. This enables format testing in release builds
and removes the requirement of an exhuastive switch for
_mesa_uncompressed_format_to_type_and_comps().
v2. indentation and conditional inclusion fixes (Chad).
allow tests to continue running if any format fails
and display the failing format name.
Reviewed-by: Chad Versace <chad.versace@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
I intend to remove nir_builder::cf_node_list, so I can't have this code
poking at it directly. The proper way is to set the insertion point and
then simply insert things there.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
I intend to remove nir_builder::cf_node_list, so I can't have this code
poking at it directly. The proper way is to set the insertion point and
then simply insert things there.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This makes it easy for NIR passes to inspect what kind of shader they're
operating on.
Thanks to Michel Dänzer for helping me figure out where TGSI stores the
shader stage information.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
The comment above move_uniform_array_access_to_pull_constants was
completely bogus because it has nothing to do with lowering instructions.
Instead, it's assiging locations of pull constants.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, we treated the entire UNIFORM file as if it had two elements:
One for direct things and one for indirect. This is substantially
different from how the old visitor code handled it where each element was
effectively its own uniform. This commit makes the NIR path more like the
old ir_visitor path where each uniform is separate. This should allow us
to more easily make decisions about what to push.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In the i965 backend, we want to be able to "pull apart" the uniforms and
push some of them into the shader through a different path. In order to do
this effectively, we need to know which variable is actually being referred
to by a given uniform load. Previously, it was completely flattened by
nir_lower_io which made things difficult. This adds more information to
the intrinsic to make this easier for us.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, there were four type_size() functions in play - the i965
compiler backend defined scalar and vec4 type_size() functions, and
nir_lower_io contained its own similar functions.
In fact, the i965 driver used nir_lower_io() and then looped over the
components using its own type_size - meaning both were in play. The
two are /basically/ the same, but not exactly in obscure cases like
subroutines and images.
This patch removes nir_lower_io's functions, and instead makes the
driver supply a function pointer. This gives the driver ultimate
flexibility in deciding how it wants to count things, reduces code
duplication, and improves consistency.
v2 (Jason Ekstrand):
- One side-effect of passing in a function pointer is that nir_lower_io is
now aware of and properly allocates space for image uniforms, allowing
us to drop hacks in the backend
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
v2 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If there are no parameters, we don't need to create a nir_variable to
hold them...and allocating an array of length 0 is pretty bogus.
Should avoid i965 backend assertions in future patches Jason and I are
working on.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
I want to use C function pointers to these, and they don't use anything
in the visitor classes anyway.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This way they don't implicitly increment the uniforms variable and don't
have to be called in-sequence during uniform setup.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The commit:
commit b49371b8ed
Author: Connor Abbott <cwabbott0@gmail.com>
AuthorDate: Tue Jul 21 19:54:18 2015 -0700
nir: move control flow modification to its own file
split out some control flow related APIs into a separate header, but did
not update drivers.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
The commit:
commit 8e0d4ef341
Author: Kenneth Graunke <kenneth@whitecape.org>
AuthorDate: Thu Aug 6 18:18:40 2015 -0700
nir: Delete the nir_function_impl::start_block field.
removed the start_block field without fixing up drivers..
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Connor introduced this helper recently; we should use it here too.
I had to move the function earlier in the file for it to be available.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This gives us some testing of it. Also, the old nir_cf_node_remove()
wasn't handling phi nodes correctly and was calling cleanup_cf_node()
too late.
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These will help us do a number of things, including:
- Early return elimination.
- Dead control flow elimination.
- Various optimizations, such as replacing:
if (foo) {
...
}
if (!foo) {
...
}
with:
if (foo) {
...
} else {
...
}
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is a helper that will be shared between the new control flow
insertion and modification code.
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
For now, it allows us to refactor the control flow insertion API's so
that there's a single entrypoint (with some wrappers). More importantly,
it will allow us to reduce the combinatorial explosion in the extract
function. There, we need to specify two points to extract, which may be
at the beginning of a block, the end of a block, or in the middle of a
block. And then there are various wrappers based off of that (before a
control flow node, before a control flow list, etc.). Rather than having
9 different functions, we can have one function and push the actual
logic of determining which variant to use down to the split function,
which will be shared with nir_cf_node_insert().
In the future, we may want to make the instruction insertion API's as
well as the builder use this, but that's a future cleanup.
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When we insert a single basic block A into another basic block B, we
will split B into C and D, insert A in the middle, and then splice
together C, A, and D. When we splice together C and A, we need to move
the successors of A into C -- except A has no successors, since it
hasn't been inserted yet. So in move_successors(), we need to handle the
case where the block whose successors are to be moved doesn't have any
successors. Fixing link_blocks() here prevents a segfault and makes it
work correctly.
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We may delete a control flow node which contains structured jumps to
other parts of the program. We need to remove the jump as a predecessor,
as well as remove any phi node sources which reference it. Right now,
the same problem exists for blocks that don't end in a jump instruction,
but with the new API it shouldn't be an issue, since blocks that don't
end in a jump must either point to another block in the same extracted
CF list or not point to anything at all.
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Unlike calling nir_instr_remove(), calling nir_cf_node_remove() (and
later in the series, the nir_cf_list_delete()) implies that you're
removing instructions that may still have uses, except those
instructions are never executed so any uses will be undefined. When
cleaning up a CF node for deletion, we must clean up any uses of the
deleted instructions by making them point to undef instructions instead.
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In particular, handle the case where the earlier block ends in a jump
and the later block is empty. In that case, we want to preserve the jump
and remove any traces of the later block. Before, we would only hit this
case when removing a control flow node after a jump, which wasn't a
common occurance, but we'll need it to handle inserting a control flow
list which ends in a jump, which should be more common/useful.
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Before, we would only split a block with a jump at the end if we were
inserting something after a block with a jump, which never happened in
practice. But now, we want to use this to extract control flow lists
which may end in a jump, in which case we really need to do the correct
patching up. As a side effect, when removing jumps we now correctly
insert undef phi sources in some corner cases, which can't hurt.
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Before, the process of removing a jump and wiring up the remaining block
correctly was atomic, but with the new control flow modification it's
split into two parts: first, we extract the jump, which creates a new
block with re-wired successors as well as a free-floating jump, and then
we delete the control flow containing the jump, which removes the entry
in the predecessors and any phi node sources. Split up
nir_handle_remove_jumps() to accomodate this, and add the missing
support for removing phi node sources.
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We want to start reworking and expanding this code, but it'll be a lot
easier to do once we disentangle it from the rest of the stuff in nir.c.
Unfortunately, there are a few unavoidable dependencies in nir.c on
methods we'd rather not expose publicly, since if not used in very
specific situations they can cause Bad Things (tm) to happen. Namely, we
need to do some magical control flow munging when adding/removing jumps.
In the future, we may disallow adding/removing jumps in
nir_instr_insert_*() and nir_instr_remove(), and use separate functions
that are part of the control flow modification code, but for now we
expose them and put them in a separate, private header.
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
cleanup_cf_node() is part of the control flow modification code, which
we're going to split into its own file, but remove_defs_uses() is an
internal function used by nir_instr_remove(). Break the dependency by
making cleanup_cf_node() use nir_instr_remove() instead, which simply
calls remove_defs_uses() and then removes the instruction from the list.
nir_instr_remove() does do extra things for jumps, though, so we avoid
calling it on jumps which matches the previous behavior (this will be
fixed later in the series).
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It was being used to initialize function impls and loops, even though
it's really a control flow modification helper. It's pretty trivial, so
just inline it to avoid the dependency.
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It's simply the first nir_cf_node in the nir_function_impl::body list,
which is easy enough to access - we don't to store a pointer to it
explicitly. Removing it means we don't need to maintain the pointer
when, say, splitting the start block when modifying control flow.
Thanks to Connor Abbott for suggesting this.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Only uncompressed formats have a non-void type and actual
components per pixel. Rename _mesa_format_to_type_and_comps
to _mesa_uncompressed_format_to_type_and_comps and require
callers to check if the format is not compressed.
v2. include compressed format cases to avoid gcc warnings (Chad).
Reviewed-by: Chad Versace <chad.versace@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
On the older platforms where we don't have logical contexts preserving
state across batches, we emit the invariant state setup on every batch
using the brw_invariant_state atom. This includes the pipeline selection
which is cached with the introduction of
commit 0e0e23ef53
Author: Jordan Justen <jordan.l.justen@intel.com>
Date: Wed Apr 22 11:43:50 2015 -0700
i965/state: Emit pipeline select when changing pipelines
However, we do not reset the cache between batches on context-less
platforms resulting in us not setting the pipeline selection and can
cause GPU hangs if a media pipelined was loaded in the meantime (e.g.
mixing mplayer/gstreamer using libva and gnome-shell). A simple solution
is to just forcibly re-emit the pipeline select along with the invariant
state and reset the cache at that point.
Reported-and-tested-by: Tomasz C. <tomaszc@o2.pl>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91254
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
This reverts commit 567394112d.
It regressed performance. It looks like smaller IBs are better, because
the GPU goes idle quicker and there is less waiting for buffers and fences.
Cc: 11.0 <mesa-stable@lists.freedesktop.org>
When the edge flag element is enabled then the elements are slightly
reordered so that the edge flag is always the last one. This was
confusing the code to upload the 3DSTATE_VF_INSTANCING state because
that is uploaded with a separate loop which has an instruction for
each element. The indices used in these instructions weren't taking
into account the reordering so the state would be incorrect.
v2: Use nr_elements instead of brw->vb.nr_enabled so that it will cope
when gl_VertexID is used.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91292
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Tested-by: Mark Janes <mark.a.janes@intel.com>
The edge flag data on Gen6+ is passed through the fixed function hardware as
an extra attribute. According to the PRM it must be the last valid
VERTEX_ELEMENT structure. However if the vertex ID is also used then another
extra element is added to source the VID. This made it so the vertex ID is in
the wrong register in the vertex shader and the edge attribute is no longer in
the last element.
v2: Also implement for BDW+
v3 [by Ben]: Remove 10.5 tag. Too late.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84677
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Tested-by: Ben Widawsky <ben@bwidawsk.net>
Tested-by: Mark Janes <mark.a.janes@intel.com>
GL_OES_copy_image not started (based on GL_ARB_copy_image, which is done for some drivers)
GL_OES_draw_buffers_indexed not started
GL_OES_draw_elements_base_vertex not started (based on GL_ARB_draw_elements_base_vertex, which is done for all drivers)
GL_OES_geometry_shader not started (based on GL_ARB_geometry_shader4, which is done for all drivers)
GL_OES_gpu_shader5 not started (based on parts of GL_ARB_gpu_shader5, which is done for some drivers)
GL_OES_primitive_bounding box not started
GL_OES_sample_shading not started (based on parts of GL_ARB_sample_shading, which is done for some drivers)
GL_OES_sample_variables not started (based on parts of GL_ARB_sample_shading, which is done for some drivers)
GL_OES_shader_image_atomic not started (based on parts of GL_ARB_shader_image_load_store, which is done for some drivers)
GL_OES_shader_io_blocks not started (based on parts of GLSL 1.50, which is done)
GL_OES_shader_multisample_interpolation not started (based on parts of GL_ARB_gpu_shader5, which is done)
GL_OES_tessellation_shader not started (based on GL_ARB_tessellation_shader, which is done for some drivers)
GL_OES_texture_border_clamp not started (based on GL_ARB_texture_border_clamp, which is done)
GL_OES_texture_buffer not started (based on GL_ARB_texture_buffer_object, GL_ARB_texture_buffer_range, and GL_ARB_texture_buffer_object_rgb32 that are all done)
GL_OES_texture_cube_map_array not started (based on GL_ARB_texture_cube_map_array, which is done for all drivers)
GL_OES_texture_stencil8 not started (based on GL_ARB_texture_stencil8, which is done for some drivers)
GL_OES_texture_storage_multisample_2d_array DONE (all drivers that support GL_ARB_texture_multisample)
More info about these features and the work involved can be found at
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=90925">Bug 90925</a> - "high fidelity": Segfault in _mesa_program_resource_find_name</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91254">Bug 91254</a> - (regresion) video using VA-API on Intel slow and freeze system with mesa 10.6 or 10.6.1</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91292">Bug 91292</a> - [BDW+] glVertexAttribDivisor not working in combination with glPolygonMode</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91673">Bug 91673</a> - Segfault when calling glTexSubImage2D on storage texture to bound FBO</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91726">Bug 91726</a> - R600 asserts in tgsi_cmp/make_src_for_op3</li>
</ul>
<h2>Changes</h2>
<p>Chris Wilson (2):</p>
<ul>
<li>i965: Prevent coordinate overflow in intel_emit_linear_blit</li>
<li>i965: Always re-emit the pipeline select during invariant state emission</li>
</ul>
<p>Daniel Scharrer (1):</p>
<ul>
<li>mesa: add missing queries for ARB_direct_state_access</li>
</ul>
<p>Dave Airlie (8):</p>
<ul>
<li>mesa/arb_gpu_shader_fp64: add support for glGetUniformdv</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=90621">Bug 90621</a> - Mesa fail to build from git</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91526">Bug 91526</a> - World of Warcraft (on Wine) has UI corruption with nouveau</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91719">Bug 91719</a> - [SNB,HSW,BYT] dEQP regressions associated with using NIR for vertex shaders</li>
</ul>
<h2>Changes</h2>
<p>Alejandro Piñeiro (1):</p>
<ul>
<li>i965/vec4: fill src_reg type using the constructor type parameter</li>
</ul>
<p>Antia Puentes (1):</p>
<ul>
<li>i965/vec4: Fix saturation errors when coalescing registers</li>
</ul>
<p>Emil Velikov (2):</p>
<ul>
<li>docs: add sha256 checksums for 10.6.7</li>
<li>cherry-ignore: add commit non applicable for 10.6</li>
</ul>
<p>Hans de Goede (4):</p>
<ul>
<li>nv30: Fix creation of scanout buffers</li>
<li>nv30: Implement color resolve for msaa</li>
<li>nv30: Fix max width / height checks in nv30 sifm code</li>
<li>nv30: Disable msaa unless requested from the env by NV30_MAX_MSAA</li>
</ul>
<p>Ian Romanick (2):</p>
<ul>
<li>mesa: Pass the type to _mesa_uniform_matrix as a glsl_base_type</li>
<li>mesa: Don't allow wrong type setters for matrix uniforms</li>
</ul>
<p>Ilia Mirkin (5):</p>
<ul>
<li>st/mesa: don't fall back to 16F when 32F is requested</li>
<li>nvc0: always emit a full shader colormask</li>
<li>nvc0: remove BGRA4 format support</li>
<li>st/mesa: avoid integer overflows with buffers >= 512MB</li>
<li>nv50, nvc0: fix max texture buffer size to 128M elements</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=38109">Bug 38109</a> - i915 driver crashes if too few vertices are submitted (Mesa 7.10.2)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=55552">Bug 55552</a> - Compile errors with --enable-mangling</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=66346">Bug 66346</a> - shader_query.cpp:49: error: invalid conversion from 'void*' to 'GLuint'</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=73512">Bug 73512</a> - [clover] mesa.icd. should contain full path</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=73528">Bug 73528</a> - Deferred lighting in Second Life causes system hiccups and screen flickering</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=74329">Bug 74329</a> - Please expose OES_texture_float and OES_texture_half_float on the ES3 context</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=80500">Bug 80500</a> - Flickering shadows in unreleased title trace</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=82186">Bug 82186</a> - [r600g] BARTS GPU lockup with minecraft shaders</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=84677">Bug 84677</a> - Triangle disappears with glPolygonMode GL_LINE</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=85252">Bug 85252</a> - Segfault in compiler while processing ternary operator with void arguments</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=89131">Bug 89131</a> - [Bisected] Graphical corruption in Weston, shows old framebuffer pieces</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=90073">Bug 90073</a> - Leaks in xcb_dri3_open_reply_fds() and get_render_node_from_id_path_tag</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=90249">Bug 90249</a> - Fails to build egl_dri2 on osx</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=90310">Bug 90310</a> - Fails to build gallium_dri.so at linking stage with clang because of multiple redefinitions</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=90347">Bug 90347</a> - [NVE0+] Failure to insert texbar under some circumstances (causing bad colors in Terasology)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=90466">Bug 90466</a> - arm: linker error ndefined reference to `nir_metadata_preserve'</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=90520">Bug 90520</a> - Register spilling clobbers registers used elsewhere in the shader</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=90797">Bug 90797</a> - [ALL bisected] Mesa change cause performance case manhattan fail.</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=90817">Bug 90817</a> - swrast fails to load with certain remote X servers</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=90830">Bug 90830</a> - [bsw bisected regression] GPU hang for spec.arb_gpu_shader5.execution.sampler_array_indexing.vs-nonzero-base</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=90839">Bug 90839</a> - [10.5.5/10.6 regression, bisected] PBO glDrawPixels no longer using blit fastpath</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=90873">Bug 90873</a> - Kernel hang, TearFree On, Mate desktop environment</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=90887">Bug 90887</a> - PhiMovesPass in register allocator broken</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=90895">Bug 90895</a> - [IVB/HSW/BDW/BSW Bisected] GLB2.7 Egypt, GfxBench3.0 T-Rex & ALU and many SynMark cases performance reduced by 10-23%</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91047">Bug 91047</a> - [SNB Bisected] Messed up Fog in Super Smash Bros. Melee in Dolphin</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91056">Bug 91056</a> - The Bard's Tale (2005, native) has rendering issues</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91077">Bug 91077</a> - dri2_glx.c:1186: undefined reference to `loader_open_device'</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91117">Bug 91117</a> - Nimbus (running in wine) has rendering issues, objects are semi-transparent</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91124">Bug 91124</a> - Civilization V (in Wine) has rendering issues: text missing, menu bar corrupted</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91222">Bug 91222</a> - lp_test_format regression on CentOS 7</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91226">Bug 91226</a> - Crash in glLinkProgram (NEW)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91231">Bug 91231</a> - [NV92] Psychonauts (native) segfaults on start when DRI3 enabled</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91254">Bug 91254</a> - (regresion) video using VA-API on Intel slow and freeze system with mesa 10.6 or 10.6.1</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91292">Bug 91292</a> - [BDW+] glVertexAttribDivisor not working in combination with glPolygonMode</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91461">Bug 91461</a> - gl_TessLevel* writes have no effect for all but the last TCS invocation</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91513">Bug 91513</a> - [IVB/HSW/BDW/SKL Bisected] Lightsmark performance reduced by 7%-10%</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91526">Bug 91526</a> - World of Warcraft (on Wine) has UI corruption with nouveau</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91544">Bug 91544</a> - [i965, regression, bisected] regression of several tests in 93977d3a151675946c03e</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91551">Bug 91551</a> - DXTn compressed normal maps produce severe artifacts on all NV5x and NVDx chipsets</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91570">Bug 91570</a> - Upgrading mesa to 10.6 causes segfault in OpenGL applications with GeForce4 MX 440 / AGP 8X</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91591">Bug 91591</a> - rounding.h:102:2: error: #error "Unsupported or undefined LONG_BIT"</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91610">Bug 91610</a> - [BSW] GPU hang for spec.shaders.point-vertex-id gl_instanceid divisor</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91673">Bug 91673</a> - Segfault when calling glTexSubImage2D on storage texture to bound FBO</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91726">Bug 91726</a> - R600 asserts in tgsi_cmp/make_src_for_op3</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91847">Bug 91847</a> - glGenerateTextureMipmap not working (no errors) unless glActiveTexture(GL_TEXTURE1) is called before</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91857">Bug 91857</a> - Mesa 10.6.3 linker is slow</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91881">Bug 91881</a> - regression: GPU lockups since mesa-11.0.0_rc1 on RV620 (r600) driver</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=38109">Bug 38109</a> - i915 driver crashes if too few vertices are submitted (Mesa 7.10.2)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91716">Bug 91716</a> - [bisected] piglit.shaders.glsl-vs-int-attrib regresses on 32 bit BYT, HSW, IVB, SNB</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91719">Bug 91719</a> - [SNB,HSW,BYT] dEQP regressions associated with using NIR for vertex shaders</li>
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.