The two paths are really similar, and the extra conditionals will be
dwarfed by the cost of the actual upload.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Commit 30259856a8 moved the state packets to
table generation time, but forgot to make this change. Apparently the
performance win there was about not reemitting the table pointers on
unrelated state changes.
No performance difference on cairo on glamor (n=118).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now that we have the stage state coming into our setup of sampler states,
it's easy to drop an identifier into it of which stage the stage_state is,
and then look up which packet to emit in a little table.
No performance difference on cairo on glamor (n=492).
v2: Don't forget to do the workaround flush on IVB.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This should have happend around the time of commit 4680d23, but Keith's
DRI3 patches and my GLX_MESA_query_renderer patches crossed in the mail.
I don't have a working DRI3 setup, so I haven't been able to actually
verify this. I'm hoping that someone can piglit this for me on DRI3...
It's also unfortunate the DRI2 and DRI3 can't share more code.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: Keith Packard <keithp@keithp.com>
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Use the ones provided by the compiler instead.
NOTE: External trees should be updated to not include '#include/c99'
directory directly, but rather rely on scons/gallium.py to do the right
thing.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Silence insignificant warnings so significant warnings have a chance to
stand out.
The only abundant warning that's not silenced here is "C4018:
signed/unsigned mismatch", as it could hide security issues, so it's better
to actually fix the code.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Drop the version/name tag from the script as it was never
meant to be there. Add swrast_create_screen as it is used
when loading swrast. Rename the file to pipe.sym.
v2: Rebase on top of the LD_NO_UNDEFINED changes.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Both llvm and clang polute the exported symbol table, as soon
as we try to link with either one. Other than those two
everything else looks good (clean).
Cc: Tom Stellard <thomas.stellard@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Namely drop the version/name tag of the exported symbol, and
rename the filename to egl.sym.
v2: Rebase on top of the LD_NO_UNDEFINED changes.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Using export-symbols-regex is the least desirable method of restricting
the exported symbols, as is completely messes up with the symbol table.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Using export-symbols-regex is the least desirable method of restricting
the exported symbols, as is completely messes up with the symbol table.
radeon_drm_winsys_create is not needed, avoid exporting it.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
In the presence of LLVM the final library exports every symbol from
the llvm namespace. Resolve this by using a version script (w/o the
version/name tag).
Considering that there are only ~25 symbols, explicitly list them
to minimize the chances of rogue symbols sneaking in.
Drop the *winsys_create functions as they were only meant for
gl-vdpau interop.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
The symbol is not meant to be exported, and its presence was
only a side effect due to the missing visibility flags.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
The symbol is used for hardware only drivers. For swrast the
loader uses swrast_create_screen. Add VISIBILITY_CFLAGS while
we're here.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
This can be called from locations that don't have a context pointer
handy. This patch also adds enough infrastructure so that the unit
tests for the GLSL compiler and the stand-alone compiler will build and
function.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
This allows them to be moved to .rodata, and allow us to be sure that they
will not be modified.
Signed-off-by: Chia-I Wu <olv@lunarg.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Using the existing driver hooks made for AMD_performance_monitor, implement
INTEL_performance_query functions.
v2: Whitespace changes.
v3: Whitespace changes, add a _mesa_warning()
Signed-off-by: Petri Latvala <petri.latvala@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Like AMD_performance_monitor, this extension provides an interface for
applications (and OpenGL-based tools) to access GPU performance
counters. Since the exact performance counters available vary between
vendors and hardware generations, the extension provides an API the
application can use to get the names, types, and minimum/maximum
values of all available counters.
Applications create performance queries based on available query
types, and begin/end measurement collection. Multiple queries can be
measuring simultaneously.
v2: Whitespace changes
v3: src/mapi/glapi/gen/gl_API.xml: Also expose the functions to GLES2.
v4: Whitespace changes, static_dispatch="false" for all functions, fix
dispatch_sanity test for GLES2 functions
Signed-off-by: Petri Latvala <petri.latvala@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This was a work-around to allow linking a program with only a fragment
shader in a GLES context. Now that we have GL_EXT_separate_shader_objects
in GLES contexts, we can just use that.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
ARB, OES, then everything else. If there's ever a KHR shading language
extension, it should go between ARB and OES.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Eric Anholt <eric@anholt.net>
I don't know of any applications that actually use it. Now that Mesa
supports GL_ARB_separate_shader_objects in all drivers, this extension
is just cruft.
The entrypoints for the extension remain in the XML. This is done so
that a new libGL will continue to provide dispatch support for old
drivers that try to expose this extension.
Future patches will add OpenGL ES GL_EXT_separate_shader_objects, but
that's a different thing.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
With this patch, the piglit arb_separate_shader_object-dlist test
passes.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This will be used for GL_ARB_separate_shader_objects. That extension
not only allows separable shaders to rendezvous by location, but it also
allows traditionally linked shaders to rendezvous by location. The spec
says:
36. How does the behavior of input/output interface matching differ
between separable programs and non-separable programs?
RESOLVED: The rules for matching individual variables or block
members between stages are identical for separable and
non-separable programs, with one exception -- matching variables
of different type with the same location, as discussed in issue
34, applies only to separable programs.
However, the ability to enforce matching requirements differs
between program types. In non-separable programs, both sides of
an interface are contained in the same linked program. In this
case, if the linker detects a mismatch, it will generate a link
error.
v2: Make sure consumer_inputs_with_locations is initialized when
consumer is NULL. Noticed by Chia-I.
v3: Rebase on removal of ir_variable::user_location.
v4: Replace a (stale) FINISHME with some good explanation comments from
Eric.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
When linking a separable program that contains only a fragment shader,
the producer will be NULL. Similar cases will exist with geometry
shaders and, eventually, tessellation shaders.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
While writing the link_varyings::single_interface_input test, I
discovered that populate_consumer_input_sets assumes that all shader
interface blocks have been lowered to discrete variables. Since there
is a pass that does this, it is a reasonable assumption. It was,
however, non-obvious. Make the code fail when it encounters such a
thing, and add a test to verify that behavior.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Four initial tests:
* Create an IR list with a single input variable and verify that
variable is the only thing in the hash tables.
* Same as the previous test, but use a built-in variable
(gl_ClipDistance) with an explicit location set.
* Create an IR list with a single input variable from an interface block
and verify that variable is the only thing in the hash tables.
* Create an IR list with a single input variable and a single input
variable from an interface block. Verify that each is the only thing
in the proper hash tables.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
I want to make some changes to this code, but first I want to make some
unit tests for it... so that I can capture the pre- and
post-invariants. Pulling the code out into its own function in a
non-anonymous namespace enables that.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This code was broken in some odd ways before. Too much state was being
saved, it was being restored in the wrong order, and in the wrong way.
The biggest problem was that the pipeline object was restored before
restoring the programs attached to the default pipeline.
Fixes a regression in the glean texgen test.
v3: Fairly significant re-write. I think it's much cleaner now, and it
avoids a bug with some meta ops that use shaders (reported by Chia-I).
v4: Check Pipeline.Current against NULL instead of Pipeline.Default.
Suggested by Chia-I.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chia-I Wu <olv@lunarg.com>
Pull most of the guts out of _mesa_BindPipeline into a new utility
function that can be use elsewhere (e.g., meta).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Don't do anything with variables that have explicitly assigned
locations. This is also how built-in varyings are handled.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
In February 2013 Paul unified the values used for shader stage outputs
and shader stage inputs. See commits 8a076c5f0^..eed6baf76. Since that
time, the location_base parameters are always VARYING_SLOT_VAR0.
Instead of passing that around, just hard code it.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Now that we are uisng the OpenCL 1.2 headers, applications expect all
the OpenCL 1.2 functions to be implemented.
This fixes linking errors with the piglit CL tests.
v2:
- Use c++ features
- Fix error code handling
v3:
- Move <iostream> into api/util.hpp
- Fix indentation
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Make set_ubo_binding() just update the binding, and move the code
that does validation, flushes the vertices etc. into a new
bind_uniform_buffer() function.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
v2: Document the difference between _mesa_lookup_bufferobj() and
_mesa_multi_bind_lookup_bufferobj().
v3: Don't create the buffer objects when they don't exist.
Reviewed-by: Brian Paul <brianp@vmware.com> (v2)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v2)
Make set_atomic_buffer_binding() just update the binding, and move
the code that does validation, flushes the vertices etc. into a new
bind_atomic_buffer() function.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This is for glBindTextures(), since it doesn't change the active
texture unit.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This patch adds functions for locking/unlocking the mutex, along with
_mesa_HashLookupLocked() and _mesa_HashInsertLocked()
that do lookups and insertions without locking the mutex.
These functions will be used by the ARB_multi_bind entry points to
avoid locking/unlocking the mutex for each binding point.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The texture can only be bound to the index that corresponds to its
target, so there is no need to loop over all possible indices
for every unit and checking if the texture is bound to it.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This will be used by glBindTextures() when unbinding textures,
to avoid having to loop over all the targets.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This will be used by glBindTextures() so we don't have to look it up
for each texture.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We had the EGLimage structure laying around in intel_regions.h, but now
it's the only thing left in the file.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
v2: Fix bad pointer on unreference (caught by Chad)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
I slipped this in in the region->pitch change from pixels to bytes, but I
don't see any reason for it any more -- the libdrm code doesn't appear to
divide pitch by a cpp.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
If the stride wasn't width*cpp, we wouldn't track how much of the src is
busy, and allow a subdata into the end to proceed unsynchronized.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Note: region->width/height used to reflect the total_width/height padding
of separate stencil, though mt->total_width didn't. region->width/height
was being used in EGL images, where the padded value would have been the
wrong one, so I converted them to use rb->Width/Height.
v2: Drop debug printf that slipped in (caught by Ken)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Regions aren't refcounted safely for multithreaded applications, and
they're not terribly useful wrappers of a BO, so I'm trying to remove
them.
Even the stride I added here could probably be reduced to use of an
existing field in the __DRIimageRec, but I want this to be as mechanical
of a change as possible.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
I want to do this to get the region removed from DRI images. However, it
does mean that we won't share the intel_region between the rb and the
texture for texture_from_pixmap. I think that's fine.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
I noticed that we were doing this while changing the DRI3 path to not use
regions, which involved changing the signature of
intel_update_winsys_renderbuffer_miptree() this way.
v2: Replace my comment with Chad's version.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> (v1)
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
The drm function to get the tiling is just a getter storing the two
pointers, so we don't need to go out of our way to avoid it.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
All the consumers are doing it on a miptree.
v2: fix a silly duplicated dereference (review by Ken)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> (v1)
Reviewed-by: Chad Versace <chad.versace@linux.intel.com> (v1)
intel_alloc_renderbuffer_storage() will clobber rb->Format which was
already set up by intel_create_renderbuffer(). This causes the driver
to potentially create the depth buffer in the wrong format.
In practice this makes the depth buffer Z24 even if the visual has
depthBits==16.
The incorrect depth buffer format doesn't seem to cause any actual
problems in i965, but it seems like we should fix it anyway. I see
Z16 has been more or less deprecated in the driver except the for
the depthBits==16 case. But if we want to use Z24 even in that
case (not sure it's really legal?) it would look better if the
code made that decision explicitly rather than relying on the
format to get magically overwritten by the renderbuffer code.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
intel_alloc_renderbuffer_storage() will clobber rb->Format which was
already set up by intel_create_renderbuffer(). This causes the driver
to potentially create the depth buffer in the wrong format.
Long time ago things worked by accident because
_mesa_choose_tex_format() checked for ARB_depth_texture
and thus returned MESA_FORMAT_NONE on gen2 hardware. Somehow
that ended up working when depthBits==16 because the driver
would then pick DEPTH_FRMT_16_FIXED. Not sure how, but things
also seemed to work with depthBits==24.
Things started to go more sideways at:
commit 6ae473221a
Author: Eric Anholt <eric@anholt.net>
Date: Mon Apr 22 16:04:25 2013 -0700
intel: Fold the one last function intel_tex_format.c into the caller.
since that caused intel_miptree_create_layout() to divide by zero
when encoutering MESA_FORMAT_NONE (bw==0). So after this
commit things were broken enough that many applications wouldn't even
run.
Things got a bit better at:
commit c245efe7e8
Author: Eric Anholt <eric@anholt.net>
Date: Thu Mar 21 09:50:45 2013 -0700
mesa: Remove extension checking from ChooseTexFormat.
since now _mesa_choose_tex_format() would return MESA_FORMAT_X8_Z24
for GL_DEPTH_COMPONENT due to i915 erroneosly claiming that
MESA_FORMAT_X8_S24 (and others) are supported texture formats even
on gen2 hardware. So now the the div-by-zero was gone, but now the
driver would pick DEPTH_FRMT_24_FIXED_8_OTHER even when
depthBits==16 which caused rendering problems.
If we prevent rb->Format from getting clobbered for the depth buffer
things work much better. This makes the spinning title text visible
again in chromium-bsu at 16bpp, for example.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
V2: Follow the new naming convention for unpack functions.
Use double precision for converting Z24 to a float.
V3: Unpack stencil value to most significant byte.
Use 'struct z32f_x24s8' type.
V4: Unpack stencil value to least significant byte.
Add a comment to clarify stencil packing.
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This patch contains non-functional changes. Assertion checks made
earlier in the functions make the if checks redundant. So, remove
the if checks and unindent the code in if block.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Depth-stencil teture targets are allowed to use source data of type
GL_UNSIGNED_INT_24_8_EXT and GL_FLOAT_32_UNSIGNED_INT_24_8_REV.
Fixes few crashes in Khronos OpenGL CTS packed_pixels tests.
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Link error conditions added in previous patch are equally applicable
to GL_ARB_fragment_coord_conventions implementation. Extension's spec
says:
"If gl_FragCoord is redeclared in any fragment shader in a program,
it must be redeclared in all the fragment shaders in that program
that have a static use of gl_FragCoord. All redeclarations of
gl_FragCoord in all fragment shaders in a single program must have
the same set of qualifiers."
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
GLSL 1.50 spec says:
"If gl_FragCoord is redeclared in any fragment shader in a program,
it must be redeclared in all the fragment shaders in that
program that have a static use gl_FragCoord. All redeclarations of
gl_FragCoord in all fragment shaders in a single program must
have the same set of qualifiers."
This patch causes the shader link to fail if we have multiple fragment
shaders with conflicting layout qualifiers for gl_FragCoord.
V2: Restructure the code and add conditions to correctly handle the
following case:
fragment shader 1:
layout(origin_upper_left) in vec4 gl_FragCoord;
void main()
{
foo();
gl_FragColor = gl_FragData;
}
fragment shader 2:
layout(pixel_center_integer) in vec4 gl_FragCoord;
void foo()
{
}
V3:
Allow linking in the following case:
fragment shader 1:
void main()
{
foo();
gl_FragColor = gl_FragCoord;
}
fragment shader 2:
in vec4 gl_FragCoord;
void foo()
{
...
}
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Section 4.3.8.1, page 39 of GLSL 1.50 spec says:
"Within any shader, the first redeclarations of gl_FragCoord
must appear before any use of gl_FragCoord."
GLSL compiler should generate an error in following case:
vec4 p = gl_FragCoord;
layout(origin_upper_left) in vec4 gl_FragCoord;
void main()
{
}
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
GLSL 1.50 spec says:
"If gl_FragCoord is redeclared in any fragment shader in a program,
it must be redeclared in all the fragment shaders in that
program that have a static use gl_FragCoord. All redeclarations of
gl_FragCoord in all fragment shaders in a single program must
have the same set of qualifiers."
This patch makes the glsl compiler to generate an error if we have a
fragment shader defined with conflicting layout qualifier declarations
for gl_FragCoord. For example:
layout(origin_upper_left, pixel_center_integer) in vec4 gl_FragCoord;
layout(pixel_center_integer) in vec4 gl_FragCoord;
void main()
{
}
V2: Some code refactoring for better readability.
Add compiler error conditions for redeclarations like:
layout(origin_upper_left) in vec4 gl_FragCoord;
layout(origin_upper_left, pixel_center_integer) in vec4 gl_FragCoord;
and
in vec4 gl_FragCoord;
layout(origin_upper_left, pixel_center_integer) in vec4 gl_FragCoord;
V3: Simplify function is_conflicting_fragcoord_redeclaration()
V4: Check for null pointer before doing strcmp(var->name, "gl_FragCoord").
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
In OpenGL 3.1 attribute 0 becomes non-magic, just like in
OpenGL ES 2.0. Earlier versions of OpenGL used attribute 0
exclusively for vertex position.
V2: Add a utility function _mesa_attr_zero_aliases_vertex() in
varray.h
Fixes 4 Khronos OpenGL CTS failures:
glGetVertexAttrib
depth24_basic
depth24_precision
rgb8_rgba8_rgb
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This patch makes changes to the behavior of glGetAttribLocation(),
glGetFragDataLocation() and glGetFragDataIndex() functions.
Code changes handle a case described in following example:
shader program:
layout(location = 1)in vec4[4] a;
void main()
{
}
Currently, glGetAttribLocation("a") returns 1.
glGetAttribLocation("a[i]"), where i = {0, 1, 2, 3}, returns -1.
But the expected locations for array elements are: 1, 2, 3 and 4
respectively.
This clarification came up with the addition of
ARB_program_interface_query to OpenGL 4.3.
From Page 326 (page 347 of the PDF) of OpenGL 4.3 spec:
"Otherwise, the command is equivalent to
GetProgramResourceLocation(program, PROGRAM_INPUT, name);"
And, From Page 101 (page 122 of the PDF) of OpenGL 4.3 spec:
"A string provided to GetProgramResourceLocation or
GetProgramResourceLocationIndex is considered to match an active
variable if
• the string exactly matches the name of the active variable;
• if the string identifies the base name of an active array, where
the string would exactly match the name of the variable if the
suffix "[0]" were appended to the string; or
• if the string identifies an active element of the array, where
the string ends with the concatenation of the "[" character, an
integer (with no "+" sign, extra leading zeroes, or whitespace)
identifying an array element, and the "]" character, the integer
is less than the number of active elements of the array variable,
and where the string would exactly match the enumerated name of
the array if the decimal integer were replaced with zero."
V2: Simplify get_matching_index() function.
Add relevant text from OpenGL spec in commit message.
Fixes failures in Khronos OpenGL CTS tests:
explicit_attrib_location_room
draw_instanced_max_vertex_attribs
Proprietary linux drivers of NVIDIA (331.49) matches the behavior
expected by OpenGL 4.3 spec.
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Currently overlapping locations of input variables are not allowed for all
the shader types in OpenGL and OpenGL ES.
From OpenGL ES 3.0 spec, page 56:
"Binding more than one attribute name to the same location is referred
to as aliasing, and is not permitted in OpenGL ES Shading Language
3.00 vertex shaders. LinkProgram will fail when this condition exists.
However, aliasing is possible in OpenGL ES Shading Language 1.00 vertex
shaders."
Taking in to account what different versions of OpenGL and OpenGL ES specs
say about aliasing:
- It is allowed only on vertex shader input attributes in OpenGL (2.0 and
above) and OpenGL ES 2.0.
- It is explictly disallowed in OpenGL ES 3.0.
Fixes Khronos CTS failing test:
explicit_attrib_location_vertex_input_aliased.test
See more details about this at below mentioned khronos bug.
V2: Fix the case where location exceeds the maximum allowed attribute
location.
V3: Simplify the condition added in V2.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: "9.2 10.0 10.1" <mesa-stable@lists.freedesktop.org>
Bugzilla: Khronos #9609
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
don't leak the MCSubtargetInfo (not really big, was already fixed with
llvm master) and TargetMachine (big). While this is only used for debugging
the leak is large enough to get you into trouble in some cases.
Tested with llvm 3.1 and master.
Before (llvm 3.1), GALLIVM_DEBUG=asm glxgears:
==14152== LEAK SUMMARY:
==14152== definitely lost: 105,228 bytes in 20 blocks
==14152== indirectly lost: 347,252 bytes in 261 blocks
==14152== possibly lost: 866,625 bytes in 1,453 blocks
==14152== still reachable: 7,344,677 bytes in 6,494 blocks
==14152== suppressed: 0 bytes in 0 blocks
After:
==13799== LEAK SUMMARY:
==13799== definitely lost: 3,108 bytes in 6 blocks
==13799== indirectly lost: 0 bytes in 0 blocks
==13799== possibly lost: 804,143 bytes in 1,429 blocks
==13799== still reachable: 7,314,267 bytes in 6,473 blocks
==13799== suppressed: 0 bytes in 0 blocks
Reviewed-by: Brian Paul <brianp@vmware.com>
The brw_eu_emit.c code manually forces the header present bit when
used in align1 (scalar) mode. So, this has no effect currently.
However, it is nice to have fs_inst::header_present reflect reality.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77221
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This is similar to what Eric did for Gen7 a little while ago; it also
has support for untyped surface reads.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Francisco made brw_mark_surface_used a freestanding function in
commit a32817f3c2. We should use it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This must not have existed when I wrote the original code. The atomic
operation header setup code uses this.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
For platforms using hardware contexts (currently Gen6+), we failed to
emit PIPELINE_SELECT and 3DSTATE_VF_STATISTICS, instead emitting MI_NOOP
for both.
During one of the context initialization reordering patches, we
accidentally moved brw_init_state before we set brw->CMD_PIPELINE_SELECT
and brw->CMD_VF_STATISTICS. So, when brw_init_state uploaded initial
GPU state (brw_init_state -> brw_upload_initial_gpu_state ->
brw_upload_invariant_state), these would be 0 (MI_NOOP).
Storing the commands in the context is not worthwhile. We have many
generation checks in our state upload code, and for platforms with
hardware contexts, this only gets called once per GL context anyway.
The cost is negligable, and it's easy to botch context creation
ordering.
This may fix hangs on Gen6+ when using the media pipeline.
Cc: "10.0 10.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
arekm reported that using Chrome with GPU acceleration enabled on GM45
triggered the hw_ctx != NULL assertion in brw_get_graphics_reset_status.
We definitely do not want to advertise reset notification support on
Gen4-5 systems, since it needs hardware contexts, and we never even
request a hardware context on those systems.
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75723
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
This was introduced with the comment and code below it, though the code
only touches prog_data (CACHE_NEW_WM_PROG).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This keeps us from having to emit the nonpipelined state packet on every
FBO binding.
-4.42003% +/- 1.09961% effect on cairo-perf-trace runtime on glamor (n=110).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
No more walking 96*6 pointers looking to see if they're the current
texture, when we only use the first 2 out of 96 units. -6.26002% +/-
1.87817% effect on cairo runtime on no-fbo-cache glamor (n=36).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Instead of walking 6 shader stages for each of the 96 combined texture
image units, now we just walk the samplers used in each shader stage.
With cairo-perf-trace on Xephyr with glamor, I'm seeing a -6.50518% +/-
2.55601% effect on runtime (n=22) since the "drop _EnabledUnits" change.
No significant performance difference on an apitrace of minecraft (n=442).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I want to avoid walking the entire long array texture image units, but the
obvious way to do so means walking program samplers, and thus hitting the
units in a random order.
This change replaces the previous behavior of only setting up the fallback
texture for a fragment shader with setting up the fallback texture for any
shader that's missing a complete texture of the right target in its unit.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is kind of ugly, but I think it's worth it to finish off the last
consumers of _ReallyEnabled.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The _MaxEnabledTexImageUnit check assures us that Unit[0].Current != NULL.
This is the last consumer of _ReallyEnabled outside of the radeons.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I'm probably not the only person that has tried to kill _ReallyEnabled.
This does the mechanical part of the work, and cleans _ReallyEnabled from
i965.
I think that using _Current makes texture management clearer: You can't
have multiple targets in use in the same texture image unit at the same
time, because there's just that one pointer.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I'm going to try to delete _ReallyEnabled, which is this weird bitfield
with either 0 or 1 bits set with just the reference to _Current.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The field wasn't really valid, since we've got more than 32 units now. It
turns out it was mostly just used for checking != 0, or checking for fixed
function coordinates, though.
v2: Fix mis-conversion in xm_line.c (caught by Ken).
Reviewed-by: Matt Turner <mattst88@gmail.com> (v1)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
_EnabledUnits is all of the first 32 image units that are used by fixed
function or programs, while _EnabledCoordUnits is just which fixed function
fragment shader texcoords need to be generated. This is a theoretical bugfix
in the case of a vertex shader texturing from large texture image unit number
(we'd end up flagging something other than a VARYING_SLOT_TEXn as needing to
be generated), but it's actually just motivated by trying to kill
_EnabledUnits.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The original GL_EXT_texture_swizzle extensions said GL_INVALID_OPERATION
was to be generated when the an invalid swizzle was passed to
glTexParameter(). But in OpenGL 3.3 and later, the error should be
GL_INVALID_ENUM.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
It is possible that there are multiple input buffers but only one is
relevant for translation. Then there will be only a single translation
group, which might need to source data from a buffer index != 0.
Fixes wrong vertex shader inputs as observed while debugging with an
application and driver combination that requires translation of a
vertex attribute in a non-trivial set of attributes and input buffers.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Igor Gnatenko:
v2: PIPE_COMPUTE_CAP_MAX_CLOCK_FREQUENCY instead of
PIPE_COMPUTE_MAX_CLOCK_FREQUENCY
Bruno Jiménez:
v3: Drivers report clock in Mhz
Signed-off-by: Igor Gnatenko <i.gnatenko.brain@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Igor Gnatenko:
v2: in define RADEON_INFO_MAX_SCLK use 0x1a instead of 0x19 (upstream changes)
Bruno Jiménez:
v3: Convert the frequency to MHz from kHz after getting it in
'do_winsys_init'
Signed-off-by: Igor Gnatenko <i.gnatenko.brain@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
We intended to set these 64-bit addresses to 0, and set the enable bit.
But, I accidentally placed the DWord with the high bits first, when it
should have been second.
This generally worked out, by luck - presumably General State Base
Address is initially zero, and ends up remaining that way in our
contexts since we bungled the "modify enable" bit.
v2: Fix MOCS shift on GSBA. It should be 4, and I had 2.
(Caught by Ben Widawsky.)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
The implementation is basically a NOP but it conforms with OpenCL 1.2.
[ Francisco Jerez: Initialize property return buffer for
CL_DEVICE_PARTITION_PROPERTIES, CL_DEVICE_PARTITION_TYPE,
CL_DEVICE_PARTITION_AFFINITY_DOMAIN, and make the latter a scalar
rather than a vector. Some clean-up and code style fixes. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
v2: use a new variable for aligned size
add comment
make both vars const
only use the aligned value in argument constructors
fix comment typo
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
The C++ headers are *not* updated because they rely on CL 1.2 APIs
that we do not implement yet when the core CL 1.2 headers are present.
Acked-by: Tom Stellard <thomas.stellard@amd.com>
Fixes textureGatherOffset when used with a shadow sampler. Also verified
against blob compiler with textureLodOffset manually (no piglit tests
for texture[Lod]Offset + shadow samplers).
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
This adds support for:
IBFE, UBFE, BFI, LSB, IMSB, UMSB, BREV, POPC
Which are all required for ARB_gs5 support.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Add bitwise reversing and signed MSB helpers for software implementation
of the new TGSI opcodes.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Also pipe through [IU]MUL_HI, MAD, and lower ldexp. This provides
coverage of all new ARB_gpu_shader5 functions except uaddCarry,
usubBorrow and interpolateAt*.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Use designated initialisers, and store the extensions pointers as const.
The loader extensions __DRIdri2LoaderExtension and __DRIswrastLoaderExtension
are setup by the platform backends so they should not be constified.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Use designated initialisers, store all extension pointers as const and use
a const __DRIextensions array over assigning each element individually.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Explicitly set the version that is implemented, as that may differ from
the one defined in dri_interface.h. The remaining __DRI*Extensions are
treated as constants, so got ahead and declare them as such.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Use a const array with the extensions, rather than assigning each
one to a fixed size array at runtime.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Make sure that the DRI*Extensions report the version of the interface
implemented over the listed in the headers. While both are currently
the same, this may change in the future.
v2: Keep loader extensions handling as is.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Explicitly set the version that is implemented, as that may differ
from the one defined in dri_interface.h. Use designated initialisers
and constify whereever possible.
Note: __DRIimageExtension should not be made const as it's modified
at runtime. This patch should have no side effects on compilers that
do not support designated initialisers, as the existing code in
dri/common already uses them.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Rather than keeping a separate and unused copy of the screen extensions
within the radeon screen, use a constant array that can be used directly
with __DRIscreen.
[Kristian Høgsberg]
The copy in the radeon screen isn't unused, that's where the array is
built and stored, the dri screen just points to that. The pattern
here was used for cases where the extensions exported by a dri driver
could vary at runtime, for example depending on chipset. In this
case, it's known at compile time, so it makes sense to use a static
const array instead.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Uniformly use the typecasted extension name, constify extension instances
and use designated initialisers. Set the implemented version of the
extension, over the one defined in dri_infertace.h. Patch covers the
following extensions:
__DRItexBufferExtension
__DRIimageExtension
__DRIrobustnessExtension
__DRI2rendererQueryExtension
__DRIdri2LoaderExtension
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
With commit e59fa4c46c8("dri2: release texture image.") we updated the
extension without bumping the version number. The patch itself added an
interface required to enable texture_from_pixmap on certain platforms.
The new code was effectively never build, as it depended on
__DRI_TEX_BUFFER_VERSION >= 3, which never came to be in upstream mesa.
This commit bumps the version number, drops the __DRI_TEX_BUFFER_VERSION
checks and resolves all the build conflicts. Additionally it add a version
check as egl and dri3, as require version 2 of the extension which does
not have the releaseTexBuffer hook.
Cc: Juan Zhao <juan.j.zhao@intel.com>
Cc: Kristian Høgsberg <krh@bitplanet.net>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Unfortunately, Cygwin defines RTLD_DEFAULT (for glibc compatibility), but can't
provide dladdr(), so add a check for dladdr()
Since I don't think scons is ever used to build for Cygwin, just set HAVE_DLADDR
in SConscript, assuming that if we have RTLD_DEFAULT, we have dladdr().
Cc: Jonathan Gray <jsg@jsg.id.au>
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Eric Anholt <eric@anholt.net>
The old python code used sys.is_big_endian to select between little-endian
and big-endian formats, which meant that the build and host endiannesses
needed to be the same. This patch instead generates both big- and little-
endian layouts, using PIPE_ARCH_BIG_ENDIAN to select between them.
Signed-off-by: Richard Sandiford <rsandifo@linux.vnet.ibm.com>
Signed-off-by: José Fonseca <jfonseca@vmware.com>
Splits out the code that parses the channel list, so that we
can have different lists for little and big endian.
There is no change to the generated u_format_table.c.
Signed-off-by: Richard Sandiford <rsandifo@linux.vnet.ibm.com>
Signed-off-by: José Fonseca <jfonseca@vmware.com>
Rather than iterate over format.channels and format.swizzles directly,
use Python subfunctions that take the channel and swizzle lists as
arguments. This allow the channel and swizzle lists to depend on
endianness.
There is no change to the generated u_format_table.c.
Signed-off-by: Richard Sandiford <rsandifo@linux.vnet.ibm.com>
Signed-off-by: José Fonseca <jfonseca@vmware.com>
With the big-endian changes, there can be two swizzle orders for each format.
This patch turns Format.inv_swizzle() into a global function that takes the
swizzle list as a parameter.
There is no change to the generated u_format_table.c.
Signed-off-by: Richard Sandiford <rsandifo@linux.vnet.ibm.com>
Signed-off-by: José Fonseca <jfonseca@vmware.com>
The main aim is to reduce the number of places that access channels[0],
swizzles[0] and swizzles[1] directly.
There is no change to the generated u_format_table.c.
Signed-off-by: Richard Sandiford <rsandifo@linux.vnet.ibm.com>
Signed-off-by: José Fonseca <jfonseca@vmware.com>
This can happen with glamor, which uses EGL_KHR_surfaceless_context and
only explicitly binds GL_READ_FRAMEBUFFER for glReadPixels.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Brian Paul <brianp@vmware.com>
relnotes weren't updated this whole time, so I went through all the
GL3.txt changes and picked out the nouveau ones since 10.1.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
_mesa_HashTable is not well-suited for us: it locks a mutex unnecessarily and
it does not accept 0 as the key (and have branches to handle 1 specially).
What we really need is a sparse array. Whether it should be implemented as a
hash table, a list, or a bsearch()-able array requires investigations of the
use models.
We choose to implement it as a list for now, assuming it is common to have a
short list of IDs in each (source, type) namespace. The code is simpler, and
the memory footprint is lower. This also fixes several corner cases such as
making messages to have different states at different severities.
v2: use GLbitfield for State/DefaultState, and add a comment
Signed-off-by: Chia-I Wu <olv@lunarg.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Do not copy the debug group until it is about to be written. One likely
scenario of using glPushDebugGroup/glPopDebugGroup is to enclose a sequence of
GL commands and give them a human-readable description. There is no message
control change in this scenario, and thus no need to copy.
This also reduces the initial size of gl_debug_state from 306KB to 7KB.
Signed-off-by: Chia-I Wu <olv@lunarg.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Add functions to provide these operations on a struct gl_debug_namespace:
init(): initialize the namespace
copy(): copy all elements from one namespace to another
clear(): clear all elements (to free the memories)
set(): set the value of an element
set_all(): set the value of all elements
get(): get the value of an element
A debug namespace is like a sparse array. The length of the array is huge,
2^sizeof(GLuint), but most of the elements assume the same value sepcified by
set_all().
Signed-off-by: Chia-I Wu <olv@lunarg.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Add struct gl_debug_group to hold all namespaces of a debug group. Replace
the 3-dimensional array, Namespaces, in struct gl_debug_state by a
1-dimensional array of type struct gl_debug_groups.
Turn the 4-dimensional array, Defaults, in struct gl_debug_state to a
1-dimensional array in struct gl_debug_namespace.
Signed-off-by: Chia-I Wu <olv@lunarg.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Remove NextMsgLength, and move members of struct gl_debug_state that belong to
the message log to a new struct, gl_debug_log. Rename gl_debug_msg to
gl_debug_message.
Signed-off-by: Chia-I Wu <olv@lunarg.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
When GL_DEBUG_OUTPUT_SYNCHRONOUS is GL_TRUE, drivers are allowed to log debug
messages from other threads. That requires gl_debug_state to be protected by
a mutex, even when it is a context state. While we do not spawn threads in
Mesa yet, this commit makes it easier to do when we want to.
Since the definition of struct gl_debug_state is no longer needed by the rest
of the driver, move it to main/errors.c. This should make it even harder to
use the struct incorrectly.
v2: add comments for the accessors
Signed-off-by: Chia-I Wu <olv@lunarg.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Add validate_length, and call it together with log_msg directly instead of
message_insert. No functional change.
v2: make sure length is non-negative (i.e., known) before calling
validate_length, noted by Timothy Arceri
Signed-off-by: Chia-I Wu <olv@lunarg.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
In both call sites, it could be easily replaced by direct
debug_is_message_enabled calls. No functional change.
Signed-off-by: Chia-I Wu <olv@lunarg.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Merge control_app_messages with the only caller. Eliminate set_message_state
and control_messages too as they are unused. No functional change.
Signed-off-by: Chia-I Wu <olv@lunarg.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Replace free_errors_data by debug_clear_group. Add debug_pop_group and
debug_destroy for use in _mesa_PopDebugGroup and _mesa_free_errors_data
respectively. No funcitonal change.
Signed-off-by: Chia-I Wu <olv@lunarg.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Move group copying to debug_push_group. Save the group message before pushing
instead of after, since we will need it after popping. No functional change
otherwise.
Signed-off-by: Chia-I Wu <olv@lunarg.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Move most of the code to debug_set_message_enable_all. No functional change.
Signed-off-by: Chia-I Wu <olv@lunarg.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Move message fetching to debug_fetch_message and message deletion to
debug_delete_messages. No functional change.
Signed-off-by: Chia-I Wu <olv@lunarg.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Move message logging to debug_log_message. Replace store_message_details by
debug_message_store. No functional change.
Signed-off-by: Chia-I Wu <olv@lunarg.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Move message state update to debug_set_message_enable. No functional change.
Signed-off-by: Chia-I Wu <olv@lunarg.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Move the message filtering logic to debug_is_message_enabled. No functional
change.
Signed-off-by: Chia-I Wu <olv@lunarg.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Move gl_debug_state allocation to a new function, debug_create. No functional
change.
Signed-off-by: Chia-I Wu <olv@lunarg.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
bitfieldInsert takes scalar integers for its last two arguments. Since
bitfieldInsert is lowered on i965 to two instructions that have more
flexible arguments, I didn't notice when I wrote this.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
It looks like this bit of code is trying to disable the prime capability if
the driver doesn't support createImageFromFds. However the logic looks a bit
broken and what it would actually do is disable all other capabilities apart
from prime. This patch fixes it to actually disable prime.
Cc: "10.0" "10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
This should give the caller some information of what called the error.
For the gbm_bo_import() case, for instance, it is possible to know if
the import is not supported or the error was caused by an invalid
parameter.
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
In all fairness we allow the gallium tests to be build with --disable-dri
which will result in the approapriate winsys to not be build, thus the
build will fail.
./configure --disable-dri --with-gallium-drivers=svga --enable-gallium-tests
Cc: Brian Paul <brianp@vmware.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Rather than defining our own set of variables, use NEED_WINSYS_XLIB
and based on it include the sw/xlib winsys.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
The function relies on the sw/dri winsys which is build only when --enable-dri
is set. Fixes build issues with the following config
./configure --disable-dri --with-gallium-drivers=svga --enable-xa
Issue can be reproduced with any hw gallium driver + st that uses the pipe-loader.
Cc: Brian Paul <brianp@vmware.com>
Reported-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
GL (3.0) allows you to clear individual color buffers in a fb. In fact
for fbs containing both int and float/normalized color buffers this is
required (because the clearing values are otherwise undefined if applied
to all buffers). The gallium interface was changed a while ago, but llvmpipe
ignored it (hence doing such individual clears always resulted in clearing
all buffers, plus some assorted asserts due to the mixed fbs).
So change the clear command to indicate the buffer to be cleared. Also, because
indicating the buffer to be cleared would have made lp_rast_arg_cmd larger
which is unacceptable (we're trying to shrink it some day) allocate the clear
value in the scene and just pass a pointer.
There's several advantages and disadvantages here:
+ clearing individual buffers works (we could also actually bin such clears now
if they'd come through clear_render_target() if the surface is in the current
fb, though we didn't do this before for the single rb case and still don't try).
+ since there's one clear per rb, we do the format conversion in setup rather
than per bin. Aside from the (drop in the ocean...) performance advantage this
means that clearing to very small values (that is, denormal when converted to
the format) should work for small float (fp16 etc.) formats, as the util code
couldn't handle it correctly before (because cpu denorms are disabled when
executing the bin commands, screwing up the magic conversion and flushing
the values to 0, though this was not verified).
- there's some overhead for traditional old-style clear-all MRT cases, since
there's one rast clear command per rb instead of one for all rbs.
This fixes https://bugs.freedesktop.org/show_bug.cgi?id=76976.
v2: get rid of the ugly manual memcpy stuff and just use union util_color.
This is 32 bytes instead of 16 but as the allocation is per scene we can live
with those additional 16 bytes (and the additional 128 bytes in the setup
context), which makes the code much more obvious. Suggested by Brian.
Reviewed-by: Brian Paul <brianp@vmware.com>
util_color often merely represents a collection of bytes, however it is
inconvenient if those bytes can only be accessed as floats/doubles for int
formats exceeding 32bits.
(Note that since rgba8 formats use one uint, not 4 bytes, hence the byte and
short member were left as is.)
The current version checking is wrongly refusing to create 3.3 contexts;
unsupported version are checked elsewhere; and the DRI path doesn't do
this sort of checking neither.
This enables piglit glsl 3.30 tests to run without skipping.
Reviewed-by: Brian Paul <brianp@vmware.com>
According to Roland all TGSI support is there in theory.
In practice there are a few piglit failures and crashes, as this hadn't
been tested before.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The GLX_ARB_create_context_profile spec says:
"If version 3.1 is requested, the context returned may implement
any of the following versions:
* Version 3.1. The GL_ARB_compatibility extension may or may not
be implemented, as determined by the implementation.
* The core profile of version 3.2 or greater."
Mesa does not support GL_ARB_compatibility, and there are no plans to
ever support it, therefore the only chance to honour a 3.1 context is
through core profile, i.e, the 2nd alternative from the spec.
This change does that. And with it piglit tests that require 3.1
contexts no longer skip.
Assuming there is no objection with this change, src/glx/dri_common.c
and src/gallium/state_trackers/wgl/stw_context.c should also be updated
accordingly, given they have the same logic.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Lets make draw_get_option_use_llvm function available unconditionally
and use it to avoid useless allocations when LLVM paths are active.
TGSI machine is never used when we're using LLVM.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Fixes a segmentation fault in conform divzero.c test.
This happens when glTexImage(level, width=0, height=0) is called. We
don't allocate texture memory in that case so the ImageSlices array
was never allocated.
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
This prevents buffer overflow w/ llvmpipe when running piglit
bin/gl-3.2-layered-rendering-clear-color-all-types 1d_array single_level -fbo -auto
v2: Compute the framebuffer size as the minimum size, as pointed out by
Brian; compacted code; ran piglit quick test list (with no
regressions.)
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
In cases where varying fetches are optimized away (just pass-through in
vertex shader, but unused in fragment shader) we need to calculate the
correct TOTALATTROVS based on the actual number of varyings fetched,
otherwise lockup.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
HiZ operations make the depth/render caches out of sync with the sampler
caches. We need to arrange for a TC flush to happen before the target
buffer is used by the sampler. Calling brw_render_cache_set_add_bo
makes that happen.
On previous generations, brw_blorp_exec took care of flushing the
texture cache by calling intel_batchbuffer_emit_mi_flush after doing
any rendering. If we were to use the normal drawing path, then
brw_postdraw_set_buffers_need_resolve would handle this.
On Broadwell, we don't use BLORP, and we don't emit a rectangle
primitive via the normal drawing path. The 3DSTATE_WM_HZ_OP and
PIPE_CONTROL implicitly make drawing happen. So, none of our existing
code makes this flush happen - we need to do it directly.
Fixes 11 Piglit copyteximage subtests.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77223
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77226
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The way cik_num_banks() was calculating the index only makes sense for
the CIK specific macrotile mode array. For SI, we need to use the tile
mode index directly.
This happened to work most of the time because most of the SI tiling
modes use the same number of banks.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Previously this was special-cased for VS and FS; it never got updated
when geometry shaders came along. Generalize using is_varying_var() so
this won't be broken again with tessellation.
Note that there are two copies of the logic for `invariant`: It can be
present as part of a new declaration, and also as a redeclaration of an
existing variable or block member.
Fixes the four new piglits:
spec/glsl-1.50/compiler/invariant-qualifier-*.geom
Note for stable: This won't quite pick cleanly due to whitespace and
state->target -> state->stage renames. Should be straightforward
adjustments though.
Cc: "10.0 10.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Section 4.3.1, page 220, of OpenGL 3.3 specification explains
the error conditions for glreadPixels():
"If the format is DEPTH_STENCIL, then values are taken from
both the depth buffer and the stencil buffer. If there is
no depth buffer or if there is no stencil buffer, then the
error INVALID_OPERATION occurs. If the type parameter is
not UNSIGNED_INT_24_8 or FLOAT_32_UNSIGNED_INT_24_8_REV,
then the error INVALID_ENUM occurs."
Fixes failing Khronos CTS test packed_depth_stencil_error.test
V2: Avoid code duplication
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
From the OpenGL 4.4 spec page 275:
"If pname is FRAMEBUFFER_ATTACHMENT_COMPONENT_TYPE, param will
contain the format of components of the specified attachment,
one of FLOAT, INT, UNSIGNED_INT, SIGNED_NORMALIZED, or
UNSIGNED_NORMALIZED for floating-point, signed integer,
unsigned integer, signed normalized fixedpoint, or unsigned
normalized fixed-point components respectively. If no data
storage or texture image has been specified for the attachment,
param will contain NONE. This query cannot be performed for a
combined depth+stencil attachment, since it does not have a
single format."
Fixes Khronos CTS test: packed_depth_stencil_parameters.test
Khronos Bug# 9170
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The '**used' pointer was pointing into the stObj->sampler_views array.
If 'free' was null, we'd realloc that array, thus making the 'used'
pointer invalid. This soon led to memory errors.
Just change the pointer to be '*used' so it points directly at the
pipe_sampler_view.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Avoid looping over 32/48/96 (!!) tex image units every draw, most of
which we don't care about.
Improves performance on everyone's favorite not-a-benchmark by 2.9% on
Haswell.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This gives us a better bound for some hot loops in the drivers than
MAX_COMBINED_TEXTURE_IMAGE_UNITS, which is ridiculously large on modern
hardware, and only getting worse as more shader stages are added.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Also rework things so that if someone were to add an opcode without
adjusting the values in these arrays, there will be a compilation error.
This fixes a few quadop-related piglit regressions since commit
d5faf8e786.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
We want to center the sample. The old code may have been correct given
the limited values of ms_x/y, but the new logic should be more
intuitive. Note that ms_x can only be 1/2 and ms_y can only be 0/1.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
The 2D engine should be usable in more cases, but this fixes MS blits
between textures with the same MS settings. Otherwise a single sample is
selected to be the target texel value.
This allows other tests to work that render to a RB and then blit that
to a texture for input into a shader that uses sampler2DMS to verify it.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Back when I originally wrote this code, force_sechalf was only used for
Gen4 code, so I didn't bother hooking it up. However, it's used more
generally these days. In particular, we use it for computing
gl_SamplePosition.
Fixes Piglit's spec/ARB_sample_shading/builtin-gl-sample-position tests.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77222
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
We previously only allowed coalescing registers that interfere (i.e.,
whose live ranges overlap) if the destination register's live range was
entirely inside the source's live range. This is unnecessary -- we only
need to check for interfering writes in the intersection of their live
ranges.
total instructions in shared programs: 1639470 -> 1638453 (-0.06%)
instructions in affected programs: 84751 -> 83734 (-1.20%)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Rather than any old control flow. Muchnick's algorithm just checks for
interfering writes between the MOV and the end of the program. Handling
this when you have backward branches is hard, so don't, but there's no
reason to bail if you see forward branches.
instructions in affected programs: 4270 -> 4248 (-0.52%)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We were starting at the beginning of the instruction list, rather than
with the MOV instruction itself. This allows us to coalesce after
control flow.
Excluding the shaders from an unreleased title, the shader-db results:
total instructions in shared programs: 1603791 -> 1594215 (-0.60%)
instructions in affected programs: 678772 -> 669196 (-1.41%)
GAINED: 5
LOST: 0
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
And avoid rewriting other instructions unnecessarily. Removes a few
self-moves we weren't able to handle because they were components of a
large VGRF.
instructions in affected programs: 830 -> 826 (-0.48%)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
There's a few 3-component vertex attribute formats that have no
equivalent SVGA3D_DECLTYPE_x format. Previously, we had to use
the swtnl code to handle them. This patch lets us use hwtnl for
more vertex attribute types by fetching 3-component attributes as
4-component attributes and explicitly setting the W component to 1.
This lets us handle PIPE_FORMAT_R16G16B16_SNORM/UNORM and
PIPE_FORMAT_R8G8B8_UNORM vertex attribs without using the swtnl path.
Fixes piglit normal3b3s GL_SHORT test.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
There's no SVGA3D_DECLTYPE that directly corresponds to
PIPE_FORMAT_R8G8B8_SNORM. Previously, we used the swtnl fallback
path to handle this but that's slow and causes invariance issues.
Now we fetch the attribute as SVGA3D_DECLTYPE_UBYTE4N and insert
some extra VS instructions to remap the attributes from the range
[0,1] to the range[-1,1].
Fixes Sauerbraten sw fallback.
Fixes piglit normal3b3s-invariance test.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Now only translate the formats once in svga_create_vertex_elements_state().
And rename the array and use the proper SVGA3dDeclType type.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
This reverts commit c875d6e57a.
Conflicts:
src/gallium/drivers/svga/svga_context.c
This work-around will no longer be needed after the next patch
which properly supports signed-byte vertex attributes.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
This sets up the proper execution mask for sends in SIMD16 mode.
Fixes Piglit's glsl-fs-normalmatrix, glsl-fs-uniform-array-2,
glsl-fs-uniform-array-6, and glsl-fs-uniform-array-7 on Ironlake,
which regressed when I enabled SIMD16 pull parameter support in
commit b207e88b25.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
gl_ViewportIndex doesn't get its own varying slot. It is stored
in VARYING_SLOT_PSIZ.z. This patch fixes the issue for both gen7
and gen8 because gen7_upload_3dstate_so_decl_list() is shared
between them.
Fixes failures in OpenGL Khronos CTS test transform_feedback_builtins.
Makes new piglit test glsl-1.50-transform-feedback-builtins pass for
'gl_ViewportIndex'.
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
gl_Layer doesn't get its own varying slot. It is stored in
VARYING_SLOT_PSIZ.y. This patch fixes the issue for both gen7
and gen8 because gen7_upload_3dstate_so_decl_list() is shared
between them.
Fixes failures in OpenGL Khronos CTS test transform_feedback_builtins.
Makes new piglit test glsl-1.50-transform-feedback-builtins pass for
'gl_Layer'.
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now that _debug_assert_fail() has the noreturn attribute, it is better
that execution truly never returns. Not just for sake of silencing the
warning, but because the code at the return IP address may be invalid or
lead to inconsistent results.
This removes support for the GALLIUM_ABORT_ON_ASSERT debugging
environment variable, but between the usefulness of
GALLIUM_ABORT_ON_ASSERT and better static code analysis I think better
static code analysis wins.
Reviewed-by: Brian Paul <brianp@vmware.com>
Same intent as commit a45a50a482,
but this the C compiler is detected via C-preprocessor macros,
similar to how autotools do it, as that seems to be the most
reliable method.
Reviewed-by: Brian Paul <brianp@vmware.com>
This allows the following shader code to work without a weird crash:
struct Foo {
int value[1];
};
int actual_value = Foo[2](Foo(int[1](100)), Foo(int[1](200)))[i].value[0];
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
nouveau_vp3_inter_sizes requires sliec_count as argument just
as the other places that call it from h264 code do. Hopefully
fixes something.
Fix the status_vp code to allow status == 0 too, when processing
hasn't started yet.
set h264->second_field correctly.
Otherwise it will trick the gallium driver into thinking that the render
target has actually changed (due to different pipe_surface pointing to
same underlying pipe_resource). This is really badness for tiling GPUs
like adreno.
This also appears to fix a rendering error with Motif on vmwgfx.
Why that is is still under investigation.
Based on an idea by Rob Clark.
Cc: "10.0 10.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
Keep track of the maximal bounds of all the operations and set scissor
accordingly. For tiling GPU's this can be a big win by reducing the
memory bandwidth spent moving pixels from system memory to tile buffer
and back.
You could imagine being more sophisticated and splitting up disjoint
operations. But this simplistic approach is good enough for the common
cases.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Once the relevant branch has been identified do not iterate over the
instructions in the branch, do a linked list insertion instead to avoid the
loop.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes piglit's fbo-blit-stretch test on drivers which use the meta path.
(i965: should fix Broadwell, but also fixes Sandybridge/Ivybridge/Haswell
since this test falls off the blorp path now due to format conversion)
V2: Use scissor instead of just mangling the rects, to avoid texcoord
rounding problems. (Thanks Marek)
V3: Rebase on Eric's CTSI meta changes; re-add _mesa_update_state in the
CTSI path so that _mesa_clip_blit sees the correct bounds.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77414
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Tested-by: Anuj Phogat <anuj.phogat@gmail.com>
According to the spec:
<renderbuffertarget> must be RENDERBUFFER and <renderbuffer>
should be set to the name of the renderbuffer object to be
attached to the framebuffer. <renderbuffer> must be either
zero or the name of an existing renderbuffer object of type
<renderbuffertarget>, otherwise an INVALID_OPERATION error is
generated.
This patch changes the previous returned GL_INVALID_VALUE to
GL_INVALID_OPERATION.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76894
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Now that we properly track accumulator dependencies, the scheduler is
able to schedule instructions between the mach and mov in the common
the integer multiplication pattern:
mul acc0, x, y
mach null, x, y
mov dest, acc0
Since a null destination implies no dependency on the destination, we
can also safely schedule instructions (that don't write the accumulator)
between the mul and mach.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This allows us to emit ADD/MUL/MAC instead of MUL/ADD/MUL/ADD,
saving one instruction and two temporary registers.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
This allows us to generate the MAC (multiply-accumulate) instruction,
which can be used to implement some expressions in fewer instructions
than doing a series of MUL and ADDs.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
This allows us to emit ADD/MUL/MAC instead of MUL/ADD/MUL/ADD,
saving one instruction and two temporary registers.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
This allows us to generate the MAC (multiply-accumulate) instruction,
which can be used to implement some expressions in fewer instructions
than doing a series of MUL and ADDs.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Our hardware has an "accumulator" register, which can be used to store
intermediate results across multiple instructions. Many instructions
can implicitly write a value to the accumulator in addition to their
normal destination register. This is enabled by the "AccWrEn" flag.
This patch introduces a new flag, inst->writes_accumulator, which
allows us to express the AccWrEn notion in the IR. It also creates a
n ALU2_ACC macro to easily define emitters for instructions that
implicitly write the accumulator.
Previously, we only supported implicit accumulator writes from the
ADDC, SUBB, and MACH instructions. We always enabled them on those
instructions, and left them disabled for other instructions.
To take advantage of the MAC (multiply-accumulate) instruction, we
need to be able to set AccWrEn on other types of instructions.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
OpenGL 4.0 spec, page 306 suggests an INVALID_OPERATION in glGetTexImage
if :
"format is one of the integer formats in table 3.3 and the internal
format of the texture image is not integer, or format is not one of
the integer formats in table 3.3 and the internal format is integer."
V2: Use helper function _mesa_is_format_integer()
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
mesa currently returns 4 when GL_VERTEX_ATTRIB_ARRAY_SIZE is queried
for a vertex array initially set up with size=GL_BGRA. This patch
makes changes to return size=GL_BGRA as required by the spec.
Fixes Khronos OpenGL CTS test: vertex_array_bgra_basic.test
V2: Use array->Format instead of adding a new variable
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
For graphics, the LLVM compiler backend currently has many shortcomings
compared to the non-LLVM one. E.g. it can't handle geometry shaders yet,
but that's just the tip of the iceberg.
So building Mesa with --enable-r600-llvm-compiler is currently not
recommended for anyone who doesn't want to work on fixing those issues.
However, for protection of users who end up enabling it anyway for some
reason, let's disable the LLVM backend at runtime by default. It can be
enabled with the environment variable R600_DEBUG=llvm.
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
This reverts commit a45a50a482.
Unfortunately gcc dumps argv[0] as the first word of --version, so it is
unreliable for detecting gcc.
In particular `cc --version` and `i686-w64-mingw32-gcc --version` give
wrong results.
A better solution needs to be found -- most likely using C-preprocessing
like autotools does. Revert for now.
For Clang static code analyzer, the scan-build script will produce more
comprehensive output. Nevertheless you can invoke it as
CC=clang CXX=clang++ scons analyze=1
For MSVC this is the best way to use its static code analysis. Simply
invoke as
scons analyze=1
Reviewed-by: Brian Paul <brianp@vmware.com>
Currently we can have name space collisions between blocks that define the same
fields. For example:
in block
{
vec4 Color;
} In[];
out block
{
vec4 Color;
} Out;
These two blocks will assign the same interface name (block.Color) to the Color
field in flatten_named_interface_blocks_declarations.cpp, leading to havoc.
This was breaking badly the gl-320-primitive-shading test from ogl-samples.
The patch uses the block instance name to avoid collisions, producing names
like block.In.Color and block.Out.Color to avoid the name clash.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76394
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The transfer staging texture is always freshly allocated, so for write-only
transfers we don't need to explicitly wait for the BO to become idle.
Squeezes a few hundered MB/s more out of x11perf -shmput500 with glamor.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This manifested as rendering failures or sometimes GPU hangs in
compositors when they accidentally got MSAA visuals due to a bug in the X
Server. Today we decided that the problem in compositors was equivalent
to a corruption bug we'd noticed recently in resizing MSAA-visual
glxgears, and debugging got a lot easier.
When we allocate our MCS MT, libdrm takes the size we request, aligns it
to Y tile size (blowing it up from 300x300=900000 bytes to 384*320=122880
bytes, 30 pages), then puts it into a power-of-two-sized BO (131072 bytes,
32 pages). Because it's Y tiled, we attach a 384-byte-stride fence to it.
When we memset by the BO size in Mesa, between bytes 122880 and 131072 the
data gets stored to the first 20 or so scanlines of each of the 3 tiled
pages in that row, even though only 2 of those pages were allocated by
libdrm. In the glxgears case, the missing 3rd page happened to
consistently be the static VBO that got mapped right after the first MCS
allocation, so corruption only appeared once window resize made us throw
out the old MCS and then allocate the same BO to back the new MCS.
Instead, just memset the amount of data we actually asked libdrm to
allocate for, which will be smaller (more efficient) and not overrun.
Thanks go to Kenneth for doing most of the hard debugging to eliminate a
lot of the search space for the bug.
Cc: "10.0 10.1" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77207
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We don't have any piglit tests for this currently.
v2: Use vec3s for the texcoords so it has some hope of working.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
You'll note from the previous commits that there's something of a loop
here: You call CTSI, which calls BlitFB, then if things go wrong that
falls back to CTSI. As a result, meta CTSI reaches over into blitfb to
tell it "no, don't try that fallback".
v2: Drop the _mesa_update_state(), which was only necessary due to use of
_mesa_clip_blit() in _mesa_meta_BlitFramebuffer() in another patch
series.
v3: Drop an _EXT suffix I copy-and-pasted.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v2)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I added support to bind_fbo_image in the process of building meta
CopyTexSubImage, and found that it broke generatemipmap because previously
we would just throw a GL error there and then end up with an incomplete
FBO and fallback.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I need to do the same code again for CopyTexSubImage().
v2: Drop incorrect, not-terribly-useful comment (review by Ken)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This avoids a ReadPixels() if there's accelerated CopyTexImage present.
It now requires GLSL as opposed to just fragment programs, but we don't
have any drivers that do ARB_fp but not GLSL.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
There shouldn't be anything special about copying out a subset of the src
rb to a temp before texturing from it, so just do it when we're figuring
out our src texture binding.
This drops Anuj's change to copy an extra border of 1 pixel around the src
area. I can't see how that change could be valid, and presumably if
there's some filtering problem at edges we just need to set the right
wrap mode.
v2: Don't fall back to swrast on non-2D/RECT/2D_MS textures when we can
still CopyTexSubImage. Fixes a segfault regression on i965 with
gl-3.2-layered-rendering-blit.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
Tested-by: Anuj Phogat <anuj.phogat@gmail.com>
I think we can assert that renderbuffer size is <= maximum 2D texture
size. Our source coordinates should have already been clipped to the src
renderbuffer size, but haven't actually (so we could potentially have
trouble if there's scaling, and we're in the CopyTexImage path that tries
to use src size). However, this texture size dependency was blocking the
next refactors, so I'm not sure if we want to go ahead with this series
before we get the clipping sorted out or not.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Putting NoDDClr and NoDDChk dependency control on instruction
sequences that include math opcodes can cause corruption of channels.
Treat math opcodes like send opcodes and suppress dependency hinting.
Signed-off-by: Mike Stroyan <mike@LunarG.com>
Tested-by: Tony Bertapelli <anthony.p.bertapelli@intel.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Matt Turner <mattst88@gmail.com>
total instructions in shared programs: 1653399 -> 1651790 (-0.10%)
instructions in affected programs: 92157 -> 90548 (-1.75%)
GAINED: 2
LOST: 2
Also significantly reduces the number of optimization loop iterations:
total loop iterations in shared programs: 39724 -> 31651 (-20.32%)
loop iterations in affected programs: 21617 -> 13544 (-37.35%)
Including some great pathological cases, like 29 -> 3 in Strike Suit
Zero and 24 -> 3 in Dota2.
Reviewed-by: Eric Anholt <eric@anholt.net>
We previously stopped searching for unread writes after encountering
control flow, but we can instead just search backwards until we hit
control flow.
instructions in affected programs: 22854 -> 22194 (-2.89%)
The generator uses its destination as a source implicitly, which breaks
some assumptions in dead code elimination. Giving the instruction a
source allows us to reason about it better.
We originally thought that GL 3.0 required GL_DEPTH_COMPONENT16 to map
exactly to Z16. However, we misread the specification, thanks in part
to LaTeX reordering the tables in the PDF.
Page 180 of the GL 3.0 specification (glspec30.20080923.pdf) says:
"[...] memory allocation per texture component is assigned by the GL to
match the allocations listed in tables 3.16-3.18 as closely as possible.
[...]
Required Texture Formats
[...]
In addition, implementations are required to support the following sized
internal formats. Requesting one of these internal formats for any
texture type will allocate exactly the internal component sizes and
types shown for that format in tables 3.16-3.17:"
Notably, however, GL_DEPTH_COMPONENT16 does /not/ appear in table 3.16
or table 3.17. It appears in table 3.18, where the "exact" rule doesn't
apply, and it falls back to the "closely as possible" rule.
The confusing part is that the ordering of the tables in the PDF is:
Table 3.16 (pages 182-184)
Table 3.18 (bottom of page 184 to top of 185)
Table 3.17 (page 185)
Presumably, people saw table 3.16, then saw the table immediately
following with DEPTH_COMPONENT* formats, and assumed it was 3.17.
Based on a patch by Chia-I Wu, but without the driconf option to force
Z16 to be used. It's not required, and there's apparently no benefit
to actually using it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chia-I Wu <olv@lunarg.com>
We've learned a few things since we originally disabled Z16; this attempts
to summarize the issue. I am no expert on this subject, though, so the
comment may not be totally accurate.
I did some benchmarking on GM45 and Ironlake, and discovered that for
GLBenchmark 2.7 EgyptHD, using Z16 was 3% slower on GM45 (n=15), and
4.5% slower on Ironlake (n=95). So, we can drop the "on Ivybridge"
aspect of the comment - it's always slower.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chia-I Wu <olv@lunarg.com>
Significantly reduces BO allocation / destruction overhead for transfers,
e.g. measurable via x11perf -shm{ge,pu}t* with glamor.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The rules are about to get a bit more complex to account for
gl_InstanceID and gl_VertexID, which are system values.
Extracting this first avoids introducing duplication.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
glClearBuffer() is currently clearing all active draw color buffers (all
buffers that have not been set to GL_NONE when calling glDrawBuffers) instead
of only clearing the one it receives as parameter. Altough brw_clear()
receives a bit mask indicating the color buffers that should be cleared,
this mask is ignored when calling brw_blorp_clear_color().
This was breaking the 'fbo-drawbuffers-none glClearBuffer' piglit test.
The patch provides the bit mask to brw_blorp_clear_color() so it can limit
clearing to the color buffers present in the mask.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76832
Reviewed-by: Eric Anholt <eric@anholt.net>
This always looks crazy when I stumble across it, until I remember
what the hardware is doing. Describing it ought to short-circuit
that process next time :)
V2: Fix indents to 6 spaces, not 7.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Many shaders use a pattern such as:
for (int i = 0; i < NUM_LIGHTS; i++) {
...access a uniform array, or shader input/output array...
}
where NUM_LIGHTS is a small constant (such as 2, 4, or 8).
The expectation is that the compiler will unroll those loops, turning
the array access into constant indexing, which is more efficient, and
which may enable array splitting and other optimizations.
In many cases, our heuristic fails - either there's another tiny nested
loop inside, or the estimated number of instructions is just barely
beyond the threshold. So, we fail to unroll the loop, leaving the
variable indexing in place.
Drivers which don't support the particular flavor of variable indexing
will call lower_variable_index_to_cond_assign(), which generates piles
and piles of immensely inefficient code. We'd like to avoid generating
that.
This patch detects unsupported forms of variable-indexing in loops, where
the array index is a loop induction variable. In that case, it bypasses
the loop-too-large heuristic and forces unrolling.
Improves performance in various microbenchmarks: Gl32PSBump8 by 47%,
Gl32ShMapVsm by 80%, and Gl32ShMapPcf by 27%. No changes in shader-db.
v2: Check ir->array for being an array or matrix, rather than the
ir_dereference_array itself.
v3: Fix and expand statistics in commit message.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The "fail" flag is set if loop_unroll_count encounters a nested loop;
calling the flag "nested_loop" is a bit clearer.
The original reasoning was that count is inaccurate (too small) if there
are nested loops, as we don't do any sort of analysis on the inner loop.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Loop unrolling will need to know a few more options in the future.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Now that we pass in gl_shader_compiler_options, it makes sense to just
use options->MaxUnrollIterations, rather than passing a separate
parameter.
Half of the invocations already passed options->MaxUnrollIterations,
while the other half passed in a hardcoded value of 32.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This will prevent the two from getting out of sync again.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
These were out of sync with the flags used to control
lower_variable_index_to_cond_assign in brw_shader.cpp.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Not setting this would prevented coalescing after a failed attempt if
the sources for both MOVs were the same.
total instructions in shared programs: 1654531 -> 1650224 (-0.26%)
instructions in affected programs: 423167 -> 418860 (-1.02%)
GAINED: 2
LOST: 0
Reviewed-by: Eric Anholt <eric@anholt.net>
It's just the array index, so we can just go look at the array and see
which element we are.
No significant performance difference (n=140)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
When considering assignment expressions like:
v.x += u.x;
v.x += u.x;
the vectorizer would incorrectly keep going, attempting to find more
instructions to vectorize. It would overwrite the saved assignment
to point at the second one, and increment channels a second time,
resulting in try_vectorize thinking the expression was a vec2 instead of
a float.
Instead, if we see a repeated assignment to a channel, just try to
vectorize everything we've found so far. This clears the saved state
so it will start over.
Fixes Piglit's repeated-channel-assignments.vert.
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
For blocks, gl_shader_program::UniformStorage isn't very useful. The
names stored there are the names of the elements of the block, so
finding blocks with an instance name is hard. There is also only one
entry in ::UniformStorage for each element of a block array, and that is
a deal breaker.
Using ::UniformBlocks is what _mesa_GetUniformBlockIndex does. I
contemplated sharing code between set_block_binding and
_mesa_GetUniformBlockIndex, but building the stand-alone compiler and
the unit tests make this hard. I plan to return to this effort shortly.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76323
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
Cc: github@socker.lepus.uberspace.de
In the next patch, we'll see that using
gl_shader_program::UniformStorage is not correct for uniform blocks.
That means we can't use ::UniformStorage to select between the sampler
path and the block path. Instead we want to just use the type of the
variable. That's never passed to set_uniform_binding, and it's easier
to just remove the function (especially for later patches in the series)
than to add another parameter.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76323
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
Cc: github@socker.lepus.uberspace.de
And remove nonsensical approximation of linear interpolation behavior
for shadow samplers.
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
These were missed/typo'd in the previous patch series:
s/R8G8B8A/R8G8B8A8/
s/rgba_16/RGBA_UNORM16/
s/rgba_uint/RGBA_UINT/
s/rgba_int/RGBA_SINT/
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The commit 3b0b44f7de introduced a build
error:
error: dereferencing pointer to incomplete type
This patch fixes this issue in all the affected files.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Keep tasks as linked list, this way we can associate
more than one encoding task with each buffer.
Signed-off-by: Christian König <christian.koenig@amd.com>
It was always getting set to -8/7 unconditionally. Use the
driver-reported value instead.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Defaults to providing the same offsets as MIN/MAX_TEXEL_OFFSET. For
nvc0, the offset can be -32/31.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Create the screen in the winsys while the mutex is locked.
This also results in a nice code cleanup!
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
This also hides the reference count from drivers.
v2: update the reference count while the mutex is locked in winsys_create
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
This replaces u_gen_mipmap with an extremely simple implementation based
on pipe->blit. st/mesa is also cleaned up.
Pros:
- less code
- correct mipmap generation for NPOT 3D textures (u_blitter uses a better
formula)
- queries are not affected by mipmap generation if drivers disable them
v2: add "first_layer", "last_layer" parameters, drop "face"
v2.1: add format
v2.2: document the format parameter
This is needed by _mesa_generate_mipmap.
This adds an array of pipe_transfers to st_texture_image. Each transfer is
for mapping a single layer.
v2: allocate the array of transfers on demand
No longer used anywhere. These also caused trouble in the Gallium
state tracker code where we include both core Mesa and Gallium util
headers (and the macros were defined differently in each world.)
Removing these macros should help avoid macro mix-ups in the future.
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We moved away from MALLOC/FREE in the rest of core Mesa a while ago.
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We were using REALLOC() from u_memory.h but FREE() from imports.h.
This mismatch caused us to trash the heap on Windows after we
deleted a texture object.
This fixes a regression from commit 6c59be7776.
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
This extension is a huge grab-bag of "stuff that's in DX11". Break it
apart to make it clear what still needs to be done.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This is the effective layer count, for clears etc. This differs from the
depth of the miptree level when views are involved.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Eric Anholt <eric@anholt.net>
We'll still avoid MinLayer here since the fast path doesn't understand
arrays at all, but it's straightforward to do levels.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Eric Anholt <eric@anholt.net>
This allows core mesa's TexSubImage paths etc to work correctly
with views which have nonzero MinLevel or MinLayer.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Eric Anholt <eric@anholt.net>
This is the actual mesa_format to use. In non-view cases this is always
the same as the mt's format.
V4: Comment style
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Eric Anholt <eric@anholt.net>
We need to wire the original texture's mt into the view. All the hard
work of setting up an appropriate tree of gl_texture_image structures
has already been done by core mesa.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Eric Anholt <eric@anholt.net>
If we were to relayout the miptree, we'd break any views that are
sharing it.
(Simplified based on suggestions from Eric)
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Eric Anholt <eric@anholt.net>
These formats can be cast to others (with different component types or
sizes) via ARB_texture_view or ARB_shader_image_load_store. We want
them to be laid out consistently so that we can just reinterpret the
memory with a different format.
In V1, this was done conditionally on a 'prefer_no_swizzle' flag which
was set in TexStorage/TextureView paths, but we need the same behavior
for ARB_shader_image_load_store (which also works with images created
via TexImage, so we don't want it to be conditional.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Eric Anholt <eric@anholt.net>
The sampler can handle R8G8B8X8 (and substitute 1.0 for the fourth
component) but we can't use it as a render target.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Eric Anholt <eric@anholt.net>
None of the other 3-component 16bpc formats are directly supported, so
they get promoted to XRGB equivalents. *Not* promoting RGB16F the same
way makes texture views much more fiddly -- we don't want to have to do
crazy copying behind the scenes.
(with my other master + my experimental ARB_texture_view support) fixes
the piglit test: `spec/ARB_texture_view/view compare 48bit formats`
No regressions in gpu.tests on Haswell.
V4: Don't alter the formats table -- just don't match it to a mesa_format. [Kenneth]
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Eric Anholt <eric@anholt.net>
This is supported by all generations, and is required for memory layout
consistency for texture_view.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Eric Anholt <eric@anholt.net>
Previously, we would unpack the texels to floats using *_TO_FLOAT_TEX,
and then pack them into the desired format using FLOAT_TO_*. Unfortunately,
this isn't quite the inverse operation, and so some texel values would
end up off-by-one.
This fixes the GL_RGB8_SNORM and GL_RGB16_SNORM subcases in piglit's
arb_texture_view-format-consistency-get test on i965. The similar 1-, 2-
and 4-component cases already worked because they took the memcpy path
rather than repacking.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Eric Anholt <eric@anholt.net>
While linux uses .so as a default extension for shared libraries that is
not the case for other platforms. The loader in libGL (and others) assumes
that the dri module will always have a .so extension, thus it will fail
to load on the affected platforms.
Spotted-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Filenames passed to dlopen() don't need to use the platform's default extension
for shared libraries.
Using the '.so' extension when dlopen()ing DRI drivers is hardcoded into mesa
and the X server, so it should be hardcoded here in the Makefile as well.
A similar fix is probably also needed for gallium DRI drivers.
(Consider that if we were starting from scratch, perhaps we would use a custom
extension like .dri instead)
Cc: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
This reverts commit 61bedc3d6b.
As the header is the one defining the API/ABI and is distributed
during installation, we should be using it rather than re-defining
the XA version in configure.ac.
Bump the version in the header to 2.2.0, to reflect what was the
original intent of commit 42158926c6.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
With commit 1f1928db001(glx: Drop _Xglobal_lock while we create and
initialize glx display) we've split the big _Xglobal_lock handling in
a more fine grained manner.
Unfortunatelly we forgot to drop the unlock_mutex on the error paths,
leading to undefined behaviour as the mutex is already unlocked.
Cc: Kristian Høgsberg <krh@bitplanet.net>
Cc: "9.2 10.0 10.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We hit this assert with some piglit tests. Which appears to be a bug
outside of freedreno. Previously we were relying on assert() being
redefined to debug_assert() so that we didn't crash in release builds.
Somehow that stopped working. So just use debug_assert() directly.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Fixes a crash in svga_context_flush_buffers() if we use the 'draw' module
for AA lines (when the device doesn't support that feature). We need to
initialize this list before we setup the swtnl pieces.
Found/fixed by Charmaine Lee.
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
The "new" fragment shader backend has never supported the necessary
color conversion code for this to work. We began using the new backend
in Mesa 7.10 for GLSL (commit a81d423d93, October 2010),
and for ARB_fragment_program in Mesa 9.1 (commit 97615b2d8c,
August 2012).
I haven't heard any complaints, so I don't think anyone will miss this
feature. I believe mplayer used it at one point, but these days
defaults to other paths anyway.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <idr@freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
create_mov() was fixed up to handle neg/abs properly for interal mov's,
using absneg.f, but forgot to fix it for TGSI MOV's. The problem with
using add.f to handle negated mov's is that we can only take a single
const reg src. So:
MOV TEMP[n], -CONST[m]
would turn into:
add.f Rdst, (neg)CONST[m], 0.0
which would not work. Anyways, just remove the extra code and always
use create_mov() which DTRT.
This fixes piglit vs-op-neg-int test.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Applications frequently clear to colors other than 0.0 or 1.0, which
prevents us from doing fast color clears. In that case, we issue this
performance warning on basically every glClear call, resulting in so
much spam that it's nearly impossible to see any other messages.
Plus, I don't think it's useful. We aren't suggesting a better way to
do what the application developers want---we're just telling them it
would be faster to do something they don't want.
Driver developers have no control over the clear color, so this message
is totally useless to them.
A better alternative to get this sort of information is to use
INTEL_DEBUG=blorp, which tells you whether color clears were fast,
simd16 repdata, or slow.
v2: Rebase on has_color_component changes.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Keep track of whether we actually have any sam instructions in the
resulting shader, rather than using TGSI SAMP declarations. If the sam
instruction is optimized out, because the result is not used, we don't
want to emit texture state, etc. In fact emitting sampler state and/or
setting PIXLODENABLE bit when there are no texture fetches seems to
cause lockup.
In theory this should never happen for a "normal" shader, unless the
state tracker is wonky. But it is a very real possibility for binning
pass shaders.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Decompressing ETC2 textures was causing intermitent segfault
by copying resulting 4x4 texel block to the destination texture
regardless of the size of the destination texture. Issue found
via application crash in GLBenchmark 3.0's Manhattan test.
v2: add more detail comment. Compute limit outside inner loops.
v3: add bugzilla reference
v4: Correct cc syntax in commit log
v5: really grab the right patch
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=74988
Cc: "9.2 10.0 10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> [v1, suggested v2-3]
v2: move allocation to a function as first step
to clean vid_enc_EncodeFrame
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Leo Liu <leo.liu@amd.com>
For TEX instructions, the set of samplers and sampler views should
be consistent. The XA state tracker sometimes passes an inconsistent
set of samplers and sampler views. Rather than assert and die, issue
a warning.
v2: add debugging code to detect inconsistent state.
v3: also check for null sampler in svga_state_tss.c
Cc: "10.0" "10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Given
mov vgrf7, vgrf9.xyxz
add vgrf9.xyz, vgrf4.xyzw, vgrf5.xyzw
add vgrf10.x, vgrf6.xyzw, vgrf7.wwww
the last instruction would be wrongly changed to
add vgrf10.x, vgrf6.xyzw, vgrf9.zzzz
during copy propagation.
The issue is that when deciding if a record should be cleared, the old code
checked for
inst->dst.writemask & (1 << ch)
instead of
inst->dst.writemask & (1 << BRW_GET_SWZ(src->swizzle, ch))
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76749
Signed-off-by: Chia-I Wu <olv@lunarg.com>
Cc: Jordan Justen <jljusten@gmail.com>
Cc: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romainck <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: "10.1" <mesa-stable@freedesktop.org>
Nothing bad came of this because they weren't used after visitor running,
but leaving them in a bad state seems like a recipe for pain later.
Suggested-by: Kenneth Graunke <kenneth@whitecape.org>
When you've got a simple solid-color shader that doesn't generate
pixel_x/y interpolation, we were deciding that the first vgrf was both the
undefined pixel_x and pixel_y, and extending its live interval to avoid
the stride problem. That tricked other optimization that tries to see if
a particular instruction is the last use of a variable.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We stopped doing variable index lowering for uniforms in
a64c1eb9b1, 5 months after the comment was
added.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
While we wish our optimization passes could identify all the cases where
we can coalesce our variables, we miss out on a lot of opportunities.
total instructions in shared programs: 1673849 -> 1673166 (-0.04%)
instructions in affected programs: 299521 -> 298838 (-0.23%)
GAINED: 7
LOST: 0
Note that many programs are "hurt". The notable ones are where we produce
unrolling in cases we didn't before (presumably just because of the lower
instruction count). But there are also some cases where pushing things
right into the variables prevents copy propagation and tree grafting,
since we don't split our variable usage webs apart.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
is_power_of_two() is now provided by mesa so its definition must be removed
from the i915 driver code.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The next patch will introduce an optimization that only works when
integers are not represented as floating point values.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The next few patches will introduce an optimization that only works when
integers are not represented as floating point values.
v2: Re-word-wrap a line, as requested by Ian Romanick.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
ir_binop_ubo_load takes unsigned integer operands. However, the array
index used to compute these offsets may be a signed integer. (For
example, see Piglit's spec/glsl-1.40/uniform_buffer/fs-bvec-array).
For some reason, we were missing an ir_binop_i2u cast, and ir_validator
was failing to catch that.
Without this change, ir_builder's type inference code broke for me when
writing a new optimization pass.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The vector backend already implemented this optimization, but
surprisingly, we never bothered to implement it in the scalar backend.
In addition to saving two instructions, this eliminates a use of the
accumulator as an explicit source, which is unsupported in SIMD16 mode
on Gen7+, which could help us gain SIMD16 programs.
Cuts 19.23% of the instructions in dolphin/efb2ram.shader_test.
v2: Rebase on is_16bit_integer_constant -> is_uint16_constant rename.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The i965 MUL instruction doesn't natively support 32-bit by 32-bit
integer multiplication; additional instructions (MACH/MOV) are required.
However, we can avoid those if we know one of the operands can be
represented in 16 bits or less. The vector backend's is_16bit_constant
static helper function checks for this.
We want to be able to use it in the scalar backend as well, which means
moving the function to a more generally-usable location. Since it isn't
i965 specific, I decided to make it an ir_constant method, in case it
ends up being useful to other people as well.
v2: Rename from is_16bit_integer_constant to is_uint16_constant, as
suggested by Ilia Mirkin. Update comments to clarify that it does
apply to both int and uint types, as long as the value is
non-negative and fits in 16-bits.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Performance warnings are logged via KHR_debug in addition to when the
INTEL_DEBUG=perf environment variable is set. Without this, messages in
debug contexts would have "(null)" for the reason.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
These are clearly needed---the comments in the function are even present
for each one of them. I originally had two separate state atoms for
3DSTATE_SBE and 3DSTATE_SBE_SWIZ. When I combined the functions, I must
have forgotten to add the atoms for 3DSTATE_SBE_SWIZ.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Nothing actually uses this---we handle rasterizer discard in the
clipper in order for statistics counters to work.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
renderer_copy_prepare was setting the first sampler but never telling
the cso code how many samplers were actually used. Fix this.
Cc: "10.1" <mesa-stable@freedesktop.org>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Binding a new destination may cause the svga driver to emit draw calls
while propagating the surface. Make sure this doesn't happen in the middle
of sampler state setup where state may be incosistent.
In practice, surface propagation should never happen here and even if it did,
it wouldn't be a valid reason for the svga driver to emit partially set up
state, but to avoid future uncertainties, make sure this doesn't happen
anyway.
Found while auditing the state tracker for inconsistent sampler state /
sampler view setup.
Cc: "10.1" <mesa-stable@freedesktop.org>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
I think this was used for coalescing out partly dead large virtual
registers, but the patch that enabled that caused regressions and didn't
make it upstream.
Reviewed-by: Eric Anholt <eric@anholt.net>
It's more likely that we won't find writes to all channels than one will
interfere, and calculating interference is more expensive. This change
will also help prepare for coalescing load_payload instructions'
operands.
Also update the live intervals for all channels, and not just the last
that we saw.
Reviewed-by: Eric Anholt <eric@anholt.net>
The intent in 9b6b084eb7 was
for urb .size and .min_vs_entries fields to use the values
from the GEN8_FEATURES macro.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Rename functions to match format names.
sed commands:
s/signed_rgba8888_rev/R8G8B8A8_SNORM/g
s/signed_rgba8888/A8B8G8R8_SNORM/g
s/f_rgba8888_rev/R8G8B8A_UNORM/g
s/f_rgba8888/A8B8G8R8_UNORM/g
s/f_rgbx8888_rev/R8G8B8X8_UNORM/g
s/f_rgbx8888/X8B8G8R8_UNORM/g
s/f_argb8888_rev/A8R8G8B8_UNORM/g
s/f_argb8888/B8G8R8A8_UNORM/g
s/f_xrgb8888_rev/X8R8G8B8_UNORM/g
s/f_xrgb8888/B8G8R8X8_UNORM/g
s/signed_rgbx8888/X8B8G8R8_SNORM/g
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Implement guest-backed surface sharing using prime fds. Previously only
legacy surfaces could use this functionality. Also use the vmwgfx 2.6
single-ioctl prime fd reference if available.
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
This opcode provide support for GL_ARB_texture_query_lod,
Signed-off-by: Dave Airlie <airlied@redhat.com>
[imirkin: rebase, docs update]
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cuts a small handful of instructions in Serious Sam 3:
instructions in affected programs: 4692 -> 4666 (-0.55%)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Which would lead to translating
mad vgrf9:F, vgrf3:F, u0:F, vgrf6:F
mov.sat vgrf7:F, -vgrf9:F
into
mad.sat vgrf9:F, vgrf3:F, u0:F, vgrf6:F
mov vgrf7:F, -vgrf9:F
Fixes some lighting effects in Dota2.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76749
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
ip needs to be initialized to start_ip - 1, since the first thing in the
main loop is ip++. Otherwise we would incorrectly propagate the saturate
from the mov to the mad:
mad a, b, c, d
mov.sat x, a
add y, z, a
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously we'd ignore the sources of instructions with non-GRF
destinations when calculating calculating the dead channels. This would
lead to us incorrectly removing the first instruction in this sequence:
mov vgrf11, ...
cmp.ne.f0 null, vgrf11, 1.0
mov vgrf11, ...
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76616
Or else poor programmers might mistakenly use the temporary mem_ctx,
instead of the fs_visitor's mem_ctx and wonder why their code is
crashing.
Also remove the parenting. These contexts are local to the optimization
passes they're in and are freed at the end.
Otherwise calling dump_instructions() after declaring a new fs_reg would
segfault when calculate_register_pressure()'s loop over reg walked off
the end of the virtual_grf_start[] array that calculate_live_intervals()
would have reallocated for you, if it had known there was a new
register.
Don't hardcode /dev/dri/card0 but instead use the drm
macros which allows the correct /dev/drm0 device to be
opened on OpenBSD.
v2: use snprintf and fallback to /dev/dri/card0
v3: check for snprintf truncation
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
After the loader changes libudev is no longer required for
gbm or the egl drm/wayland platforms. Lets these build/run
on OpenBSD.
v2: preserve the libudev requirement for Linux as suggested
by Emil Velikov.
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
After the loader changes libudev is no longer required to
build gbm or the egl drm/wayland platforms.
Remove a libudev ifdef which allows the the drm egl driver
to be loaded on OpenBSD.
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
OpenBSD does not have DT_NEEDED entries for libc by design,
over concerns how the symbols would be referenced after
changing the major version of the library.
So avoid -no-undefined checks on OpenBSD as they will fail.
v2: don't include the -no-undefined libtool option in the variable
and change -Wl,--no-undefined references in Automake.inc as well.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76856
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The targets do not require expat or selinux. Use GALLIUM_COMMON_LIB_DEPS
which provides the core requirements for each gallium target.
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
The targets do not require expat or selinux. Use GALLIUM_COMMON_LIB_DEPS
which provides the core requirements for each gallium target.
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
The build system will use g++ to link the static library due to the
dummy.cpp source(s). Thus one does not need the explicit link against
stdc++.
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
The build system does not know that the static library is C++.
Mention the cpp file to trigger generation of the proper variable
and drop the hacky stdc++ linking.
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
With recent commit we started de-duplicating all of the compiler/
linker flags moving their handling inside Automake.inc.
This did not take into consideration that the above variable was set
at configure time, leading to issues on certain build combinations.
Move the variable to where it's used/handled thus cleaning up
configure.ac.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76848
Tested-by: Vinson Lee <vlee@freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
The pkg-config module was called "EXPAT" instead of "expat" in
PKG_CHECK_EXISTS. This seems to have been wrong because the wrong
argument was copied from PKG_CHECK_MODULES.
Cc: "10.0" "10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
_GNU_SOURCE is only set/required for linux*|*-gnu*|gnu*) and as the
functionality is available on other systems check for RTLD_DEFAULT instead.
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Platforms that lack libudev (OpenBSD and possibly others) need
this change in order to load the correct dri driver.
Under linux we unconditionally require libudev, thus this code
will never get build.
v2: Add commit message (Emil Velikov)
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
This reverts commit 96e8b916a7.
In the case of VCE encoding with raw YUV file, CPU load directly
to VRAM is faster than combination of CPU writing to GTT and
then blit to VRAM with GPU.
Reviewed-by: Christian König <christian.koenig@amd.com>
Keep a dynamically increasing array of all the views
created for a texture instead of just the last one.
v2: add comments, fix array size calculation,
release only the first sampler view found
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The xa version number had to be set in two places. In configure.ac and in
xa_tracker.h. Furthermore, xa_tracker.h is an installed header so we can't
use mesa internal defines. So therefore, at configure time, modify the
xa_tracker.h header to use the version given by configure.ac
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
/me puts a paper bag on his head and sits in the corner.
This was supposed to be included in 5a68f731, which added
glPointSizePointerOES back to the list of functions exposed by
libGLESv1_CM. It looks like it was an uncommitted change in my tree
when I sent the patch out.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We are checking for no-ops in the CSO module for both of these items
so there's no reason to do it in the driver.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
As we do for sampler states in single_sampler_done() and many other
CSO functions.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Previously GLX_EXT_buffer_age has always been advertised as supported because
both client_glx_support and client_glx_only where set. So it did not matter
that direct_support is only set when running dri3 and we ended up always
advertising it.
Fix that by not setting client_glx_only for buffer_age in known_glx_extensions.
Signed-off-by: Adel Gadllah <adel.gadllah@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
We want to call pipe->set_sampler_views() with count being the
maximum of the old number of sampler views and the new number.
This makes sure we null-out any old sampler views.
We already do the same thing for sampler states in single_sampler_done().
Fixes some assertions seen in the VMware driver with XA tracker.
Cc: "10.0" "10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Tested-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Ages ago Chia-I added an ES compatibility flag to several of the various
generator scripts. The intention was to bridge differences between ES
and desktop in Mesa builds without ES. It doesn't appear that it has
ever been used. Recent changes to static_dispatch status of several ES1
functions caused problems in desktop-only, non-shared-glapi builds.
Enabling the ES compatibility mode appears to fix these build problems.
This is kind of a duct tape solution to this problem. As I mentioned in
the cover letter for the series that triggered the build problem, I
would like to make some major changes to the generator architecture and
the XML. The whole point of the proposed architecture changes is to
better handle the differences between desktop GL and ES. I think duct
tape is okay for now.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76869
Tested-by: Brian Paul <brianp@vmware.com>
Tested-by: Lu Hua <huax.lu@intel.com>
Cc: Vinson Lee <vlee@freedesktop.org>
Cc: Chia-I Wu <olv@lunarg.com>
Commit fb78fa58 made the GL_ARB_debug_output functions aliases of the
GL_KHR_debug output functions. As a result, the function names in
struct _glapi_table also changed. The table in check_table.cpp used the
ARB names.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Tested-by: Brian Paul <brianp@vmware.com>
Tested-by: Lu Hua <huax.lu@intel.com>
Cc: Vinson Lee <vlee@freedesktop.org>
C89 has a fairly short minimum-maximum string length. To support
compilers limited by the C89 limits, this script had a mode where it
would generate a character array instead of a giant string. These were
functionally the same, but the code generated for the character array is
HUGE and difficult to read.
As far as I can tell, nothing in Mesa uses '-m short' any more. The
generated files used to be tracked in revision control, but I think we
stopped using '-m short' when we stopped tracking the generated files.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Brian Paul <brianp@vmware.com>
Tested-by: Lu Hua <huax.lu@intel.com>
Cc: Vinson Lee <vlee@freedesktop.org>
Nested for loops running through tables against which they
finally do an assert were ran also with optimized builds.
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
% operator could return negative value which would cause
indexing before perm table. Change %256 to &0xff
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This is to avoid running out of query buffer space due to winsys
limitations. Instead of a fixed size per screen pool of query buffers,
use a slab allocator that allocates a new slab if we run out of space
in the first one.
v2: Correct email addresses.
v3: s/8192/VMW_QUERY_POOL_SIZE/. Improve documentation and log message.
Reported-and-tested-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
This adds a gallium cap that allows us to fake GL3.0 by
not exposing MSAA on sw rendering.
It also forces the extra extensions needed for GL3.2.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This reverts commit f6e290f80c.
To fix the broken build. The DRI-enabled build seems OK after reverting.
Th non-DRI/gallium build is still suffering from an unrelated issue in
the pipe-loader code.
Currently, we raise an error when doing this which breaks a conformance
test from the OpenGL samples pack. Even if this is a bit silly it is not
an error.
From http://www.opengl.org/wiki/Rectangle_Texture:
"Rectangle textures contain exactly one image; they cannot have mipmaps.
Therefore, any texture parameters that depend on LODs are irrelevant
when used with rectangle textures; attempting to set these parameters to
any value other than 0 will result in an error."
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76496
Reviewed-by: Brian Paul <brianp@vmware.com>
The previous commit stopped exporting 21 libGLESv2 and 88 libGLESv1_CM
functions. This removes the work-arounds for those functions from
ABI-check.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Chad Versace <chad.versace@linux.intel.com>
This has been a long standing issue with the ES libraries. Functions
marked in the XML with 'static_dispatch=false' were still incorrectly
exported. ABI-check is supposed to detect this case, but we have to
paper over failures every time a new extension is added.
This change will cause a big pile of functions to disappear from
libGLESv2 and libGLESv1_CM.
libGLESv2 loses (20 functions):
glBindVertexArrayOES
glCompressedTexImage3DOES
glCompressedTexSubImage3DOES
glCopyTexSubImage3DOES
glDeleteVertexArraysOES
glDiscardFramebufferEXT
glDrawBuffersNV
glFlushMappedBufferRangeEXT
glFramebufferTexture3DOES
glGenVertexArraysOES
glGetBufferPointervOES
glGetProgramBinaryOES
glIsVertexArrayOES
glMapBufferOES
glMapBufferRangeEXT
glProgramBinaryOES
glReadBufferNV
glTexImage3DOES
glTexSubImage3DOES
glUnmapBufferOES
libGLESv1_CM loses (88 functions):
glAlphaFuncxOES
glBindFramebufferOES
glBindRenderbufferOES
glBlendEquationOES
glBlendEquationSeparateOES
glBlendFuncSeparateOES
glCheckFramebufferStatusOES
glClearColorxOES
glClearDepthfOES
glClearDepthxOES
glClipPlanefOES
glClipPlanexOES
glColor4xOES
glDeleteFramebuffersOES
glDeleteRenderbuffersOES
glDepthRangefOES
glDepthRangexOES
glDiscardFramebufferEXT
glDrawTexfOES
glDrawTexfvOES
glDrawTexiOES
glDrawTexivOES
glDrawTexsOES
glDrawTexsvOES
glDrawTexxOES
glDrawTexxvOES
glFlushMappedBufferRangeEXT
glFogxOES
glFogxvOES
glFramebufferRenderbufferOES
glFramebufferTexture2DOES
glFrustumfOES
glFrustumxOES
glGenerateMipmapOES
glGenFramebuffersOES
glGenRenderbuffersOES
glGetBufferPointervOES
glGetClipPlanefOES
glGetClipPlanexOES
glGetFixedvOES
glGetFramebufferAttachmentParameterivOES
glGetLightxvOES
glGetMaterialxvOES
glGetRenderbufferParameterivOES
glGetTexEnvxvOES
glGetTexGenfvOES
glGetTexGenivOES
glGetTexGenxvOES
glGetTexParameterxvOES
glIsFramebufferOES
glIsRenderbufferOES
glLightModelxOES
glLightModelxvOES
glLightxOES
glLightxvOES
glLineWidthxOES
glLoadMatrixxOES
glMapBufferOES
glMapBufferRangeEXT
glMaterialxOES
glMaterialxvOES
glMultiTexCoord4xOES
glMultMatrixxOES
glNormal3xOES
glOrthofOES
glOrthoxOES
glPointParameterxOES
glPointParameterxvOES
glPointSizePointerOES
glPointSizexOES
glPolygonOffsetxOES
glQueryMatrixxOES
glRenderbufferStorageOES
glRotatexOES
glSampleCoveragexOES
glScalexOES
glTexEnvxOES
glTexEnvxvOES
glTexGenfOES
glTexGenfvOES
glTexGeniOES
glTexGenivOES
glTexGenxOES
glTexGenxvOES
glTexParameterxOES
glTexParameterxvOES
glTranslatexOES
glUnmapBufferOES
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Chia-I Wu <olv@lunarg.com>
Cc: Paul Berry <stereotype441@gmail.com>
This is hella ugly. The same-named function in desktop OpenGL is
hidden, but it needs to be exposed by libGLESv2 for OpenGL ES 3.0.
There's no way to express in the XML that a function should be be hidden
in one API but exposed in another.
This won't affect any change now, but it will prevent a regression in a
later patch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Functions that are part of OpenGL ES 1.0 or 1.1 should have static
dispatch functions in libGLESv1_CM. This doesn't affect any change yet,
but it will prevent later regressions.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Chad Versace <chad.versace@linux.intel.com>
It looks like these were added accidentally by Paul in commit 1a1db174.
From the commit message and the look of the patch, I think this was just
some sed-job left overs.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
It was my understanding that the writemask works in SIMD4x2 mode for
texturing instructions and doesn't require a message header. Some bit of
this logic must be wrong, so disable it until it's understood.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76617
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
By doing GC the linker removes all the symbols that are not referenced
and/or used by the final library. This results in a saving of ~100K
up-to ~600K per (stripped) binary (classic vs gallium drivers).
If interested one can ask the compiler to print the sections that are
removed using -Wl,--print-gc-sections.
v2: Check if ld supports the flag before using it.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com> (v1)
... apart from the dri drivers.
With this final change we can build mesa without fear that
the resulting libraries will have unresolved symbols.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Set the flag for all but the dri targets. They have missing
glapi symbols which are required for the normal operation with
the X server.
Jon, I fear that you'll need to carry the "no-undefined" hunk
locally when building the dri drivers under cygwin.
Cc: Jon TURNEY <jon.turney@dronecode.org.uk>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Explicitly setting the linker variable was required for old and broken
build toolchains. At this point this should no longer be needed, and
setting the sources lists will trigger generation of the correct LINK
variables.
Explicitly include dummy.cpp to use g++ to link the static library which
in most cases is based upon C++ code.
v2: Reword commit message.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This lets us have only one if HAVE_MESA_LLVM block, rather than
one for each driver.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Every library uses the same libtool/linker flags. Compact those
into AM_LDFLAGS and append the version script to it.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Or it compiles them in, but pretends they don't exist
v2: Rebase (Emil)
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Mesa does _not_ link against libudev. Additionally the only place
that deals with it is the loader, thus we can drop the CFLAGS.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
It makes little sense to enable the vdpau, xvmc and omx state-trackers
as they do not make use of (don't work with) the software driver.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
grep -q is easier to read and consistent with the rest of configure.ac.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Gallium osmesa links against the softpipe driver, thus the build
will fail if it's missing.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Evidently at least static OSMesa is still used as shared one
causes substantial increase in the load time for some programs
that use it (from seconds up-to ~30min).
Rather than forcing everyone to use shared mesa, revert commit
a6efbac9fb and default to shared
build when both shared and static are disabled.
v2: Whitespace cleanup, drop silly comment.
Reported-by: Burlen Loring <burlen.loring@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
GL_INTENSITY has never been valid as a pixel format -- to get the memcpy
pack/unpack paths, the app needs to specify GL_RED as the pixel format
(or GL_RED_INTEGER for the integer formats).
Note: This was briefly merged before, but exposed some breakage in gallium, so
was reverted. Hopefully it will stick this time.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Brian Paul <brianp@vmware.com>
_mesa_format_matches_format_and_type() returns true for
GL_RED/GL_RED_INTEGER (with an appropriate type) into an intensity
mesa_format.
We want the `red`-based format instead, regardless of the order we find
them in our walk of the mesa formats list.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The case for this was in the wrong function, and this format's store
func was not set in the table at all.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Brian Paul <brianp@vmware.com>
Whether or not the coords are normalized is handled in the texture
state. But we otherwise need to treat RECT sample instructions as 2D.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
In some cases, we need a register to be assigned up to three components
before the base. Since we can't have negative register #'s, just shift
everything up. May increase register usage for trivial shaders, but I
don't think we are shader limited in those cases. A proper solution is
going to require a better register assignment algorithm (which is on the
TODO list), this is just a hack to get us by until then.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
RB_FRAME_BUFFER_DIMENSION is not a banked context register, so we need
to wait for the GPU to idle before updating it. But we'd rather not
have unnecessary WFI's, so actually keep track if we need to emit it or
not.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
This is something that XA triggers. In some cases it will only use
SAMP[1] (composite mask) but not SAMP[0] (composite src).
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Based on a patch by Ville Syrjälä.
As usual, these are placeholder values; actual values will come later.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
This is the closing for the "\defgroup IR Intermediate representation
nodes" all the way at the top of the file.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When doing software rendering (i.e. rendering to the selection buffer) we need
to make sure that we have valid index bounds before calling _tnl_draw_prims(),
otherwise we can crash.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=59455
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The underlying glDrawArrays() calls weren't getting compiled into
the display list. We simply need to use the current dispatch table
so the CALL_DrawArrays() is routed to the display list save function.
This patch also fixes glMultiModeDrawArraysIBM and
glMultiModeDrawElementsIBM.
Fixes the new piglit gl-1.4-dlist-multidrawarrays test.
Cc: "10.0" "10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously we only examined the GL_DEPTH_MODE state to determine the
sampler view swizzle for depth textures. Now we also consider the
texture base format for color textures too.
The basic idea is if we're sampling from a RGB texture we always
want to get A=1, even if the actual hardware format might be RGBA.
We had assumed that the texture's A values were always one since that's
what Mesa's texstore code does. But if we render to the RGBA texture,
the A values might not be 1. Subsequent sampling didn't return the
right values.
Now we examine the user-specified texture base format vs. the actual
gallium format to determine the right swizzle.
Fixes several fbo-blending-formats, fbo-clear-formats and fbo-tex-rgbx
failures with VMware/svga driver (and possibly other drivers).
No other piglit regressions with softpipe or VMware/svga.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
In preparation for following changes.
I used a temporary test harness to compare the old code to the new
for all possible swizzle inputs. No change in results.
This also happens to fix a leak of the current GS pull constant BO on
context destroy, by just not holding on to the pull const bos after the
surface state is generated.
No statistically significant performance difference on GLB2.7 on HSW at
1024x768 (n=40) or 320x240 (n=44), or on BYT at 320x240 (n=47).
v2: Rebase on intel_upload simplification.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The implementation kept a page-sized area for uploading data, and
uploaded chunks from that to a 64kb-sized streamed buffer. This wasted
cache footprint (and extra state tracking to do so) when we want to just
write our data into the buffer immediately.
Instead, build it around an interface like brw_state_batch() that just
gets you a pointer to BO memory to upload your stuff immediately.
Improves OpenArena on HSW by 1.62209% +/- 0.355299% (n=61) and on BYT by
1.7916% +/- 0.415743% (n=31).
v2: Rebase on Mesa master, drop old prototypes. Re-do performance
comparison on a kernel that doesn't punish CPU efficiency
improvements.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
it's useful to know what the llvmbuildstore arguments are going to
be before executing it because it can crash and make sure to
print out the inputs only if we're not generating a gs because
it fetches inputs differently.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
We used to overallocate the output buffer sometimes running out
of memory with applications rendering large geometries. The actual
maximum number of vertices out is simply the maximum number of
primitives in (number of gs invocations) multiplied by the maximum
number of output vertices per gs input primitive (i.e. gs invocation).
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Don't pass null query object pointers into gallium functions.
This avoids segfaulting in the VMware driver (and others?) if the
pipe_context::create_query() call fails and returns NULL.
Cc: "10.0" "10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
sed commands:
s/ARGB2101010_UINT\b/B10G10R10A2_UINT/g
s/ABGR2101010_UINT\b/R10G10B10A2_UINT/g
Reviewed-by: Reviewed-by: Roland Scheidegger <sroland@vmware.com>
It is quite hard to meet the dependency of the libxml2 python bindings
outside Linux, and in particularly on MacOSX; whereas ElementTree is
part of Python's standard library. ElementTree is more limited than
libxml2: no DTD verification, defaults from DTD, or XInclude support,
but none of these limitations is serious enough to justify using
libxml2.
In fact, it was easier to refactor the code to use ElementTree than to
try to get libxml2 python bindings.
In the process, gl_item_factory class was refactored so that there is
one method for each kind of object to be created, as it simplifies
things substantially.
I confirmed that precisely the same output is generated for GL/GLX/GLES.
v2: Remove m4/ax_python_module.m4 as suggested by Matt Turner.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Release the references to the sampler views before
destroying the pipe context.
v2: remove TODO and unrelated change
v3: move to st_texture.[ch], rename callback, add comment
v4: fix rebase mess up and add further cleanups
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: "10.0 10.1" <mesa-stable@lists.freedesktop.org>
This can get called in some circumstances if both src type and dst type
have same width (seen with float32->unorm32). While this particular case
was bogus anyway let's just fix that as it can work trivially (due to the
way it was called it actually worked anyway apart from the assert).
Reviewed-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
When deciding if a clear color is suitable for fast clear,
take into account if a color channel is active in the
buffer format.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch introduces two pre-canned MOCS values: BDW_MOCS_WB
(write-back, all caches) and BDW_MOCS_WT (write-through, all caches).
We use write-through caching for render targets, and write-back for
all other data. (At least on Haswell, I believe write-back LLC/eLLC
didn't work for scan-out buffers, while write-through did.)
No performance analysis has been done on the impact of this patch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Eric Anholt <eric@anholt.net>
All of the functionality is implemented in a private function in the one
file where it is used.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Fix eglCreateImage() from a packed dma_buf surface with a non-zero offset
to pixels data. In particular, this fixes support for planar YUV surfaces
when they are individually mapped on a per-plane basis, i.e. when the
OES_EGL_image_external is not used and user application wants to use its
own shader code for composition, or processing on individual plane (OCL).
Signed-off-by: Gwenole Beauchesne <gwenole.beauchesne@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Implementation note:
I don't use context for ralloc (don't know how).
The check on PROGRAM_SEPARABLE flags is also done when the pipeline
isn't bound. It doesn't make any sense in a DSA style API.
Maybe we could replace _mesa_validate_program by
_mesa_validate_program_pipeline. For example we could recreate a dummy
pipeline object. However the new function checks also the
TEXTURE_IMAGE_UNIT number not sure of the impact.
V2:
Fix memory leak with ralloc_strdup
Formatting improvement
V3 (idr):
* Actually fix the leak of the InfoLog. :)
* Directly generate logs in to gl_pipeline_object::InfoLog via
ralloc_asprintf isntead of using a temporary buffer.
* Split out from previous uber patch.
* Change spec references to include section numbers, etc.
* Fix a bug in checking that a different program isn't active in a stage
between two stages that have the same program. Specifically,
if (pipe->CurrentVertexProgram->Name == pipe->CurrentGeometryProgram->Name &&
pipe->CurrentGeometryProgram->Name != pipe->CurrentVertexProgram->Name)
should have been
if (pipe->CurrentVertexProgram->Name == pipe->CurrentFragmentProgram->Name &&
pipe->CurrentGeometryProgram->Name != pipe->CurrentVertexProgram->Name)
v4 (idr): Rework to use CurrentProgram array in loops.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This is much like _mesa_sampler_uniforms_are_valid, but it operates
across an entire pipeline object.
This function differs from _mesa_sampler_uniforms_are_valid in that it
directly creates the gl_pipeline_object::InfoLog instead of writing to
some temporary buffer.
This was originally included in another patch, but it was split out by
Ian Romanick.
v2 (idr): Fix the loop bounds. shProg isn't an array, so
ARRAY_SIZE(shProg) was 1, so only the vertex program was validated.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
V2 (idr):
* Keep the behavior of other info logs in Mesa: and empty info log
reports a GL_INFO_LOG_LENGTH of zero.
* Use a NULL pointer to denote an empty info log.
* Split out from previous uber patch.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Test become green in piglit:
The updated ext_transform_feedback-api-errors:useprogstage_noactive useprogstage_active bind_pipeline
arb_separate_shader_object-GetProgramPipelineiv
arb_separate_shader_object-IsProgramPipeline
For the moment I reuse Driver.UseProgram but I guess it will be better
to create a UseProgramStages functions. Opinion is welcome
V2: formatting & rename
V3 (idr):
* Change spec references to core OpenGL versions instead of issues in the
extension spec.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Now arb_separate_shader_object-GetProgramPipelineiv should pass.
V3 (idr):
* Change spec references to core OpenGL versions instead of issues in
the extension spec.
* Split out from previous uber patch.
v4 (idr): Use _mesa_has_geometry_shaders in _mesa_UseProgramStages to
detect availability of geometry shaders.
v5 (idr): Whitespace cleanup, use _mesa_lookup_shader_program_err
instead of open-coding it again, and update some comments at the end of
_mesa_UseProgramStages. All suggested by Eric.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Extend use_shader_program to support a different target. Allow to reuse the
function to update the pipeline state. Note I bypass the flush when target
isn't current. Maybe it would be better to create a new UseProgramStages
driver function
This was originally included in another patch, but it was split out by
Ian Romanick.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
save and restore _Shader/Pipeline binding point. Rational we don't want any
conflict when the program will be unattached.
V2: formatting improvement
V3 (idr):
* Build fix. The original patch added calls to _mesa_use_shader_program
with 4 parameters, but the fourth parameter isn't added to that
function until a much later patch. Just drop that parameter for now.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Basically a sed but shaderapi.c and get.c.
get.c => GL_CURRENT_PROGAM always refer to the "old" UseProgram behavior
shaderapi.c => the old api stil update the Shader object directly
V2: formatting improvement
V3 (idr):
* Rebase fixes after a block of code was moved from ir_to_mesa.cpp to
shaderapi.c.
* Trivial reformatting.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
To avoid NULL pointer check a default pipeline object is installed in
_Shader when no program is current
The spec say that UseProgram/UseShaderProgramEXT/ActiveProgramEXT got an
higher priority over the pipeline object. When default program is
uninstall, the pipeline is used if any was bound.
Note: A careful rename need to be done now...
V2: formating improvement
V3 (idr):
* Build fix. The original patch added calls to _mesa_use_shader_program
with 4 parameters, but the fourth parameter isn't added to that
function until a much later patch. Just drop that parameter for now.
* Trivial reformatting.
* Updated comment of gl_context::_Shader
v4 (idr): Reformat spec quotations to look like spec quotations. Update
comments describing what gl_context::_Shader can point to. Bot
suggested by Eric.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Eliminate lp_vertex_shader, as it added nothing over draw_vertex_shader.
Simplify lp_geometry_shader, as most of the incoming state is unneeded.
(We could also just use draw_geometry_shader if we were willing to peek
inside the structure.)
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Zack Rusin <zackr@vmware.com>
* Add HAVE_PTHREAD, we do have pthread support wrappers now for
non-native Haiku threaded applications.
* Viewport changed behavior recently breaking the build.
We fix this by looking at the gl_context ViewportArray
(Thanks Brian for the idea)
Acked-by: Brian Paul <brianp@vmware.com>
The SIMD16 replicated FB write message only works if we don't need the
color calculator to mask our framebuffer writes. Previously, we bailed
on it if color_mask wasn't <true, true, true, true>. However, this was
needlessly strict for formats with fewer than four components - only the
components that actually exist matter.
WebGL Aquarium attempts to clear a BGRX texture with the ColorMask set
to <true, true, true, false>. This will work perfectly fine with the
replicated data message; we just bailed unnecessarily.
Improves performance of WebGL Aquarium on Iris Pro (at 1920x1080) by
abound 50%, and Bay Trail (at 1366x768) by over 70% (using Chrome 24).
v2: Use _mesa_format_has_color_component() to properly handle ALPHA
formats (and generally be less fragile).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Dylan Baker <baker.dylan.c@gmail.com>
WebGL Aquarium in Chrome 24 actually hits this.
v2: Move to core Mesa (wisely suggested by Ian); only consider
components which actually exist.
v3: Use _mesa_format_has_color_component to determine whether components
actually exist, fixing alpha format handling.
v4: Add a comment, as requested by Brian. No actual code changes.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Dylan Baker <baker.dylan.c@gmail.com>
When considering color write masks, we often want to know whether an
RGBA component actually contains any meaningful data. This function
provides an easy way to answer that question, and handles luminance,
intensity, and alpha formats correctly.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Dylan Baker <baker.dylan.c@gmail.com>
This gets us equivalent code paths on BDW and pre-BDW, except for stencil
(where we don't have MSAA stencil resolve code yet)
Improves MSAA-forced citybench by 7.94496% +/- 2.38429% (n=16). Reduces
DRI2 MSAA glxgears performance by -12.3559% +/- 1.52845% (n=9).
v2: Move the new meta code to brw_meta_updownsample.c, name it
brw_meta_updownsample(), add a comment about
intel_rb_storage_first_mt_slice(), and rename that function and move
the RB generation into it (review ideas by Ken).
v3: Fix 2 src vs dst pasteos in previous change.
v4: Skip this path pre-gen8 for now, until we can analyze the glxgears
performance delta some more.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now that BindRenderbufferTexImage() is a thing that drivers can do, winsys
FBOs *can* have NeedsFinishRenderTexture set.
v2: Keep the short-circuit for non-BindRenderbufferTexImage() drivers
(review by Ken).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Even if the singlesample_mt got reopened from DRI due to
pageflipping/buffer swapping, our private miptree shouldn't need any
changes.
Improves performance of a little swapbuffers-loving microbenchmark with
MSAA forced on, by 1.2371% +/- 0.624802% (n=102)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The formatting was weird, and the tests were duplicated, and it is
guaranteed that mt->region exists.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes a memory leak with MSAA winsys buffers since my move of
singlesample_mt to the rb in 4e0924c5de
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Sometimes it would be nice to benchmark some app with MSAA versus not, but
it doesn't offer the controls you want. Just provide a handy knob to
force the issue.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
For each write, search previous instructions for unread writes to the
flag register and remove them. Note that this will not eliminate the
last unread write.
total instructions in shared programs: 788074 -> 788004 (-0.01%)
instructions in affected programs: 4930 -> 4860 (-1.42%)
Reviewed-by: Eric Anholt <eric@anholt.net>
With an awful O(n^2) algorithm that searches previous instructions for
dead writes.
total instructions in shared programs: 805582 -> 788074 (-2.17%)
instructions in affected programs: 144561 -> 127053 (-12.11%)
Reviewed-by: Eric Anholt <eric@anholt.net>
That is, modify
mad dst, a, b, c
to be
mad dst.xyz, a, b, c
if dst.w is never read.
total instructions in shared programs: 811869 -> 805582 (-0.77%)
instructions in affected programs: 168287 -> 162000 (-3.74%)
Reviewed-by: Eric Anholt <eric@anholt.net>
A future patch adds support for removing dead writes to the flag
register. This patch simplifies the logic until then.
total instructions in shared programs: 811813 -> 811869 (0.01%)
instructions in affected programs: 3378 -> 3434 (1.66%)
Reviewed-by: Eric Anholt <eric@anholt.net>
To be consistent with the fs backend. Also the instruction scheduler
incorrectly considered SEL with a conditional modifier to read the flag
register.
Reviewed-by: Eric Anholt <eric@anholt.net>
The ARB_framebuffer_object spec lists this case before the
FRAMEBUFFER_INCOMPLETE_DRAW_BUFFER and
FRAMEBUFFER_INCOMPLETE_READ_BUFFER cases.
Fixes two broken cases in piglit's fbo-incomplete test, if
ARB_ES2_compatibility is not advertised. (If it is, this is masked
because the FRAMEBUFFER_INCOMPLETE_DRAW_BUFFER /
FRAMEBUFFER_INCOMPLETE_READ_BUFFER cases are removed by that extension)
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Brian Paul <brianp@vmware.com>
GL_INTENSITY has never been valid as a pixel format -- to get the memcpy
pack/unpack paths, the app needs to specify GL_RED as the pixel format
(or GL_RED_INTEGER for the integer formats).
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Brian Paul <brianp@vmware.com>
With shared glx contexts it is possible that a texture is create and used
in one context and then used in another one resulting in incorrect
sampler view usage.
v2: avoid template copy
v3: add XXX comment
Signed-off-by: Christian König <christian.koenig@amd.com>
Cc: "10.0 10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
This reverts commit 2919c3fdb4.
For formats like BGRX, looping through 0..num_components works fine.
But for formats like XRGB, we'd check the color mask for X and fail to
check it for B.
The SIMD16 replicated FB write message only works if we don't need the
color calculator to mask our framebuffer writes. Previously, we bailed
on it if color_mask wasn't <true, true, true, true>. However, this was
needlessly strict for formats with fewer than four components - only the
components that actually exist matter.
WebGL Aquarium attempts to clear a BGRX texture with the ColorMask set
to <true, true, true, false>. This will work perfectly fine with the
replicated data message; we just bailed unnecessarily.
Improves performance of WebGL Aquarium on Iris Pro (at 1920x1080) by
abound 40%, and Bay Trail (at 1366x768) by over 70% (using Chrome 24).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Tested-by: Dylan Baker <baker.dylan.c@gmail.com>
Currently, we don't use this path on Sandybridge because we suspect
other paths will be faster. But we potentially could. If we do, we
should allow it to support Y-tiled BLTs.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Tested on ILK and CTG (with the GL3isms taken out of the piglits).
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The conversion code for srgb was tuned for n x 4x8bit AoS -> 4 x nxfloat SoA
(and vice versa), fix this to handle also 16bit 565-style srgb formats.
Still not really all that generic, things like r10g10b10a2_srgb or
r4g4b4a4_srgb wouldn't work (the latter trivial to fix, the former would not
require more work to not crash but near certainly need some higher precision
calculation) but not needed right now.
The code is not fully optimized for this (could use more direct calculation
instead of expanding to 8-bit range first) but should be good enough.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
GL generally doesn't seem to allow srgb formats with less (or more) than 8 bit
for the rgb channels, though some hw could easily do it (typically for formats
with up to 10 bits for the rgb channels, at least for formats with less than 8
bits support is likely widespread even). While it may be true there aren't
really any benefits for such formats, we need for it for d3d, though luckily
only for b5g6r5_srgb it seems.
So add this format along with the util code for conversion - since that util
code is heavily tuned for 8bit srgb this isn't really all that well optimized
and rounding doesn't seem right but at least it should give some halfway
meaningful results.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The nvc0 texfetch instruction expects the sample id to be in the second
source (usually used for the offset) rather than as part of the texture
coordinate.
This fixes all the sampler2DMS/Array tests on nvc0.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Christoph Bumiller <e0425955@student.tuwien.ac.at>
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
This fallback to triangle strips is silly and should be done in drivers
if they need it.
This should fix the case when quad strips are used with flatshading that is
enabled by the "flat" GLSL varying modifier. It also fixes primitive restart
for quad strips.
This fixes piglit:
NV_primitive_restart/primitive-restart-draw-mode-quad_strip
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Brian Paul <brianp@vmware.com>
The last changes to it are from 2008 and 2009.
It doesn't support most texture formats and some texture targets.
Nobody can possibly be using this.
Reviewed-by: Brian Paul <brianp@vmware.com>
This code is a slightly modified version of evergreen_dma_blit (and
evergreen_dma_copy as well as evergreen_dma_copy_tile).
It would be nice to share some of the code in the long term.
I have reused some "cik"-prefixed functions that also return the right
value for SI. I am not sure if they should be renamed.
v2: Marek> removed gfx.flush in si_dma_copy_tile
Signed-off-by: Niels Ole Salscheider <niels_ole@salscheider-online.de>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
The AoS version of ld_build_blend_factor was assuming that if the first
channel was alpha, there were no rgb components.
Fixes glean/blendFunc on System z. No piglit regressions on x86_64.
The shortcut is still used in tests like spec/ARB_framebuffer_object/
fbo-alpha.
Signed-off-by: Richard Sandiford <rsandifo@linux.vnet.ibm.com>
libdl does not exist on many platforms which have dlopen in libc.
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
In the gallium code, the assert() macro could come from either the
system's assert.h file (via c11/threads.h) or from gallium's u_debug.h.
It looks like all known assert.h files unconditionally #undef assert
before defining their own version. So the assert you get depends on
whether threads.h or u_debug.h was included last.
In the gallium code we really want to use the assert() from u_debug.h
(it behaves better on Windows). In gallium, c11/threads.h is only
included after u_debug.h in the os_thread.h wrapper. So Adding
an #ifndef assert test in the threads*.h files avoids using the system's
assert().
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
EXT_packed_depth_stencil is supported by all drivers, but
ARB_depth_texture isn't (notably nouveau_vieux). This should avoid
passing unexpected values down to ChooseTextureFormat.
The EXT_packed_depth_stencil spec does not make any explicit references
to requiring ARB_depth_texture in order to allow textures with that
format, however if there is no dependency, ARB_depth_texture would be
practically implied by the extension.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "10.0 10.1" <mesa-stable@lists.freedesktop.org>
Note for 10.0 backport: This will produce a conflict, the solution is to
move the surrounding if as well.
In all uses of dotlike() we're writing generic code that operates on 1-4
component vectors. That our IR requires ir_binop_dot expressions'
operands to be 2+ component vectors is an implementation detail that's
not important when implementing built-in functions with dot(), which is
defined for scalar floats in GLSL.
Reviewed-by: Eric Anholt <eric@anholt.net>
ARB_gpu_shader5 and ES 3.0 expose different subsets of
ARB_shading_language_packing.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Volume 4, part 1 of the Ivybridge PRM says, "Generally, the EWA
approximation algorithm results in higher image quality than the legacy
algorithm." Using a classic anisotropic filtering "tunnel" demo, it
appears that there is *no* anisotropic filtering on IVB without this bit
set.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I meant to include this fixes in v3 of commit
de7ad2c88f, but accidentally pushed a
previous version.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Sadly, we can't use actual ALIGN(), since that only supports
power-of-two values for the alignment parameter.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Register sets depend on the particular hardware generation, but don't
depend on anything in the actual OpenGL context. Computing them is
fairly expensive, and they take up a large amount of memory. Putting
them in the screen allows us to compute/allocate them once for all
contexts, saving both time and space.
Improves the performance of a context creation/destruction
microbenchmark by about 3x on my Haswell i7-4750HQ.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This will allow us to use the screen as a memory context.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This one saves about 2MB peak allocation in glsl-fs-algebraic-add-add-1,
with no performance difference on timing short shader-db runs (n=9/10,
warmup outlier removed).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This isn't the GL API, so there's no reason to use GLboolean.
Using bool is safer: any non-zero value is treated as "true". When
converting a value to a GLboolean, all but the low byte is discarded,
which means that values like 256 will be incorrectly rendered as false.
Done via the following vim commands:
:%s/GLboolean/bool/g
:%s/GL_TRUE/true/g
:%s/GL_FALSE/false/g
and one line of manual whitespace tidying.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Ideally, we'd like to never even attempt the SIMD16 compile if we could
know ahead of time that it won't succeed---it's purely a waste of time.
This is especially important for state-based recompiles, which happen at
draw time.
The fragment shader compiler has a number of checks like:
if (dispatch_width == 16)
fail("...some reason...");
This patch introduces a new no16() function which replaces the above
pattern. In the SIMD8 compile, it sets a "SIMD16 will never work" flag.
Then, brw_wm_fs_emit can check that flag, skip the SIMD16 compile, and
issue a helpful performance warning if INTEL_DEBUG=perf is set. (In
SIMD16 mode, no16() calls fail(), for safety's sake.)
The great part is that this is not a heuristic---if the flag is set, we
know with 100% certainty that the SIMD16 compile would fail. (It might
fail anyway if we run out of registers, but it's always worth trying.)
v2: Fix missing va_end in early-return case (caught by Ilia Mirkin).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> [v1]
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> [v1]
Reviewed-by: Eric Anholt <eric@anholt.net>
This is just a matter of reusing the pull/push constant information set
up by the SIMD8 compile.
This gains us 78 SIMD16 programs in shader-db.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Now that we don't renumber uniform registers, assign_constant_locations
and move_uniform_array_access_to_pull_constants use the same names.
So, they can share a single copy of the pull_constant_loc[] array.
This simplifies the code considerably. assign_constant_locations()
doesn't need to walk through pull_params[] to rediscover reladdr
demotions; it just has that information in pull_constant_loc[]. We also
only need to rewrite the instruction stream once, instead of twice.
Even better, we now have a single array describing the layout of
all pull parameters, which we can pass to the SIMD16 program.
This actually hurts a few shaders in Serious Sam 3, and one in KWin:
total instructions in shared programs: 1841957 -> 1842035 (0.00%)
instructions in affected programs: 1165 -> 1243 (6.70%)
Comparing dump_instructions() before and after the pull constant
transformations with and without this patch, it appears that there is
a uniform array with variable indexing (reladdr) and constant indexing
(of array element 0). Previously, we uploaded array element 0 as both
a pull constant (for reladdr) /and/ a push constant.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, remove_dead_constants() would renumber the UNIFORM registers
to be sequential starting from zero, and the resulting register number
would be used directly as an index into the params[] array.
This renumbering made it difficult to collect and save information about
pull constant locations, since setup_pull_constants() and
move_uniform_array_access_to_pull_constants() used different names.
This patch generalizes setup_pull_constants() to decide whether each
uniform register should be a pull constant, push constant, or neither
(because it's unused). Then, it stores mappings from UNIFORM register
numbers to params[] or pull_params[] indices in the push_constant_loc
and pull_constant_loc arrays. (We already did this for pull constants.)
Then, assign_curb_setup() just needs to consult the push_constant_loc
array to get the real index into the params[] array.
This effectively folds all the remove_dead_constants() functionality
into assign_constant_locations(), while being less irritable to work
with.
v2: Add assert(remapped <= i), requested by Topi.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
move_uniform_array_access_to_pull_constants() and setup_pull_constants()
both have two parts:
1. Decide which UNIFORM registers to demote to pull constants, and
assign locations.
2. Mechanically rewrite the instruction stream to pull the uniform
value into a temporary VGRF and use that, eliminating the UNIFORM
file access.
In order to support pull constants in SIMD16 mode, we will need to make
decisions exactly once, but rewrite both instruction streams.
Separating these two tasks will make this easier.
This patch introduces a new helper, demote_pull_constants(), which
takes care of rewriting the instruction stream, in both cases.
For the moment, a single invocation of demote_pull_constants can't
safely handle both reladdr and non-reladdr tasks, since the two callers
still use different names for uniforms due to remove_dead_constants()
remapping of things. So, we get an ugly boolean parameter saying
which to do. This will go away.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
When demoting a variably indexed uniform array to pull constants, we
only recorded the location for the base of the array (element 0).
Recording locations for all array elements is a trivial amount of code
and will make subsequent refactoring easier.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, both move_uniform_array_access_to_pull_constants() and
setup_pull_constants() maintained stack-local arrays with this
information. Storing this information will allow it to be used from
multiple functions, allowing us to split and move code around.
We'll also eventually want to pass pull constant location information
to the SIMD16 compile. Saving this information will help us do that.
Unfortunately, the two functions *cannot* share the contents of the
array just yet. remove_dead_constants() renumbers all the UNIFORM
registers to be contiguous starting at zero, so the two functions
talk about uniforms using different names. We can't even remap them,
since move_uniform_array_access_to_pull_constants() deletes UNIFORM
registers that are only accessed with reladdr, so remove_dead_constants
can't even see them.
This situation will improve in the next few patches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
The SIMD8 compile will determine whether pull parameters are necessary.
If so, it will set prog_data->nr_pull_params to a value greater than 0.
brw_wm_fs_emit checks if nr_pull_params > 0 and skips the SIMD16 compile
altogether. So, this code should never occur.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
To emphasize that the result is floating point 1.0 or 0.0, to match
other opcodes like SLE and SEQ.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Handle mapping edgeflag data similar to the code around it.
This fixes a crash in piglit test gl-2.0-edgeflag.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Enable EGL_EXT_platform_base and the Linux platform extensions layered
atop it: EGL_EXT_platform_x11, EGL_EXT_platform_wayland,
and EGL_MESA_platform_gbm.
Tested with Piglit's EGL_EXT_platform_base tests under an X11 session.
To enable running the Wayland and GBM tests, windowed Weston was running
and the kernel had render nodes enabled.
I regression tested my EGL_EXT_platform_base patch set with Piglit on
Ivybridge under X11/EGL, standalone Weston, and GBM with rendernodes. No
regressions found.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
From the EGL_EXT_wayland_spec, version 3:
It is not valid to call eglCreatePlatformPixmapSurfaceEXT with a <dpy>
that belongs to Wayland. Any such call fails and generates
EGL_BAD_PARAMETER.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
From the EGL_MESA_platform_gbm spec, version 5:
It is not valid to call eglCreatePlatformPixmapSurfaceEXT with a <dpy>
that belongs to the GBM platform. Any such call fails and generates
EGL_BAD_PARAMETER.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Internally, much of the EGL code uses EGLNativeDisplayType,
EGLNativeWindowType, and EGLPixmapType. However, the EGLNative type
often does not match the variable's actual type.
The concept of EGLNative types are a bad match for Linux, as explained
below. And the EGL platform extensions don't use EGLNative types at all.
Those extensions attempt to solve cross-platform issues by moving the
EGL API away from the EGLNative types.
The core of the problem is that eglplatform.h can define each EGLNative
type once only, but Linux supports multiple EGL platforms.
To work around the problem, Mesa's eglplatform.h contains multiple
definitions of each EGLNative type, selected by feature macros. Mesa
expects EGL clients to set the feature macro approrpiately. But the
feature macros don't work when a single codebase must be built with
support for multiple EGL platforms, *such as Mesa itself*.
When building libEGL, autotools chooses the EGLNative typedefs based on
the first element of '--with-egl-platforms'. For example,
'--with-egl-platforms=x11,drm,wayland' defines the following:
typedef Display* EGLNativeDisplayType;
typedef Window EGLNativeWindowType;
typedef Pixmap EGLNativePixmapType;
Clearly, this doesn't work well for Wayland and GBM. Mesa works around
the problem by casting the EGLNative types to different things in
different files.
For sanity's sake, and to prepare for the EGL platform extensions, this
patch removes from egl/main and egl/dri2 all internal use of the
EGLNative types. It replaces them with 'void*' and checks each explicit
cast with a static assertion. Also, the patch touches egl_gallium the
minimal amount to keep it compatible with eglapi.h.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Add dri2_egl_display_vtbl::create_image, set it for each platform, and
let egl_dri2 dispatch eglCreateImageKHR to that.
To remove ambiguity, rename egl_dri2.c:dri2_create_image() to
dri2_create_image_from_dri().
This prepares for the EGL platform extensions.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
dri2_initialize_x11_swrast() does a strange thing. For some extensions
it doesn't support, it sets the corresponding functions in
_EGLDriver::API to NULL. The intention here is clear, but misplaced.
NULL or not, the function pointers never get called because their
extensions aren't supported.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Add dri2_egl_display_vtbl::create_wayland_buffer_from_image, set it for
each platform, and let egl_dri2 dispatch
eglCreateWaylandBufferFromImageWL to that.
This prepares for the EGL platform extensions.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
egl_dri2.c:dri2_terminate() handled terminating X11 and DRM displays.
The Wayland platform implemented its own dri2_wl_terminate(), which was
nearly a copy of the common one.
To implement the EGL platform extensions, we either need to dispatch
eglTerminate per display or define a common implementation for all
platforms. This patch chooses consolidation. It removes
dri2_wl_terminate() by folding it into the common dri2_terminate().
It was necessary to invert the `if (disp->PlatformDisplay == NULL)` and
the switch statement because, unlike DRM and X11, Wayland's terminator
performed action even when EGL didn't own the native display. In the
inversion, I replaced `disp->PlatformDisplay == NULL` with
`dri2_dpy->own_device` because the two expressions are synonymous, but
the latter's meaning is clearer.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
When the user calls eglGetDisplay(EGL_DEFAULT_DISPLAY), the Wayland and
DRM platforms set dri2_dpy->own_device=true. This patch makes the X11
platform do the same for consistency.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Add dri2_egl_display_vtbl::post_sub_buffer, set it for each
platform, and let egl_dri2 dispatch eglPostSubBufferNV to that.
This prepares for the EGL platform extensions.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Add dri2_egl_display_vtbl::swap_buffers_region, set it for each
platform, and let egl_dri2 dispatch eglSwapBuffersRegionNOK to that.
This prepares for the EGL platform extensions.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Add dri2_egl_display_vtbl::copy_buffers, set it for each
platform, and let egl_dri2 dispatch eglCopyBuffers to that.
This prepares for the EGL platform extensions.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Add dri2_egl_display_vtbl::query_buffer_age, set it for each
platform, and let egl_dri2 dispatch API.QueryBufferAge to that.
This prepares for the EGL platform extensions.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Add dri2_egl_display_vtbl::destroy_surface, set it for each
platform, and let egl_dri2 dispatch eglDestroySurface to that.
This prepares for the EGL platform extensions.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Add dri2_egl_display_vtbl::create_pbuffer_surface, set it for each
platform, and let egl_dri2 dispatch eglCreatePbufferSurface to that.
This prepares for the EGL platform extensions.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Add dri2_egl_display_vtbl::create_pbuffer_surface, set it for each
platform, and let egl_dri2 dispatch eglCreatePixmapSurface to that.
This prepares for the EGL platform extensions.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Add dri2_egl_display_vtbl::create_window_surface, set it for each
platform, and let egl_dri2 dispatch eglCreateWindowSurface to that.
This prepares for the EGL platform extensions.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Add dri2_egl_display_vtbl::swap_buffers_with_damage, set it for each
platform, and let egl_dri2 dispatch eglSwapBuffersWithDamageEXT to that.
This prepares for the EGL platform extensions.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Add dri2_egl_display_vtbl::swap_buffers, set it for each platform, and
let egl_dri2 dispatch eglSwapBuffers to that.
This prepares for the EGL platform extensions.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Add dri2_egl_display_vtbl::swap_interval, set it for each platform, and
let egl_dri2 dispatch eglSwapInterval to that.
This prepares for the EGL platform extensions.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Don't call it through the driver dispatch table. Just call it
statically.
This prepares for the EGL platform extensions.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Each of the egl_dri2 platforms (except Android) prefix their function
names with "dri2", not "dri2_${platform}". This means many function
names have three separate definitions in the egl_dri2 directory: one in
each of platform_drm.c, platform_wayland.c, and platform_x11.c. For
example, each of the three files defines dri2_create_window_surface().
The name collisions make it difficult to review patches for correctness
("Is this patch hunk calling a platform_x11 function or a global
egl_dri2 function?"), complicate debugging, and confuse code navigation
tools.
For each function in platform_x11.c prefixed with 'dri2', this patch
changes its prefix to 'dri2_x11'. Likewise for platform_drm.c and
'dri2_drm'; and platform_wayland.c and 'dri2_wl'.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
dri2_egl_display has only one virtual function, 'authenticate'. Define
dri2_egl_display::vtbl and move 'authenticate' there.
This prepares for the EGL platform extensions, which will add many
more virtual functions to dri2_egl_display.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
This pulls in EGL_EXT_platform_base, EGL_EXT_platform_wayland,
EGL_EXT_platform_x11, and EGL_MESA_platform_gbm.
This patch has a lot of churn because Khronos recently changed its
method of generating headers. Khronos now generates it headers from XML.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
This allows retrieving the existing BO and incrementing its reference count,
instead of creating a separate winsys representation for it, when the kernel
reports that the BO was already assigned a virtual memory address.
This fixes problems with XWayland using radeonsi and the
xf86-video-wlglamor driver, which calls GEM flink outside of the radeon
winsys code and creates BOs from the flinked names using the same DRM file
descriptor.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
install-gallium-links.mk fails to create the compat link for ilo_dri.so
because it looks for dri_LTLIBRARIES instead of noinst_LTLIBRARIES. Fix this
by switching to dri_LTLIBRARIES (and make the driver installable).
Since pci_id_driver_map.h and the DDX both tell libGL.so to look for "i965",
ilo_dri.so will never be loaded even enabled and installed. The change should
not create any more confusion.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
The GL 4.4 spec says it's not color-renderable and not accepted
by RenderBufferStorage. The EXT_texture_shared_exponent spec says
it's not color-renderable but it's accepted by RenderBufferStorageEXT.
This seems to be a bug in the extension spec.
Let's do what GL 4.4 says.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We can just check the polygon mode when updating the edge flag state.
Also, we can just flag ST_NEW_VERTEX_PROGRAM directly, which makes
ST_NEW_EDGEFLAGS_DATA useless.
Normally, nothing uses live intervals at this point, so this isn't
necessary. However, dump_instructions() calculates them and uses them
to show register pressure. So, calling dump_instructions() in this area
of the code would segfault due to the arrays being the wrong size.
This is not a candidate for stable branches because it only serves to
fix internal debugging code that you manually have to invoke by altering
the source code or using gdb.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Previously, dump_instruction() would print output such as:
{ 2} 3: mov vgrf1:F, u0:F
{ 3} 4: mov vgrf7:F, u0:F
{ 4} 5: mov vgrf8:F, u0:F
which looked like either a scalar access or perhaps a constant-indexed
access of element 0, when it was really a variable index.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
In commit e57d77280e, I fixed this for
destinations in the Vec4 backend, and sources in the scalar backend.
But not both types in both backends.
To prevent this mess from continuing, make the reg_encoding table
static, so only the disassembler can use it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
opt_saturate_propagation_local compares scan_inst->dst.reg/reg_offset
with inst->src[0].reg/reg_offset, and ensures that scan_inst->dst.file
is GRF. But nothing ensured that inst->src[0].file was GRF.
In the following program, this resulted in u1:F matching vgrf1:UW,
and a saturate being incorrectly propagated from instruction 8 to
instruction 1.
{ 1} 0: add vgrf0:UW, hw_reg1+8:UW, hw_reg0:V
{ 1} 1: add vgrf1:UW, hw_reg1+10:UW, hw_reg0:V
{ 1} 2: linterp vgrf6:F, hw_reg2:F, hw_reg3:F, hw_reg0:F
{ 2} 3: linterp vgrf27:F, hw_reg2:F, hw_reg3:F, hw_reg0+16:F
{ 4} 4: mov vgrf10+0.0:F, vgrf6:F
{ 3} 5: mov vgrf10+1.0:F, vgrf27:F
{ 6} 6: tex vgrf8+0.0:F, vgrf10+0.0:F
{ 5} 7: mov vgrf32:F, u1:F
{ 5} 8: mov.sat vgrf12:F, u1:F
From shader-db:
total instructions in shared programs: 1841932 -> 1841957 (0.00%)
instructions in affected programs: 5823 -> 5848 (0.43%)
I inspected two of the 25 hurt shaders, and concluded that they were
both hitting this bug, and not legitimately optimized.
This fixes bugs in Left 4 Dead 2 and Team Fortress 2, possibly among
others. The optimization pass didn't exist in 10.0, so this is only
a candidate for 10.1.
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
It turns out we can allow COHERENT storage/mappings all the time,
regardless of LLC vs non-LLC. It just means never using temporary
mappings to avoid GPU stalls, and on non-LLC we have to use the GTT intead
of CPU mappings. If we were to use CPU maps on non-LLC (which might be
useful if apps end up using buffer_storage on PBO reads, to avoid WC read
slowness), those would be PERSISTENT but not COHERENT, but doing that
would require us driving the clflushes from userspace somehow.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It looks like there's no big difference for write-only workloads, but
using a CPU map means that if they happen to read without having set the
MAP_READ_BIT, they get 100x the performance for those reads.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
While in expected usage patterns nobody will ever hit this path, doubling
our bandwidth used seems like a waste, and it cost us extra code too.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
On LLC, it should always be better to use a cached mapping than the GTT.
On non-LLC, it seems pretty silly to try to optimize read performance for
the INVALIDATE_RANGE_BIT case. This will make the buffer_storage logic
easier.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Similar to the other cases, shift some weight/coord calculations to int
space. This should be slightly faster (on x86 sse it should actually safe one
instruction, and generally int instructions are cheaper).
The previous code used coords which were calculated as
(int) (f_coord * tex_size * 256) >> 8.
This is not only unnecessarily complex but can give the wrong texel due to
rounding for negative coords (as an example, after denormalization coords
from -1.0 to 0.0 should give -1, but this will give -1 for numbers from
-1.0-1/256 - 0.0-1/256.
Instead, juse use ifloor, dropping the shift stuff.
Unfortunately, this will most likely be slower - with arch rounding available
it shouldn't be too bad (trades a int shift for a round but also saves an int
mul (which is shared by all coords) but otherwise it's a mess.
The previous method for converting coords to ints was sligthly inaccurate
(effectively losing 1bit from the 8bit lerp weight). This is probably
especially noticeable when trying to draw a pixel-aligned texture.
As an example, for a 100x100 texture after dernormalization the texture
coords in this case would turn up as
0.5, 1.5, 2.5, 3.5, 4.5, ...
After the mul by 256, conversion to int and 128 subtraction, they end up as
0, 256, 512, 768, 1024, ...
which gets us the correct coords/weights of
0/0, 1/0, 2/0, 3/0, 4/0, ...
But even LSB errors (which are unavoidable) in the input coords may cause
these coords/weights to be wrong, e.g. for a coord of 3.49999 we'd get a
coord/weight of 2/255 instead.
Fix this by using round-to-nearest int instead of FPToSi (trunc). Should be
equally fast on x86 sse though other archs probably suffer a little.
Constify the offsets parameter to silence gcc warning 'assignment
from incompatible pointer type' due to function prototype miss-match.
Use a boolean changed as a shorthand for target != current_target.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
There is little point of continuing if fread returns zero, as it
indicates that either the file is empty or cannot be read from.
Bail out if fread returns zero after closing the file.
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Core drm defines that the handle is of type int, while all drivers
treat it as uint internally. Typecast the value to silence gcc
warning messages and be consistent amongst all drivers.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Commit 3805a864b1d(nv50: assert before trying to out-of-bounds access
samplers) introduced a series of asserts as a precausion of a previous
illegal memory access.
Although it failed to encapsulate loop within nv50_sampler_state_delete
effectively failing to clear the sampler state, apart from exadurating
the illegal memory access issue.
Fixes gcc warning "array subscript is above array bounds" and
"Nesting level does not match indentation" and "Out-of-bounds read"
defects reported by Coverity.
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
v2: Indent according to Mesa style, reuse sbc instead of making a new
swap_count field, and actually get a usable back before returning the
age of the back (fixing updated piglit tests). Changes by anholt.
Signed-off-by: Adel Gadllah <adel.gadllah@gmail.com>
Reviewed-by: Robert Bragg <robert@sixbynine.org> (v1)
Reviewed-by: Adel Gadllah <adel.gadllah@gmail.com> (v2)
Reviewed-by: Eric Anholt <eric@anholt.net>
With the buffer_age code, I need to be able to potentially call this more
than once per frame, and it would be bad if a new special event showing up
meant I chose a different back mid-frame. Now, once we've chosen a back
for the frame, another find_back will choose it again since we know that
it won't have ->busy set until swap.
Note that this makes find_back return a buffer id instead of a backbuffer
index. That's kind of a silly distinction anyway, since it's an identity
mapping between the two (it's the front buffer that is at an offset).
Reviewed-By: Adel Gadllah <adel.gadllah@gmail.com>
This extension provides a way for an application to render to multiple
surfaces with different buffer formats without having to use multiple
contexts. An EGLContext can be created without an EGLConfig by passing
EGL_NO_CONFIG_MESA. In that case there are no restrictions on the surfaces
that can be used with the context apart from that they must be using the same
EGLDisplay.
_mesa_initialze_context can now take a NULL gl_config which will mark the
context as ‘configless’. It will memset the visual to zero in that case.
Previously the i965 and i915 drivers were explicitly creating a zeroed visual
whenever 0 is passed for the EGLConfig. Mesa needs to be aware that the
context is configless because it affects the initial value to use for
glDrawBuffer. The first time the context is bound it will set the initial
value for configless contexts depending on whether the framebuffer used is
double-buffered.
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
In eglCreateContext there is a check for whether the config parameter is zero
and in this case it will avoid reporting an error if the
EGL_KHR_surfacless_context extension is supported. However there is nothing in
that extension which says you can create a context without a config and Mesa
breaks if you try this so it is probably better to leave it reporting an
error.
The original check was added in b90a3e7d8b based on the API-specific
extensions EGL_KHR_surfaceless_opengl/gles1/gles2. This was later changed to
refer to EGL_KHR_surfacless_context in b50703aea5. Perhaps the original
extensions specified a configless context but the new one does not.
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Under GLES 3 it is not valid to pass GL_FRONT to glDrawBuffers. Instead,
GL_BACK has a magic interpretation which means it will render to the front
buffer on single-buffered contexts and the back buffer on double-buffered. We
were incorrectly setting the initial value to GL_FRONT for single-buffered
contexts. This probably doesn't really matter at the moment except that
presumably it would be exposed in the API via glGetIntegerv.
When we switch to configless contexts this is more important because in that
case we always want to rely on the magic interpretation of GL_BACK in order to
automatically switch between the front and back buffer when a new surface with
a different number of buffers is bound. We also do this for GLES 1 and 2
because the internal value doesn't matter in that case and it is convenient to
use the same code to have the magic interpretation of GL_BACK.
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
In GLES 3 it is not possible to select rendering to the front buffer and
instead selecting GL_BACK has the magic interpretation that it is either the
front buffer on single-buffered configs or the back buffer on double-buffered.
GLES 1 and 2 have no way of selecting the draw buffer at all. In that case we
were initialising the draw buffer to either GL_FRONT or GL_BACK depending on
the context's config and then leaving it at that.
When we switch to having configless contexts we ideally want Mesa to
automatically switch between the front and back buffer whenever a double-
or single-buffered surface is bound. To make this happen we can just allow
the magic behaviour from GLES 3 in GLES 1 and 2 as well. It shouldn't matter
what the internal value of the draw buffer is in GLES 1 and 2 because there
is no way to query it from the external API.
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Commit 6e8d04a caused a leak by allocating ctx->Debug but never freeing it.
Release the memory in _mesa_free_errors_data when destroying a context.
Use FREE to match CALLOC_STRUCT from _mesa_get_debug_state.
Reviewed-by: Brian Paul <brianp@vmware.com>
The few paths that were playing with framebuffers and renderbuffer were
saving and restoring them.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This was being applied in a subset of the places that
intel_prepare_render() was called, to set the same flag that
intel_prepare_render() was setting.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The flag wasn't getting updated correctly when the ctx->DrawBuffer or
ctx->ReadBuffer changed. It usually ended up working out because most
apps only have one window system framebuffer, or if they have more than
one and they have any front read/drawing, they will have called
glReadBuffer()/glDrawBuffer() on it when they get started on the new
buffer.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
It's the ctx->ReadBuffer that gets read from, not the ctx->DrawBuffer.
So, if you happened to have a ctx->ReadBuffer that was the winsys buffer,
and it had previously been intel_prepare_render()ed but not invalidated
since then, and you called glReadBuffer() to switch to front buffer
instead of back buffer reading on the winsys fbo while your drawbuffer was
a user FBO, you'd never get the front buffer's miptree fetched, and
segfault.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
I think these are all equivalent to vertex buffer fetches which should be
dword-aligned. Scalar loads are also dword-aligned.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Buffers are disabled by VGT_STRMOUT_BUFFER_CONFIG, but the query only works
if VGT_STRMOUT_CONFIG.STREAMOUT_0_EN is enabled.
This moves VGT_STRMOUT_CONFIG to its own state. The register is set to 1
if either streamout or the primitives-generated query is enabled.
However, the primitives-emitted query is also incremented, so it's disabled
by setting VGT_STRMOUT_BUFFER_SIZE to 0 when there is no buffer bound.
This fixes piglit:
ARB_transform_feedback2/counting with pause
EXT_transform_feedback/primgen-query transform-feedback-disabled
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
When doing fast clear for single-sample color buffers for the first time,
a CMASK buffer has to be allocated and the CMASK state in all pipe_surfaces
referencing the color buffer must be updated. Updating all surfaces is kinda
silly, so let's move the values to r600_texture instead.
This is only for Evergreen and later. R600-R700 don't have fast clear.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
This looks like r600g. The shared Cayman MSAA code is used here.
The real motivation for this is that I need the ability to change values
of color registers after the framebuffer state is set. The PM4 state cannot
be modified easily after it's generated. With this, I can just change
r600_surface::cb_color_xxx and set framebuffer.atom.dirty=true and it's done.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Fixes the following build error on OpenBSD:
./.libs/libglsl.a(builtin_functions.o)(.text+0x973): In function `mtx_lock':
../../include/c11/threads_posix.h:195: undefined reference to `pthread_mutex_lock'
./.libs/libglsl.a(builtin_functions.o)(.text+0x9a5): In function `mtx_unlock':
../../include/c11/threads_posix.h:248: undefined reference to `pthread_mutex_unlock'
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Brian Paul <brianp@vmware.com>
Static and shared builds were possible in the good old days
of static makefiles. Currently the build system does not
distinguish nor does anything special when one requests a
static build.
Print a warning message for the packager that static builds
are not supported and continue building shared libs.
Currently only Debian and derivatives use static build, and
they use it for building a Xlib powered libGL. This patch
will only change the warning message they are seeing but
the binaries produced will be identical.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Jon TURNEY <jon.turney@dronecode.org.uk>
- As of commit cb080a10b68(configure.ac: Don't require shared LLVM when
building OpenCL) opencl does not mandate using shared llvm.
- Add a warning message that building with static llvm may cause
compilation problems.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
- xf86dri.h is the old dri1 header, not required by dri2 nor dri3
- fold xf86drm.h inclusiong inside dri2.h
- dri3_glx does not have any drm specific dependencies
- glapi.h is not required by the dri2 and dri3 codepaths
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Recent commit fixed build issues in dri2_query_renderer.c by
wrapping in defined(direct_rendering) && !defined(applegl)
This patch targets the query_renderer tests, so that make check
passes on platforms such as hurd and cygwin.
v2: (Emil)
- Rebase and update commit message.
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Some platforms different library extension - dll, dylib, a.
Honor that when we are creating the required links.
Rename LIB_EXTENSION to LIB_EXT while we're here.
With libglapi linking aside, building classic drivers on
non-linux platforms should be possible now.
v2: Resolve conflicts.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Jon TURNEY <jon.turney@dronecode.org.uk>
In the cases where one links against the static glapi.la there
is no need to create temporary variables only to explicitly
link agaist it.
Instead use SHARED_GLAPI_LIB to explicitly indicate when one
is building and linking with the shared glapi provider.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Jon TURNEY <jon.turney@dronecode.org.uk>
All the variables were used before the automake conversion
and do not make sense (nor are used) currently.
Replace GL_LIB_NAME with lib$(GL_LIB).$(LIB_EXTENSION) for
apple-glx. The build has been broken for ages, but this will
ease the recovery process as it happens.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Jon TURNEY <jon.turney@dronecode.org.uk>
All three (xvmc and omx) do not have an alternative loading
similar to the dri modules. Thus one needs to explicitly install
them in order to use/test them.
v2:
- Keep vdpau targets, as an equivalent of LIBGL_DRIVERS_PATH
is being worked on.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Jon TURNEY <jon.turney@dronecode.org.uk>
This helper script will be used to minimise the duplication
during link generation across all gallium targets.
v2:
- Handle vdpau_LTLIBRARIES. Requested by Christian König.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Jon TURNEY <jon.turney@dronecode.org.uk>
There is little point in echoing everything that the script does
to stdout. Wrap it in AM_V_GEN so that a reasonable message is
printed as a indication of it's invocation.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Jon TURNEY <jon.turney@dronecode.org.uk>
There is little gain in printing whenever a folder is created.
v2:
- Use $(AM_V_at) over @ to have control in verbose builds.
Suggested by Erik Faye-Lund.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Jon TURNEY <jon.turney@dronecode.org.uk>
D3D10 allows setting of the internal offset of a buffer, which is
in general only incremented via actual stream output writes. By
allowing setting of the internal offset draw_auto is capable
of rendering from buffers which have not been actually streamed
out to. Our interface didn't allow. This change functionally
shouldn't make any difference to OpenGL where instead of an
append_bitmask you just get a real array where -1 means append
(like in D3D) and 0 means do not append.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The MESA_FORMAT_x enums in formats.h weren't declared in any sort
of reasonable order. Now it should be a little more logical.
This also required reordering tables in formats.c and s_texfetch.c
Reviewed-by: Michel Dänzer <michel@daenzer.net>
Acked-by: Eric Anholt <eric@anholt.net>
There's no real reason to list all the formats in the comments.
Reviewed-by: Michel Dänzer <michel@daenzer.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Removes unnecessary MOV instructions in L4D2, TF2, Dota2, and many other
Steam games.
total instructions in shared programs: 1668126 -> 1657509 (-0.64%)
instructions in affected programs: 242235 -> 231618 (-4.38%)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This keeps us from needing to reemit all the other stage state just
because a surface changed.
Improves unoptimized glamor x11perf -f8text by 1.10201% +/- 0.489869%
(n=296). [v1]
v2:
- Drop binding table packets from Gen8 unit state as well.
- Pass _3DSTATE_BINDING_TABLE_POINTERS_XS to brw_upload_binding_table,
cutting even more code.
v3: Don't forget to drop them from 3DSTATE_GS (botched refactor in v2).
Signed-off-by: Eric Anholt <eric@anholt.net> [v1]
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> [v1]
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> [v2, v3]
Reviewed-by: Eric Anholt <eric@anholt.net> [v3]
This makes both the empty and non-empty binding table paths exit through
the bottom of the function, which gives us a place to share code.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Explicitly add radeon_drm_winsys_create and nouveau_drm_screen_create to
the dynamic list. This will ensure vdpau interop still works even when
the user links with -Bsymbolic-functions in hardened builds.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Tested-by: Rachel Greenham <rachel@strangenoises.org>
Reported-by: Peter Frühberger <peter.fruehberger@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
This reverts most of commit d80f0c34b7. Upon a
closer reading, having the presumed offsets written is not enough to set the
flag. EXEC_OBJECT_NEEDS_GTT and/or EXEC_OBJECT_WRITE of the reloc entries
must also be set appropriately.
Rename
intel_winsys_check_aperture_size() to intel_winsys_can_submit_bo(),
intel_bo_exec() to intel_winsys_submit_bo(), and
intel_winsys_decode_commands() to intel_winsys_decode_bo().
Make a semantic change to ignore intel_context when the ring is not the render
ring.
The only alloc flag is INTEL_ALLOC_FOR_RENDER, which can as well be expressed
by specifying the initial write domain. The change makes it obvious that we
failed to set INTEL_ALLOC_FOR_RENDER in several places.
Rename
intel_bo_emit_reloc() to intel_bo_add_reloc(),
intel_bo_clear_relocs() to intel_bo_truncate_relocs(), and
intel_bo_references() to intel_bo_has_reloc().
Besides, we need intel_bo_get_offset() only to get the presumed offset afer
adding a reloc entry. Remove the function and make intel_bo_add_reloc()
return the presumed offset. While at it, switch to gem_bo->offset64 from
gem_bo->offset.
Patch adds a remap table for uniforms that is used to provide a mapping
from application specified uniform location to actual location in the
UniformStorage. Existing UniformLocationBaseScale usage is removed as
table can be used to set sequential values for array uniform elements.
This mapping helps to implement GL_ARB_explicit_uniform_location so that
uniforms locations can be reorganized and handled in a more easy manner.
v2: small fixes + rename parameters for merge and split functions (Ian)
improve documentation, remove old check for location bounds (Eric)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Nothing uses this structure, removal fixes Klocwork error about
the possible oom condition in _mesa_symbol_table_iterator_ctor.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This logic is borrowed from the radeon code. The transfer logic will
only get called for PIPE_BUFFER resources, so it shouldn't be necessary
to worry about them becoming render targets.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Christoph Bumiller <e0425955@student.tuwien.ac.at>
It's only used in this one file as far as I can tell, and exporting a
symbol named 'devices' from a shared library is a recipe for trouble.
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
When building out of tree, the file ends up dangling which
may result in a binary with the old git sha.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
That structure member is a pointer, so the loop with
the Elements macro only freed up the first entry.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
After preprocessing by glcpp all adjacent spaces were replaced by
single one and glsl parser received column-shifted shader source.
It negatively affected ast location set up and produced wrong error
messages for heavily-spaced shaders.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch fixes this SCons build error introduced with commit
70e7905608.
build/linux-x86_64-debug/mesa/libmesa.a(driverfuncs.os): In function `_mesa_init_driver_functions':
src/mesa/drivers/common/driverfuncs.c:99: undefined reference to `_mesa_meta_GenerateMipmap'
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
I don't know how many people care about this case, but it's easy enough
to do, so we may as well. The tricky part is that for some reason Mesa
stores the number of array slices in Height, not Depth.
I thought the easiest way to handle that here was to make Height = 1
(the actual height), and srcDepth = srcImage->Height. This requires
some munging when calling _mesa_prepare_mipmap_level, so I created a
wrapper that sorts it out for us.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This is largely a matter of looping over the number of slices/layers,
and not minifying depth (presumably that code exists for the unfinished
3D texture support).
Normally, I would have made the loop over array slices the outermost
loop. I suspect that would make it trickier to support 3D textures
someday, though, so I didn't. The advantage is that we would only have
one BufferData call per slice, rather than one per miplevel and slice.
However, a GenerateMipmaps microbenchmark indicates that either way is
basically just as fast. So I'm not sure it's worth bothering.
Improves performance in a GenerateMipmaps microbenchmark by nearly 5x.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Almost the exact same code appeared twice, and it needs to expand to
handle additional texture targets. Refactor it to tidy up the code and
avoid duplicating more work in the future.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
KHR_debug
Also update dispatch sanity removing ARB_debug_output checks and
removing KHR_debug placeholders as the checks have already been added
V2: Make sure we exit case statements with conditional breaks rather than
just dropping through.
Signed-off-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Brian Paul <brianp@vmware.com>
This fixes the a build breakage caused by
6974eb9076 on build configurations where
all the following are true:
1. radeonsi is not being built
2. r600g is being built
3. opencl is disabled
4. --enable-r600-llvm-compiler is not being used
5. libelf is not installed
v2:
- Add $(RADEON_CFLAGS) to libllvmradeon_la_CFLAGS
Tested-by: Brian Paul <brianp@vmware.com>
And move its definition into r600_pipe_common.h; This struct is a just
a container for shader code and has nothing to do with LLVM.
v2:
- Drop unrelated Makefile change
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
A user would have no idea what "_glthread_" is. This removes the
last remaining instance of the _glthread_ string in Mesa.
Reviewed-by: Chia-I Wu <olv@lunarg.com>
To check that the st_mesa_format_to_pipe_format() and
st_pipe_format_to_mesa_format() functions correctly convert
all corresponding Mesa/Gallium formats.
This found that MESA_FORMAT_YCBCR_REV was missing in
st_mesa_format_to_pipe_format(). Fixed that too.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
v2: rotate in gen_rect_verts instead
v3: clear rotate in vl_compositor_clear_layers,
update calc_drawn_area as well
Signed-off-by: Kusanagi Kouichi <slash@ac.auone-net.jp>
Signed-off-by: Christian König <christian.koenig@amd.com>
It's never used, and it's equivalent to ralloc_parent(ht) if you really
need it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
To match PIPE_FORMAT_R8G8B8A8_SRGB.
v2: fix component name copy&paste bugs
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
We've had several problems now with FinishRenderTexture not getting called
enough, and we're ready to just give up on it ever doing what we need. In
particular, an upcoming Steam title had rendering bugs that could be fixed
by always_flush_cache=true.
Instead of hoping Mesa core can figure out when we need to flush our
caches, just track what BOs we've rendered to in a set, and when we render
from a BO in that set, emit a flush and clear the set.
There's some overhead to keeping this set, but most of that is just
hashing the pointer -- it turns out our set never even gets very large,
because cache flushes are so common (even on cairo-gl).
No statistically significant performance difference in cairo-gl (n=100),
despite spending ~.5% CPU in these set operations.
v1: (Original patch by Eric Anholt.)
v2: (Changes by Ken Graunke.)
- Rebase forward from May 7th 2013 -> March 4th 2014.
- Drop the FinishRenderTexture hook entirely; after rebasing the
patch, the hook was just an empty function.
- Move the brw_render_cache_set_clear() call from
intel_batchbuffer_emit_flush() to brw_emit_pipe_control_flush().
In theory, this could catch more cases where we've flushed.
- Consider stencil as a possible texturing source.
v3: (changes by anholt):
- Move set_clear() back to emit_mi_flush() -- it means we can drop
more forced flushes from the code. In the previous location, it
wouldn't have been called when we wanted pre-gen6.
- Move the set clear from batch init to reset -- it should be empty at
the start of every batch, since the kernel handled any inter-batch
flush for us.
v4: Drop the debug code in set.c that I accidentally committed.
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Dylan Baker <baker.dylan.c@gmail.com> [v2]
With a non-debug build, gcc has two complaints:
1. 'found' var not used. Silence with '(void) found;'
2. 'id' not initialized. It's assigned by the UniformHash->get()
call, actually. But init it to zero to silence gcc.
Reviewed-by: Matt Turner <mattst88@gmail.com>
sizeof(scissor) returns the size of the full array rather than a single
element. Fix it to consider just the one element.
Fixes: 0705fa35 ("st/mesa: add support for GL_ARB_viewport_array (v0.2)")
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
The texture formats of winsys fbo are always linear becase the st manager
(st/dri for example) could not know the colorspace used. But it does not mean
that we cannot make the fbo sRGB-capable. By
- setting rb->Visual.sRGBCapable to GL_TRUE when the pipe driver supports the
format in sRGB colorspace,
- giving rb an sRGB internal format, and
- updating code to check rb->Format instead of strb->texture->format,
we should be good.
Fixed bug 75226 for at least llvmpipe and ilo, with no piglit regression.
v2: do not set rb->Visual.sRGBCapable for GLES contexts to avoid surprises
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75226
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
We need the header setup to not be predicated on which pixels are
undiscarded. I'm not sure originally if I had thought that the mask
disable implied predicate disable, or if I had just misread the mask
disable as predicate disable. Either way, I know I had spent more time
thinking about this in the gen8 generator than the gen7 generator.
Plus, it turns out that I had mis-implemented the "the GPU will use the
predicate unless this header is present" comment, by skipping setting up
the pixel mask when the header was present.
Fixes GPU hangs in piglit glsl-fs-discard-mrt, Trine, Trine 2 and
preusmably MLL.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75207
Tested-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It was really only used in the radeon driver for a debug printf.
And evidently, libGL.so referenced it just to work around some sort
of linker issue.
This patch removes the two calls to the function and the function
itself.
Fixes undefined _glthread_GetID symbol in libGL reported by 'nm'.
Though, the missing symbol doesn't cause any issues on my system but
it does cause glxinfo to fail on one of our test systems.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Before, it was kind of ugly to set the multisample fields with
assignments after we called _mesa_init_teximage_fields().
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
If scissor optimization is used (to avoid bringing scissored portions of
the render target into GMEM and then back out to system memory) in
combination with hw binning pass, the result would be a scissor mismatch
between binning pass and rendering pass. This would cause rendering
bugs in some scenarios with (for example) gnome-shell.
I would have expected that simply using the correct screen-scissor
during the binning pass would be enough, but seems like there is
something else missing. So for now disable binning pass if scissor
optimization is used.
Most of the logic refers to the local variable 'mt' directly but
a few cases use 'intelObj->mt' instead. These are the same for
now but will be different once stencil miptree gets used.
v2 (Ian): fixed also indentation in surrounding lines
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
On earlier hardware, we had to implement math in the shader to translate
Y-tiled or untiled coordinates to W-tiled coordinates (which is what
BLORP does today in order to texture from stencil buffers).
On Broadwell, we can simply state that it's W-tiled in SURFACE_STATE,
and adjust the pitch. This is much easier.
In the surface state code, I chose to handle the "should we sample depth
or stencil?" question separately from the setup for sampling from
stencil. This should make it work with the BindRenderbufferTexImage
hook as well, and hopefully be reusable for GL_ARB_texture_stencil8
someday.
v2: Update docs/GL3.txt (caught by Matt).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
While the GL_ARB_stencil_texturing extension does not allow the creation
of stencil textures, it does allow shaders to sample stencil values
stored in packed depth/stencil textures.
Specifically, applications can call glTexParameter* with a pname of
GL_DEPTH_STENCIL_TEXTURE_MODE and value of either GL_DEPTH_COMPONENT or
GL_STENCIL_INDEX to select which component they wish to sample. The
default value is GL_DEPTH_COMPONENT (for traditional depth sampling).
Shaders should use an unsigned integer sampler (presumably usampler2D)
to access stencil data. Otherwise, results are undefined. Using shadow
samplers with GL_STENCIL_INDEX selected also is undefined behavior.
This patch creates a new gl_texture_object field, StencilSampling, to
indicate that stencil should be sampled rather than depth. (I chose to
use a boolean since I figured it would be more convenient for drivers.)
It also introduces the [Get]TexParameter code to get and set the value,
and of course the extension plumbing.
v2: Also consider textures incomplete when sampling stencil with
non-NEAREST min/mag filters (caught by Eric Anholt).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
The loader infrastructure for everything but DRI2 requires that udev be
present, so we can figure out an appropriate driver from the fd. We don't
have a portable solution yet, but presumably it will have similar lookup
based on the device node.
It will also be even more required for krh's udev-based hwdb support,
which lets us have a loader that actually loads DRI drivers not included
in the loader's source distribution.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75212
Reviewed-by: Matt Turner <mattst88@gmail.com>
Because in draw we always inject position at slot 0 whenever
fragment shader would take the maximum number of inputs (32) it
meant that we had PIPE_MAX_ATTRIBS + 1 slots to translate, which
meant that we were crashing with fragment shaders that took
the maximum number of attributes as inputs. The actual max number
of attributes we need to translate thus is PIPE_MAX_ATTRIBS + 1.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Matthew McClure <mcclurem@vmware.com>
draw_current_shader_* functions return a final output when considering
both the geometry shader and the vertex shader. But when code generating
vertex shader we can not be using output slots from the geometry shader
because, obviously, those can be completely different. This fixes a
number of very non-obvious crashes.
A side-effect of this bug was that sometimes the vertex shading code
could save some random outputs as position/clip when the geometry
shader was writing them and vertex shader had different outputs at
those slots (sometimes writing garbage and sometimes something correct).
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Matthew McClure <mcclurem@vmware.com>
From OpenGL 3.3 spec, page 141:
"Textures with a base internal format of DEPTH_COMPONENT or DEPTH_STENCIL
require either depth component data or depth/stencil component data.
Textures with other base internal formats require RGBA component data.
The error INVALID_OPERATION is generated if one of the base internal
format and format is DEPTH_COMPONENT or DEPTH_STENCIL, and the other
is neither of these values."
Fixes Khronos OpenGL CTS test failure: proxy_textures_invalid_size
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
From OpenGL 4.0 spec, page 398:
"The initial internal format of a texel array is RGBA
instead of 1. TEXTURE_COMPONENTS is deprecated; always
use TEXTURE_INTERNAL_FORMAT."
Fixes Khronos OpenGL CTS test failure: proxy_textures_invalid_size
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Starting with llvm-3.5svn r202574, LLVM expects C+11 mode.
commit f8bc17fadc8f170c1126328d203f0dab78960137
Author: Chandler Carruth <chandlerc@gmail.com>
Date: Sat Mar 1 06:31:00 2014 +0000
[C++11] Turn off compiler-based detection of R-value references, relying
on the fact that we now build in C++11 mode with modern compilers. This
should flush out any issues. If the build bots are happy with this, I'll
GC all the code for coping without R-value references.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@202574 91177308-0d34-0410-b5e6-96231b3b80d8
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
`--enable-llvm-shared-libs` option was recently renamed as
`--with-llvm-shared-libs`, but several error messages still mention the
old option, causing confusing.
Trivial.
GetCurrentThread() returns a pseudo-handle (a constant which only makes
sense when used within the calling thread) and not a real handle.
DuplicateHandle() will return a real handle, but it will create a new
handle every time we call. Calling DuplicateHandle() here means we will
leak handles, which can cause serious problems.
In short, the Windows implementation of thrd_t needs a thorough make
over, and it won't be pretty. It looks like C11 committee
over-simplified things: it would be much better to have seperate objects
for threads and thread IDs like C++11 does.
For now, just comment out the thrd_current() implementation, so we get
build errors if anybody tries to use it.
Thanks to Brian Paul for spotting and diagnosing this problem.
Cc: "10.0" "10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
u_thread_self() expects thrd_current() to return a unique numeric ID
for the current thread, but this is not feasible on Windows.
Cc: "10.0" "10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
r600_translate_colorformat is rewritten to look like radeonsi.
r600_translate_colorswap is shared with radeonsi.
r600_colorformat_endian_swap is consolidated.
This adds some formats which were missing. Future "plain" formats will
automatically be supported.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
With commit 0432aa064b(configure: use shared-glapi when more than one
gl* API is used) we removed "disable shared-glapi when building without
dri" hunk.
In the good old days of classic mesa, dri and xlib-glx were mutually
exclusive thus the hunk made sense.
Currently enable-dri is used as a synonym for a range of things thus
it's more appropriate to handle xlib-glx explicitly.
Fixes a missing symbol '_glapi_Dispatch' in a xlib powered libGL,
build using the following
./autogen.sh --enable-xlib-glx --disable-dri --with-gallium-drivers=swrast
Cc: Brian Paul <brianp@vmware.com>
Reported-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
The _glthread_LOCK/UNLOCK_MUTEX() macros are just wrappers around
the c11 mutex functions. Let's start getting rid of those wrappers.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Update the comments for the packed formats to accurately reflect the
layout of the bits in the pixel. For example, for the packed format
MESA_FORMAT_R8G8B8A8, R is in the least significant position while A
is in the most-significant position of the 32-bit word.
v2: also fix MESA_FORMAT_A1B5G5R5_UNORM, per Roland.
The spec incorrectly used void as return type, when it should have
been GLboolean. This has now been fixed. According to Nvidia, their
implementation always used GLboolean.
Reviewed-by: Christian König <christian.koenig@amd.com>
This is just a simple implementation that stores the extra values into the DRIimage
struct and just uses the fd importer. I haven't looked into what is required
to import YUV or deal with the extra parameters.
Signed-off-by: Dave Airlie <airlied@redhat.com>
i965 recently moved debug printfs to use stderr, including ones which
trigger on MESA_GLSL=dump. This resulted in scrambled output.
For drivers using ir_to_mesa, print_program was already using stderr,
yet all the code around it was using stdout.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
opt_algebraic was translating lrp(x, 0, a) into add(x, -mul(x, a)).
Unfortunately, this references "x" twice, which is invalid in the IR,
leading to assertion failures in the validator.
Normally, cloning IR solves this. However, "x" could actually be an
arbitrary expression tree, so copying it could result in huge piles
of wasted computation. This is why we avoid reusing subexpressions.
Instead, transform it into mul(x, add(1.0, -a)), which is equivalent
but doesn't need two references to "x".
Fixes a regression since d5fa8a9562, which isn't in any stable
branches. Fixes 18 shaders in shader-db (bastion and yofrankie).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The logic to count number of block outputs was out of sync with the
actual array construction. But to simplify / make things less fragile,
we can just allocate the arrays for worst case size.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
A value may be assigned on only one side of an if/else. In this case we
can simply substitute a mov.f32f32.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Add option to generate fragment shader to emulate two sided color.
Additional inputs are added to shader for BCOLOR's (on corresponding to
each COLOR input). CMP instructions are used to select whether to use
COLOR or BCOLOR.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
If vertex writes pointsize, there are a few extra bits we need to turn
on in the cmdstream here and there.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Now that we have the infrastructure for shader variants, add support to
generate an optimized shader for hw binning pass (with varyings/outputs
other than position/pointsize removed). This exposes the possibility
that the shader uses fewer constants than what is bound, so we have to
take care to not emit consts beyond what the shader uses, lest we
provoke the wrath of the HLSQ lockup!
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Fixes anything that tries to use gl_FrontFacing/gl_FragCoord. Also,
face support is needed to emulate two sided color.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
An unused input might not have a register assigned. We don't want bogus
regid to result in impossibly high max_reg..
Signed-off-by: Rob Clark <robclark@freedesktop.org>
BRW_MAX_TEX_UNIT is the static limit on the number of textures we
support per-stage, not in total.
Core's `Unit` array is sized by MAX_COMBINED_TEXTURE_IMAGE_UNITS, which
is significantly larger, and across the various shader stages, up to
ctx->Const.MaxCombinedTextureImageUnits elements of it may be actually
used.
Fixes invisible bad behavior in piglit's max-samplers test (although
this escalated to an assertion failure on HSW with texture_view, since
non-immutable textures only have _Format set by validation.)
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Cc: "9.2 10.0 10.1" <mesa-stable@lists.freedesktop.org>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes glGetTexImage() when converting from MESA_FORMAT_Z32_FLOAT_S8X24_UINT
to GL_UNSIGNED_INT_24_8. Hit by the piglit
ext_packed_depth_stencil-getteximage test.
Cc: "10.0" "10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Though it won't matter on Linux, use _mesa_align_free to release it.
Since i965 doesn't have sys_buffer, I overlooked this in the
GL_ARB_map_buffer_alignment work a few months ago. Fixes i915 (and
presumably i830) regressions in ARB_map_buffer_range tests and the
failure in arb_map_buffer_alignment-sanity_test.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=74960
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
There's no reason to have more vertex texture units than fragment
texture units on this hardware. Since increasing the default maximum
number of texture units from 16 to 32, this has triggered some segfault
in i915 driver. There's probably some array or bitfield that isn't
properly sized now. This really papers over the bug, but I don't think
I'll lose any sleep over that.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=74071
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
v2: vec4_visitor::pack_uniform_registers(): Use correct comparison in the
assert, this->uniforms is already adjusted. Compare the actual value used to
index uniform_size and uniform_vector_size instead.
Signed-off-by: Petri Latvala <petri.latvala@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
v2: Don't add function parameters, pass the required size in
prog_data->nr_params.
v3:
- Use the name uniform_array_size instead of uniform_param_count.
- Round up when dividing param_count by 4.
- Use MAX2() instead of taking the maximum by hand.
- Don't crash if prog_data passed to vec4_visitor constructor is NULL
v4: Rebase for current master
v5 (idr): Trivial whitespace change.
Signed-off-by: Petri Latvala <petri.latvala@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=71254
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
GLX can be either dri or xlib based, while enable_dri is
used in a variety of contexts.
With enable_dri_glx the context is clearly visible.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
All hardware drivers including the virtual vmwgfx require
the drm pipe-loader in order to be properly loaded by xa,
gbm and opencl.
Note this does _not_ add support for the above three it only
allows the pipe driver to be loaded by the library.
Eg. GBM will now properly open the pipe-i915 driver, should
one be working on the such hardware.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75453
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Building to provide accelration using swrast does not make
sense.
Note: update your build script to explicitly mention svga
in the gallium drivers list, if you are building the vmwgfx
xa library.
v2: Update error message to provide more clarify, add an example.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Recent patch converted our logic to use test -n and test -z.
An emptry string variable (empty_str="") return true for both
thus making the check unreliable.
Fix this by correctly setting the variable when applicable.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
The issue is caused by a thinko that an empty string will be
considered of zero length by 'test'. This is not the case,
thus we were building the 'core' of megadrivers even when no
classic drivers were built.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
glGetTexImage(GL_DEPTH_STENCIL, GL_UNSIGNED_INT_24_8) was just
using memcpy() instead of _mesa_unpack_uint_24_8_depth_stencil_row()
to convert texels from the hardware format to the GL format.
Fixes issue reported by David Meng at Intel. The new piglit
ext_packed_depth_stencil-getteximage test checks for this bug.
Also, add some format/type assertions. We don't yet handle the
GL_FLOAT_32_UNSIGNED_INT_24_8_REV type. That should be fixed in
a follow-on patch.
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: "10.0" "10.1" <mesa-stable@lists.freedesktop.org>
API is always API_OPENGL_COMPAT (since commit 4e4a537ad5,
"meta: Push into desktop GL mode when doing meta operations."),
so most of these checks do nothing.
We could instead check save->API to only bother setting/restoring
relevant GL state, but I'm not sure saving a few _mesa_set_enable
calls is worth the complexity. My understanding is the point of
the ctx->API guards was to avoid raising GL errors.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
In _mesa_meta_begin(), we switch to API_OPENGL_COMPAT, then munge a lot
of state (including some that doesn't exist in the actual API - like
PolygonStipple in API_OPENGL_CORE).
It seems reasonable that in _mesa_meta_end(), we should restore it,
then switch back to the original API. This at least makes it symmetric.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Some formats can't be handled - in particular cannot handle ints/uints formats,
which lack the pack_rgba_float/unpack_rgba_float functions. Instead of trying
to call these (and crash) return an error (I'm not sure yet if we should try
to translate such formats too here might not make much sense).
v2: suggested by Jose, use separate checks for pack/unpack of rgba_8unorm and
rgba_float functions (right now if one exists the other should as well).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
There are currently only two VUE map layouts: one for Gen4-5, and one
for everything else. We keep having to add new "case N+1" labels for
every new hardware generation, and so far it's always been the same.
This patch makes it so we only have to do work in the case where
something actually changes.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
According to the BSpec's 3D workarounds page, this is unnecessary on
shipping Haswell hardware, and was never necessary on Broadwell. It
unfortunately doesn't say anything about Baytrail.
The workaround database confirms those results for Ivybridge, Haswell,
and Broadwell. Baytrail is less clear - one page says it's necessary,
while the other says it isn't. For now, be conservative and leave it
enabled.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This makes it easy to compare output between different cards, especially
for ones that you don't have (and/or not in the current machine).
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
This should pave the way to being able to use the compiler without a
context. Also leads to cleaner code.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
By adding "#define gvec4 %svec4" to the top of our fragment shader, we
can write generic code without needing to specialize it to vec4, ivec4,
or uvec4 via asprintf.
This also makes the INT and UNSIGNED_INT merge function code identical,
so I combined those two cases.
It's not a big savings, but a little bit tidier.
v2: Rebase on Vinson's MSVC build fixes.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This fixes fbo-clear-formats GL_ARB_depth_texture on Ironlake, which
regressed since commit f128bcc7c2
("i965: Drop mt->levels[].width/height.") intel_miptree_copy_slice was
calling minify(.., 7) on a 2x2 texture with mt->first_level == 7.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75292
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Tt's kind of a trap---calling do_common_optimization() after
lower_instructions() may cause opt_algebraic() to reintroduce
ir_triop_lrp expressions that were lowered, effectively defeating the
point. Because of this, nobody uses it.
v2: Delete more code (caught by Ian Romanick).
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Eric Anholt <eric@anholt.net>
When the vec4 backend encountered an ir_triop_lrp, it always emitted an
actual LRP instruction, which only exists on Gen6+. Gen4-5 used
lower_instructions() to decompose ir_triop_lrp at the IR level.
Since commit 8d37e9915a ("glsl: Optimize open-coded lrp into lrp."),
we've had an bug where lower_instructions translates ir_triop_lrp into
arithmetic, but opt_algebraic reassembles it back into a lrp.
To avoid this ordering concern, just handle ir_triop_lrp in the backend.
The FS backend already does this, so we may as well do likewise.
v2: Add a comment reminding us that we could emit better assembly if we
implemented the infrastructure necessary to support using MAC.
(Assembly code provided by Eric Anholt).
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75253
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Eric Anholt <eric@anholt.net>
Similar to u_blitter, u_upload_mgr is now a client of the pipe context. Its
creation needs to be delayed until the context has been (almost) initialized.
The sort priorites for GLX_SAMPLES and GLX_SAMPLE_BUFFERS are
not defined in GL_ARB_multisample, but they are defined in
the GLX 1.4 specification.
Cc: "9.2 10.0 10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The default values for GLX_DRAWABLE_TYPE and GLX_RENDER_TYPE are
GLX_WINDOW_BIT and GLX_RGBA_BIT respectively, as specified in
the GLX 1.4 specification.
This fixes the glx-choosefbconfig-defaults piglit test.
Cc: "9.2 10.0 10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This variable is no longer needed after the cleanup to the
code prior to the first arrays of array series
Signed-off-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Matt Turner <mattst88@gmail.com>
On my gentoo system, llvm libs are in /usr/lib64/llvm, and llvm-config
--ldflags does not provide the rpath (it does, of course, provide a -L).
This adds the llvm dir to the rpath. It should be harmless if the path
is a system path, and should make things work when it's not.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Tested-by: Emil Velikov <emil.l.velikov@gmail.com>
This will be used for changing texture properties without modifying
pipe_resource like r600g, but not in this series. For now, this change
allows consolidation of pipe_surface functions.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
db_z_info was unused. This just renames the variable to match the register
name.
Now, db_depth_info is unused on Evergreen.
Both variables will be needed on SI though.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
OpenGL allows a buffer to be mapped only once, but we also map buffers
internally, e.g. in the software primitive restart fallback, for PBOs,
vbo_get_minmax_index, etc. This has always been a problem, but it will
be a bigger problem with persistent buffer mappings, which will prevent
all Mesa functions from mapping buffers for internal purposes.
This adds a driver interface to core Mesa which supports multiple buffer
mappings and allows 2 mappings: one for the GL user and one for Mesa.
Note that Gallium supports an unlimited number of buffer and texture
mappings, so it's not really an issue for Gallium.
v2: fix unmapping in xm_dd.c, remove the GL errors there
v3: fix the intel driver (by Fredrik)
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
v2: dropped the error that DYNAMIC_STORAGE is required for MAP_WRITE_BIT,
the error is removed in the latest revision of GL 4.4
This adds support for GL_ARB_texture_gather, and one step of
support for GL_ARB_gpu_shader5.
This adds support for passing the TG4 instruction, along
with non-constant texture offsets, and tracking them for the
optimisation passes.
This doesn't support native textureGatherOffsets hw, to do that
you'd need to add a CAP and if set disable the lowering pass,
and bump the MAX offsets to 4, then do the i0,j0 sampling using
those.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds support to gallium for a TG4 instruction,
and two CAPs. The first CAP is required for GL_ARB_texture_gather.
The second CAP is required to expose GL_ARB_gpu_shader5.
However so far we haven't found any hardware that natively
exposes the textureGatherOffsets feature from GL, so just
lower it for now. If hardware appears for this we can add
another CAP to allow TG4 to take 4 offsets.
v2: add component selection src and a cap to say
hw can do it. (st can use to help control
GL_ARB_gpu_shader5/GLSL 4.00). Add docs.
v3: rename to SM5, add docs.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This lowering pass will be useful for gallium drivers as well, in order to support
the GL TG4 oddity that is textureGatherOffsets.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The offsets will be stored in the handles parameter. This makes
it possible to use sub-buffers.
v2:
- Style fixes
- Add support for constant sub-buffers
- Store handles in device byte order
v3:
- Use endian helpers
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Current automake build does not try to resolve undefined
symbols thus we could end up with a broken library.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
With the introduction of the pipe_loader_sw_probe_dri helper we
require the sw/dri winsys during linking stage despite it being
unused by any of the targets. This will cause a minor increase
in the resulting library which will be cleaned up via linker
options with upcoming patches.
v2: Link with libswdri.la only when available.
Reported-and-tested-by: Tom Stellard <thomas.stellard@amd.com> (v1)
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
While looking at bug 75356, I've noticed that the presence of
x11 egl platform pulls in sw/xlib as "needed" but fails to
report so at the end of configure.
Tested-by: Tom Stellard <thomas.stellard@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
The above function implies using the the xlib winsys, which
has additional library dependencies that should not be forced.
Make the software xlib pipe loader optional thus avoid all
the dependency hell. A user that wishes to use the particular
pipe-loader would need to set the following within configure.ac.
enable_gallium_xlib_loader=yes
v2:
- Wrap sw/xlib/xlib_sw_winsys.h to handle compilation on systems
lacking X11 headers. Spotted by Christian Prochaska.
Tested-by: Tom Stellard <thomas.stellard@amd.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75356
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
The blitter is completely ignorant of MSAA buffer layouts, so any
attempt to use BLT paths with MSAA buffers is likely to break
spectacularly.
In most cases, BLORP handles MSAA blits, so we never hit this bug.
Until recently, it also wasn't worth fixing, since Meta couldn't handle
MSAA either, so there was nothing to fall back to. But now there is.
+143 piglit tests on Broadwell (which doesn't have BLORP support).
Surprisingly, three also start failing. Since non-IMS MSAA buffers
store samples in successive array slices, using the blitter ought to
access sample 0 and ignore the rest, which is apparently good enough for
a few not-very-picky Piglit tests. Presumably the meta replacement code
is still broken.
No Piglit changes on Ivybridge.
v2: Move the early return to the top of the function (suggested by
Paul).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Using generic shaders caused a measurable fps drop, which was isolated to
use of full precision (vs half precision) output. This is an attempt to
regain that lost performance by using half precision solid/blit shaders
(when the output format is not float32).
Note: for the built-in shaders, I would not expect them to be register
starved. And in fact it is the solid frag shader that seems to have the
biggest impact. So I suspect you get double the pixel pipe units (or
half the cycles) when the output is half precision. So there may be
some gain to using half precision output for application shaders as
well, even though the rest of register usage is still full precision.
But for half precision to work for more complex shaders, we need to deal
with some constraints, like cat2 needing same precision for it's two src
registers. So for now it is not enabled by default except for the
built-in shaders.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Start putting in place infrastructure to deal with multiple shader
variants. Initially we'll use this for two sided color (frag) and
binning pass (vert) shaders. Possibly need for others later (such
as YUV vs RGB eglImage?).
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Easier than making more extensive use of rpt, and the more compact
shaders seem to bring some bit of performance boost. (Perhaps repeat
flag benefits are more than just instruction cache, possibly it saves
on instruction decode as well?)
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Instead in the common code, construct these shaders from TGSI. For now
we let a2xx keep it's hand coded shaders, as it's compiler isn't quite
up to the job yet. All the same it is a net drop in code size and gets
rid of special cases.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Make things configurable, and tweak the API a bit to avoid an extra
tgsi_shader_scan(). Getting closer to something generic which can be
moved out of freedreno and shaderd by other drivers.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
... over the one provided by the headers.
Explicitly set extension members to improve clarity.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
... over the one provided by the headers.
Explicitly set extension members to improve clarity.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
... over the one provided by the headers.
Explicitly set extension members to improve clarity.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
... over the one provided by the headers.
Currently both versions are identical, but that is not
guaranteed to be the case in the future.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
... over the one provided by the spec.
Currently both versions are identical, but that is not
guaranteed to be the case in the future.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
... over the version number provided by the headers.
Explicitly set extension members to improve clarity.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Note the member function releaseTexBuffer was added without
bumping spec version, and currently no drivers implement it.
v2: releaseTexBuffer was introduced by version 3
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
While we want to be able to print to stdout for glsl_compiler, for
debugging drivers we want to be able to dump to stderr because that's
where other driver debug (like LIBGL_DEBUG) tends to go, and because some
apps actually close stdout to shut up their own messages (such as the X
Server, or NWN).
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This was only going to get worse when tesselation shows up, and was
causing too much extra duplication in my stderr changes coming up.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Unfortunately there's only one RT_ARRAY_MODE setting for all
attachments, so clears were previously truncated to the minimum number
of layers any attachment had. Instead set the RT_ARRAY_MODE to 512 (the
max number of layers) before doing the clear. This fixes
gl-3.2-layered-rendering-clear-color-mismatched-layer-count.
Also fix clears of individual layered rt/zeta, in case it ever happens.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christoph Bumiller <e0425955@student.tuwien.ac.at>
Cc: 10.1 <mesa-stable@lists.freedesktop.org>
Use tex->bo_format instead of zs->format in ilo_blitter_rectlist_clear_zs()
because the latter may be combined depth/stencil format. hiz_can_clear_zs()
is no-op for GEN7+, but move the GEN check so that the assertions are tested.
Finally, call the fast depth clear function from ilo_clear().
Using this emit function implicitly creates three copies, which
is pointlessly inefficient.
1. Code creates the original instruction.
2. Calling emit(fs_inst) copies it into the function.
3. It then allocates a new fs_inst and copies it into that.
The second could be eliminated by changing the signature to
fs_inst(const fs_inst &)
but that wouldn't eliminate the third. Making callers heap allocate the
instruction and call emit(fs_inst *) allows us to just use the original
one, with no extra copies, and isn't much more of a burden.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
These functions (modulo emit_lrp, necessitating the small fix-up) pass
these arguments by value unmodified to other functions. No point in
making an additional copy.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The C99 spec says the type of an enum is implementation defined (but can
be char, signed int, or unsigned int). gcc appears to always give enums
four bytes, even when they can fit in less. It does so because this is
what other compilers seem to do [0] and therefore to maintain ABI
compatibility with them.
gcc has an -fshort-enum flag that tells the compiler to use only as much
space as needed for an enum. Adding __attribute__((__packed__)) to an
enum definition has the same behavior, but on a per-enum basis.
brw_reg_type and register_file are not part of the ABI, so we can safely
mark them as PACKED so that they'll take only a byte, rather than four.
[0] http://gcc.gnu.org/onlinedocs/gcc/Non-bugs.html#index-fshort-enums-3868
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
This patch fixes this SCons build error introduced with commit
4f37e52f37.
Compiling src/gallium/targets/libgl-xlib/xlib.c ...
src/gallium/targets/libgl-xlib/xlib.c:35:42: fatal error: state_tracker/xlib_sw_winsys.h: No such file or directory
#include "state_tracker/xlib_sw_winsys.h"
^
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75347
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
v2: Handle null_sw_create failure, add missing function return type
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com> (v1)
Will be used in the following commits.
v2: Link gallium tests against the library.
v3: Handle dri_create_sw_winsys failure
v4: Rebase on top of the targets/xa changes
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com> (v2)
Will be used in the upcoming patches.
v2: handle xlib_create_sw_winsys failure, drop unneeded header
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com> (v1)
The sw pipe-loader implicitly handles winsys_create, thus we
it would make sense to implicitly destroy it upon releasing
the loader.
Currently we leak the sw_winsys when releasing the pipe-loader.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
We don't need an array mapping the shader index to "sampler2DMS",
"isampler2DMS", and so on. We can simply do "%ssampler2DMS" and pass in
vec4_prefix, which is "", "i", or "u".
This eliminates the use of C99 array initializers and should fix the
MSVC build.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75344
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
gl_texture_object's Target field is never a cube face enumeration, so
target_to_target is just the identity function. Aptly named, at least.
I verified this by putting an assert(!"ZOMG, CUBES!") in the cube face
case, and running Piglit. Nothing ever hit it. Beyond that, I
inspected the code in mesa/main.
This could probably also be deleted from i915, but I haven't tested
there.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This patch fixes this SCons build error.
build/linux-x86_64-debug/mesa/libmesa.a(context.os): In function `init_attrib_groups':
src/mesa/main/context.c:815: undefined reference to `_mesa_init_pipeline'
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Fix build error introduced with commit f4c13a890f.
CC pixeltransfer.lo
main/pipelineobj.c: In function '_mesa_delete_pipeline_object':
main/pipelineobj.c:59:4: error: unknown type name 'unsinged'
unsinged i;
^
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
This was originally included in another patch, but it was split out by
Ian Romanick.
v2 (idr):
* Trivial reformatting.
* Remove GL_COMPUTE_SHADER. Compute shaders don't participate in pipeline
objects anyway. Suggested by Matt Turner.
v3 (idr):
* Use _mesa_has_geometry_shaders.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This was originally included in another patch, but it was split out by
Ian Romanick.
v2 (idr): Return early from _mesa_ActiveShaderProgram if
_mesa_lookup_shader_program_err returns an error. Suggested by Jordan.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> [v2]
This will allow the guts of the implementation to be shared with
_mesa_CreateShaderProgramv.
This was originally included in another patch, but it was split out by
Ian Romanick.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Implement IsProgramPipeline based on the VAO code.
This was originally included in another patch, but it was split out by
Ian Romanick.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Implement GenProgramPipelines based on the VAO code.
This was originally included in another patch, but it was split out by
Ian Romanick.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Implement DeleteProgramPipelines based on the VAO code.
This was originally included in another patch, but it was split out by
Ian Romanick.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
V1:
* Extend gl_shader_state as pipeline object state
* Add a new container gl_pipeline_shader_state that contains
binding point of the previous object
* Update mesa init/free shader state due to the extension of
the attibute
* Add an init/free pipeline function for the context
V2:
* Rename gl_shader_state to gl_pipeline_object
* Rename Pipeline.PipelineObj to Pipeline.Current
* Formatting improvement
V3 (idr):
* Split out from previous uber patch.
* Remove '#if 0' debug printfs.
V4 (idr):
* Fix some errors in comments. Suggested by Jordan.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Future patches will use this function outside shaderapi.c.
This was originally included in another patch, but it was split out by
Ian Romanick.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Nothings implemented yet but glProgramUniform* which are mostly a
copy/paste of the older function glUniform*
I create dedicated pipelineobj.[ch] file that will contains function
related to the "new" pipeline container object.
V2: formatting improvement
V3:
* indentation fix
* Update copyright
* Add a comment on ProgramParameteri already present in another extension
* Remove TODO, will be readded on correct patch
V4 (idr):
* Fix dispatch_sanity unit test
* Make extension string available in core profiles (instead of just
compatibility).
* Trivial reformating
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
GL_ARB_separate_shader_objects adds the ability to specify location
layouts for interstage inputs and outputs.
In addition, this extension makes 'in' and 'out' generally available for
shader inputs and outputs. This mimics the behavior of
GL_ARB_explicit_attrib_location.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Current behaviour states that shared-glapi is usefull when building
with dri, which is not the case. Shared-glapi is used to dispatch
the gl* functions across the one or more gl api's which can be dri
based but do not need to be.
Fixed the following build
./configure --enable-gles2 --disable-dri --enable-gallium-egl \
--with-egl-platforms=fbdev --with-gallium-drivers=swrast
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75098
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Commit ee55500c22a(configure: cleanup classic dri drivers handling)
cleaned up the logic handling autodetection of dri drivers, but missed
the case when one can explicitly disable dri, and still request opengl.
Fixes build issues for the following
./autogen.sh --disable-dri --with-gallium-drivers=swrast
While we're here, explicitly clear with_dri_drivers whenever building
without such drivers to prevent choking later on.
v2: Simplify with_dri_drivers handling.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75126
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Compared to i965, the code generated doesn't use the AVG instruction. But
I'm not sure that multisampled integer resolves are really that important
to worry about.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These are non-stretched, non-resolving blits, so it's just a matter of
sampling once from our gl_SampleID and storing that to our color/depth.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We're disabling GL_MULTISAMPLE, so we didn't need to worry about a lot of
that state. But to do MSAA to MSAA blits, we need to start handling more
state.
v2: Fix pasteo caught by Kenneth.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Blending of values would occur when doing GL_LINEAR filtering with
scaling, and in an upcoming commit when doing MSAA resolves.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Note that this doesn't handle GL_EXT_multisample_scaled_blit yet. The
i965 code for that extension bakes in knowledge of the sample positions
(well, knowledge of the sample positions aligned to a lower-resolution
grid), which we would have to do at runtime somehow for meta.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We haven't been executing this code before the meta-blit case, because
we've been flagging the miptree as validated at texstorage time, and never
having to revalidate.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Identified by Valgrind memory check. Initialized block-opaque in a
different patch. This test seems unnecessary. If opaque must be true,
just set to true.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Courtney Goeltzenleuchter <courtney@LunarG.com>
Define some additional convenience operators, clean up the
implementation slightly, and rename it to 'intrusive_ptr' for reasons
that will be obvious in the next commit.
Tested-by: Tom Stellard <thomas.stellard@amd.com>
I'd neglected to port these to Broadwell. Most of this code is copy
and pasted from Gen7, but instead of using F32TO16/F16TO32, we just
use MOV with HF register types.
Fixes fs-packHalf2x16 and fs-unpackHalf2x16 tests (both the ARB
extension and ES 3.0 variants).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Broadwell removed the F32TO16 and F16TO32 instructions. However, it has
actual support for HF values, so they're actually just MOV.
Fixes vs-packHalf2x16 and vs-unpackHalf2x16 tests (both the ARB
extension and ES 3.0 variants).
v2: Emulate F32TO16's align16 zeroing bug, since Chad's front end code
relies on it happening. We can probably refactor this code to be
better later.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
brw_init_state() calls brw_upload_initial_gpu_state(). If hardware
contexts are enabled (brw->hw_ctx != NULL), this will upload some
initial invariant state for the GPU. Without hardware contexts, we
rely on this state being uploaded via atoms that subscribe to the
BRW_NEW_CONTEXT bit.
Commit 46d3c2bf4d accidentally moved
the call to brw_init_state() before creating a hardware context.
This meant brw_upload_initial_gpu_state would always early return.
Except on Gen6+, we stopped uploading the initial GPU state via
state atoms, so it never happened.
Fixes a regression since 46d3c2bf4d.
Cc: "10.0 10.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
To make sure that both the Gen4 and Gen7 style messages work, I
initially disabled the SHADER_OPCODE_GEN7_SCRATCH_READ optimization,
ran Piglit, re-enabled it, and ran Piglit again. Both worked fine.
Fixes 40 Piglit tests (most of the varying-packing category).
v2: Move num_regs assertion from gen8_fs_generator to
gen8_set_dp_scratch_message() (suggested by Eric).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The new accessors will make it easy to do Gen7-style scratch messages.
v2: Move num_regs assertion from gen8_fs_generator into
gen8_set_dp_scratch_message() (suggested by Eric).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
In the past, 3DSTATE_PS took an absolute number of threads. Conversely,
on Broadwell you always program 64, and it implicitly scales based on
the GT-level with no special programming. So, I stored 64 in
brw_device_info::max_wm_threads.
However, I didn't realize that we also use max_wm_threads to compute the
size of the scratch space buffer. In that case, we really need the
absolute number of threads.
This patch hardcodes 3DSTATE_PS to use the value it expects, and changes
max_wm_threads back to a (completely fake) absolute thread count (once
again copied from Haswell).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
On Broadwell, g0.5 contains the "Scratch Space Pointer"; using OR
puts some bits of that into "ignored" sections of our message header.
While this doesn't hurt, it's also not terribly /useful/. Using MOV
is sufficient to set the only interesting bits in this part of the
message header.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
According to the latest documentation, any PIPE_CONTROL with the
"Command Streamer Stall" bit set must also have another bit set,
with five different options:
- Render Target Cache Flush
- Depth Cache Flush
- Stall at Pixel Scoreboard
- Post-Sync Operation
- Depth Stall
I chose "Stall at Pixel Scoreboard" since we've used it effectively
in the past, but the choice is fairly arbitrary.
Implementing this in the PIPE_CONTROL emit helpers ensures that the
workaround will always take effect when it ought to.
Apparently, this workaround may be necessary on older hardware as well;
for now I've only added it to Broadwell as it's absolutely necessary
there. Subsequent patches could add it to older platforms, provided
someone tests it there.
v2: Only flag "Stall at Pixel Scoreboard" when none of the other bits
are set (suggested by Ian Romanick).
v3: Prefix the function with "gen8" (requested by Eric).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v2)
Reviewed-by: Eric Anholt <eric@anholt.net>
Grab the parsed invocation count, check for consistency
during linking, and finally save the result in
gl_shader_program Geom.Invocations.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
_mesa_glsl_parse_state in_qualifier->invocations will store the
invocations count.
v3:
* Use in_qualifier to allow the primitive to be specied
separately from the invocations count (merge_qualifiers)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
We introduce a new merge_in_qualifier ast_type_qualifier
which allows specialized handling of merging input layout
qualifiers.
By merging layout qualifiers into state->in_qualifier, we
allow multiple input qualifiers. For example, the primitive
type can be specified specified separately from the
invocations count (ARB_gpu_shader5).
state->gs_input_prim_type is moved into state->in_qualifier->prim_type
state->gs_input_prim_type_specified is still processed separately
so we can determine when the input primitive is specified. This
is important since certain scenerios are not supported until after
the primitive type has been specified in the shader code.
v4:
* Merge with compute shader input layout qualifiers
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Improves performance of a dolphin emulator trace I had laying around by
3.60131% +/- 0.995887% (n=128).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We generate steaming piles of these for the centroid workaround, and this
quickly cleans them up.
total instructions in shared programs: 1591228 -> 1590047 (-0.07%)
instructions in affected programs: 26111 -> 24930 (-4.52%)
GAINED: 0
LOST: 0
(Improved apps are l4d2, csgo, and dolphin)
Reviewed-by: Matt Turner <mattst88@gmail.com>
The previous code relied on cpu denorm support for converting small float
formats (such r11g11b10_float and r16_float) to floats, otherwise denorms
are flushed to zero. We worked around that in llvmpipe blend code by
reenabling denorms, but this did nothing for texture sampling. Now it would
be possible to reenable it there too but I'm not really a fan of messing
with fpu flags (and it seems we can't actually do it reliably with llvm in
any case looking at some bug reports). (Not to mention if you actually have
a lot of denorms in there, you can expect some order-of-magnitude slowdown
with x86 cpus.)
So instead use code which adjusts exponents etc. directly hence not relying
on cpu denorm support for the rescaling mul.
(We still need the fpu flag handling as we can't do float-to-smallfloat
without using cpu denorms at least for now - I actually wanted to keep
both the old and new code and using one or the other depending on from where
it's called but that didn't work out as the parameter would have to be passed
through too many layers than I'd like.)
Reviewed-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Si Chen <sichen@vmware.com>
We need to advertise 8x, 4x, and 2x multisamples. Previously, we only
claimed to support 0/1 samples.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
I can't find any documentation to explain what ought to be done here, so
I simply guessed based on the pattern I observed in the 4x/8x cases.
It appears to work, but it could be totally wrong.
I was able to find the Sandybridge PRM quote from the comments in the
latest documentation: Shared Functions > 3D Sampler > Multisampled
Surface Behavior. However, it only mentions 4x MSAA - not even 8x.
After a substantial amount more digging, I was able to find a second
page (incorrectly tagged) which confirmed the formulas in our code for
8x MSAA. However, that page didn't mention 2x MSAA at all.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
According to the "Point Multisample Rasterization" of the OpenGL
specification (3.0 or later), smooth points are supposed to be enabled
implicitly when multisampling, regardless of the GL_POINT_SMOOTH flag.
However, if GL_POINT_SPRITE is enabled, you get square points no matter
what. Core contexts always enable point sprites, so this effectively
makes smooth points go away, even in the case of multisampling.
Fixes Piglit's EXT_framebuffer_multisample/point-smooth tests.
(Yes, that's right folks, we actually have Piglit tests for this.)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
The meaning and effects of this bit are surprisingly complicated.
See Rasterization > Windower > Multisampling > Multisample ModesState.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This restriction carries forward from earlier platforms. The code is
ported straight from gen7_wm_state.c.
v2: Actually do it right.
v3: Add missing _NEW_MULTISAMPLE bit (caught by Eric).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
v2: Also set the "oMask Present to Render Target" bit, which is required
for shaders that write oMask. Otherwise the hardware won't expect
the extra data.
v3: Add missing _NEW_MULTISAMPLE (caught by Eric).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
I made a few changes which I think simplify the code a bit compared to
the Gen7 implementation, but which are largely pointless.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
We already set the number of samples, but were missing the MSAA layout
mode. Reusing gen7_surface_msaa_bits makes it easy to set both.
This also lets us drop the Gen8 surface_num_multisamples function.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
The enumerations are just log2(num_samples) shifted by 3, which we can
easily compute via ffs().
This also makes it reusable for Broadwell, which has 2x MSAA.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
These enumerations are simply log2 of the number of multisamples shifted
by a bit, so we can calculate them using ffs() in a lot less code.
Suggested by Eric Anholt.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
The const in
const unsigned foo(void);
is meaningless. Removing it silences this warning:
src/glsl/ast_to_hir.cpp:1802:56: warning: type qualifiers ignored on function return type [-Wignored-qualifiers]
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
From page 14 (page 20 of the PDF) of the GLSL 1.10 spec:
"In addition, all identifiers containing two consecutive underscores
(__) are reserved as possible future keywords."
The intention is that names containing __ are reserved for internal use
by the implementation, and names prefixed with GL_ are reserved for use
by Khronos. Names simply containing __ are dangerous to use, but should
be allowed.
Per the Khronos bug mentioned below, a future version of the GLSL
specification will clarify this.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "9.2 10.0 10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Tested-by: Darius Spitznagel <d.spitznagel@goodbytez.de>
Cc: Tapani Pälli <lemody@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=71870
Bugzilla: Khronos #11702
Section 3.3 (Preprocessor) of the GLSL 1.30 spec (and later) and the
GLSL ES spec (all versions) say:
"All macro names containing two consecutive underscores ( __ ) are
reserved for future use as predefined macro names. All macro names
prefixed with "GL_" ("GL" followed by a single underscore) are also
reserved."
The intention is that names containing __ are reserved for internal use
by the implementation, and names prefixed with GL_ are reserved for use
by Khronos. Since every extension adds a name prefixed with GL_ (i.e.,
the name of the extension), that should be an error. Names simply
containing __ are dangerous to use, but should be allowed. In similar
cases, the C++ preprocessor specification says, "no diagnostic is
required."
Per the Khronos bug mentioned below, a future version of the GLSL
specification will clarify this.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "9.2 10.0 10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Tested-by: Darius Spitznagel <d.spitznagel@goodbytez.de>
Cc: Tapani Pälli <lemody@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=71870
Bugzilla: Khronos #11702
Linking with LLVM static libraries is easily broken by changes to
the llvm-config program or when LLVM adds, removes, or changes library
components. Keeping up with these changes requires a lot of maintanence
effort to keep the build working on the master and stable branches.
Also, because of issues in the past LLVM static libraries, the release
manager is currently configuring with --with-llvm-shared-libs when
checking the build before release. Enabling shared libraries by
default would allow the release manager to run ./configure with
no arguments, and be reasonably confident that the build would succeed.
Acked-by: Emil Velikov <emil.l.velikov@gmail.com>
Useful because the total number of uniform components might exceed
MAX_UNIFORMS * 4 in some cases because of the image metadata we'll be
passing as push constants.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Like the VEC4 back-end does. It will make dynamic allocation of the
param_size array easier in a future commit.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Since we are now consuming two ringbuffers at a time, we probably want a
pool larger than 4.. but we don't need each individual ringbuffer to be
so large, so offset the pool size increase by reducing rb size.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
It seems the write-after-read hazard that applies to texture fetch
instructions, also applies to sfu instructions.
Also, cat5/cat6 instructions do not have a (ss) bit, so in these
cases we need to insert a dummy nop instruction with (ss) bit set.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
There doesn't seem to be any reason for it to be a method, and it's
surprising that the expression 'reg.retype(t)' doesn't retype its
object but rather it creates a temporary with the new type. Use
'retype(reg, t)' instead.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Add assertion that the register is not in the HW_REG or IMM file,
calculate the conjunction of the old and new mask instead of replacing
the old [consistent with the behavior of brw_writemask(), causes no
functional changes right now], make it static inline to let the
compiler do a slightly better job at optimizing things, and shorten
its name.
v2: Assert that the new writemask is not zero to avoid undefined
hardware behaviour.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
And define non-mutating helper functions to retype fixed and normal
regs with a common interface. At some point we may want to get rid of
::fixed_hw_reg completely and have fixed regs use the normal register
data members (e.g. backend_reg::reg to select a fixed GRF number,
src_reg::swizzle to store the swizzle, etc.), I have the feeling that
this is not the last headache we're going to get because of the
multiple ways to represent the same thing and the different register
interface depending on the file a register is stored in...
Reviewed-by: Paul Berry <stereotype441@gmail.com>
There doesn't seem to be any reason for nr_params, nr_pull_params,
param, and pull_param to be duplicated in the stage-specific
subclasses of brw_stage_prog_data. Moving their definition to the
common base class will allow some code sharing in a future commit, the
removal of brw_vec4_prog_data_compare and brw_*_prog_data_free, and
the simplification of the stage-specific brw_*_prog_data_compare.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Broadwell's 3DSTATE_WM_HZ_OP packet makes this much easier.
Instead of programming the whole pipeline, we simply have to emit the
depth/stencil packets, a state override, and a pipe control. Then
arrange for the state to be put back. This is easily done from a single
function.
v2: Use minify(mt->logical_{width,height}0, level) in 3DSTATE_WM_HZ_OP
instead of intel_mipmap_level's width/height fields. Those were
based on the physical width/height, and thus wrong for MSAA buffers.
Eric also deleted those fields.
v3: Use 0xFFFF as the sample mask regardless of what the user set (as
this operation is unrelated); set the drawing rectangle to the
miplevel being operated on, rather than the whole surface; remove
unnecessary MAX2(..., 1) around mt->logical_depth0 (all suggested
by Eric Anholt).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The existing code followed the vtable function signature, which is not a
great fit: many of the parameters are unused, and the function still
inspects global state, making it less reusable.
This patch refactors the depth buffer packet emission code into a new
function which takes exactly the parameters it needs, and which uses no
global state. It then makes the existing vtable function call the new
one.
Ideally, we would remove the vtable function, and clean up that
interface. But that can happen once HiZ is working.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Broadwell's "HiZ Resolve" operation still has the restriction that the
rectangle primitive must be 8x4 aligned. So I believe we still need
this.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
HiZ buffers still don't exist, but when they do, we'll set them up.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
brw_depthbuffer_format is not very reusable at the moment, since it
uses global state (ctx->DrawBuffer) to access a particular depth buffer.
For HiZ on Broadwell, I need a function which simply converts the
formats. However, at least one existing user of brw_depthbuffer_format
really wants the existing interface. So, I've created a new function.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This reverts commit 1456ed85f0.
_eglInitResource can and is supposed to be called on subclass objects.
Acked-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
This reverts commit 498d10e230.
_eglInitResource can and is supposed to be called on subclass objects.
Acked-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Even with the other limits raised, TestProxyTexImage would still reject
textures > 1GB in size. This is an artificial limit; nothing prevents
us from having a larger texture. I stayed shy of 2GB to avoid the
larger-than-aperture situation.
For 3D textures, this raises the effective limit:
- RGBA8: 645 -> 738
- RGBA16: 512 -> 586
- RGBA32F: 406 -> 465
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=74130
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Gen4+ supports 8192x8192 cube maps. Ivybridge and later can actually
support 16384, but that would place GL_MAX_CUBE_MAP_TEXTURE_SIZE above
GL_MAX_TEXTURE_SIZE, which seems like a bad idea.
(Unfortunately, we can't bump GL_MAX_TEXTURE_SIZE to 16384 without
causing regressions due to awful W-tiled stencil buffer interactions.)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=74130
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
GL_ARB_ES2_compatibility doesn't say anything about shader linking
when one of the shaders (vertex or fragment shader) is absent. So,
the extension shouldn't change the behavior specified in GLSL
specification.
Tested the behavior on proprietary linux drivers of NVIDIA and AMD.
Both of them allow linking a version 100 shader program in OpenGL
context, when one of the shaders is absent.
Makes following Khronos CTS tests to pass:
successfulcompilevert_linkprogram.test
successfulcompilefrag_linkprogram.test
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This drops the MOVs for header setup, which are totally mis-scheduled.
total instructions in shared programs: 1590047 -> 1589331 (-0.05%)
instructions in affected programs: 43729 -> 43013 (-1.64%)
GAINED: 0
LOST: 0
glb27-trex:
x before
+ after
+-----------------------------------------------------------------------------+
| + x xx + + + |
| ++ + xxx ++x xx + ** *x+ + + + x * |
|+x xx x* x+++xx*x*xx+++*+*xx++** *x* x+***x*+xx+* + * + + *|
| |__|__________MA___A___________|___| |
+-----------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 49 62.33 65.41 63.49 63.53449 0.62757822
+ 50 62.28 65.4 63.7 63.6982 0.656564
No difference proven at 95.0% confidence
Reviewed-by: Matt Turner <mattst88@gmail.com>
It often confused people because it was unclear on whether it was the
physical or logical, and people needed the other one as well. We can
recompute it trivially using the minify() macro, clarifying which value is
being used and making getting the other value obvious.
v2: Fix a pasteo in intel_blit.c's dst flip.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> (v1)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Since only window system renderbuffers can have a singlesample_mt, this
lets us drop a bunch of sanity checking to make sure that we're just a
renderbuffer-like thing.
v2: Fix a badly-written comment (thanks Kenneth!), drop the now trivial
helper function for set_needs_downsample.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The only DRI2 vs DRI3 delta was just how to decide about frontbuffer-ness
for doing the upsample.
v2: Fix missing singlesample_mt->region->name update in the merged code,
which would have broken the DRI2 don't-recreate-the-miptree
optimization.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Pretty silly to pass in values dereferenced out of one of the arguments.
v2: Get the destination size from the dst, even though the callers are
always dealing with src size == dst size cases.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
So far it's happened to be that we're only ever calling
intel_miptree_blit() (up/downsampling) from the ReadBuffer, but I stumbled
over a null ReadBuffer case when debugging later parts of the series.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Mostly mt->target == 2D_MS just results in a few checks that we don't try
to allocate multiple LODs and don't try to do slice copies with them. But
with the introduction of binding renderbuffers to textures, we need more
consistency.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This lets us simplify our shaders, and rely on GLES-prohibited
functionality (like ARB_texture_multisample) when writing these
driver-internal functions.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes:
xa_tracker.c: In function 'xa_tracker_create':
xa_tracker.c:147:5: error: implicit declaration of function 'pipe_loader_drm_probe_fd' [-Werror=implicit-function-declaration]
in some build configurations, as XA now implicitly depends on
gallium_drm_loader.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
Fixes radeonsi emitting command streams to the kernel even when there
have been no draw calls before a flush, potentially powering up the GPU
needlessly.
Incidentally, this also cuts the runtime of piglit gpu.py in about half
on my Kaveri system, probably because an X11 client going away no longer
always results in a command stream being submitted to the kernel via
glamor.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=65761
Cc: "10.1" mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Required for libdrm 2.4.37 and earlier. Both scons and automake
require version 2.4.38 now so that guard is not longer needed.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
xorg-server and libkms is no longer required since the removal
of the xorg state-tracker.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This is the first version that introduced DRM_CAP_PRIME, which is
implicitly required by egl/wayland.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
* Make sure that only drivers that are handled by configure.ac
are included in DRI_DIRS.
* Change with_dri_drivers default value to auto, and set enable
autodetection, when enable_opengl is on.
v2: Move "test" to the correct location.
v3: Squash DRI_DIRS handling before the switch statement.
Suggested by Ilia Mirkin
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Both x86_64|amd64 and *bsd, already set the full range of available
classic dri drivers. Drop the explicit assignment, and fall back to
the generic default.
Keep explicit list from plafroms/arches that do not handle the default
list.
Update help strings, to explicitly mention "classic" for applicable
DRI drivers.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Move all the cases within one switch statement and handle
i9{1,6}5 and r{adeon,200} independently.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
If built without llvm, the following error occurs with mplayer:
Failed to open VDPAU backend .../libvdpau_r600.so: undefined symbol: _ZTVN10__cxxabiv117__class_type_infoE
[vo/vdpau] Error when calling vdp_device_create_x11: 1
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kusanagi Kouichi <slash@ac.auone-net.jp>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
DRM_API_HANDLE_TYPE_SHARED is zero, so doesn't actually fix anything.
But we shouldn't rely on SHARED handle type being zero.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Build two versions of pipe-loader, with only the client version linking
in x11 client side dependencies. This will allow the XA state tracker
to use pipe-loader.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Seems texture sample instructions don't immediately consume there
src(s). In fact, some shaders from blob compiler seem to indiciate that
it does not even count the texture sample instructions when calculating
number of delay slots to fill for non-sample instructions. (Although so
far it seems inconclusive as to whether this is required.)
In particular, when a src register of a previous texture sample
instruction is clobbered, the (ss) bit is needed to synchronize with the
tex pipeline to ensure it has picked up the previous values before they
are overwritten.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Was supposed to be a '+', otherwise we end up with a negative offset and
choosing registers below the assigned range.
This seems to fix the scheduling mystery "solved" by adding in extra
delay slots.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Since 'kill' does not produce a result, the new compiler was happily
optimizing them out. We need to instead track 'kill's similar to
outputs. But since there is no non-predicated kill instruction,
(and for flattend if/else we do want them to be predicated), we need
to track the topmost branch condition on the stack and use that as src
arg to the kill. For a kill at the topmost level, we have to generate
an immediate 1.0 to feed into the cmps.f for setting the predicate
register.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Thanks to figuring out 32bit float render target, and adding regdump
test in fdre-a3xx, I can more easily play around with instructions to
figure out range of inputs/outputs/etc. And from this I can conclude
that cmps.f works more like expected and I can do something much more
simple in trans_cmp() (compared to before which was more closely
emulating the instruction sequence of the blob compiler).
And using sel.b32 (binary 0/1) often makes more sense than sel.f32
(+/- float) or sel.u32 (+/- uint) as it can use the output directly
from cmps.f without needing the 'add.s tmp0, tmp0, -1'.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
As documented, the _mesa_free_shader_program_data function:
"Frees all the data that hangs off a shader program object, but not
the object itself."
This means that this function may be called multiple times on the same object,
(and has been observed to). Meanwhile, the shProg->Label field was not being
set to NULL after its free(). This led to a second call to free() of the same
address on the second call to this function.
Fix this by setting this field to NULL after free(), (just as with all other
calls to free() in this function).
Reviewed-by: Brian Paul <brianp@vmware.com>
CC: mesa-stable@lists.freedesktop.org
The linux winsys needs to know whether a surface is shared.
For guest-backed surfaces we need this information to avoid allocating a
mob out of the mob cache for shared surfaces, but instead allocate a shared
mob, that is never put in the mob cache, from the kernel.
Also previously, all surfaces were given the "shareable" attribute when
allocated from the kernel. This is too permissive for client-local surfaces.
Now that we have the needed info, only set the "shareable" attribute if the
client indicates that it needs to share the surface.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
In some situations, it may be desirable to bypass the cache at buffer
creation but to insert the buffer in the cache at buffer destruction.
One such situation is where we already have a kernel representation of a
buffer that we want to use, but we also want to insert it in the cache when
it's freed up.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
In some situations it's important to restrict the sizes of buffers that the
cached buffer manager is allowed to return
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
This adds new interface functions for guest-backed surfaces and
adds a mobid parameter to the surface_relocation() function.
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
The old svga3d_reg.h file is split into separate header files and we
add new items for guest-backed surfaces.
Plus some minor code fixes because of renamed symbols.
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
Commit f4ebcd133b ("dri/nouveau: NV17_3D class is not available for
NV1a chipset") fixed this partially by using the correct 3d class.
However there were a lot of checks left over comparing against the
chipset.
Reported-and-tested-by: John F. Godfrey <jfgodfrey@gmail.com>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: 9.2 10.0 10.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
v2 (chk): fix eos handling
v3 (leo): implement scaling configuration support
v4 (leo): fix bitrate bug
v5 (chk): add workaround for bug in Bellagio
v6 (chk): fix div by 0 if framerate isn't known,
user separate pipe object for scale and transfer,
always flush the transfer pipe before encoding
v7 (chk): make suggested changes, cleanup a bit more,
only advertise encoder on supported hardware
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Leo Liu <leo.liu@amd.com>
v2: add fw version query
v3: add README.VCE
v4: avoid error msg when kernel doesn't support it
Signed-off-by: Christian König <christian.koenig@amd.com>
Commit 246ca4b001 ("nv50: implement multiple viewports/scissors, enable
ARB_viewport_array") added dirty tracking to scissors/viewports. However
it neglected to mark them all as dirty on a context switch. This fixes
an apparent regression in webgl in chrome, but probably in any
application that switches contexts.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
The bound range is disconnected from the viewport dimensions. This is
the relevant bit from glViewportArray:
"""
The location of the viewport's bottom left corner, given by (x, y) is
clamped to be within the implementaiton-dependent viewport bounds range.
The viewport bounds range [min, max] can be determined by calling glGet
with argument GL_VIEWPORT_BOUNDS_RANGE. Viewport width and height are
silently clamped to a range that depends on the implementation. To query
this range, call glGet with argument GL_MAX_VIEWPORT_DIMS.
"""
Just set it to +/-16384, as that is the minimum required by
ARB_viewport_array and the value that all current drivers provide.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This avoids a CopyTexImage() on Intel i965 hardware without blorp.
v2: Move the !readAtt check up higher.
v3: Rebase on idr's changes, plus readAtt check is totally gone, and also
fix a typo in a comment.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v2)
This will let us use meta's acceleration from renderbuffers without having
to do a CopyTexImage first.
This is like what we do for TFP, but just taking an existing renderbuffer
and binding it to a texture with whatever its format was. The
implementation won't work for stencil renderbuffers, and it only does
non-texture renderbuffers (but then, if you're using a texture
renderbuffer, you can just pull the texture object/level/slice out of the
renderbuffer, anyway).
v2: Don't forget to propagate NumSamples to the teximage.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This function is only handling the color case. We can just unindent as
long as we're willing to do the check for the bit outside of the
function.
v2: Rebase on idr's changes, drop readAtt check that's always non-null
anyway (it's a pointer into to the statically-allocated attachments
array in the renderbuffer).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
I want split some meta.c code off to a separate file, so these functions
can't be static any more.
v2: Rebase on idr's changes, also expose setup_blit_shader,
blit_shader_table_cleanup, setup_vertex_objects,
setup_ff_tnl_for_blit.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
I'd like to split some of our code to separate files, since 4k lines and
growing is pretty unreasonable for all these separate operations.
v2: Rebase on idr's changes.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
There was this funny argument passed to setup for "did alloc decide we
need to allocate new texture storage?", which goes away if we don't have
the caller do alloc as a separate step.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
From the GL_ARB_fbo spec:
If the source and destination buffers are identical, and the
source and destination rectangles overlap, the result of the blit
operation is undefined.
As far as I know, that's the only thing that would have been of concern
for this.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
I tripped over one of these when debugging meta, and it's a lot nicer to
just see the internalFormat being complained about.
v2: Drop a note in the other errors path that there is one early return.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
While these structs are generated per GLSL sampler type, they're structs
of data-about-shaders (notably, the ID of a shader program), not
data-about-samplers.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Everyone was just immediately calling it and doing nothing else with the
shader program id.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The only thing that wants to track the glsl_sampler structure is the
shader string generator.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Most of the VEC4 back-end agrees on src_reg::swizzle being one of the
BRW_SWIZZLE macros defined in brw_reg.h, except in two places where we
use Mesa's SWIZZLE macros. There is even a doxygen comment saying
that Mesa's macros are the right ones. They are incompatible swizzle
representations (3 bits vs. 2 bits per component), and the code using
Mesa's works by pure luck. Fix it.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
The same effect can be achieved using ::subreg_offset. Remove the
less flexible alternative and define a convenience function to keep
the fs_reg interface sane.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
The same effect can be achieved using a combination of ::stride and
::subreg_offset. Remove the less flexible ::smear to keep the data
members of fs_reg orthogonal.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
v2: Some improvements for copy propagation with non-contiguous
register strides and mismatching types.
v3: Add example of the situation that the copy propagation changes are
intended to avoid. Clarify that 'fs_reg::apply_stride()' is expected
to work with zero strides too.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
It would be nice if we could have a single 'reg_offset' field
expressed in bytes that would serve the purpose of both, but the
semantics of 'reg_offset' are quite complex currently (it's measured
in units of one, eight or sixteen dwords depending on the register
file and the dispatch width) and changing it to bytes would be a very
intrusive change at this stage. Add a separate 'subreg_offset' field
for now.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
To fix MSVC compile breakage. Evidently, _restrict is an MSVC keyword,
though the docs only mention __restrict (with two underscores).
Note: we may want to also rename _volatile to volatile_flag to be
consistent.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=74900
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Fix formatting, add new comments, get rid of extraneous indentation.
Suggested by Ian in bug 74723.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
v2: Set driver-specified flag in NewDriverState when glUniform* is
used to bind an image unit.
v3: Abbreviate argument type check.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Because of the combinatorial explosion of different image built-ins
with different image dimensionalities and base data types, enumerating
all the 242 possibilities would be annoying and a waste of .text
space. Instead use a special path in the built-in builder that loops
over all the known image types.
v2: Generate built-ins on GLSL version 4.20 too. Rename
'_has_float_data_type' to '_supports_float_data_type'. Avoid
duplicating enumeration of image built-ins in create_intrinsics()
and create_builtins().
v3: Use a more orthodox approach for passing image built-in generator
parameters.
v4: Cosmetic changes.
Acked-by: Paul Berry <stereotype441@gmail.com>
No opaque types may be statically initialized in the shader, all
opaque variables must be declared uniform or be part of an "in"
function parameter declaration, no opaque types may be used as the
return type of a function.
v2: Add explicit check for opaque types in interface blocks. Check
for opaque types in ir_dereference::is_lvalue().
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Aggregating images inside uniform blocks is explicitly disallowed by
the standard, aggregating them inside structures is not (as of GL
4.4), but there is a similar problem as with atomic counters: image
uniform declarations require either a "writeonly" memory qualifier or
an explicit format qualifier, which are explicitly forbidden in
structure member declarations. In the resolution of Khronos bug
#10903 the same wording applied to atomic counters was decided to mean
that they're not allowed inside structures -- Rejecting image member
declarations within structures seems the most reasonable option for
now.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
v2: Add comment next to the read_only and write_only qualifier flags.
Change temporary copies of the type qualifier mask to use uint64_t
too.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Add predicates to query if a GLSL type is or contains an image.
Rename sampler_coordinate_components() to coordinate_components().
v2: Use assert instead of unreachable.
v3: No need to use a separate code-path for images in
coordinate_components() after merging image and sampler fields in
the glsl_type structure.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
v2: Reuse the glsl_sampler_dim enum for images. Reuse the
glsl_type::sampler_* fields instead of creating new ones specific
to image types. Reuse the same constructor as for samplers adding
a new 'base_type' argument.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
This fixes bug 73200 "vdpau-GL interop fails due to different screen
objects" in the same way radeon does.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Only export __driDriverExtensions by default, and radeon_drm_winsys_create on radeons.
Remove -Bsymbolic which should no longer be needed.
As a side effect, it ought to fix a manifestation of bug 73200 on radeon.
Signed-off-by: Maarten Lankhorst<maarten.lankhorst@canonical.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Fixed piglit test getteximage-targets S3TC CUBE_ARRAY on systems that
don't have libtxc_dxtn installed.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This will be necessary to support cubemap array textures because they
use all four components.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Rectangle textures were not necessary for mipmap generation (because
they cannot have mipmaps), but all of the future users of this common
code will need to support rectangle textures.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This is quite like code we want for blits. Pull it out so that it can
be shared by other paths.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This will allow the same table of shader-per-sampler-type to be used for
paths in meta other than just mipmap generation. This is also the
reason the declarations of the structures was moved towards the top of
the file.
v2: Code formatting change suggested by Brian.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Also... glOrtho(-1.0, 1.0, -1.0, 1.0, -1.0, 1.0) *is* the identity
matrix, so drop the unnecessary call to _mesa_Ortho.
v2: Rename setup_ff_TNL_for_blit() to setup_ff_tnl_for_blit(). Seems
silly to capitalize one out of two to three acronyms in the name
(change by anholt, acked by idr).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com> (v1)
Reviewed-by: Eric Anholt <eric@anholt.net>
I set the "address modify enable" bit in the wrong DWord. The first
DWord is the high 16 bits of the address, while the second is the low
32-bits and enable bit.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
We never set it on previous generations, but I had to set it in
3DSTATE_PS for correct behavior. For symmetry, I set it in 3DSTATE_VS
as well, but there's no actual need to do so. Piglit works fine either
way. The documentation also remarks that there should never be a need
to program this.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
My earlier patch (i965: Reserve space for "Vertex Count" in GS outputs.)
incremented Global Offset for most URB writes to make room for the new
"Vertex Count" field, but failed to shift the URB writes used for
writing control bits.
Confusingly, Global Offset must be incremented by 2 here, rather than 1.
The URB writes we use for actual data are HWord writes, which treat
Global Offset as a 256-bit offset. These are OWord writes, so it's
treated as a 128-bit offset instead.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
I added support for these on Haswell, but forgot to update the Broadwell
code before landing it. Fixes Piglit's max-samplers test.
v2: Use get_element_ud() for the destination as well as the source.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
I added support for these on Haswell, but forgot to update the Broadwell
code before landing it. Partially fixes Piglit's max-samplers test.
v2: Use get_element_ud() consistently, rather than using it for the
source but using brw_vec1_grf for the destination..
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
MOV_RAW disables masking, but doesn't force the instruction to be
uncompressed. That needs to be done by hand.
Fixes textureGather and texture offset tests.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Almost every driver already supported it. All current and future
Gallium drivers always support it, and most existing classic drivers
support it.
This only changes radeon and nouveau.
This extension only adds data types that can be passed to, for example,
glTexImage2D. It does not add internal formats. Since you can already
pass GL_FLOAT to glTexImage2D this shouldn't pose any additional issues
with those drivers. Note that r200 and i915 already supported this
extension, and they don't support floating-point textures either.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Half-float TexBOs should require both GL_ARB_half_float_pixel and
GL_ARB_texture_float. This doesn't matter much in practice. Every
driver that supports GL_ARB_texture_buffer_object already supports
GL_ARB_half_float_pixel. We only expose the TexBO extension in core
profiles, and those require GL_ARB_texture_float.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
drivers/common/meta.c: In function '_mesa_meta_CopyTexSubImage':
drivers/common/meta.c:3744:52: warning: unused parameter 'rb' [-Wunused-parameter]
Unfortunately, the parameter can't just be removed because it is part of
the dd_function_table::CopyTexSubImage interface.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
drivers/common/meta.c: In function 'setup_drawpix_texture':
drivers/common/meta.c:1572:30: warning: unused parameter 'texIntFormat' [-Wunused-parameter]
setup_drawpix_texture has never used this paramater. Before the
refactor commit 04f8193aa it was used in several locations. After that
commit, texIntFormat was only used in alloc_texture.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
All of the other meta routines have a particular pattern for creating
and tracking the VAO and VBO. This one function deviated from that
pattern for no apparent reason.
Almost all of the code added in this patch will be removed shortly.
v2: Drop glDeleteBuffers() of the old, now-uninitialized vbo variable.
Fixes getteximage-formats and fbo-mipmap-copypix regression when "2"
landed in the variable (change by anholt).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Final intermediate step leading to some code sharing. Note that the new
GemerateMipmap and decompress vertex structures are the same as the new vertex
structure in BlitFramebuffer and the others.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Another step leading to some code sharing. Note that the new DrawPixels
vertex structure is the same as the new vertex structure in BlitFramebuffer
and the others.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Another step leading to some code sharing. Note that the new Clear
vertex structure is the same as the new BlitFramebuffer and CopyPixels
vertex structure.
The "sizeof(float) * 7" hack is temporary. It will magically disappear
in a just a couple more patches.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Another step leading to some code sharing. Note that the new CopyPixels
vertex structure is the same as the new BlitFramebuffer vertex
structure.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Initial step of cleaning the exported symbols from targets/omx
- Mark omx_component_library_Setup as public
v2: Keep export-symbols-regex
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com> (v1)
There is no cpp files during the build process, thus we
can safely drop the unused cxxflags.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The function pointer is retrieved via VdpGetProcAddress just
like all the other vdpau functions and should not be exported.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Add VISIBILITY_CFLAGS to automake build, so that
only required symbols are exported.
v2: Rebase
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This symbol is internal and was never part of the API.
Unused by any of the gbm backends, it makes sense to
simply not export it.
Cc: Kristian Høgsberg <krh@bitplanet.net>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
VISIBILITY_CFLAGS
Currently the library exports every symbol imaginable,
rather than the ones defined by the API.
Note: This may cause issues for libraries that are linking
agaist libgbm's internals.
Cc: Kristian Høgsberg <krh@bitplanet.net>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Currently we create a OPENGL_COMPAT context regardless of
what was requested by the program. Correct that by retaining
the program's request and passing it into _mesa_initialize_context.
Based on a similar commit for radeon/r200 by Ian Romanick.
Cc: "9.1 9.2 10.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Add the explicit note about the required version during configure.
Require the same version (151) of udev when building the pipe-loader.
Mention the udev version requirement in GBM Requires.private.
v2: Resolve a couple of silly typos. Spotted by Ilia
v3: Cleanup platfrom/platform typo. Spotten by Stefan
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
As of recently we dlopen the library, additionally the only
code that is including the libudev.h header, is the loader.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Previously the linking was required due to dependency of udev in the
pipe-loader. Now this is no longer the case, as we dlopen the library.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Previously the linking was required due to dependency of udev in the
pipe-loader. Now this is no longer the case, as we dlopen the library.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
--enable-gallium-llvm is required by radeonsi. Currently we
check only for LLVM_VERSION_INT which is 0, whenever gallium-llvm
is disabled explicitly.
./configure --with-gallium-drivers=r600,radeonsi --disable-gallium-llvm
v2: Correct typo in error message. Spotted by Tom Stellard
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Matt Turner noted the incorrect order, but I somehow forgotten to
change it before pushing upstream. The other one is a typo during rebase.
Signed-off-by: Christian König <christian.koenig@amd.com>
If we don't recognize the PCI ID, we can't reasonably load the driver.
However, calling abort() is quite rude - it means the application that
tried to initialize us (possibly the X server) can't continue via
fallback paths. We already have a more polite mechanism - failing to
create the context. So, just use that.
While we're at it, improve the error message.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=73024
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Lu Hua <huax.lu@intel.com>
Consider a multithreaded program with two contexts A and B, and the
following scenario:
1. Context A calls initialize(), which allocates mem_ctx and starts
building built-ins.
2. Context B calls initialize(), which sees mem_ctx != NULL and assumes
everything is already set up. It returns.
3. Context B calls find(), which fails to find the built-in since it
hasn't been created yet.
4. Context A finally finishes initializing the built-ins.
This will break at step 3. Adding a lock ensures that subsequent
callers of initialize() will wait until initialization is actually
complete.
Similarly, if any thread calls release while another thread is still
initializing, or calling find(), the mem_ctx/shader would get free'd while
from under it, leading to corruption or use-after-free crashes.
Fixes sporadic failures in Piglit's glx-multithread-shader-compile.
Bugzilla: https://bugs.freedesktop.org/69200
Signed-off-by: Daniel Kurtz <djkurtz@chromium.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "10.1 10.0" <mesa-stable@lists.freedesktop.org>
In the first case, we can simply call stride(mask, 16, 8, 2) rather than
creating a new register with a different stride, then immediately
changing it a second time.
In the second case, the stride was already what we wanted, so we can
just use mask without any changes at all.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
stride(brw_vec1_reg(...) ...) takes some register, changes the strides,
then changes the strides again. Let's do it once.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
this just ties the mesa code to the pre-existing gallium interface,
I'm not sure what to do with the CSO stuff yet.
0.2: fix min/max bounds
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds GS output and FS input support, even though FS input
support isn't supported until GLSL 4.30 from what I can see.
Signed-off-by: Dave Airlie <airlied@redhat.com>
There are only two sensible placements for 2x MSAA samples - and one is
the mirror image of the other. I chose (0.25, 0.25) and (0.75, 0.75).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
The 4x and 8x cases contained identical code for extracting the X and
Y sample offset values and converting them from U0.4 back to float.
Without this refactoring, we'd have to duplicate it a third time in
order to support 2x MSAA.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Apparently some players are ill-prepared for us claiming that a decoder
exists only to have creating it fail, and express this poor preparation
with crashes (e.g. flash). Check that firmware is there to increase the
chances of there being a high correlation between reported capabilities
and ability to create a decoder.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: 10.0 10.1 <mesa-stable@lists.freedesktop.org>
Tested-by: Emil Velikov <emil.l.velikov@gmail.com>
In commit eeed49f5f2, Mark accidentally
renamed MESA_FORMAT_S8_Z24 to MESA_FORMAT_Z24_UNORM_X8_UINT and
MESA_FORMAT_X8_Z24 to MESA_FORMAT_Z24_UNORM_S8_UINT, reversing their
sense. The commit message was correct, but what sed commands actually
got run didn't match that.
This patch swaps the two enum names, reversing them. This should undo
the damage, but might break things if people have manually fixed a few
instances in the meantime...
Mark's commit also failed to mention renames:
s/MESA_FORMAT_ARGB2101010_UINT\b/MESA_FORMAT_B10G10R10A2_UINT/g
s/MESA_FORMAT_ABGR2101010\b/MESA_FORMAT_R10G10B10A2_UNORM/g
but those seem okay.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
nvfx_fragprog_assign_generic only allows for up to 10/8 texcoords for
nv40/nv30. This fixes compilation of the varying-packing tests.
Furthermore it appears that the last 2 inputs on nv4x don't seem to
work in those tests, so just report 8 everywhere for now.
Tested on NV42, NV44. NV4B appears to have additional problems.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: 9.1 9.2 10.0 10.1 <mesa-stable@lists.freedesktop.org>
We don't need to allocate all the state related to GL_ARB_debug_output
until some aspect of that extension is actually needed.
The sizeof(gl_debug_state) is huge (~285KB on 64-bit systems), not even
counting the 54(!) hash tables and lists that it contains. This change
reduces the size of gl_context alone from 431KB bytes to 145KB bytes on
64-bit systems and from 277KB bytes to 78KB bytes on 32-bit systems.
Reviewed-by: Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The WHILE instruction doesn't have UIP. It only has JIP.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The bits which normally contain the source register descriptions
actually contain the JIP/UIP jump targets, which we already printed.
Interpreting JIP/UIP as source registers results in some really creepy
looking output, like IF statements with acc14.4<0,1,0>UD sources.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Broadwell's 3DSTATE_CLEAR_PARAMS packet expects a floating point value
regardless of format. This means we need to stop converting it to
UNORM.
Storing the value as float would make sense, but since we already have a
uint32_t field, this patch continues shoehorning it into that. In a
sense, this makes mt->depth_clear_value the DWord you emit in the
packet, rather than the clear value itself.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
v2: Fix pasteo of an extra abs being inserted (caught by many). Rewrite
to drop the silly switch statement.
Reviewed-by: Matt Turner <mattst88@gmail.com> (v1)
We've had various bug reports over the years where miptrees are missing,
and when I screwed it up while adding DRI2 to the modesetting driver, I
figured I should put the info necessary for debug here.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This makes it work on Broadwell, too.
v2: Drop bogus double write to 3DPRIM_BASE_VERTEX register
(caught by Chris Forbes).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
This saves some boilerplate and hides the OUT_RELOC/OUT_RELOC64
distinction.
Placing the function in intel_batchbuffer.c is rather arbitrary; there
wasn't really an obvious place for it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Since commit 9cee3ff562, INTEL_DEBUG=vs
has caused a NULL pointer dereference for fixed-function/ARB programs.
In the vec4 generators, "prog" is a gl_program, and "shader_prog" is the
gl_shader_program. This is different than the FS visitor.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Mesa fails to retain the precision qualifier when parsing:
#version 300 es
centroid in mediump vec2 v;
Consider how the parser's type_qualifier production is applied.
First, the precision_qualifier rule creates a new ast_type_qualifier:
<precision: mediump>
Then the storage_qualifier rule creates a second one:
<flags: in>
and calls merge_qualifier() to fold in any previous qualifications,
returning:
<flags: in, precision: mediump>
Finally, the auxiliary_storage_qualifier creates one for "centroid":
<flags: centroid>
it then does $$ = $1 and $$.flags |= $2.flags, resulting in:
<flags: centroid, in>
Since precision isn't stored in the flags bitfield, it is lost. We need
to instead call merge_qualifier to combine all the fields.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reported-by: Kevin Rogovin <kevin.rogovin@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
If st_GetTexImage() is to decompress the texture, avoid the fallback
path even if prefer_blit_based_texture_transfer = false. For drivers
that returned PIPE_CAP_PREFER_BLIT_BASED_TEXTURE_TRANSFER = 0, we
were always taking the fallback path for texture decompression rather
than rendering a quad. The later is a lot faster.
Cc: "10.0" "10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
In the specification text of NV_vertex_program1_1, the upper
limit of the RCC instruction is written as 1.884467e+19 in
scientific notation, but as 0x5F800000 in binary. But the binary
version translates to 1.84467e+19 rather than 1.884467e+19 in
scientific notation.
Since the lower-limit equals 2^-64 and the binary version equals
2^+64, let's assume the value in scientific notation is a typo
and implement this using the value from the binary version
instead.
Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
_eglInitResource() was used to memset entire _EGLSurface by
writing more than size of pointed target. This does work
as long as Resource is the first element in _EGLSurface,
this patch fixes such dependency.
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
_eglInitResource() was used to memset entire _EGLContext by
writing more than size of pointed target. This does work
as long as Resource is the first element in _EGLContext,
this patch fixes such dependency.
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
If a driver enables ARB_gpu_shader5 and sets Const.MaxVertexSteams >= 4,
then piglit's arb_gpu_shader5-minmax test should now pass.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
In the tests they were the same so it didn't matter, but indications are
that this is the correct behaviour. Also take this opportunity to
(trivially) support using gl_Layer in fp.
Cc: 10.1 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Christoph Bumiller <e0425955@student.tuwien.ac.at>
GLX_ARB_create_context allows making a GLX context current with None
drawable and readables, but this was never implemented correctly in GLX.
We would create a __DRIdrawable for the None GLX drawable and pass that
to the DRI driver and that would somehow work. Now it's somehow broken.
The way this should have worked is that we pass a NULL DRI drawable
to the DRI driver when the GLX user calls glXMakeContextCurrent()
with None for drawable and readables.
https://bugs.freedesktop.org/show_bug.cgi?id=74143
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Flush the context when we unmap a buffer, otherwise VDPAU might
start rendering the next frame while we still reference that buffer.
Signed-off-by: Christian König <christian.koenig@amd.com>
Tested-by: StrangeNoises (rachel@strangenoises.org)
Bugzilla (bug needs XBMC changes as well):
https://bugs.freedesktop.org/show_bug.cgi?id=73191
When VL uploads vertex buffers, it uses PIPE_TRANSFER_DONTBLOCK, which always
flushes the context in the winsys if the buffer being mapped is busy. Since
I added handling of DISCARD_RANGE, DONTBLOCK has had no effect when combined
with DISCARD_RANGE and I think the context isn't flushed anywhere else,
so no commands are submitted to the GPU until the IB is full, which takes
a lot of frames.
Using DISCARD_RANGE is not the only way to trigger this bug. The other way
is to reallocate the vertex buffer before every upload.
BTW, I'm not sure if this is the right place for flushing, but it does fix
the bug.
v2 (chk): move the flush to the right place.
Signed-off-by: Christian König <christian.koenig@amd.com>
Tested-by: StrangeNoises (rachel@strangenoises.org)
v2: This doesn't change the behavior. It only moves the tiling check
to r600_init_resource and removes the usage parameter.
Reviewed-by: Christian König <christian.koenig@amd.com>
Featuring a full grown MPEG2 and H264 decoder and a couple of hundred bugs.
v2 (Leo): fix an error for pic_order_cnt_type 1
v3 (Leo): implement support for field decoding
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Leo Liu <leo.liu@amd.com>
dri2_dup_image was not copying the dri_format field.
This was causing some bugs, for example:
. we create an gbm_bo.
. we get an EGLImage from the gbm_bo.
. Bug: impossible to get again the gbm_bo from the EGLImage by
importing. (gbm dri2 backend)
Signed-off-by: Axel Davy <axel.davy@ens.fr>
This regressed when I converted BRW_REGISTER_TYPE_* to be an abstract
type that doesn't match the hardware description. dump_instruction()
was using reg_encoding[] from brw_disasm.c, which no longer matches
(and was incorrect for Gen8+ anyway).
This patch introduces a new function to convert the abstract enum values
into the letter suffix we expect.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reported-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Mesa now has a real, feature-rich EGL implementation on X11 via xcb.
Therefore I believe there is no longer a practical need for the egl_glx
driver.
Furthermore, egl_glx appears to be unmaintained. The most recent
nontrivial commit to egl_glx was 6baa5f1 on 2011-11-25.
Tested by running weston-smoke in windowed Weston on X with i965.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Kristian Høgsberg <krh@bitplanet.net>
ureg_program is allocated on the heap so we can just bump the
number of immediates that it can handle. It's needed for d3d10.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
We need to handle a lot more immediates and in order to do that
we also switch from allocating this structure on the stack to
allocating it on the heap.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
We only supported up to 256 immediates, which isn't enough. We had
code which was allocating immediates as an allocated array, but it
was always used along a statically backed array for performance
reasons. This commit adds code to skip that performance optimization
and always use just the dynamically allocated immediates if the
number of them is too great.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The number of allowed temporaries increases almost with every
iteration of an api. We used to support 128, then we started
increasing and the newer api's support 4096+. So if we notice
that the number of temporaries is larger than our statically
allocated storage would allow we just treat them as indexable
temporaries and allocate them as an array from the start.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
If multiple color outputs are written, this shader is unlikely to be
useful with a winsys framebuffer.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The driver is supposed to ensure buffers before any drawing operation, but in
do_blit_drawpixels() and do_blit_copypixels() we inspect the buffer format
before calling intel_prepare_render(). That was covered up by the
unconditional call to intel_prepare_render() in intelMakeCurrent(), but we
now only do this on the initial intelMakeCurrent call for a context
(to get the size for the initial viewport values).
https://bugs.freedesktop.org/show_bug.cgi?id=74083
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Tested-by: Alexander Monakov <amonakov@gmail.com>
This will allow testing of compute shader functionality before it is
completed.
To enable ARB_compute_shader functionality in the i965 driver, set
INTEL_COMPUTE_SHADER=1.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
v2: Document that the 3-element array MaxComputeWorkGroupCount is
indexed by dimension.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
v2: Document that the 3-element array MaxComputeWorkGroupSize is
indexed by dimension.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This patch adds MESA_SHADER_COMPUTE to the gl_shader_stage enum.
Also, where it is trivial to do so, it adds a compute shader case to
switch statements that switch based on the type of shader. This
avoids "unhandled switch case" compiler warnings.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Linker loops that iterate through all the stages in the pipeline need
to use MESA_SHADER_FRAGMENT as a bound, so that we can add an
additional MESA_SHADER_COMPUTE stage, without it being erroneously
included in the pipeline.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Previously, we were really doing F2I. And also move it to generic section.
(Note that for llvmpipe the code generated is definitely bad, due to lack
of unsigned conversions with sse. I think though what llvm does (using scalar
conversions to 64bit signed either with x87 fpu (32bit) or sse (64bit)
including lots of domain changes is quite suboptimal, could do something like
is_large = arg >= 2^31
half_arg = 0.5 * arg
small_c = fptoint(arg)
large_c = fptoint(half_arg) << 1
res = select(is_large, large_c, small_c)
which should be much less instructions but that's something llvm should do
itself.)
This fixes piglit fs/vs-float-uint-conversion.shader_test (maybe more, needs
GL 3.0 version override to run.)
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Zack Rusin <zackr@vmware.com>
This moves the value from the GS shader to the copy shader so the registers
are setup correctly.
fixes tests/spec/glsl-1.50/execution/geometry/clip-distance-out-values.shader_test
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
attempt to calculate a better value for array size to avoid breaking apps.
v2: use 0xfff like streamout, suggested by Grigori
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
cayman has a different end of program bit, so do that properly.
fixes hangs with geom shader tests on cayman.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
set the correct values so the misc out register is setup correctly
for the copy shader.
This also updates the state for the gs copy shader so the hw
gets programmed correctly.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
This just makes r600 and evergreen do what the radeonsi codepaths do
for layered rendering. This makes the 2d amd_vertex_shader_layer test
pass on evergreen.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
This just adds support for emitting the proper value in the VS out misc.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
This just enables the workarounds we have for vertex/pixel shaders
for geom shaders as well.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
This follows what fglrx does, it unpacks the input we are
going to indirect into a bunch of registers and indirects
inside them.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
We need to be able to write to the ring using a base register
for when we emit vertices in a loop, in theory the SB compiler
could collapse these indirect writes to direct writes if the
register value is constant and known, but that is outside my
pay grade.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Vadim's code derived it from the info.mode, but it needs
to be takes from the geometry shader output primitive.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
The instance cnt register was missing for a few kernels,
with a new enough kernel we can output it.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
If the shader has no CF clauses at all emit an nop
If the last instruction is an ENDLOOP add a NOP for the LOOP to go to
if the last instruction is CALL_FS add a NOP
These fix a bunch of hangs in the geometry shader tests.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
SB needs fixes for three GS instructions it seems to raise
them outside loops etc despite my best efforts.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Although we don't use SB on geom shaders, the VS copy shader will use it
so we might as well implement MEM_RING support in sb.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
This can happen in normal operation, so don't report an error on it,
just continue.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
This is Vadim's initial work with a few regression fixes squashed in.
v2: (airlied)
fix regression in glsl-max-varyings - need to use vs and ps_dirty
fix regression in shader exports from rebasing.
whitespace fixing.
v2.1: squash fix assert
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
For geometry shaders we need to call this code from a second place.
Just move it out for now to keep future patches cleaner.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
From the GLSL 4.40 spec, section 6.4 (Jumps):
The continue jump is used only in loops. It skips the remainder of
the body of the inner most loop of which it is inside. For while
and do-while loops, this jump is to the next evaluation of the
loop condition-expression from which the loop continues as
previously defined.
Previously, we incorrectly treated a "continue" statement as jumping
to the top of a do-while loop.
This patch fixes the problem by replicating the loop condition when
converting the "continue" statement to IR. (We already do a similar
thing in "for" loops, to ensure that "continue" causes the loop
expression to be executed).
Fixes piglit tests:
- glsl-fs-continue-inside-do-while.shader_test
- glsl-vs-continue-inside-do-while.shader_test
- glsl-fs-continue-in-switch-in-do-while.shader_test
- glsl-vs-continue-in-switch-in-do-while.shader_test
Cc: mesa-stable@lists.freedesktop.org
Acked-by: Carl Worth <cworth@cworth.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
In addition to making it public, we also need to change its first
argument from an ir_loop * to an exec_list *, so that it can be used
to insert the condition anywhere in the IR (rather than just in the
body of the loop).
This will be necessary in order to make continue statements work
properly in do-while loops.
Cc: mesa-stable@lists.freedesktop.org
Acked-by: Carl Worth <cworth@cworth.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This is really not needed as blorp blit programs already sample
XRGB normally and get alpha channel set to 1.0 automatically by
the sampler engine. This is simply copied directly to the payload
of the render target write message and hence there is no need for
any additional blending support from the pixel processing pipeline.
The blending formula is anyway broken for color components, it
multiplies the color component with itself (blend factor is the
component itself).
Alpha blending in turn would not fix the alpha to one independent
of the source but simply used the source alpha as is instead
(1.0 * src_alpha + 0.0 * dst_alpha).
Quoting Eric:
"If we want to actually make the no-alpha-bits-present thing work,
we need to override the bits in the surface state or in the
generated code. In the normal draw path, it's done for sampling
by the swizzling code in brw_wm_surface_state.c, and the blending
overrides is just to fix up the alpha blending stage which
doesn't pay attention to that for the destination surface."
If one modifies piglit test gl-3.2-layered-rendering-blit to use
color component values other than zero or one, this change will
kick in on IVB. No regressions on IVB.
This is effectively revert of c0554141a9:
i965/blorp: Support overriding destination alpha to 1.0.
Currently, Blorp requires the source and destination formats to be
equal. However, we'd really like to be able to blit between XRGB and
ARGB formats; our BLT engine paths have supported this for a long time.
For ARGB -> XRGB, nothing needs to occur: the missing alpha is already
interpreted as 1.0. For XRGB -> ARGB, we need to smash the alpha
channel to 1.0 when writing the destination colors. This is fairly
straightforward with blending.
For now, this code is never used, as the source and destination formats
still must be equal. The next patch will relax that restriction.
NOTE: This is a candidate for the 9.1 branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This moves the intel_batchbuffer_flush before the drm_intel_bo_busy
call, which is a change in behavior. However, the old behavior was
broken.
In the future, we may want to only flush in the batchbuffer references
the BO being mapped. That's certainly more typical.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Carl Worth <cworth@cworth.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This additionally measures the time stalled, while also simplifying the
code.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Carl Worth <cworth@cworth.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Mapping a buffer is a common place where we could stall the CPU.
In a few places, we've added special code to check whether a buffer is
busy and log the stall as a performance warning. Most of these give no
indication of the severity of the stall, though, since measuring the
time is a small hassle.
This patch introduces a new brw_bo_map() function which wraps
drm_intel_bo_map, but additionally measures the time stalled and reports
a performance warning. If performance debugging is not enabled, it
simply maps the buffer with negligable overhead.
We also add a similar wrapper for drm_intel_gem_bo_map_gtt().
This should make it easy to add performance warnings in lots of places.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Carl Worth <cworth@cworth.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Hw binning pass doesn't seem to have broken anything. And optimizing
compiler fixes a lot of shaders and doesn't seem to break anything. So
re-org slightly FD_MESA_DEBUG params and make both hw binning and
optimizer enabled by default.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
The new compiler generates a dependency graph of instructions, including
a few meta-instructions to handle PHI and preserve some extra
information needed for register assignment, etc.
The depth pass assigned a weight/depth to each node (based on sum of
instruction cycles of a given node and all it's dependent nodes), which
is used to schedule instructions. The scheduling takes into account the
minimum number of cycles/slots between dependent instructions, etc.
Which was something that could not be handled properly with the original
compiler (which was more of a naive TGSI translator than an actual
compiler).
The register assignment is currently split out as a standalone pass. I
expect that it will be replaced at some point, once I figure out what to
do about relative addressing (which is currently the only thing that
should cause fallback to old compiler).
There are a couple new debug options for FD_MESA_DEBUG env var:
optmsgs - enable debug prints in optimizer
optdump - dump instruction graph in .dot format, for example:
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot.pnghttp://people.freedesktop.org/~robclark/a3xx/frag-0000.dot
At this point, thanks to proper handling of instruction scheduling, the
new compiler fixes a lot of things that were broken before, and does not
appear to break anything that was working before[1]. So even though it
is not finished, it seems useful to merge it in it's current state.
[1] Not merged in this commit, because I'm not sure if it really belongs
in mesa tree, but the following commit implements a simple shader
emulator, which I've used to compare the output of the new compiler to
the original compiler (ie. run it on all the TGSI shaders dumped out via
ST_DEBUG=tgsi with various games/apps):
163b6306b1
Signed-off-by: Rob Clark <robclark@freedesktop.org>
For the time being, keep old compiler as fallback for things that the
new compiler does not support yet. Split out as it's own commit to make
the later new-compiler commits easier to follow.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Not really used for anything anymore. So strip it out and avoid
conflicting symbols with upcoming new-compiler.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
When we clipped a line weren't copying the provoking vertex
color to the second vertex. We also weren't checking for
first vs. last provoking vertex.
Fixes failures found with the new piglit line-flat-clip-color test.
Cc: "10.0, 10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This has been wrong for many years. It was originally 0x000FFFFF and long
ago there was discussion about whether GL_ALL_ATTRIB_BITS should include
the then-new GL_MULTISAMPLE_BIT bit. Eventually the ARB decided that
glPushAttrib(GL_ALL_ATTRIB_BITS) should save all current and future
attribute groups (hence ~0). Unfortunately, Mesa's gl.h was never updated.
This was just recently spotted by Eric Anholt and reported as a bug to the
ARB. Ian, Jon Leech and I discussed it at the ARB meeting and decided to
change Mesa's value to reflect the ARB's decision.
Acked-by: Eric Anholt <eric@anholt.net>
If the shader is too large, plug in a dummy shader. This patch also
reworks the existing dummy shader code.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
gallivm soa code supported only a single level of nesting for
control flow opcodes (if, switch, loops...) but the d3d10 spec
clearly states that those are nested within functions. To support
nesting of conditionals inside functions we need to store the
nesting data inside function contexts and keep a stack of those.
Furthermore we make sure that if nesting for subroutines is deeper
than 32 then we simply ignore all subsequent 'call' invocations.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
DirectX and most hardware documentation use the term "Index Buffer" to
refer to a buffer containing indexes into arrays of vertex data, which
allows random access to vertex data, rather than sequential access.
OpenGL uses a different term for this concept: "Element Array Buffer".
However, "Index Buffer" has become much more widespread. A quick
Google search shows 29,300 hits for "Element Array Buffer" vs.
82,300 hits for "Index Buffer."
Arguably, "Index Buffer" is clearer: an "element of an array" (or list)
usually refers to an actual item stored in the array, not the index used
to refer to it.
The terminology is also already used in Mesa: some VBO module code for
dealing with ElementArrayBufferObj names local variables "ib".
Completely generated by:
$ find . -type f -print0 | xargs -0 sed -i \
's/ElementArrayBufferObj/IndexBufferObj/g'
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
For consistency with the previous renames.
Completely generated by:
$ find . -type f -print0 | xargs -0 sed -i \
's/_mesa_lookup_arrayobj/_mesa_lookup_vao/g'
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
_mesa_update_vao_client_arrays() is less of a mouthful than
_mesa_update_array_object_client_arrays(), and generally clearer.
Generated by:
$ find . -type f -print0 | xargs -0 sed -i \
's/_mesa_\([^_]*\)_array_object/_mesa_\1_vao/g'
with manual whitespace and indentation fixes applied.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
I considered replacing it with "gl_vao", but spelling it out seemed to
fit better with Mesa's traditional style. Mesa doesn't shy away from
long type names - consider gl_transform_feedback_object,
gl_fragment_program_state, gl_uniform_buffer_binding, and so on.
Completely generated by:
$ find . -type f -print0 | xargs -0 sed -i \
's/gl_array_object/gl_vertex_array_object/g'
v2: Rerun command to resolve conflicts with Ian's meta patches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Now that the field is named "VAO" instead of "ArrayObj", it makes sense
to call the local variables "vao" instead of "arrayObj".
Completely generated by:
$ find . -type f -print0 | xargs 0 sed -i 's/arrayObj/vao/g'
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
When reading through the Mesa drawing code, it's not immediately obvious
to me that "ArrayObj" (gl_array_object) is the Vertex Array Object (VAO)
state. The comment above the structure explains this, but readers still
have to remember this and translate accordingly.
Out of context, "array object" is a fairly vague. Even in context,
"array" has a lot of meanings: glDrawArrays, vertex data stored in user
arrays, gl_client_arrays, gl_vertex_attrib_arrays, and so on.
Using the term "VAO" immediately associates these fields with the OpenGL
concept, clarifying the situation and aiding programmer sanity.
Completely generated by:
$ find . -type f -print0 | xargs -0 sed -i \
-e 's/ArrayObj;/VAO;/g' \
-e 's/->ArrayObj/->VAO/g' \
-e 's/Array\.ArrayObj/Array.VAO/g' \
-e 's/Array\.DefaultArrayObj/Array.DefaultVAO/g'
v2: Rerun command to resolve conflicts with Ian's meta patches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Silences many GCC warnings of the form:
drivers/common/meta.c: In function 'cleanup_temp_texture':
drivers/common/meta.c:1208:41: warning: unused parameter 'ctx' [-Wunused-parameter]
drivers/common/meta.c: In function 'setup_ff_blit_framebuffer':
drivers/common/meta.c:1453:46: warning: unused parameter 'ctx' [-Wunused-parameter]
drivers/common/meta.c: In function 'meta_glsl_blit_cleanup':
drivers/common/meta.c:1998:43: warning: unused parameter 'ctx' [-Wunused-parameter]
drivers/common/meta.c: In function 'meta_glsl_clear_cleanup':
drivers/common/meta.c:2287:44: warning: unused parameter 'ctx' [-Wunused-parameter]
drivers/common/meta.c: In function 'setup_ff_generate_mipmap':
drivers/common/meta.c:3365:45: warning: unused parameter 'ctx' [-Wunused-parameter]
drivers/common/meta.c: In function 'meta_glsl_generate_mipmap_cleanup':
drivers/common/meta.c:3556:54: warning: unused parameter 'ctx' [-Wunused-parameter]
There are a couple other similar warnings, but they are less trivial. I
want to investigate these further before axing them.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Array textures can't be used with fixed-function, so don't. Instead,
just drop the decompress request on the floor. This is no worse than
what was done previously because generating the GL error (in
_mesa_set_enable) broke everything anyway.
A later patch will get GL_TEXTURE_2D_ARRAY targets working.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
There is no need to use pixel coordinates, and using NDC directly will
simplify the GLSL paths.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
For these objects, meta was already using the non-Apple function to
delete the objects. Everywhere else in the file uses
_mesa_GenVertexArrays and _mesa_BindVertexArrays.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: "9.1 9.2 10.0" <mesa-stable@lists.freedesktop.org>
The hardware decompression path isn't even close to being able to handle
this. This converts the crash (assertion failure) in
"EXT_texture_compression_s3tc/getteximage-targets S3TC CUBE_ARRAY" to a
plain old failure.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: "9.1 9.2 10.0" <mesa-stable@lists.freedesktop.org>
_mesa_meta_DrawPixels creates a VAO and (potentially) two fragment
programs, but none of them are ever released. Leaking piles of memory
is generally frowned upon.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: "9.1 9.2 10.0" <mesa-stable@lists.freedesktop.org>
decompress_texture_image creates an FBO, an RBO, a VBO, a VAO, and a
sampler object, but none of them are ever released. Later patches will
add program objects, exacerbating the problem. Leaking piles of memory
is generally frowned upon.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: "9.1 9.2 10.0" <mesa-stable@lists.freedesktop.org>
TEXTURE_BUFFER_INDEX has to be specially called out because it is not
allowed in any of the glTexParameter or glGetTexParameter functions.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The next patch will use this function in another file.
v2: Rename _mesa_target_enum_to_index to _mesa_tex_target_to_index.
Suggested by Brian.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
glTexSubImage(), glCopyTexSubImage() and glCompressedTexSubImage()
only change the texel data, not other state like texture size or format.
If a driver really needs do something special it can hook into the
corresponding driver functions or Map/UnmapTextureImage().
This should avoid some needless state validation effort.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The _mesa_get_current_tex_object() function is now used everywhere that
_mesa_select_tex_object() was formerly used.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
As far as I know, this should be safe. If not, we have to decide whether
to have variable lookup of the functions, or just drop support for .so.0
(which is a year and a half old it looks like)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=74127
Reviewed-by: Matt Turner <mattst88@gmail.com>
Updates to non-banked registers, CP_LOAD_STATE, etc, need a WFI if there
is potentially pending rendering. Track this better, and add fd_wfi()
calls everywhere that might potentially need CP_WAIT_FOR_IDLE.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Gallium can leave const buffers bound above what is used by the current
shader. Which can have a couple bad effects:
1) write beyond const space assigned, which can trigger HLSQ lockup
2) double emit of immed consts, first with bound const buffer vals
followed by with actual immed vals. This seems to be a sort of
undefined condition.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Currently lowers the following instructions:
DST, XPD, SCS, LRP, FRC, POW, LIT, EXP, LOG, DP4,
DP3, DPH, DP2
translating these into equivalent simpler TGSI instructions.
This probably should be moved to util so other drivers can use
it, but just adding under freedreno for now so that I can clear
out a lot of the lowering code in a3xx compiler before beginning
to add new compiler.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
The ctx should hold ref to dev to avoid problems if screen is destroyed
before ctx. Doesn't really fix the egl/glx issues, but at least it
prevents things from getting much worse.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
In the final revision of my gen8_generator patch, I updated the MATH
instruction's assertion from (dst.hstride == 1) to check that source and
destination hstride matched. Unfortunately, I didn't test this enough,
and many Piglit tests fail this test.
The documentation indicates that "scalar source is also supported",
which we believe means <0,1,0> access mode (hstride == 0). If hstride
is non-zero, then it must match the destination register.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This puts the PCI IDs in place so it's easy to enable support. However,
it doesn't actually enable support since it's very preliminary still,
and a few crucial pieces (such as BLORP) are still missing.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Eric Anholt <eric@anholt.net>
Eric believes this to be wrong and unnecessary, as the command is
supposed to emit an implicit rectangle primitive. However, empirically
the pixel pipeline is completely unreliable without it. So for now, it
stays until someone comes up with a better solution.
We'll need to do better than this when we implement multisampling, HiZ,
or fast clears...but for now, this will do.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Eric Anholt <eric@anholt.net>
This is quite similar to the Gen7 code. The main changes:
- 48-bit relocations
- Thread count is specified as U/2-1 instead of U-1.
- An extra DWord (DW9) with clip planes, URB entry output length/offsets
- We need to program the "Expected Vertex Count" (VerticesIn)
v2: Set the number of binding table entries so they can be prefetched
(requested by Eric Anholt).
v3: Add a WARN_ONCE for a missing workaround.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
On previous platforms, 3DSTATE_MULTISAMPLE contained the number of
samples, pixel location, and the positions of each sample within a pixel
for each multisampling mode (4x and 8x). It was also a non-pipelined
command, presumably since changing the sample positions is fairly
drastic.
Broadwell improves upon this by splitting the sample positions out into
a separate non-pipelined state packet, 3DSTATE_SAMPLE_PATTERN. With
that removed, 3DSTATE_MULTISAMPLE becomes a pipelined state packet.
Broadwell also supports 2x and 16x multisampling, in addition to the 4x
and 8x supported by Gen7. This patch, however, does not implement 2x
and 16x.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
The amount of cut and paste from Gen7 is rather ugly, and should
probably be cleaned up in the future. Even the Gen7 code is in need of
some tidying though; many of the function parameters aren't used on
platforms that use level/layer rather than tile offsets. Tidying both
can be left to a future patch series. This at least gets things going.
v2: Rebase on Paul's rename of NumLayers -> MaxNumLayers.
v3: Shift QPitch by 2 when storing it in the packet. Bits 14:0 store
bits 16:2 of the actual value. Fixes tests.
v4: Add missing stencil buffer QPitch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Eric Anholt <eric@anholt.net>
v2: Allow logic ops on all surface types. The UNORM restriction was
lifted with Haswell and I simply hadn't noticed. Also, add missing
BRW_NEW_STATE_BASE_ADDRESS dirty bit. Both caught by Eric Anholt.
v3: Fix swapped per-RT DWord pairs. Eliminates bizarre hacks.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
It has additional fields to support clipping to the viewport even if
guardband clipping is enabled.
v2: Update for viewport array changes.
v3: No, seriously, update for viewport array changes.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net> [v1]
v2: Add missing SCS setting in gen8_emit_buffer_surface_state (caught by
Eric Anholt).
v3: Use stored QPitch rather than recomputing it.
v4: Shift QPitch by 2 when setting it in the packet; bits 14:0 store
bits 16:2 of the actual value (fixes myriads of cube and array
texturing tests). Also, only enable cube face bits for cubemaps
(matches Chris Forbes' commit on master). Port to use offset64.
v5: s/gl_format/mesa_format/g
v6: Fix DW5 of renderbuffer state, which neglected to subtract
irb->mt->first_level. Use vertical_alignment() rather than
hardcoding 4. Use ffs for multisample counts rather than a
large switch statement (all caught/suggested by Eric).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Unlike on Gen7, we can directly set the offset via the state packet.
We also -have- to: the kernel SOL reset code won't work anymore.
v2: Fix copy and paste mistake in buffer stride setup; drop stale
comment (caught by Eric Anholt). Add a perf_debug for missing
MOCS setup.
v3: Rebase on Paul Berry's changes to CurrentVertexProgram.
v4: Fix SO Write Offset handling. We need to set bits 20 and 21 so the
hardware both loads and saves the offset. There's also a
restriction that 3DSTATE_SO_BUFFER can only be programmed once per
buffer between primitives, so the "reset to zero" code needed
reworking. Fixes most of the transform feedback Piglit tests.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net> [v2]
v2: Also disable 3DSTATE_WM_CHROMAKEY for safety.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net> [v1]
Broadwell's winding order, polygon fill, and viewport Z test fields have
moved to DWord 1 of 3DSTATE_RASTER.
v2: Add a perf_debug for a future optimization and improve commit
message (both suggested by Eric Anholt).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
v2: Emit a dummy 3DSTATE_VF_SGVS packet when not needed.
v3: Add WARN_ONCE and perf_debugs requested by Eric Anholt.
v4: Program 3DSTATE_SGVS even in the no-elements case so gl_VertexID
continues working. Fix 3DSTATE_VF_INSTANCING to not use an
element index to access the buffers array. Some ARB_draw_indirect
prep work.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
v2: Fix missing "change" bit on instruction state base address
(caught by Haihao Xiang).
v3: Add a perf_debug for missing MOCS setup, requested by Eric.
v4: Fix buffer sizes. The value, specified at bit 12 and up, is
actually measured in 4k pages. We need to round up to the
next multiple of 4k.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net> [v3]
Reviewed-by: Matt Turner <mattst88@gmail.com> [v4]
v2: Fix setting of GEN8_PSX_ATTRIBUTE_ENABLE after rebases.
v3: Add missing binding table entry counts. Don't worry about alpha
testing or alpha to coverage when setting the "Kill Pixel" bit;
those are specified in 3DSTATE_PS_BLEND (caught by Eric Anholt).
Drop unused _NEW_BUFFERS. Tidy comments.
v4: Rebase on Paul Berry's changes to CurrentFragmentProgram.
v5: Re-enable line stippling. It doesn't crash or anything.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net> [v3]
v2: Remove incorrect MOCS shifts; rename urb_entry_write_offset to
urb_entry_output_offset to closer match the documentation.
v3: Only emit a non-zero constant buffer read length when active.
v4: Add missing binding table counts (caught by Eric).
v5: Rebase on Paul Berry's changes to CurrentVertexProgram.
v6: Drop bogus SBE read length/offset field code. We were programming
the wrong values, and our 3DSTATE_SBE code overrides any value we
put here anyway with the correct one.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net> [v4]
v2: Only set GEN8_PS_BLEND_HAS_WRITEABLE_RT if color buffer writes are
enabled (caught by Eric Anholt).
v3: Set non-blending flags (writeable RT, alpha test, alpha to coverage)
for integer formats too. +14 Piglits.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net> [v2]
v2: Use stencil->_WriteEnabled instead of setting
GEN8_WM_DS_STENCIL_BUFFER_WRITE_ENABLE twice (suggested by Eric).
v3: Mask stencil->WriteMask and stencil->ValueMask with 0xff. The field
is only 8-bits, so we'd trip the new SET_FIELD assertion when core
Mesa gave us a value like 0xFFFFFFFF. The Gen7 code uses structure
field widths to implicitly do this truncation. Fixes Piglit tests.
v4: Use uint32_t for dw1/dw2, not uint8_t. Worst. Typo. Ever.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net> [v2]
The attribute override portion of 3DSTATE_SBE was split out into
3DSTATE_SBE_SWIZ; various bits of 3DSTATE_SF were split out into
3DSTATE_RASTER.
v2: Set Force URB Read Offset bit. Eventually the URB read offset
should be set in 3DSTATE_VS, but that will require some refactoring.
v3: Rebase on viewport array changes.
v4: Improve comments about URB read length/offset overrides.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
I haven't investigated whether these are necessary on Broadwell or not,
but for paranoia's sake, we may as well continue doing them for now.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
It's going to diverge significantly. Starting out with a copy allows
future patches to change atoms one by one.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The code re-enabling denorms for small float formats did not recognize
this format due to format handling hacks (mainly, the lp_type doesn't have
the floating bit set).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The -p option we now use when calling bison means that this variable will be
named glcpp_parser_debug not yydebug. This was not caught when the -p option
was added because this variable isn't used in the code as committed. (I prefer
the declaration to remain since it allows a developer to easily find this
variable name to enable debugging.)
This is the innocent-looking but killer test case to verify the bug fixed in
the preceding commit.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
In commit 6005e9cb28 a new start state of NEWLINE_CATCHUP was added to the
lexer. This start state is used whenever the lexer is emitting a NEWLINE token
to emit additional NEWLINE tokens for any newline characters that were skipped
by an immediately preceding multi-line comment.
However, that commit erroneously entered the NEWLINE_CATCHUP state for
single-line comments. This is not desired since in the case of a single-line
comment, the lexer is not emitting any NEWLINE token. The result is that the
lexer will remain in the NEWLINE_CATCHUP state and proceed to fail to emit a
NEWLINE token for the subsequent newline character, (since the case to match \n expects only the INITIAL start state).
The fix is quite simple, remove the "BEGIN NEWLINE_CATCHUP" code from the
single-line comment case, (preserving it only in exactly the cases where the
lexer is actually emitting a NEWLINE token).
Many thanks to Petri Latvala for reporting this bug and for providing the
minimal test case to exercise it. The bug showed up only with a multi-line
comment which was followed immediately by a single-line comment (without any
intervening newline), such as:
/*
*/ // Kablam!
Since 6005e9cb28, and before this commit, that very innocent-looking
combination of comments would yield a parse failure in the compiler.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=72686
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This automatically adjusts the number of buffers that we want based on
what swapping mode the X server is using and the current swap interval:
swap mode interval buffers
copy > 0 1
copy 0 2
flip > 0 2
flip 0 3
Note that flip with swap interval 0 is currently limited to twice the
underlying refresh rate because of how the kernel manages flipping. Moving
from 3 to 4 buffers would help, but that seems ridiculous.
v2: Just update num_back at the point that the values that change num_back
change. This means we'll have the updated value at the point that the
freeing of old going-to-be-unused backbuffers happens, which might not
have been the case before (change by anholt, acked by keithp).
Signed-off-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
The __DRIimage createImageFromFds function takes a fourcc code, but there was
no fourcc code that match __DRI_IMAGE_FORMAT_SARGB8. This adds a define for
that format, adds a translation in DRI3 from __DRI_IMAGE_FORMAT_SARGB8 to
__DRI_IMAGE_FOURCC_SARGB8888 and then adds translations *back* to
__IMAGE_FORMAT_SARGB8 in both the i915 and i965 drivers.
I'll refrain from comments on whether I think having two separate sets of
format defines in dri_interface.h is a good idea or not...
Fixes piglit glx-tfp and glx-visuals-depth
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
XCB doesn't flush the output buffer automatically, so we have to call
xcb_flush ourselves before waiting.
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Now that we're tracking SBC values correctly, and the X server has the
ability to send the GLX swap events from a PresentPixmap request, enable
this extension.
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Eric figured out that glXWaitForSbcOML wanted to block until the requested
SBC had been completed, which means to wait until the
PresentCompleteNotify event for that SBC had been received.
This replaces the simple sleep(1) loop (which was bogus) with a loop that
just checks to see if we've seen the specified SBC value come back in a
PresentCompleteNotify event yet.
The change is a bit larger than that as I've broken out a piece of common
code to wait for and process a single Present event for the target
drawable.
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Tracking the full 64-bit SBC values makes it clearer how those values are
being used, and simplifies the wait_msc code. The only trick is in
re-constructing the full 64-bit value from Present's 32-bit serial number
that we use to pass the SBC value from request to event.
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This consolidates how we link the libraries into the build directory.
It works for lib_LTLIBRARIES but not custom shared libraries like DRI
drivers or gallium state trackers which needs special casing (cf dri
mega drivers, for example)
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Spamming the pci id is not beneficial. Make sure it's printed
only when needed.
v2: Change severity to _LOADER_DEBUG, rather than removing
the message.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The former symbol is never defined within mesa. Based on the code
it seems that the original intent was to use NDEBUG.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Makefiles need hard tabs, let's not make that harder than it needs to be.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Every driver supports it. All current and future Gallium drivers always
support it, and all existing classic drivers support it.
v2: Making GL_ARB_map_buffer_alignment a desktop OpenGL extension only.
v3: Squash two commits together.
v4 (idr): MIN_MAP_BUFFER_ALIGNMENT queries don't have any dependencies.
In previous versions of the patch it depended on EXTRA_API_GL which
would prevent the query from working in core profile contexts.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
This driver does not support GL_ARB_map_buffer_range, so no special
treatment is needed for unaligned offsets in the mapping.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
These drivers do not support GL_ARB_map_buffer_range, so no special
treatment is needed for unaligned offsets in the mapping.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Ian manually ran the map_buffer_range* tests and the
arb_map_buffer_alignment-* tests, but he did not do a full piglit run.
v2 (idr): Use 64 instead of 4096
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
Move parameter loads out of loops, and use the instruction offset
instead of a VGPR for the vertex attribute offset when writing to the
ESGS ring buffer.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Just always bind the current states before drawing.
Besides the simplification, as a bonus this makes sure the VS hardware
shader stage always uses the GS copy shader when a geometry shader is
active, fixing a number of GS related piglit tests.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
It needs to increment at shader runtime, not at shader compile time, as
the geometry shader can emit vertices in loops. LLVM automagically
converts the ID back to an immediate value if its value can be
determined at compile time.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Transforms, for example,
mul vgrf3, vgrf2, vgrf1
mov.sat vgrf4, vgrf3
into
mul.sat vgrf3, vgrf2, vgrf1
mov vgrf4, vgrf3
which gives register_coalescing an opportunity to remove the MOV
instruction.
total instructions in shared programs: 1515039 -> 1504634 (-0.69%)
instructions in affected programs: 798586 -> 788181 (-1.30%)
GAINED: 0
LOST: 4
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
OpenGL 3.3 spec expects GL_INVALID_OPERATION:
"For both the default framebuffer and framebuffer objects, the
constants FRONT, BACK, LEFT, RIGHT, and FRONT AND BACK are not
valid in the bufs array passed to DrawBuffers, and will result
in the error INVALID OPERATION."
But OpenGL 4.0 spec changed the error code to GL_INVALID_ENUM:
"For both the default framebuffer and framebuffer objects, the
constants FRONT, BACK, LEFT, RIGHT, and FRONT_AND_BACK are not
valid in the bufs array passed to DrawBuffers, and will result
in the error INVALID_ENUM."
This patch changes the behaviour to match OpenGL 4.0 spec
Fixes Khronos OpenGL CTS draw_buffers_api.test.
V2: Update the comment in code.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
I sometimes build without EGL just for speed purposes, however
it no longer finds my drivers when I do due to the HAVE_LIBUDEV
defines being wrong.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Fixes the tests for the depth parameter for TexImage3D calls when the
target type is GL_TEXTURE_2D_ARRAY or GL_TEXTURE_CUBE_MAP_ARRAY
so that a depth value of 0 is accepted. Previously, the check
incorrectly required the depth argument to be atleast 1.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The code depending on the definitions is already wrapped
in the same conditional so go ahead and wrap the include.
Otherwise we'll brake compilation on platforms that are
missing the header. Add assert.h in there as well, as it
is introduced and used in the same fashon.
Cc: Eric Anholt <eric@anholt.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=74122
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
We have code generation paths that carry out swizzles of AoS vectors via
bitwise shifts, as these tend to generate more efficient code than
straightforward byte shuffles. But when the input is a constant the
additional bitwise arithmetic operations somehow don't really get
constant propagated properly, evenutally causing assertion failure in
InstCombine pass.
Therefore avoid the bug by using the trivial shuffles for constant
inputs.
Although the sample LLVM IR can cause a crash with any LLVM version,
this was only seen in practice with LLVM 3.2.
Reviewed-by: Matthew McClure <mcclurem@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The check was in the wrong place, such that if a shader incorrectly put
a preprocessor token before the #version declaration, the version would
be resolved twice, leading to a segmentation fault when attempting to
redefine the __VERSION__ macro.
#extension GL_ARB_sample_shading: require
#version 130
void main() {}
Also, rename glcpp_parser_resolve_version to
glcpp_parser_resolve_implicit_version to avoid confusion.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Carl Worth <cworth@cworth.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
radeonsi now reports PIPE_VIDEO_CAP_SUPPORTS_PROGRESSIVE = true if UVD support
isn't available. It's what all the other drivers do.
Also, some #include directives were missing in radeon_uvd.h.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Implemented by the common code. You can now visualize the statistics
with the HUD, see GALLIUM_HUD=help for all available queries. For example:
GALLIUM_HUD=clipper-primitives-generated
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Replace Type A _INT formats names with _SINT to match naming spec,
and update type C formats as follows:
s/MESA_FORMAT_R_INT8\b/MESA_FORMAT_R_SINT8/g
s/MESA_FORMAT_R_INT16\b/MESA_FORMAT_R_SINT16/g
s/MESA_FORMAT_R_INT32\b/MESA_FORMAT_R_SINT32/g
s/MESA_FORMAT_RG_INT8\b/MESA_FORMAT_RG_SINT8/g
s/MESA_FORMAT_RG_INT16\b/MESA_FORMAT_RG_SINT16/g
s/MESA_FORMAT_RG_INT32\b/MESA_FORMAT_RG_SINT32/g
s/MESA_FORMAT_RGB_INT8\b/MESA_FORMAT_RGB_SINT8/g
s/MESA_FORMAT_RGB_INT16\b/MESA_FORMAT_RGB_SINT16/g
s/MESA_FORMAT_RGB_INT32\b/MESA_FORMAT_RGB_SINT32/g
s/MESA_FORMAT_RGBA_INT8\b/MESA_FORMAT_RGBA_SINT8/g
s/MESA_FORMAT_RGBA_INT16\b/MESA_FORMAT_RGBA_SINT16/g
s/MESA_FORMAT_RGBA_INT32\b/MESA_FORMAT_RGBA_SINT32/g
s/\bMESA_FORMAT_RED_RGTC1\b/MESA_FORMAT_R_RGTC1_UNORM/g
s/\bMESA_FORMAT_SIGNED_RED_RGTC1\b/MESA_FORMAT_R_RGTC1_SNORM/g
s/\bMESA_FORMAT_RG_RGTC2\b/MESA_FORMAT_RG_RGTC2_UNORM/g
s/\bMESA_FORMAT_SIGNED_RG_RGTC2\b/MESA_FORMAT_RG_RGTC2_SNORM/g
s/\bMESA_FORMAT_L_LATC1\b/MESA_FORMAT_L_LATC1_UNORM/g
s/\bMESA_FORMAT_SIGNED_L_LATC1\b/MESA_FORMAT_L_LATC1_SNORM/g
s/\bMESA_FORMAT_LA_LATC2\b/MESA_FORMAT_LA_LATC2_UNORM/g
s/\bMESA_FORMAT_SIGNED_LA_LATC2\b/MESA_FORMAT_LA_LATC2_SNORM/g
Update comments. Replace format names containing SIGNED with
SNORM appended w/decoration per the format name spec:
s/MESA_FORMAT_SIGNED_R8\b/MESA_FORMAT_R_SNORM8/g
s/MESA_FORMAT_SIGNED_RG88_REV\b/MESA_FORMAT_R8G8_SNORM/g
s/MESA_FORMAT_SIGNED_RGBX8888\b/MESA_FORMAT_X8B8G8R8_SNORM/g
s/MESA_FORMAT_SIGNED_RGBA8888\b/MESA_FORMAT_A8B8G8R8_SNORM/g
s/MESA_FORMAT_SIGNED_RGBA8888_REV\b/MESA_FORMAT_R8G8B8A8_SNORM/g
s/MESA_FORMAT_SIGNED_R16\b/MESA_FORMAT_R_SNORM16/g
s/MESA_FORMAT_SIGNED_GR1616\b/MESA_FORMAT_R16G16_SNORM/g
s/MESA_FORMAT_SIGNED_RGB_16\b/MESA_FORMAT_RGB_SNORM16/g
s/MESA_FORMAT_SIGNED_RGBA_16\b/MESA_FORMAT_RGBA_SNORM16/g
s/MESA_FORMAT_SIGNED_A8\b/MESA_FORMAT_A_SNORM8/g
s/MESA_FORMAT_SIGNED_I8\b/MESA_FORMAT_I_SNORM8/g
s/MESA_FORMAT_SIGNED_L8\b/MESA_FORMAT_L_SNORM8/g
s/MESA_FORMAT_SIGNED_A16\b/MESA_FORMAT_A_SNORM16/g
s/MESA_FORMAT_SIGNED_I16\b/MESA_FORMAT_I_SNORM16/g
s/MESA_FORMAT_SIGNED_L16\b/MESA_FORMAT_L_SNORM16/g
s/MESA_FORMAT_SIGNED_AL88\b/MESA_FORMAT_L8A8_SNORM/g
s/MESA_FORMAT_SIGNED_RG88\b/MESA_FORMAT_G8R8_SNORM/g
s/MESA_FORMAT_SIGNED_RG1616\b/MESA_FORMAT_G16R16_SNORM/g
Change all 4 color component unsigned byte formats to meet spec for P
Type formats:
s/MESA_FORMAT_RGBA8888\b/MESA_FORMAT_A8B8G8R8_UNORM/g
s/MESA_FORMAT_RGBA8888_REV\b/MESA_FORMAT_R8G8B8A8_UNORM/g
s/MESA_FORMAT_ARGB8888\b/MESA_FORMAT_B8G8R8A8_UNORM/g
s/MESA_FORMAT_ARGB8888_REV\b/MESA_FORMAT_A8R8G8B8_UNORM/g
s/MESA_FORMAT_RGBX8888\b/MESA_FORMAT_X8B8G8R8_UNORM/g
s/MESA_FORMAT_RGBX8888_REV\b/MESA_FORMAT_R8G8B8X8_UNORM/g
s/MESA_FORMAT_XRGB8888\b/MESA_FORMAT_B8G8R8X8_UNORM/g
s/MESA_FORMAT_XRGB8888_REV\b/MESA_FORMAT_X8R8G8B8_UNORM/g
v2: Note that Fredrik Höglund is working on GL_ARB_multi_bind, not
Maxence Le Doré. Suggested by Matt.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
The define was only available if
gl_extensions::AMD_shader_trinary_minmax was set, but no driver set it.
Since the extension is advertised by default, remove that field too.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: Maxence Le Doré <maxence.ledore@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Every driver supports it. All current and future Gallium drivers always
support it, and all existing classic drivers support it.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The dd_function_table::BlitFramebuffer is already initialized to
_mesa_meta_BlitFramebuffer, so it should just work.
Tested on a Radeon 7500 (OpenGL renderer string: Mesa DRI R100 (RV200
5157) TCL DRI2). I couldn't do a full piglit run because it would tank
the system with or without this patch. I just ran all the blit tests
(-t blit to piglit-run.py). Only fbo-sys-sub-blit failed. All of the
other tests that weren't skipped (i.e., all the multisample and sRGB
tests skip) passed.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The dd_function_table::BlitFramebuffer is already initialized to
_mesa_meta_BlitFramebuffer, so it should just work.
Tested on a FireGL 8800 (OpenGL renderer string: Mesa DRI R200 (R200
5148) TCL DRI).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes the glTexStorage3D failure in
ext_packed_depth_stencil-depth-stencil-texture and
oes_packed_depth_stencil-depth-stencil-texture_gles2.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We need almost identical code in the glTexStorage path.
v2: Fix typo in a comment noticed by Topi.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
All versions of the OpenGL spec are quite clear that
GL_INVALID_OPERATION should be generated. I added a quotation from the
3.3 core profile spec.
Fixes the glTexImage3D subcases of
ext_packed_depth_stencil-depth-stencil-texture and
oes_packed_depth_stencil-depth-stencil-texture_gles2. The same subtests
of oes_packed_depth_stencil-depth-stencil-texture_gles1 fail, but they
fail with a different wrong error code.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cover both loader and glx/dri_glx
Drop \n from the default loader logger
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Since the loader changes, there has been a compiler warning that the
prototype didn't match. It turns out that if a loader error message was
ever thrown, you'd segfault because of trying to use the warning level as
a format string.
Reviewed-by: Keith Packard <keithp@keithp.com>
Tested-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
This allows Mesa to choose to rename driver .sos (or split drivers),
without needing a flag day with the corresponding 2D driver.
v2: Undo the loader-only-for-dri3 change.
Reviewed-by: Keith Packard <keithp@keithp.com> [v1]
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> [v1]
I want to stop trusting the server for the driver name, and instead decide
on our own based on the fd, so I needed this code motion.
Reviewed-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Steam links against libudev.so.0, while we're linking against
libudev.so.1. The result is that the symbol names (which are the same in
the two libraries) end up conflicting, and some of the usage of .so.1
calls the .so.0 bits, which have different internal structures, and
segfaults happen.
By using a dlopen() with RTLD_LOCAL, we can explicitly look for the
symbols we want, while they get the symbols they want.
Reviewed-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Tested-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Some of the hardware support is missing. The NVIDIA-provided driver,
which claims seamless cube map support fails the relevant tests as well.
As this is the last extension before we can have OpenGL 3.2, doing this
allows us to expose geometry shaders without doing the additional
work involved in supporting ARB_geometry_shader4.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Creates two areas in the AUX constbuf:
- Sample offsets for MS textures
- Per-texture MS settings
When executing a texelFetch with a MS sampler, looks up that texture's
settings and adjusts the parameters given to the texfetch instruction.
With this change, all the ARB_texture_multisample piglits pass, so turn
on PIPE_CAP_TEXTURE_MULTISAMPLE.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Each code BO is a heap that allocates at the end first, and so GPs are
allocated at the very end of the allocated space. When executing, we see
PAGE_NOT_PRESENT errors for the next page. Just over-allocate to make
sure that there's something there.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Make sure that we never try to use a 0-sized map. This can happen when
using a gp, so add a dummy mapping when computing vp_gp_mapping in that
case.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Marks gl_Layer as only having one component, and makes sure to keep
track of where it is and emit it in the output map, since it is not an
input to the FP.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Note that the primitive id is stored in a[0x18], while usually the
geometry instructions are of the form a[$a1 + 0x4] which gets mapped to
p[] space. We need to avoid the change from a[] to p[] here, so it's
keyed on whether the access is indirect or not.
Note that there's also a use-case for accessing e.g. a[$r1], however
that's not supported for now. (Could be added by checking the register
file of the indirect parameter.)
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Layer output probably doesn't work yet, but other than that everything seems
to be working.
Signed-off-by: Bryan Cain <bryancain3@gmail.com>
[calim: fix up minor bugs, code formatting]
Signed-off-by: Christoph Bumiller <e0425955@student.tuwien.ac.at>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Instead of emitting an SHL 4 io an address register on the TGSI ARL and UARL
instructions, emit the shift when the loaded address is actually used. This
is necessary because input vertex and attribute indices in geometry shaders on
nv50 need to be shifted left by 2 instead of 4.
Signed-off-by: Bryan Cain <bryancain3@gmail.com>
[calim: various updates to the indirect address logic]
Signed-off-by: Christoph Bumiller <e0425955@student.tuwien.ac.at>
[imirkin: remove OP_MAD change that calim made, add OP_RESTART handling
same as OP_EMIT for code flow analysis]
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
This is needed since commit 9baa45f78b (st/mesa: bind NULL colorbuffers
as specified by glDrawBuffers).
This implementation is highly based on a larger commit by
Christoph Bumiller <e0425955@student.tuwien.ac.at> in his gallium-nine
branch.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
It doesn't make sense to do an OP_NEG from U32 to U32. This was
manifested on nv50 in glsl-fs-atan-3 which was generating a
UMAD TEMP[0].x, TEMP[0].xxxx, -TEMP[5].xxxx, TEMP[0].xxxx
instruction. (For some reason, nvc0 causes a different shader to be
generated.) This led to a
cvt neg u32 $r1 u32 $r1
Which did not yield the desired result. This changes the final output to
cvt neg s32 $r1 u32 $r1
which produces the desired output and the piglit tests passes. My
assumption is that this is also what we want on nvc0, but could not test
as there was no suitable shader that generated the problem instruction.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
When the min_index is very large (or very negative), the multipliation
can overflow 32 bits and result in an incorrect map pointer
modification.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This was discovered as a result of the draw-elements-base-vertex-neg
piglit test, which passes very negative offsets in, followed up by large
indices. The nouveau code correctly adjusts the pointer, but the
translate code needs to do the proper inverse correction. Similarly fix
up the SSE code to do a 64-bit multiply to compute the proper offset.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
Broadwell requires software to specify QPitch in a bunch of packets,
so we decided to store it in the miptree. However, when I did that
refactoring, I missed a subtlety: the hardware expects QPitch to be
"in units of rows in the uncompressed surface".
This is the value we originally compute. However, for compressed
surfaces, we then divided it by 4 (the block height), to obtain the
physical layout. This is no longer the QPitch Broadwell expects.
So, store the original undivided value in mt->qpitch, but continue to
use the divided value in brw_miptree_layout_texture_array(). For
non-Broadwell platforms, this should have no impact at all.
Helps fix Piglit's "getteximage-targets S3TC CUBE" test on Broadwell.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This patch fixes the NetBSD build.
NetBSD does not have pthread_mutex_timedlock.
CC glapi_dispatch.lo
threads_posix.h: In function 'mtx_timedlock':
threads_posix.h:216:5: error: implicit declaration of function 'pthread_mutex_timedlock'
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
The type of all three parameters are identical, so we don't need to
specify it three times. The predicate is always identical too, so we
don't need to make it a parameter, either.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Simple shaders such as:
void splat(vec2 v, float f) {
v[0] = v[1] = f;
}
failed to compile with the following error:
error: value of type vec2 cannot be assigned to variable of type float
First, we would process v[1] = f, and transform:
LHS: (expression float vector_extract (var_ref v) (constant int (1)))
RHS: (var_ref f)
into:
LHS: (var_ref v)
RHS: (expression vec2 vector_insert (var_ref v) (constant int (1))
(var_ref f))
Note that the LHS type is now vec2, not a float. This is surprising,
but not the real problem.
After emitting assignments, this ultimately becomes:
(declare (temporary) vec2 assignment_tmp)
(assign (xy)
(var_ref assignment_tmp)
(expression vec2 vector_insert (var_ref v) (constant int (1))
(var_ref f)))
(assign (xy) (var_ref v) (var_ref assignment_tmp))
We would then return (var_ref assignment_tmp) as the rvalue, which has
the wrong type---it should be float, but is instead a vec2.
To fix this, we simply return (vector_extract (var_ref assignment_temp)
<the appropriate channel>) to pull out the desired float value.
Fixes Piglit's chained-assignment-with-vector-constant-index.vert and
chained-assignment-with-vector-dynamic-index.vert tests.
Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=74026
Reported-by: Dan Ginsburg <dang@valvesoftware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Transform feedback may come from either the geometry shader or the
vertex shader, so we can't use
ctx->Shader.CurrentProgram[MESA_SHADER_VERTEX] to find the current
post-link transform feedback information. Fortunately we can use
ctx->TransformFeedback.CurrentObject->shader_program.
Cc: 10.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previous to this patch, the _mesa_{Begin,Resume}TransformFeedback
functions were using ctx->Shader.CurrentProgram[MESA_SHADER_VERTEX] to
find the program that would be the source of transform feedback data.
This isn't correct--if there's a geometry shader present it should be
ctx->Shader.CurrentProgram[MESA_SHADER_GEOMETRY]. (These might be
different if separate shader objects are in use).
This patch creates a function get_xfb_source(), which figures out the
correct program to use based on GL state, and updates
_mesa_{Begin,Resume}TransformFeedback to call it. get_xfb_source() is
written in terms of the gl_shader_stage enum, so it should not need
modification when we add tessellation shaders in the future. It also
creates a new driver flag, NewTransformFeedbackProg, which is flagged
whenever this program changes.
To reduce future confusion, this patch also rewords some comments and
error message text to avoid referring to vertex shaders.
Cc: 10.0 <mesa-stable@lists.freedesktop.org>
v2: make the for loop in get_xfb_source() clearer.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The "shader" field in fs_generator, vec4_generator, and gen8_generator
was only used for one purpose; to figure out if we were compiling an
assembly program or a GLSL shader (shader is NULL for assembly
programs). And it wasn't being used properly: in vec4 shaders we were
always initializing it based on
prog->_LinkedShaders[MESA_SHADER_FRAGMENT], regardless of whether we
were compiling a geometry shader or a vertex shader.
This patch simplifies things by using the "prog" field instead; this
is also NULL for assembly programs.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Instead of defining preprocessor macros in glcpp_parser_create based on
the GL API, wait until the shader version has been resolved. Doing this
allows us to correctly set (and not set) preprocessor macros for
extensions allowed by the API but not the shader, as in the case of
ARB_ES3_compatibility.
The shader version has been resolved when the preprocessor encounters
the first preprocessor token, since the GLSL spec says
"The #version directive must occur in a shader before anything else,
except for comments and white space."
Specifically, if a #version token is found the version is known
explicitly, and if any other preprocessor token is found then the GLSL
version is implicitly 1.10.
Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=71630
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Fixes glean fragProg1 regression caused by commit b9f68d927e
(implement TGSI_PROPERTY_FS_COLOR0_WRITES_ALL_CBUFS). This bug
only appears when the fragment shader emits fragment.Z before
color outputs. The bug was caused by confusion between register
indexes and semantic indexes.
Also added some comments to better explain register indexing.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Otherwise we pull libudev as a dependency and crash
games/programs that ship their own version of libudev.
Either way we should link the loader lib only when needed.
This fixes a regression caused by
commit eac776cf77
Author: Emil Velikov <emil.l.velikov@gmail.com>
Date: Sat Jan 11 02:24:43 2014 +0000
glx: use the loader util lib
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=73854
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Leaving it set to zero isn't really correct since every allocation has
at least an alignment of 1 byte. It also caused a problem in the i965
driver after I removed the MAX(64, ...) from the alignment calculation.
That's what I get for changing a patch without retesting it. :(
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=73907
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: Lu Hua <huax.lu@intel.com>
Sed job:
grep -lr BEGIN_BATCH_NO_AUTOSTATE src/mesa/drivers/dri/ | while read f
do
cat $f | sed 's/BEGIN_BATCH_NO_AUTOSTATE/BEGIN_BATCH/g' > x
mv x $f
done
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Marek Olšák <marek.olsak@amd.com>
This parameter hasn't been used since January 2010 (commit 29e02c7).
Fixes the following warning in both radeon and r200:
radeon_common.c: In function 'r200_rcommonBeginBatch':
radeon_common.c:762:14: warning: unused parameter 'dostate' [-Wunused-parameter]
Note that now BEGIN_BATCH and BEGIN_PATCH_NO_AUTOSTATE are identical.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Marek Olšák <marek.olsak@amd.com>
radeon_common.c: In function 'radeon_draw_buffer':
radeon_common.c:237:3: warning: suggest braces around empty body in an 'if' statement [-Wempty-body]
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Marek Olšák <marek.olsak@amd.com>
When parameters were removed from dd_function_table::Viewport (commit
065bd6ff), radeon_viewport (in both radeon and r200) started generating
a warning.
radeon_common.c: In function 'r200_radeon_viewport':
radeon_common.c:415:15: warning: assignment from incompatible pointer type [enabled by default]
radeon_common.c:419:23: warning: assignment from incompatible pointer type [enabled by default]
I didn't notice this initially, and it's harmless because the function is
never called through the incorrectly typed pointer.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Marek Olšák <marek.olsak@amd.com>
Otherwise they will be NULL when stage destroy is invoked prematurely,
(i.e, on out of memory).
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Use some new helper functions to make the code much more readable.
And fix wrong value for XPD's w result.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
In some cases we were converting generic formats to sized formats
and vice versa. The point is to simply convert sRGB formats to
corresponding linear formats.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Became apparent with the C11 thread changes. Unfortunately I didn't
have all dependencies to build the driver, and only noticed
this issue on build server.
Implementation is based of https://gist.github.com/2223710 with the
following modifications:
- inline implementatation
- retain XP compatability
- add temporary hack for static mutex initializers (as they are not part
of the stack but still widely used internally)
- make TIME_UTC a conditional macro (some system headers already define
it, so this prevents conflict)
- respect HAVE_PTHREAD macro
Reviewed-by: Brian Paul <brianp@vmware.com>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Chad Versace <chad.versace@linux.intel.com>
Previously the reason we needed is_array was because we used array_size == NULL to
represent both non-arrays and unsized arrays. Now that we use a non-NULL
array_specifier to represent an unsized array, is_array is redundant.
Signed-off-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
We need to insert outermost dimensions in the correct spot otherwise
the dimension order will be backwards
Signed-off-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
array
This change does not help fix or prevent any bugs
it just seems reasonable to do
Signed-off-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
No regressions on IVB (piglit quick + unit tests).
v2 (Paul):
- no need to patch the unit tests anymore. Original logic
was altered and unit tests updated to match the
fs-generator
- lrp emission moves from the blorp compiler core into the
emitter here (previously there was a separate refactoring
patch which is not really needed anymore as the lrp logic
got refactored when the original lrp logic got fixed).
- pass 'BRW_BLORP_RENDERBUFFER_BINDING_TABLE_INDEX' to the
generator in fs_inst::target instead of hardcoding it
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
The compiler for blorp programs likes to emit instructions for
the message construction itself meaning that the generator needs
to skip any such when blorp programs are translated for the hw.
In addition, the binding table control is special for blorp
programs and the generator does not need to update the binding
tables associated with the compiler bookkeeping (this in fact
gets thrown away as the blorp compiler sets the program data
in its own way).
v2 (Paul): do not hardcode the binding table index but use
fs_inst::target instead.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Unit tests comparing generated blorp programs to known good need
to have the dump in designated file instead of in default
standard output. The comparison also expects the jump counters
of if-else-instructions to be correctly set and hence the dump
needs to be taken _after_ 'patch_IF_ELSE()' is run (the default
dump of the fs_generator does this before).
v2 (Paul): dropped the redundant 'dump_enabled' argument
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
In addition, the special case requiring explicit execution size
control is wrapped manually.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
In addition, the two special cases requiring explicit execution
size control are wrapped manually.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
v2 (Paul): pass the combining opcode as an argument to emit_combine().
This keeps manual_blend_average() selfcontained
documentation wise.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Resolving of the hardware message type is moved into the
emitter also in preparation for switching to use fs_generator.
The generator wants to translate the high level op-code into
the message type and hence the emitter needs to know the
original op-code.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Prepares for the introduction of non-compressed multi-sampled
lookup used in the blorp programs.
v2: now also taking into account gen8
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
The combination of four separate comparison operations and
and the masked "and" require special treatment when moving
to FS LIR.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Prepares for presenting blorp blit programs using FS IR that
allows EU-assembly generation using i965 glsl-compiler
backend (fs_generator).
v2: rebased on top of endif-jump counter fix (moving the
added brw_set_uip_jip() into the emitter)
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
The Intel closed source OpenGL driver recently began supporting 32
texture image units on Haswell. This makes the open source driver
support 32 as well.
Earlier generations don't have the message header field required to
support more than 16 sampler states, so we continue to advertise 16
there.
On Haswell, this causes us to advertise:
- GL_MAX_TEXTURE_IMAGE_UNITS = 32
- GL_MAX_VERTEX_TEXTURE_IMAGE_UNITS = 32
- GL_MAX_COMBINED_TEXTURE_IMAGE_UNITS = 96
instead of the old values of 16, 16, and 48.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
BRW_MAX_TEX_UNIT is about to grow, but only Gen7+ will be able to
support the new larger value. On older platforms, we don't want to
allocate the extra space - it would just be a waste.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Like the scalar backend, we add an offset to the "Sampler State Pointer"
field to select a group of 16 samplers, then use the "Sampler Index"
field to select within that group.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
The next patch adds an additional case where the message header is
necessary. So we want to do the g0 copy if inst->header_present is set,
rather than inst->texture_offset.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
In theory, a shader might use textureOffset() but set all the texel
offsets to zero. In that case, we don't actually need to set up the
message header - zero is the implicit default.
By moving the texture_offset setup before the header_present setup, we
can easily only set header_present when there are non-zero texel offset
values.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
The message descriptor's "Sampler Index" field is only 4 bits (on all
generations of hardware), so it can only represent indices 0 through 15.
Haswell introduced a new field in the message header - "Sampler State
Pointer". Normally, this is copied straight from g0, but we can also
add a byte offset (as long as it's a multiple of 32).
This patch uses a "Sampler State Pointer" offset to select a group of
16 sampler states, and then uses the "Sampler Index" field to select
the state within that group.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Previously, the code to copy g0 to the message header existed in two
places - one for the texture offset case, and one for any other case.
By treating texture_offset as a special case of header_present, we can
remove this duplication and shorten the code. Future patches which add
new header fields also won't have to add additional duplication.
This also clarifies a confusing construct. The old code contained:
} else if (inst->header_present) {
if (brw->gen >= 7) {
...explicit copy from g0 to the message header...
} else {
/* Set up an implied move from g0 to the MRF. */
}
}
This looks like it might set up an implied move on Sandybridge, which
doesn't support those. However, Sandybridge only uses a message header
for texture offsets, so it would never hit this code path. The new code
avoids this implicit knowledge by only setting up an implied move on
Gen4-5.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Use the WINDOW and VPORT scissors for the framebuffer and scissor test,
respectively. The other two scissors are disabled (they cover the max fb size).
We actually have 16 VPORT scissors, which will map well to ARB_viewport_array.
Also, we don't need to write SC_WINDOW_OFFSET with this commit, because it's
disabled everywhere.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Also set the unsynchronized flag if the whole resource was discarded
to avoid doing buffer-busy checks again.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
It's unused and shouldn't be used at all in my opinion.
If some driver doesn't support the unsynchronized flag, u_upload_mgr should
avoid the synchronization by other means, e.g. by using the DONTBLOCK flag.
This is the recommended way for streaming vertices. Always use this if you
need to upload vertices every frame.
Reviewed-by: Christian König <christian.koenig@amd.com>
Commit 05da4a7a5e attempts to eliminate the
call to intel_update_renderbuffer() in the case where we already have a
drawbuffer for the drawable. Unfortunately this only checks the
back left renderbuffer, which breaks in case of single buffer drawables.
This means that the initial viewport will not be set in that case. Instead,
we now check whether the initial viewport has not been set, in which case
we call out to intel_update_renderbuffer().
https://bugs.freedesktop.org/show_bug.cgi?id=73862
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Most of the time it is not necessary to perform type inference to
compile GLSL; the type of every expression can be inferred from the
contents of the expression itself (and previous type declarations).
The exception is aggregate initializers: their type is determined by
the LHS of the variable being assigned to. For example, in the
statement:
mat2 foo = { { 1, 2 }, { 3, 4 } };
the type of { 1, 2 } is only known to be vec2 (as opposed to, say,
ivec2, uvec2, int[2], or a struct) because of the fact that the result
is being assigned to a mat2.
Previous to this patch, we handled this situation by doing some type
inference during parsing: when parsing a declaration like the one
above, we would call _mesa_set_aggregate_type(), which would infer the
type of each aggregate initializer and store it in the corresponding
ast_aggregate_initializer::constructor_type field. Since this
happened at parse time, we couldn't do the type inference using
glsl_type objects; we had to use ast_type_specifiers, which are much
more awkward to work with. Things are about to get more complicated
when we add support for ARB_arrays_of_arrays.
This patch simplifies things by postponing the call to
_mesa_set_aggregate_type() until ast-to-hir time, when we have access
to glsl_type objects. As a side benefit, we only need to have one
call to _mesa_set_aggregate_type() now, instead of six.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Specs say "If the argument is a buffer object, the arg_value
pointer can be NULL or point to a NULL value in which case a NULL
value will be used as the value for the argument declared as a
pointer to __global or __constant memory in the kernel."
So don't crash when somebody does that.
v2: Insert NULL into input buffer instead of buffer handle pair
Fix constant_argument too
Drop r600 driver changes
v3: Fix inserting NULL pointer
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
No known bugs fixed but this is now in line with fs-generator.
No regresssions on IVB.
Eric further explained that:
"The endif jump, since it's forward, is just an optimization to
have set right -- otherwise, the GPU will just step forward
instruction by instruction until it hits something else that
updates the per-channel PC."
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This is possible now that ctx->Shader.CurrentProgram is an array.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Brian Paul <brianp@vmware.com>
This is possible now that ctx->Shader.CurrentProgram is an array.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Brian Paul <brianp@vmware.com>
Now that we have a ctx->Shader.CurrentProgram array, we can just use
it directly.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Brian Paul <brianp@vmware.com>
Since ctx->Shader.Current{Vertex,Geometry,Fragment}Program is an
array, this allows some meta code to be rolled up into loops.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Brian Paul <brianp@vmware.com>
These are replaced with
ctx->Shader.CurrentProgram[MESA_SHADER_{VERTEX,FRAGMENT,GEOMETRY}].
In patches to follow, this will allow us to replace a lot of ad-hoc
logic with a variable index into the array.
With the exception of the changes to mtypes.h, this patch was
generated entirely by the command:
find src -type f '(' -iname '*.c' -o -iname '*.cpp' ')' \
-print0 | xargs -0 sed -i \
-e 's/\.CurrentVertexProgram/.CurrentProgram[MESA_SHADER_VERTEX]/g' \
-e 's/\.CurrentGeometryProgram/.CurrentProgram[MESA_SHADER_GEOMETRY]/g' \
-e 's/\.CurrentFragmentProgram/.CurrentProgram[MESA_SHADER_FRAGMENT]/g'
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Brian Paul <brianp@vmware.com>
Rather than maintain separately named arrays and counts for vertex,
geometry, and fragment shaders, just maintain these as arrays indexed
by the gl_shader_type enum.
v2: When there is neither a vertex nor a geometry shader, set
prog->LastClipDistanceArraySize = 0, and clarify that the values is
not used.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Brian Paul <brianp@vmware.com>
This patch replaces code in _mesa_new_shader() and delete_shader_cb()
that checks the type of a shader with calls to
_mesa_validate_shader_target(). This has two advantages: it allows
for a more thorough check (since _mesa_validate_shader_target()
doesn't permit shader targets that aren't supported by the back-end),
and it reduces the amount of code that will need to be modified when
adding new shader stages.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Brian Paul <brianp@vmware.com>
This will allow this function to be used in circumstances where there
is no context available, such as when building built-in GLSL
functions.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Brian Paul <brianp@vmware.com>
In my recent zeal to refactor Mesa's handling of the gl_shader_stage
enum, I accidentally wound up with two functions that do the same
thing: _mesa_program_index_to_target(), and
_mesa_shader_stage_to_program().
This patch keeps _mesa_shader_stage_to_program(), since its name is
more consistent with other related functions. However, it changes the
signature so that it accepts an unsigned integer instead of a
gl_shader_stage--this avoids awkward casts when the function is called
from C++ code.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Brian Paul <brianp@vmware.com>
We have to make the functions available to work around a GLEW bug (see
comments already in the code), but if an application calls one of these
functions we should still generate GL_INVALID_OPERATION.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This patch handles the use of 'centroid' qualifier with 'in' variables
in a fragment shader when persample shading is enabled. Per sample
shading for the whole fragment shader can be enabled by:
glEnable(GL_SAMPLE_SHADING) or using {gl_SamplePosition, gl_SampleID}
builtin variables in fragment shader. Explaining it below in more
detail.
/* Enable sample shading using OpenGL API */
glEnable(GL_SAMPLE_SHADING);
glMinSampleShading(1.0);
Example fragment shader:
in vec4 a;
centroid in vec4 b;
main()
{
...
}
Variable 'a' will be interpolated at sample location. But, what
interpolation should we use for variable 'b' ?
ARB_sample_shading recommends interpolation at sample position for
all the variables. GLSL 400 (and earlier) spec says that:
"When an interpolation qualifier is used, it overrides settings
established through the OpenGL API."
But, this text got deleted in later versions of GLSL.
NVIDIA's and AMD's proprietary linux drivers (at OpenGL 4.3)
interpolates at sample position. This convinces me to use
the similar approach on intel hardware.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Current implementation of arb_sample_shading doesn't set 'Barycentric
Interpolation Mode' correctly. We use pixel barycentric coordinates
for per sample shading. Instead we should select perspective sample
or non-perspective sample barycentric coordinates.
It also enables using sample barycentric coordinates in case of a
fragment shader variable declared with 'sample' qualifier.
e.g. sample in vec4 pos;
A piglit test to verify the implementation has been posted on piglit
mailing list for review.
V2: Do not interpolate all the 'in' variables at sample position
if fragment shader uses 'sample' qualifier with one of them.
For example we have a fragment shader:
#version 330
#extension ARB_gpu_shader5: require
sample in vec4 a;
in vec4 b;
main()
{
...
}
Only 'a' should be sampled at sample location, not 'b'.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
This will be useful in my next patch which depends on a functionality
of _mesa_get_min_invocations_per_fragment() to ignore the sample
qualifier (prog->IsSample) based on a flag passed to it.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reduces vertex shader instruction counts in DOTA2 by 6.42%, L4D2 by
4.61%, and CS:GO by 5.71%.
total instructions in shared programs: 1500153 -> 1498191 (-0.13%)
instructions in affected programs: 59919 -> 57957 (-3.27%)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Only implemented for ir_swizzles currently, but perhaps will be useful
for other IR types in the future.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This flag was really just a proxy for determining whether the backend
was vector (AOS) or scalar (SOA). It will be used to apply a future
optimization only for vector backends.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Dumping the number of live registers at each IP allows us to see
register pressure and identify any local maxima. This should
aid in debugging passes designed to reduce register pressure, as
well as optimizations that suddenly trigger spilling.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Calling it after value numbering (added in the next commit) prevents
some instruction count regressions.
total instructions in shared programs: 1524387 -> 1523905 (-0.03%)
instructions in affected programs: 13112 -> 12630 (-3.68%)
GAINED: 0
LOST: 3
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Previously we simply considered two registers whose live ranges
overlapped to interfere. Cases such as
set A ------
... |
mov B, A -- |
... | B | A
use B -- |
... |
use A ------
would be considered to interfere, even though B is an unmodified copy of
A whose live range fit wholly inside that of A.
If no writes to A or B occur between the mov B, A and the use of B then
we can safely coalesce them.
Instead of removing MOV instructions, we make them NOPs and remove them
at once after the main pass is finished in order to avoid recomputing
live intervals (which are needed to perform the previous step).
total instructions in shared programs: 1543768 -> 1513077 (-1.99%)
instructions in affected programs: 951563 -> 920872 (-3.23%)
GAINED: 46
LOST: 22
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
mov takes only a single source argument. Example instruction
inexplicably changed from add to mov in commit f10f5e49.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Unnamed record types are assigned to separate types per stage, e.g. if
uniform struct { ... } a;
is defined in both vertex and fragment shader, two separate types will
result with different names. When linking the shader, this results in a
type conflict. However, there is no reason why this should not be
allowed according to GLSL specifications. Compare and match record types
when linking shader stages to avoid this conflict.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes several colorbuffer tests, including piglit "fbo-drawbuffers-none"
for "gl_FragColor" and "glDrawPixels" cases.
v2: rework patch to only avoid creating extra shader variants when
TGSI_PROPERTY_FS_COLOR0_WRITES_ALL_CBUFS is not specified. Per Jose.
Use a write_color0_to_n_cbufs key field to replicate color0 to N
color buffers only when N > 0 and WRITES_ALL_CBUFS is set.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
The new TYPE_DOUBLEN_2 type was added in 0e60d850 but the code to
return values of that type wasn't completed.
Fixes conform's default state test. glGetFloatv(GL_DEPTH_RANGE)
wasn't returning anything.
v2: remove stray 'break' statements.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
These messages are in code that is shared between the VS and GS
back-ends, so use the terminology "vec4" to avoid confusion.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Even with depth clipping disabled, vertices which have negative w coords
must be discarded. And since we don't have a proper guardband implementation
yet (relying on driver to handle all values except infs/nans in rasterization
for such points) we need to kill them off manually (as they can end up with
coordinates inside viewport otherwise).
v2: use 0.0f instead of 0 (spotted by Brian).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
v2: Also increment ir->offset in the GS visitor, rather than at the
final assembly generation stage (requested by Paul).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
On Broadwell, PIPE_CONTROL needs an extra DWord to accomodate the
48-bit addressing.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Now that we have a helper function that handles the PIPE_CONTROL
variations between the various platforms, these are basically the same.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
There are a lot of places that use PIPE_CONTROL to write a value to a
buffer (either an immediate write, TIMESTAMP, or PS_DEPTH_COUNT).
Creating a single function to do this seems convenient.
As part of this refactor, we now set the PPGTT/GTT selection bit
correctly on Gen7+. Previously, we set bit 2 of DW2 on all platforms.
This is correct for Sandybridge, but actually part of the address on
Ivybridge and later!
Broadwell will also increase the length of these packets by 1; with the
refactoring, we should have to adjust that in substantially fewer
places, giving us confidence that we've hit them all.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
I believe that PIPE_CONTROL uses the length field to decide whether to
do 32-bit or 64-bit writes. A length of 4 would do a 32-bit write,
while a length of 5 would do a 64-bit write. (I haven't verified this,
though.)
For workaround writes, we don't care what value gets written, or how
much data. We're only writing something because hardware bugs mandate
that do so. So using a 64-bit write should be fine.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The PIPE_CONTROL packet actually has 5 DWords on Gen6+:
1. Header
2. Flags
3. Address
4. Immediate Data: Lower DWord
5. Immediate Data: Upper DWord
We just never emitted the last one. While it appears to work, it's
probably safer to emit the entire thing.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
These days, we need to emit PIPE_CONTROL flushes all over the place.
Being able to do that via a single function call seems convenient.
Broadwell will also increase the length of these packets by 1; with the
refactoring, we should have to do this in substantially fewer places.
v2: Add back forgotten intel_emit_post_sync_nonzero_flush (caught by
Eric Anholt). Drop unlikely() from BLT_RING check.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Broadwell uses 48-bit addresses. The first DWord is the low 32 bits,
and the second DWord is the high 16 bits.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
libdrm 2.4.52 introduces a new 'uint64_t offset64' field, intended to
replace the old 'unsigned long offset' field. To preserve ABI, libdrm
continues to store the presumed offset in both locations.
On Broadwell, a 64-bit kernel may place BOs at "high" (> 4G) addresses.
However, with a 32-bit userspace, the 'unsigned long offset' field will
only be 32-bit, which is not large enough to hold this value. We need
to use a proper uint64_t (like the kernel does).
Technically, a lot of this code doesn't affect Broadwell, so we could
leave it using the old field. But it makes sense to just switch to the
new, properly typed field.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
intel_buffer_objects.c: In function 'old_intel_bufferobj_buffer':
intel_buffer_objects.c:471:17: warning: unused parameter 'flag' [-Wunused-parameter]
The parameter hasn't been used since the i915 and i965 drivers had their
breakup. i965 got the flags, and i915 got to cry itself to sleep.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Not actually tested, but the changes are identical to the i965 changes
that are tested.
v2: Remove MAX2(64, ...). Suggested by Ken (in the i965 version of this
patch).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: Siavash Eliasi <siavashserver@gmail.com>
No piglit regressions on IVB.
With minor tweaks to the arb_map_buffer_alignment-map-invalidate-range
test (disable the extension check, set alignment to 64 instead of
querying), the i965 driver would fail the test without this patch (as
predicted by Eric). With this patch, it passes.
v2: Remove MAX2(64, ...). Suggested by Ken.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: Siavash Eliasi <siavashserver@gmail.com>
v2 (idr): Only enable the extension on GEN7+ w/core profile because it
requires geometry shaders.
v3 (idr): Add some casting to fix setting of ViewportBounds.Min.
Negating an unsigned value, then casting to float doesn't do what you
might think it does.
Signed-off-by: Courtney Goeltzenleuchter <courtney@LunarG.com>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
noop_scissor (correctly) only examines the scissor rectangle for
viewport 0. Therefore, it should only be called when that scissor
rectangle is enabled.
v2: Remove spurious change to radeon code. Noticed by Ken.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Currently MaxViewports is still 1, so this won't affect any change.
v2: Minor code reformatting suggested by Ken.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Drivers that currently use _Xmin and friends to set their scissor
rectangle will need to use this code directly once they are updated for
GL_ARB_viewport_array.
v2: Use different bit-test idiom and fix mixed tabs and spaces. Both
were suggested by Ken.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
At various stages the hardware clamps the gl_ViewportIndex to these
values. Setting them to zero effectively makes gl_ViewportIndex be
ignored. This is acutally useful in blorp (so that we don't have to
modify all of the viewport / scissor state).
v2: Use INTEL_MASK to create GEN6_CLIP_MAX_VP_INDEX_MASK. Suggested by
Ken.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Define API connections to extension entry points added in previous
commits. Update entry points to use floating point arguments as
required by the extension.
Add get tokens for ARB_viewport_array state.
v2: Include review feedback.
v3 (idr): Fix 'make check'. Add missing Get infrastructure (some was
culled from other pathces).
Signed-off-by: Courtney Goeltzenleuchter <courtney@LunarG.com>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2 (idr): Use set_viewport_no_notify / set_depth_range_no_notify (and
manually notify the driver) instead of calling _mesa_set_viewporti /
_mesa_set_depthrangei. Refactor bodies of _mesa_ViewportIndexed and
_mesa_ViewportIndexedv into a shared function. Remove spurious CLAMP
calls in _mesa_DepthRangeArrayv and _mesa_DepthRangeIndexed.
v3 (idr): Add some missing return-statements after calls to _mesa_error.
v4 (idr): Only perform the ViewportBounds.Min / ViewportBounds.Max
clamping in set_viewport_no_notify if GL_ARB_viewport_array is enabled.
Otherwise the driver may not have set ViewportBounds, and the clamping
will do bad things.
Signed-off-by: Courtney Goeltzenleuchter <courtney@LunarG.com>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2 (idr): Use set_scissor_no_notify (and manually notify the driver)
instead of calling _mesa_set_scissori. Refactory bodies of
_mesa_ScissorIndexed and _mesa_ScissorIndexedv into a shared function.
Perform parameter validation in the same order in all three functions.
Pull MaxViewports comparison fix (in _mesa_ScissorArrayv) from the next
patch to this patch.
Signed-off-by: Courtney Goeltzenleuchter <courtney@LunarG.com>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now that the scissor enable state is a bitfield need a custom function
to extract the correct value from gl_context. Modeled
Scissor.EnableFlags after Color.BlendEnabled.
Signed-off-by: Courtney Goeltzenleuchter <courtney@LunarG.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2 (idr): Fix several "comparison between signed and unsigned integer
expressions" warnings.
Signed-off-by: Courtney Goeltzenleuchter <courtney@LunarG.com>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This matches the expectations of GL_ARB_viewport_array and the storage
type where the values will land.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously the restore code would enable all scissor rectangles if any
scissor rectangles were enabled on entry to meta. When there is only
one scissor rectangle, this is fine. As soon as a driver supports
multiple viewports, this will be a problem.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In _mesa_Scissor, make sure that ctx->Driver.Scissor is only called once
instead of once per scissor rectangle.
v2: Use MAX_VIEWPORTS instead of ctx->Const.MaxViewports because the
driver may not set ctx->Const.MaxViewports yet.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In _mesa_Viewport and _mesa_DepthRange, make sure that
ctx->Driver.Viewport is only called once instead of once per viewport or
depth range.
v2: Make _mesa_DepthRange actually set all of the depth ranges (instead
of just index 0). Noticed by Ken.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Use MAX_VIEWPORTS instead of ctx->Const.MaxViewports because the
driver may not set ctx->Const.MaxViewports yet.
v3: Handle all viewport entries in update_viewport_matrix and
_mesa_copy_context too. This was previously in an earlier patch.
Having the code in the earlier patch could cause _mesa_copy_context to
access a matrix that hadn't been constructed.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> [v2]
Create an internal function that just writes data into the scissor
rectangle. In future patches this will see more use because we only
want to call dd_function_table::Scissor once after setting all of the
scissor rectangles instead of once per scissor rectangle.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Create an internal function that just writes data into the viewport. In
future patches this will see more use because we only want to call
dd_function_table::Viewport once after setting all of the viewport
instead of once per viewport.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Create an internal function that just writes data into the depth range.
In future patches this will see more use because we only want to call
dd_function_table::DepthRange once after setting all of the depth ranges
instead of once per depth range.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Only element 0 of the array is used anywhere at this time, so there
should be no changes.
v4: Split out from a single megapatch. Suggested by Ken.
Signed-off-by: Courtney Goeltzenleuchter <courtney@LunarG.com>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v4: Split out from a single megapatch. Suggested by Ken. Also make
meta's save_state::ViewportX, ::ViewportY, ::ViewportW, and ::ViewportH
to match gl_viewport_attrib.
Signed-off-by: Courtney Goeltzenleuchter <courtney@LunarG.com>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This will be used when the viewport near and far plane are stored as
doubles instead of as floats.
v4 (idr): Split out from a single megapatch. Suggested by Ken. Also
drop value_double_4. It's never used anywhere in the patch series.
Signed-off-by: Courtney Goeltzenleuchter <courtney@LunarG.com>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Update Mesa and drivers to access updated gl_scissor_attrib.
Now have an enable bitfield and array of gl_scissor_rects.
Drivers have been updated to the new scissor enable state
attribute (gl_context.scissor.EnableFlags) but still treat it
as a single boolean which is okay as mesa will only use
bit 0 when communicating with a driver that does not support
ARB_viewport_array.
v2 (idr): Rebase fixes.
v3 (idr): Small code formatting fix suggsted by Ken.
Signed-off-by: Courtney Goeltzenleuchter <courtney@LunarG.com>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These limits will be queryable by GL_MAX_VIEWPORTS,
GL_VIEWPORT_SUBPIXEL_BITS, and GL_VIEWPORT_BOUNDS_RANGE. Drivers that
actually implement the extension must set values for these constants
that comply with the minimum-maximums from the spec.
Most of these changes were part of other patches. They were separated out
because it make reordering of later patches easier. Also, MaxViewports wasn't
set by that patch, and I completely overlooked it in review. It's now obvious
that it's set. :)
v2 (idr): Split these changes out from the original patches. Keep
MaxViewportWidth and MaxViewportHeight as GLuint.
Signed-off-by: Courtney Goeltzenleuchter <courtney@LunarG.com>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2 (idr): Split these changes out from the original patch. Only
advertise GL_ARB_viewport_array in a core profile because it requires
geometry shaders.
Signed-off-by: Courtney Goeltzenleuchter <courtney@LunarG.com>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We were calling draw_total_vs_outputs() too early. The call to
draw_pt_emit_prepare() could result in the vertex size changing.
So call draw_total_vs_outputs() after draw_pt_emit_prepare().
This fix would seem to be needed for the non-LLVM code as well,
but it's not obvious. Instead, I added an assertion there to
try to catch this problem if it were to occur there.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=72926
Cc: 10.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Instead of skipping x/y clipping completely if there's point_tri_clip points
use guard band clipping. This should be easier (previously we could not disable
generating the x/y bits in the clip mask for llvm path, hence requiring custom
clip path), and it also allows us to enable this for tris-as-points more easily
too (this would require custom tri clip filtering too otherwise). Moreover,
some unexpected things could have happen if there's a NaN or just a huge number
in some tri-turned-point, as the driver's rasterizer would need to deal with it
and that might well lead to undefined behavior in typical rasterizers (which
need to convert these numbers to fixed point). Using a guardband should hence
be more robust, while "usually" guaranteeing the same results. (Only "usually"
because unlike hw guardbands draw guardband is always just twice the vp size,
hence small vp but large points could still lead to different results.)
Unfortunately because the clipmask generated is completely unaffected by guard
band clipping, we still need a custom clip stage for points (but not for tris,
as the actual clipping there takes guard band into account).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
These are components which were originally developed by Tungsten Graphics,
which was in turn acquired by VMware, but are de facto now being maintained
by third-party contributors of the Mesa open-source community.
This matches what's reported by swrast driver and a few other components.
Suggested by Ian Romanick.
Currently the MSVC build is broken because of conflicting definitions of
'log' function. I didn't investigate thoroughly, but I suspect the
it is conflicting standard math.h's log.
log_ is admittedly not a great name, but it is better than a broken build.
A better one can be used in a follow-on build.
By highlighting these special cases makes it clearer to switch
to the fs-generator as the wider scoped compression control
settings used in the current implementation can be simply
dropped.
No regressions on IVB (piglit quick + unit tests).
v2 (Ian): typo in a comment
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Effectively only the mask control bit gets altered for the single
addition in question and hence there is no real need to use a
fresh state control level for it -- that is more useful when
multiple intructions share the same mask and compression settings.
This is a preparation step for removing the explicit compression
control modifiers in the blit compiler. After this patch there
are no nested state control levels making the constant nature of
the compression settings more apparent.
No regressions on IVB (piglit quick + unit tests).
v2 (Matt, Ian): use temporary variable instead of assigning
directly on the same line with a function call.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We call intel_prepare_render() in intelMakeCurrent() to make sure we have
renderbuffers before calling _mesa_make_current(). The only reason we
do this is so that we can have valid defaults for width and height.
If we already have buffers for the drawable we're making current, we
don't need this step.
In itself, this is a small optimization, but it also avoids a round trip
that could block on the display server in a unexpected place.
https://bugs.freedesktop.org/show_bug.cgi?id=72540https://bugs.freedesktop.org/show_bug.cgi?id=72612
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
NV3x cards don't support NPOT textures. Technically this restriction
could be worked around, but since it also doesn't expose any video
decoding hw, just turn it off entirely.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: 10.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Christian König <christian.koenig@amd.com>
pipe_loader_drm.c: In function 'pipe_loader_drm_probe_fd':
pipe_loader_drm.c:120:4: error: implicit declaration of function 'loader_get_pci_id_for_fd' [-Werror=implicit-function-declaration]
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Mesa provides the flexibility of building without the
need to have libdrm present on the system. The situation
has regressed with the recent commit
commit 8c2e7fd846
Author: Emil Velikov <emil.l.velikov@gmail.com>
Date: Fri Jan 10 23:36:16 2014 +0000
loader: introduce the loader util lib
By isolating libdrm code by #ifndef __NOT_HAVE_DRM_H we
can have libdrm-less builds on across all build systems.
This patch converts Android's _EGL_NO_DRM to __NOT_HAVE_DRM_H
to provide consistency with the other cases within mesa, allows
compilation of libloader on libdrm-less scons and conditionally
links against libdrm if present under automake.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=73776
BUgzilla: https://bugs.freedesktop.org/show_bug.cgi?id=73777
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The only difference is that STATE_SIP takes a 48-bit address, so we need
to output two zeroes.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This replaces the old fs_generator backend.
v2: Port to the C-based representation of assembly instructions.
Fix texturing after the texture-grf merge.
v3: Add high quality derivative support. Fix SET_SIMD4X2_OFFSET.
v4: Pass brw_context to gen8_instruction functions as required.
v5: Fixes for MRT, as well as zero render targets (alpha test only).
v6: Replace n-wide with SIMDn in comments and messages; port over
Topi's blorp-generator changes; add missing TXF_MCS opcode,
fix missing high quality derivatives for DDX; fix typo (all caught
by Eric). Simplify ADDC/SUBB handling; drop "Used only on Gen6+"
comment (caught by Matt). Emit SIMD16 versions of three source
instructions (caught by both Eric and Matt).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This replaces the old vec4_generator backend.
v2: Port to use the C-based instruction representation. Also, remove
Geometry Shader offset hacks - the visitor will handle those instead
of this code.
v3: Texturing fixes (including adding textureGather support).
v4: Pass brw_context to gen8_instruction functions as required.
v5: Add SHADER_OPCODE_TXF_MCS support; port DUAL_INSTANCED gs fixes
(caught by Eric). Simplify ADDC/SUBB handling; add comments to
gen8_set_dp_message calls (suggested by Matt).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This replaces the brw_eu_emit.c layer for Broadwell. It will be
used by both the vector and scalar shader backends.
v2: Port to use the C-based instruction representation.
v3: Fix destination register type for CMP.
v4: Pass brw to gen8_instruction functions (required by rebase).
v5: Remove bogus assertion on math instructions (caught by Piglit).
v6: Remove more restrictions on math instructions (caught by Eric).
Make ADDC and SUBB helpers set accumulator writes, like MAC and
MACH (caught by Matt).
v7: Don't implicitly force ALU3 operations to SIMD8 (we've been able
to do SIMD16 versions since Haswell, but didn't when I originally
wrote this code).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Heavily based on Keith Packard's existing brw_disasm.c code. I've tried
to go through most of the pieces (like SFIDs) and update the lists to
include features added in recent generations.
v2: Port to use the C-based instruction emitters. This allows us to use
C99 array initializers, which tidies up some of the code.
v3: Improve decoding of render target write messages.
v4: Update for BRW_REGISTER_TYPE becoming an abstraction.
v5: Rebase on Chris Forbes' SFID message defines.
v6: Fix disassembly of UV immediates; remove silly casts.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Broadwell significantly changes the EU instruction encoding. Many of
the fields got moved to different bit positions; some even got split
in two.
With so many changes, it was infeasible to continue using struct
brw_instruction. We needed a new representation.
This new approach is a bit different: rather than a struct, I created a
class that has four DWords, and helper functions that read/write various
bits. This has several advantages:
1. We can create several different names for the same bits. For
example, conditional modifiers, SFID for SEND instructions, and the
MATH instruction's function opcode are all stored in bits 27:24.
In each situation, we can use the appropriate setter function:
set_sfid(), set_math_function(), or set_cond_modifier(). This
is much easier to follow.
2. Since the fields are expressed using the original 128-bit numbers,
the code to create the getter/setter functions follows the table in
the documentation very closely.
To aid in debugging, I've enabled -fkeep-inline-functions when building
gen8_instruction.c. Otherwise, these functions cannot be called by
gdb, making it insanely difficult to print out anything.
Kenneth Graunke wrote most of this code. Damien Lespiau ported it to
C99. Xiang Haihao added media fields. Zhao Yakui added indirect
addressing support. Eric Anholt added an assertion to make sure that
values fit in the alloted number of bits.
v2: Update for brw_reg_type_to_hw_type(), which necessitates passing
brw_context pointers around everywhere.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Xiang, Haihao <haihao.xiang@intel.com>
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Matt Turner <mattst88@gmail.com>
While we probably won't ever use these, having them makes it easy to
share disassembler code between intel-gpu-tools and Mesa.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This is not observed to actually fix anything, but the PRM says this
field must be zero for other surface types.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Previously this was missing many interesting fields. Having them decoded
makes debugging views much easier.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
The index passed to the function is already unsigned, and internally
we threat it as unsigned.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
The textures array is defined as a number of NV50_MAX_PIPE_CONSTBUFS
per shader stage. Currently the nv50 driver handles only 3 shader
stages, thus we wreck chaos when accessing array-out-of-bounds.
Cc: 9.1 9.2 10.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
The textures array is defined as a number of PIPE_MAX_SAMPLERS per shader stage.
Currently nv50 driver handles only 3 shader stages, thus we wreck chaos when
accessing array-out-of-bounds.
Fixes a segfault in piglit/bin/arb_texture_buffer_object-data-sync -fbo -auto
Cc: 9.1 9.2 10.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Use the kernel driver name are returned by drmGetVersion() for
non-pci(platform) devices.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
v2 (Emil): Rebased and weaked commit message.
Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Culled out of the "loader: refactor duplicated code into loader util lib"
patch by Rob Clark.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
As per original approach by Rob, each user of the loader lib should include
loader.h and the pci_id_driver_map.h header will be used exclusively by the
loader.
Add back the include guard __IS_LOADER and remove no longer needed include
folder in the scons build.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Additionally this commit removes the following exported functions
_gbm_udev_device_new_from_fd()
_gbm_fd_get_device_name()
_gbm_log()
All three were erroneously marked as exported since their inception.
Neither of them has ever been a part of the API thus there should be
no users of them.
Cc: Chad Versace <chad.versace@linux.intel.com>
Cc: Kristian Høgsberg <krh@bitplanet.net>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
All the various window system integration layers duplicate roughly the
same code for figuring out device and driver name, pci-id's, etc. Which
is sad. So extract it out into a loader util lib.
v2 (Emil)
* Separate the introduction of libloader from the code de-duplication.
* Strip out non-pci devices support.
* Add scons + Android build system support.
* Add VISIBILITY_CFLAGS to avoid exporting the loader funcs.
v3 (Emil)
* PIPE_OS_ANDROID is undefined at this scope, use ANDROID
* Make sure we define _EGL_NO_DRM when building only swrast
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Using an unoptimized variant of glamor spending 50% of its CPU time in
brw_draw_prims() (and hitting the cache *very* frequently):
N Min Max Median Avg Stddev
x 200 29200 40500 34900 34750 958.43256
+ 200 31000 40300 34700 34622 916.35941
No difference proven at 95.0% confidence
Similarly, no difference on GLB2.7:
N Min Max Median Avg Stddev
x 63 64.1 71.36 70.69 70.113175 1.6782026
+ 63 63.6 71.18 70.75 70.223651 1.6044186
No difference proven at 95.0% confidence
v2: Rebase on master (by anholt)
v3: Add a missing BEGIN_BATCH(3) to aa_line_parameters -- CACHED_BATCH
didn't have the asserts about batchbuffer usage that ADVANCE_BATCH
does, so we started assertion failing.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Those are the terms used in the docs, and think "n-wide" was something I
just happened to say. Note that shader-db needs updating for the
INTEL_DEBUG=fs parsing.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The original intent was that we'd keep a driver-private copy, and there
would be the normal copy for swrast to make use of without the tuning (or
anything more invasive we might do) specific to i965. Only, we don't
generate swrast code any more, because swrast can't render current shaders
anyway. Thus, our private copy is rather a waste, and we can just do our
backend-specific operations on the linked shader.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tungsten Graphics Inc. was acquired by VMware Inc. in 2008. Leaving the
old copyright name is creating unnecessary confusion, hence this change.
This was the sed script I used:
$ cat tg2vmw.sed
# Run as:
#
# git reset --hard HEAD && find include scons src -type f -not -name 'sed*' -print0 | xargs -0 sed -i -f tg2vmw.sed
#
# Rename copyrights
s/Tungsten Gra\(ph\|hp\)ics,\? [iI]nc\.\?\(, Cedar Park\)\?\(, Austin\)\?\(, \(Texas\|TX\)\)\?\.\?/VMware, Inc./g
/Copyright/s/Tungsten Graphics\(,\? [iI]nc\.\)\?\(, Cedar Park\)\?\(, Austin\)\?\(, \(Texas\|TX\)\)\?\.\?/VMware, Inc./
s/TUNGSTEN GRAPHICS/VMWARE/g
# Rename emails
s/alanh@tungstengraphics.com/alanh@vmware.com/
s/jens@tungstengraphics.com/jowen@vmware.com/g
s/jrfonseca-at-tungstengraphics-dot-com/jfonseca-at-vmware-dot-com/
s/jrfonseca\?@tungstengraphics.com/jfonseca@vmware.com/g
s/keithw\?@tungstengraphics.com/keithw@vmware.com/g
s/michel@tungstengraphics.com/daenzer@vmware.com/g
s/thomas-at-tungstengraphics-dot-com/thellstom-at-vmware-dot-com/
s/zack@tungstengraphics.com/zackr@vmware.com/
# Remove dead links
s@Tungsten Graphics (http://www.tungstengraphics.com)@Tungsten Graphics@g
# C string src/gallium/state_trackers/vega/api_misc.c
s/"Tungsten Graphics, Inc"/"VMware, Inc"/
Reviewed-by: Brian Paul <brianp@vmware.com>
Fixes regression since 9baa45f78b
but some of the piglit fbo-drawbuffers-none tests still don't
pass.
v2: use the right pointer type for 'h'
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Fixes regression from 9baa45f78b
v2: incorporate a few small changes suggested by Roland.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The whole round-pointsize-to-int stuff must only be done with GL legacy
rules (no point_quad_rasterization) or all the wrong edges are lit up.
This was previously in a private branch (d3d pointsprite test complains
loudly otherwise) and got lost in a merge. However, it should certainly
apply to GL point sprite rasterization as well.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
OpenGL does whole-point clipping, that is a large point is either fully
clipped or fully unclipped (the latter means it may extend beyond the
viewport as long as the center is inside the viewport). d3d9 (d3d10 has
no large points) however requires points to be clipped after they are
expanded to a rectangle. (Note some IHVs are known to ignore GL rules at
least with some hw/drivers.)
Hence add a rasterizer bit indicating which way points should be clipped
(some drivers probably will always ignore this), and add the draw interaction
this requires. Drivers wanting to support this and using draw must support
large points on their own as draw doesn't implement vp clipping on the
expanded points (it potentially could but the complexity doesn't seem
warranted), and the driver needs to do viewport scissoring on such points.
Conflicts:
src/gallium/drivers/llvmpipe/lp_context.c
src/gallium/drivers/llvmpipe/lp_state_derived.c
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Commit c13970808 (mesa: GL_EXT_secondary_color is not optional) changed
CHECK_EXTENSION2(EXT_secondary_color, ARB_vetex_program, cap)
to
CHECK_EXTENSION(ARB_vertex_program, cap)
However CHECK_EXTENSION2 checks that either extension is available, not
both. Remove the extension check entirely since the intent was for it to
always be enabled.
v2: Fix glGet*(GL_COLOR_SUM) too. Suggested by Ian.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: 9.2 10.0 <mesa-stable@lists.freedesktop.org>
It's possible to bind a smaller buffer as a constant buffer, than
what the shader actually uses/requires. This could cause nasty
crashes. This patch adds the architecture to pass the maximum
allowable constant buffer index to the jit to let it make
sure that the constant buffer indices are always within bounds.
The behavior follows the d3d10 spec, which says the overflow
should always return all zeros, and overflow is only defined
as access beyond the size of the currently bound buffer. Accesses
beyond the declared shader constant register size are not
considered an overflow and expected to return garbage but consistent
garbage (we follow the behavior which some wlk tests expect which
is to return the actual values from the bound buffer).
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Commit 95bf222603 (cso_context: Fix cso_context::sample_mask initial
value.) fixed the cso sample mask to be initialized to ~0. The cso code
is also careful not to needlessly call set_sample_mask, so we ended up
with the ctx->sample_mask never being set. This broke a number of
EXT_framebuffer_multisample piglit tests.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
ctx->DrawIndirectBuffer wasn't being free'd in _mesa_free_buffer_objects
With this patch, "valgrind --leak-check=full glxgears" on evergreen (CEDAR)
now shows:
LEAK SUMMARY:
definitely lost: 0 bytes in 0 blocks
indirectly lost: 0 bytes in 0 blocks
possibly lost: 0 bytes in 0 blocks
still reachable: 70,228 bytes in 651 blocks
suppressed: 0 bytes in 0 blocks
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
The radeonsi code was not cleaning up either of these items leading to
leaked memory.
v2: Move cleanup to r600_common_context_cleanup instead of duplicating
the logic for SI
CC: "10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The i830 and i915 drivers used them, but they didn't really need to.
They will just be annoying in future patches.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
A future patch will rename some of the fields of gl_viewport_attrib, and
I don't want to update dead code that I can't test.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: Dave Airlie <airlied@redhat.com>
The ES and desktop GL specs diverge here. Yay!
In desktop OpenGL, the driver can perform online compression of
uncompressed texture data. GL_NUM_COMPRESSED_TEXTURE_FORMATS and
GL_COMPRESSED_TEXTURE_FORMATS give the application a list of formats
that it could ask the driver to compress with some expectation of
quality. The GL_ARB_texture_compression spec calls this "suitable for
general-purpose usage." As noted above, this means
GL_COMPRESSED_RGBA_S3TC_DXT1_EXT is not included in the list.
In OpenGL ES, the driver never performs compression.
GL_NUM_COMPRESSED_TEXTURE_FORMATS and GL_COMPRESSED_TEXTURE_FORMATS give
the application a list of formats that the driver can receive from the
application. It is the *complete* list of formats. The
GL_EXT_texture_compression_s3tc spec says:
"New State for OpenGL ES 2.0.25 and 3.0.2 Specifications
The queries for NUM_COMPRESSED_TEXTURE_FORMATS and
COMPRESSED_TEXTURE_FORMATS include COMPRESSED_RGB_S3TC_DXT1_EXT,
COMPRESSED_RGBA_S3TC_DXT1_EXT, COMPRESSED_RGBA_S3TC_DXT3_EXT,
and COMPRESSED_RGBA_S3TC_DXT5_EXT."
Note that the addition is only to the OpenGL ES specification!
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
See-also: http://lists.freedesktop.org/archives/mesa-dev/2013-October/047439.html
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Returning a reference is incorrect if the specified pair was a
temporary -- Instead of that, use decltype() to deduce the correct
return type qualifiers. Fixes a crash in clCreateProgramWithBinary().
Reported-and-tested-by: "Dorrington, Albert" <albert.dorrington@lmco.com>
According to the spec it's allowed to call clBuildProgram() on a
program created from a user-specified binary. We don't need to do
anything to build the program in that case.
Reported-and-tested-by: "Dorrington, Albert" <albert.dorrington@lmco.com>
This avoids the inefficient multiple evaluation of the map result in
the code below. It should cause no functional changes.
Tested-by: "Dorrington, Albert" <albert.dorrington@lmco.com>
From ARB_shader_image_load_store:
If a texture object bound to one or more image units is deleted by
DeleteTextures, it is detached from each such image unit, as though
BindImageTexture were called with <unit> identifying the image unit
and <texture> set to zero.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
And uncomment the relevant lines of the dispatch sanity test.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
v2: Name image format classes consistently, fix array and 3D teximage
selection with layered = GL_FALSE, make sure that the
user-specified layer is less than the number of texture layers,
add some asserts.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Including pack/unpack and texstore code. ARB_shader_image_load_store
requires support for the GL_RG8_SNORM and GL_RG16_SNORM formats, which
map to MESA_FORMAT_SIGNED_GR88 and MESA_FORMAT_SIGNED_GR1616 on
little-endian hosts, and MESA_FORMAT_SIGNED_RG88 and
MESA_FORMAT_SIGNED_RG1616 respectively on big-endian hosts -- only the
former were already present, add support for the latter.
Acked-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Including pack/unpack and texstore code. This texture format is a
requirement for ARB_shader_image_load_store.
Acked-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
v2: Increase MAX_IMAGE_UNITS to what i965 wants and add a separate
MAX_IMAGE_UNIFORMS define, clarify a couple of comments.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
And to check if it can have layers at all. This will be used by the
implementation of ARB_shader_image_load_store.
v2: Fix constness of texobj argument, use assert and return reasonable
default rather than calling unreachable() in default switch case.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The temporary variable used to store _ColorDrawBufferIndexes must be
signed (GLint), otherwise the following conditional will be incorrectly
evaluated. Leading to crashes in the driver/mesa or accessing/writing
to arbitrary memory location. The bug dates back to 2009.
Cc: 10.0 9.2 9.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
_ColorDrawBufferIndexes is defined as GLint* and using a GLuint*
will result in the first part of the conditional to be evaluated to
true always.
Unintentionally introduced by the following commit, this will result
in a driver segfault if one is using an old version of the piglit test
bin/clearbuffer-mixed-format -auto -fbo
commit 03d848ea10
Author: Marek Olšák <marek.olsak@amd.com>
Date: Wed Dec 4 00:27:20 2013 +0100
mesa: fix interpretation of glClearBuffer(drawbuffer)
This corresponding piglit tests supported this incorrect behavior instead of
pointing at it.
Cc: Marek Olšák <marek.olsak@amd.com>
Cc: 10.0 9.2 9.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
If it wasn't necessary for Haswell, it's likely not to be necessary for
Broadwell either.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
We need to disable HiZ for non-8x4 aligned levels, except for level 0, layer
0. For the very first layer we can adjust Width and Height fields of
3DSTATE_DEPTH_BUFFER to make it aligned.
Specifically, add ILO_TEXTURE_HIZ and set the flag only for properly aligned
levels. ilo_texture_can_enable_hiz() is updated to check for the flag.
In tex_layout_validate(), align the depth bo to 8x4 so that we can adjust
Width/Height of 3DSTATE_DEPTH_BUFFER without introducing out-of-bound access.
Finally in rectlist blitter, add the ability to adjust 3DSTATE_DEPTH_BUFFER.
Add tex_layout_init_hiz() before tex_layout_init_format() to decide whether
HiZ should be enabled.
On GEN6, because of layer offsetting, HiZ is enabled only when the texture is
non-mipmapped and non-array. PIPE_USAGE_STAGING is also taken as a hint to
disable HiZ.
These can't use foreach_list since they want to skip over the first few
list elements. Just doing the ad-hoc list walking isn't too bad.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
When handling function calls, we often want to walk through the list of
formal parameters and list of actual parameters at the same time.
(Both are guaranteed to be the same length.)
Previously, we used a pattern of:
exec_list_iterator 1st_iter = <1st list>.iterator();
foreach_iter(exec_list_iterator, 2nd_iter, <2nd list>) {
...
1st_iter.next();
}
This was awkward, since you had to manually iterate through one of
the two lists.
This patch introduces a foreach_two_lists macro which safely walks
through two lists at the same time, so you can simply do:
foreach_two_lists(1st_node, <1st list>, 2nd_node, <2nd list>) {
...
}
v2: Rename macro from foreach_list2 to foreach_two_lists, as suggested
by Ian Romanick.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Formal function parameters are always ir_variable objects, not an
arbitrary ir_instruction. So there's no need to dynamically cast here.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
A function call's parameters are always rvalues. ir_rvalue may not
always be a subclass of ir_instruction in the future, so we should use
the right one.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
foreach_iter and exec_list_iterators have been deprecated for some time now;
we just hadn't ever bothered to convert code to the newer foreach_list
and foreach_list_safe macros.
In these cases, we aren't editing the list, so we can use foreach_list
rather than foreach_list_safe.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Prior to this patch, if we ran out of aperture space during
brw_try_draw_prims(), we would rewind the batch buffer pointer
(potentially throwing some state that may have been emitted by
brw_upload_state()), flush the batch, and then try again. However, we
wouldn't reset the dirty bits to the state they had before the call to
brw_upload_state(). As a result, when we tried again, there was a
danger that we wouldn't re-emit all the necessary state. (Note: prior
to the introduction of hardware contexts, this wasn't a problem
because flushing the batch forced all state to be re-emitted).
This patch fixes the problem by leaving the dirty bits set at the end
of brw_upload_state(); we only clear them after we have determined
that we don't need to rewind the batch buffer.
Cc: 10.0 9.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
An example why it is required:
Let's say there's a fragment shader writing to gl_FragData[0..1].
The user calls: glDrawBuffers(2, {GL_NONE, GL_COLOR_ATTACHMENT0});
That means gl_FragData[0] is unused and gl_FragData[1] is written
to GL_COLOR_ATTACHMENT0.
st/mesa was skipping the GL_NONE draw buffer, therefore gl_FragData[0]
was written to GL_COLOR_ATTACHMENT0, which was wrong.
This commit fixes it, but drivers must also be fixed not to crash when
binding NULL colorbuffers. There is also a new set of piglit tests for this.
The MSAA state also had to be fixed not to crash when reading fb->cbufs[0].
Reviewed-by: Brian Paul <brianp@vmware.com>
yInverted is used by EGL_NOK_texture_from_pixmap to indicate that
window system rendering is y-inverted compared to OpenGL texture
representation. This extension is only known to be used with X11
window system where sane default is GL_TRUE.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=73371
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
dri2_add_config makes decisions based on NOK_texture_from_pixmap so
it needs to be enabled before calling dri2_add_configs_for_visuals.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Piglit was recently changed to expect the correct error code (piglit
commit 271b998), so it started failing on Mesa. This corrects that
failing and adds some spec quotations to justify the errrors set.
The code was rearranged a little bit to match the order listed in the
spec.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
brw_queryobj.c needs a version of write_timestamp that works on all
generations for the QueryCounter() driver hook. So there's no point in
duplicating it in gen6_queryobj.c.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, Mesa enforced the following rule (from
ARB_geometry_shader4's list of criteria for framebuffer completeness):
* If any framebuffer attachment is layered, all attachments must have
the same layer count. For three-dimensional textures, the layer count
is the depth of the attached volume. For cube map textures, the layer
count is always six. For one- and two-dimensional array textures, the
layer count is simply the number of layers in the array texture.
{ FRAMEBUFFER_INCOMPLETE_LAYER_COUNT_ARB }
However, when ARB_geometry_shader4 was adopted into GL 3.2, this rule
was dropped; GL 3.2 permits different attachments to have different
layer counts. This patch brings Mesa in line with GL 3.2.
In order to ensure that layered clears properly clear all layers, we
now have to keep track of the maximum number of layers in a layered
framebuffer.
Fixes the following piglit tests in spec/!OpenGL 3.2/layered-rendering:
- clear-color-all-types 1d_array mipmapped
- clear-color-all-types 1d_array single_level
- clear-color-mismatched-layer-count
- framebuffer-layer-count-mismatch
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
From section 4.4.4 (Framebuffer Completeness) of the GL 3.2 spec:
If any framebuffer attachment is layered, all populated
attachments must be layered. Additionally, all populated color
attachments must be from textures of the same target.
We weren't checking that the attachments were from textures of the
same target.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Commit 1a92881 added extra flushes to fix a HiZ hang in
WebGL Google Maps. With the extra flushes emitted by the previous two
patches, the flushes added by 1a92881 are redundant.
Tested with the same criteria as in 1a92881: by zooming in and out
continuously for 2 hours on Sandybridge Chrome OS (codename
Stumpy) without a hang.
CC: Kenneth Graunke <kenneth@whitecape.org>
CC: Stéphane Marchesin <marcheu@chromium.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Unconditionally set brw->need_workaround_flush at the top of gen6 blorp
state emission.
The art of emitting workaround flushes on Sandybridge is mysterious and
not fully understood. Ken and I believe that
intel_emit_post_sync_nonzero_flush() may be required when switching from
regular drawing to blorp. This is an extra safety measure to prevent
undiscovered difficult-to-diagnose gpu hangs.
I verified that on ChromeOS, pre-patch, need_workaround_flush was not
set at the top of blorp, as Paul expected. To verify, I inserted the
following debug code at the top of gen6_blorp_exec(), restarted the ui,
and inspected the logs in /var/log/ui. The abort gets triggered so early
that the browser never appears on the display.
static void
gen6_blorp_exec(...)
{
if (!brw->need_workaround_flush) {
fprintf(stderr, "chadv: %s:%d\n", __FILE__, __LINE__);
abort();
}
...
}
CC: Kenneth Graunke <kenneth@whitecape.org>
CC: Stéphane Marchesin <marcheu@chromium.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
This patch makes the workaround code in gen6 blorp follow the pattern
established in the regular draw path. It shouldn't result in any
behavioral change.
On gen6, there are two places where we emit 3D_CMD_PRIM: brw_emit_prim()
and gen6_blorp_emit_primitive(). brw_emit_prim() sets
need_workaround_flush immediately after emitting the primitive, but
blorp does not. Blorp sets need_workaround_flush at the bottom of
brw_blorp_exec().
This patch moves the need_workaround_flush from brw_blorp_exec() to
gen6_blorp_emit_primitive(). There is no need to set
need_workaround_flush in gen7_blorp_emit_primitive() because the
workaround applies only to gen6.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
We weren't handling the LUMINANCE_SNORM, LUMINANCE_ALPHA_SNORM and
INTENSITY_SNORM cases. Note that adding these cases here does not
require a driver to support rendering to these surface types. If
the driver can't do it we'll report an incomplete framebuffer.
NVIDIA doesn't support GL_EXT_texture_snorm but their driver
accepts these formats in glRenderBufferStorage().
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This allows the caller to execute it in a loop rather than
hand-rolling a separate call for each stage.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These are replaced with
ctx->Const.Program[MESA_SHADER_{VERTEX,FRAGMENT,GEOMETRY}]. In
patches to follow, this will allow us to replace a lot of ad-hoc logic
with a variable index into the array.
With the exception of the changes to mtypes.h, this patch was
generated entirely by the command:
find src -type f '(' -iname '*.c' -o -iname '*.cpp' -o -iname '*.py' \
-o -iname '*.y' ')' -print0 | xargs -0 sed -i \
-e 's/Const\.VertexProgram/Const.Program[MESA_SHADER_VERTEX]/g' \
-e 's/Const\.GeometryProgram/Const.Program[MESA_SHADER_GEOMETRY]/g' \
-e 's/Const\.FragmentProgram/Const.Program[MESA_SHADER_FRAGMENT]/g'
Suggested-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Commit eda21d2a30 fixed the rasterization
of points for Direct3D but ended up breaking the rasterization of OpenGL
non-sprite points, in particular conform's pntrast.c test.
The only way to get both working is to properly honour
pipe_rasterizer::point_quad_rasterization, and follow the weird OpenGL
rule when it is false.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
We definitely want to fall through to the unsynchronized map case, instead
of wasting bandwidth on a copy. Prevents a -43.2407% +/- 1.06113% (n=49)
performance regression on aa10perf when teaching glamor to provide the
GL_INVALIDATE_RANGE_BIT information.
This is a performance fix, which I usually wouldn't cherry-pick to stable.
But this was really was just a bug in the code, its presence would
discourage developers from giving us the best information they can, and I
think we've got fairly high confidence in the unsynchronized map path
already.
Cc: 10.0 9.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
While incorrect, it probably wouldn't affect anyone ever: You'd have to do
an appropriately-formatted readpixels into a PBO, then overwrite the tail
end of the updated area of the PBO with glBufferSubData(), and you
wouldn't get appropriate synchronization.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
The earlier assert made sure that our math didn't exceed our bounds, but
this makes sure that we don't overflow from the high bits X into the low
bits of Y. We've already put checks in intel_miptree_blit(), but I've
wanted to expand the type in our protoype from short to uint32_t, and we
could get in trouble with intel_emit_linear_blit() if we did.
v2: Add Ken's comment about the funny language extension used.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> (v1)
This was one of the things we always wanted to do to this, to make it more
useful than just (value << FIELD_MASK).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
With all of the flipping and pitch twiddling and miptree layout involved
in our blits, there are lots of ways for us to scribble outside of a
buffer. Put in a check that we're not about to do so.
This catches a bug that glamor was running into.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This fixes the following compile error:
src\glsl\ir_constant_expression.cpp(1405) : error C2666: 'copysign' : 3
overloads have similar conversions
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Add for now some simple/basic query support (ie. things not actually
requiring the GPU). Might change around a bit when I actually add
GPU queries, but for now this enables some useful performance info
in the GALLIUM_HUD. For example:
GALLIUM_HUD=fps+batches+batches-sysmem+batches-gmem+restores,draw-calls
The driver specific specific queries are:
+ draw-calls
+ batches - number of batches per second, sum of batches-sysmem
plus batches-gmem
+ batches-gmem - render a set of tiles in GMEM, for each tile
(optionally) system mem -> gmem (restore), plus N draws,
plus gmem -> system mem (resolve) per second
+ batches-sysmem - N draws to system memory (GMEM bypass) per
second
+ restores - number of GMEM batches that required restore per
second
Ideally for GMEM rendering, you want batches-gmem to equal fps. If
the app is doing something that triggers multiple passes (ie. requires
extra round trip gmem <-> system memory) then the # of batches per
second will go up relative to fps.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Since we now have the cmdstream patch mechanism needed for hw binning,
might as well also use it for RB_RENDER_CONTROL updates. This avoids
the need to use RMW (and associated WFI) to update RB_RENDER_CONTROL.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
The binning pass sorts vertices into which bins/tiles they apply to.
The visibility information generated during the binning pass can be
used to speed up the rendering pass by filtering out vertices which
do not apply to the current tile. See:
https://github.com/freedreno/freedreno/wiki/Adreno-tiling#optimized-approach
This brings a significant fps boost. A rough assortment of tests
(supertuxkart, etracer, tremulous, glmark2 'build' test, etc) seems
to yield a ~35-45% fps improvement.
For now, to be conservative, the binning pass is not enabled yet by
default. To enable it use:
FD_MESA_DEBUG=binning
So far I haven't found anything that breaks with binning enabled,
but I'd like a bit more testing before I enable it as default.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
The hardware is broken with nonzero texel offsets and unnormalized
coordinates; instead of doing correct offsetting, we get garbage.
This just extends the existing workaround for ir_txf and
ir_tg4+gsampler2DRect to also consider ir_tex+gsampler2DRect.
Fixes broken rendering in 'tesseract' when 'mesa_texrectoffset_bug' is
not enabled; also fixes the new piglit test
'tests/spec/glsl-1.30/execution/fs-textureOffset-Rect'.
Has been broken ~forever; suggesting including this in only 10.0 because
the lowering pass doesn't exist in 9.2 or earlier so would require quite
a different patch.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: Lee Salzman <lsalzman@gmail.com>
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Also rename "shaderType" param of is_varying_var() to "stage".
Reviewed-by: Brian Paul <brianp@vmware.com>
This reduces confusion since gl_shader::Type is sometimes
GL_SHADER_PROGRAM_MESA but is more frequently
GL_SHADER_{VERTEX,GEOMETRY,FRAGMENT}. It also has the advantage that
when switching on gl_shader::Stage, the compiler will alert if one of
the possible enum types is unhandled. Finally, many functions in
src/glsl (especially those dealing with linking) already use
gl_shader_stage to represent pipeline stages; using gl_shader::Stage
in those functions avoids the need for a conversion.
Note: in the process I changed _mesa_write_shader_to_file() so that if
it encounters an unexpected shader stage, it will use a file suffix of
"????" rather than "geom".
Reviewed-by: Brian Paul <brianp@vmware.com>
v2: Split from patch "mesa: Store gl_shader_stage enum in gl_shader objects."
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Also move the related #define MESA_SHADER_STAGES. This will allow
gl_shader_stage to be used in struct gl_shader.
Reviewed-by: Brian Paul <brianp@vmware.com>
v2: Split from patch "mesa: Store gl_shader_stage enum in gl_shader objects."
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
v2: Split from patch "mesa: Store gl_shader_stage enum in gl_shader objects."
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, we had an enum called gl_shader_type which represented
pipeline stages in the order they occur in the pipeline
(i.e. MESA_SHADER_VERTEX=0, MESA_SHADER_GEOMETRY=1, etc), and several
inconsistently named functions for converting between it and other
representations:
- _mesa_shader_type_to_string: gl_shader_type -> string
- _mesa_shader_type_to_index: GLenum (GL_*_SHADER) -> gl_shader_type
- _mesa_program_target_to_index: GLenum (GL_*_PROGRAM) -> gl_shader_type
- _mesa_shader_enum_to_string: GLenum (GL_*_{SHADER,PROGRAM}) -> string
This patch tries to clean things up so that we use more consistent
terminology: the enum is now called gl_shader_stage (to emphasize that
it is in the order of pipeline stages), and the conversion functions are:
- _mesa_shader_stage_to_string: gl_shader_stage -> string
- _mesa_shader_enum_to_shader_stage: GLenum (GL_*_SHADER) -> gl_shader_stage
- _mesa_program_enum_to_shader_stage: GLenum (GL_*_PROGRAM) -> gl_shader_stage
- _mesa_progshader_enum_to_string: GLenum (GL_*_{SHADER,PROGRAM}) -> string
In addition, MESA_SHADER_TYPES has been renamed to MESA_SHADER_STAGES,
for consistency with the new name for the enum.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Also rename the "target" field of _mesa_glsl_parse_state and the
"target" parameter of _mesa_shader_stage_to_string to "stage".
Reviewed-by: Brian Paul <brianp@vmware.com>
The adjustment needs to be applied to the y coordinates and not the x
coordinates, just like the equivalent code for lines and triangles in
lp_setup_line.c and lp_setup_tri.c.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Zack Rusin <zackr@vmware.com>
This was inadvertently forgotten when replacing gl_rasterization_rules
with lower_left_origin and half_pixel_center (commit
2737abb44e).
This makes a difference when lower_left_origin != half_pixel_center, e.g,
D3D10.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Zack Rusin <zackr@vmware.com>
When the depth buffer is to be read, perform a Depth Buffer Resolve if it has
been rendered. When the depth buffer is to be rendered, perform a HiZ Buffer
Resolve when the depth buffer is modified externally.
Add blitter functions to perform Depth Buffer Clear, Depth Buffer Resolve, and
Hierarchical Depth Buffer Resolve. Those functions set ilo_blitter up and
pass it to the pipelines to emit the commands.
Allow HiZ op to be specified in 3DSTATE_WM. Pass depth format directly in
gen7_emit_3DSTATE_SF. Use tex->hiz.bo to determine if HiZ exists. Fix
3DSTATE_SF for the case when there is no ilo_rasterizer_state. Fix
3DSTATE_PS for the case when there is no ilo_shader_state.
GEN6 has several requirements regarding the LOD/Depth/Width/Height of the
render targets and the depth buffer. We used to offset to the layers in
question unconditionally to meet the requirements. With this commit,
offseting is done only when the requirements are not met.
On Haswell, POW takes 24 cycles, while EXP2 only takes 14. Plus, using
POW requires putting 2.0 in a register, while EXP2 doesn't.
I believe that EXP2 will be faster than POW on basically all GPUs, so
it makes sense to optimize it.
Looking at the savage2 subset of shader-db:
total instructions in shared programs: 113225 -> 113179 (-0.04%)
instructions in affected programs: 2139 -> 2093 (-2.15%)
instances of 'math pow': 795 -> 749 (-6.14%)
instances of 'math exp': 389 -> 435 (11.8%)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This patch creates a new generic is_value() method, which checks if an
ir_constant has a particular value. (For vectors, it must have the
single value repeated across all components.)
It then rewrites the is_zero/is_one/is_negative_one methods to use this
generic helper. All three were basically identical except for the value
they checked for. The other difference is that is_negative_one rejects
boolean types. The new is_value function maintains this behavior, only
allowing boolean types when checking for 0 or 1.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Using the get_local_param_pointer helper ensures that the LocalParams
arrays have actually been allocated before attempting to use them.
glProgramLocalParameters4fvEXT needs to do a bit of extra checking,
but it can be simplified since the helper has already validated the
target.
Fixes crashes in programs that use Cg (for example, Awesomenauts,
Rocketbirds: Hardboiled Chicken, and Tiny and Big: Grandpa's Leftovers)
since commit e5885c119d
(mesa: Dynamically allocate the storage for program local parameters.)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=73136
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Laurent Carlier <lordheavym@gmail.com>
We don't support MSAA (ie, number of samples is always one) therefore
sample_mask boils down to a synonym of the rasterizer_discard flag.
Also, this change makes setup actually use the value received in
lp_setup_set_rasterizer_discard instead of reaching out to llvmpipe
upper layers to re-fetch it.
Based on Si Chen's draft.
With this patch `wgf11multisample Coverage passes 100%` on the UMD
D3D10 state tracker.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Si Chen <sichen@vmware.com>
The initial value of cso_context::sample_mask_saved is irrelevant as it
will be overwritten with cso_context::sample_mask in
cso_save_sample_mask. Therefore it is cso_context::sample_mask that
needs to be properly initialized.
This fixes regressions in blits and mipmap generation after adding
support for sample_mask to llvmpipe.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Implement Alpha to Coverage by discarding a fragment alpha component is
less than 0.5. This is a joint work of Jose and Si.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Commit 9119269ca1 moved the texel
buffer allocation to _swrast_texture_span(), however, when compiled
with OpenMP support this code already runs multi-threaded so a
critical section is required to prevent multiple allocations and
rendering errors.
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Evidently, there's some other definition of "min" and "max" that
causes MSVC to choke on these function names. Renaming to min2()
and max2() fixes things.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Both brw_defines.h and intel_reg.h defined PIPE_CONTROL fields, which
had similar names, but couldn't be used in the same way. (One had
built-in shifts, and the other didn't...)
Delete the unused set to preserve sanity.
(Eric wrote an almost identical patch back in August, so I believe he
approves.)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
This patch fixes this build error with gcc <= 4.5 and clang <= 3.1.
CC clientattrib.lo
In file included from ../../include/GL/glx.h:333:0,
from glxclient.h:45,
from clientattrib.c:32:
../../include/GL/glxext.h:275:13: error: redefinition of typedef 'GLXContextID'
../../include/GL/glx.h:171:13: note: previous declaration of 'GLXContextID' was here
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70591
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* The Haiku renderers need to link to libGL to function properly
in all usage contexts. As mesa drivers build before gallium
targets, we couldn't properly link the mesa swrast driver to
the gallium libGL target for Haiku.
* This is likely better as it mimics how glx is laid out ensuring
the Haiku libGL is better understood.
* All renderers properly link in libGL now.
Acked-by: Brian Paul <brianp@vmware.com>
These are just software flag values (not hardware specific values), and
aren't used anywhere. Delete them to avoid confusion.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Add extra null check in auto generated indirect_init.c via
src/mapi/glapi/gen/glX_proto_send.py
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Previously we left the size of this vgrf as 1, which caused register
allocation to be subtly broken. If we were lucky we would explode in
the post-alloc instruction scheduler; if we were unlucky we'd just stomp
on someone else and get broken rendering.
Fixes crash when running `tesseract` with the following settings:
msaa 4
glineardepth 0
Also fixes the piglit test:
arb_sample_shading-builtin-gl-sample-id
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Cc: Anuj Phogat <anuj.phogat@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=72859
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The preprocessor currently accepts multiple else/elif-groups
per if-section. The GLSL-preprocessor is defined by the C++
specification, which defines the following parse-rule:
if-section:
if-group elif-groups(opt) else-group(opt) endif-line
This clearly only allows a single else-group, that has to come
after any elif-groups.
So let's modify the code to follow the specification. Add test
to prevent regressions.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Carl Worth <cworth@cworth.org>
Cc: 10.0 <mesa-stable@lists.freedesktop.org>
The preprocessor has always replaced multi-line comments with a single space
character, (as required by the specification), but as of commit
bd55ba568b the lexer also emitted a NEWLINE
token for each newline within the comment, (in order to preserve line
numbers).
The emitting of NEWLINE tokens within the comment broke the rule of "replace a
multi-line comment with a single space" as could be exposed by code like the
following:
#define FOO a/*
*/b
FOO
Prior to commit bd55ba568b, this code defined
the macro FOO as "a b" as desired. Since that commit, this code instead
defines FOO as "a" and leaves a stray "b" in the output.
In this commit, we fix this by not emitting the NEWLINE tokens while lexing
the comment, but instead merely counting them in the commented_newlines
variable. Then, when the lexer next encounters a non-commented newline it
switches to a NEWLINE_CATCHUP state to emit as many NEWLINE tokens as
necessary (so that subsequent parsing stages still generate correct line
numbers).
Of course, it would have been more clear if we could have written a loop to
emit all the newlines, but flex conventions prevent that, (we must use
"return" for each token we emit).
It similarly would have been clear to have a new rule restricted to the
<NEWLINE_CATCHUP> state with an action much like the body of this if
condition. The problem with that is that this rule must not consume any
characters. It might be possible to write a rule that matches a single
lookahead of any character, but then we would also need an additional rule to
ensure for the <EOF> case where there are no additional characters available
for the lookahead to match.
Given those considerations, and given that the SKIP-state manipulation already
involves a code block at the top of the lexer function, before any rules, it
seems best to me to go with the implementation here which adds a similar
pre-rule code block for the NEWLINE_CATCHUP.
Finally, this commit also changes the expected output of a few, existing glcpp
tests. The change here is that the space character resulting from the
multi-line comment is now emitted before the newlines corresponding to that
comment. (Previously, the newlines were emitted first, and the space character
afterward.)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=72686
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Two things make this code confusing:
1. The uncharacteristic manipulation of lexer start state outside of
flex rules.
2. The confusing semantics of the skip_stack (including the
"lexing_if" override and the SKIP_NO_SKIP state).
This new comment is intended to bring a bit more clarity for any readers.
There is no intended beahvioral change to the code here. The actual code
changes include better indentation to avoid an excessively-long line, and
using the more descriptive INITIAL rather than 0.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Support all levels of a supported texture format.
Using 1024x1024, RGBA 8888 source, mipmap
internal-format Before (MB/sec) mipmap (MB/sec)
GL_RGBA 627.15 615.90
GL_RGB 456.35 611.53
512x512
GL_RGBA 597.00 619.95
GL_RGB 440.62 611.28
256x256
GL_RGBA 487.80 587.42
GL_RGB 376.63 585.00
Benchmark has been sent to mesa-dev list: teximage_enh
Signed-off-by: Courtney Goeltzenleuchter <courtney@LunarG.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
MESA_FORMAT_XRGB8888 is equivalent to MESA_FORMAT_ARGB8888 in terms
of storage on the device, so okay to use this optimized copy routine.
This series builds on work from Frank Henigman to optimize the
process of uploading a texture to the GPU. This series adds support for
MESA_XRGB_8888 and full miptrees where were found to be common activities
in the Smokin' Guns game. The issue was found while profiling the app
but that part is not benchmarked. Smokin-Guns uses mipmap textures with
an internal format of GL_RGB (MESA_XRGB_8888 in the driver).
These changes need a performance tool to run against to show how they
improve execution performance for specific texture formats. Using this
benchmark I've measured the following improvement on my Ivybridge
Intel(R) Xeon(R) CPU E3-1225 V2 @ 3.20GHz.
1024x1024 texture size
internal-format Before (MB/sec) XRGB (MB/sec)
GL_RGBA 628.15 627.15
GL_RGB 265.95 456.35
512x512 texture size
internal-format Before (MB/sec) XRGB (MB/sec)
GL_RGBA 600.23 597.00
GL_RGB 255.50 440.62
256x256 texture size
internal-format Before (MB/sec) XRGB (MB/sec)
GL_RGBA 489.08 487.80
GL_RGB 229.03 376.63
Benchmark has been sent to mesa-dev list: teximage
Signed-off-by: Courtney Goeltzenleuchter <courtney@LunarG.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Only a Mesa bug could cause this function to be called with an
out-of-range index, so raise an assertion if that ever happens.
Reviewed-by: Brian Paul <brianp@vmware.com>
This patch replaces the following pattern:
foo bar[MESA_SHADER_TYPES] = {
...
};
With:
foo bar[] = {
...
};
STATIC_ASSERT(Elements(bar) == MESA_SHADER_TYPES);
This way, when a new shader type is added in a future version of Mesa,
we will get a compile error to remind us that the array needs to be
updated.
Reviewed-by: Brian Paul <brianp@vmware.com>
This argument was carrying the name of the shader target (as a
string). We can get this just as easily by calling
_mesa_shader_enum_to_string().
Reviewed-by: Brian Paul <brianp@vmware.com>
Previously, _mesa_glsl_shader_target_name() had an overload for GLenum
and an overload for the gl_shader_type enum, each of which behaved
differently. However, since GLenum is a synonym for unsigned int, and
unsigned ints are often used in place of gl_shader_type (e.g. in loop
indices), there was a big risk of calling the wrong overload by
mistake. This patch gives the two overloads different names so that
it's always clear which one we mean to call.
Reviewed-by: Brian Paul <brianp@vmware.com>
This reverts commit 136a12ac98.
According to belak51 on IRC, this commit broke Allegro, which would no
longer compile. Applications apparently expect the GLXContextID typedef
to exist in glx.h; removing it breaks them. A bit of searching around
the internet revealed other complaints since upgrading to Mesa 10.
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
According to git blame, this hasn't been used in over two years:
commit d2235b0f46
Author: Eric Anholt <eric@anholt.net>
Date: Thu Nov 17 17:01:58 2011 -0800
i965: Always handle GL_DEPTH_TEXTURE_MODE through the shader.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
instead of ignoring the argument and always dumping to
standard output.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Using RMW on banked context registers is not safe. The value read
could be the wrong one. So if there has been a DRAW_IDX launched,
the RMW must be preceded by a WAIT_FOR_IDLE to ensure the read part
of RMW sees the correct value.
To avoid unnecessary WFI's, keep track if there is a need for WFI,
and only emit one if needed. Furthermore, keep track if we even
need to update the register in the first place.
And to cut down on the amount of RMW to avoid excessive WFI's, at the
tiling/GMEM level we can always overwrite RB_RENDER_CONTROL, as the
state at beginning of draw/clear cmds (which we IB to) is always
undefined. In the draw/clear commands, we always still use RMW (with
WFI if needed), but only if the register value actually changes. (At
points where the current value cannot be known, the saved value is
reset to ~0, which includes bits outside of RBRC_DRAW_STATE, so there
never is chance for confusion.)
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Actually assign VSC_PIPE's properly, which will be needed for tiling.
And introduce fd_tile for per-tile state (including the assignment of
tile to VSC_PIPE). This gives us the proper pipe setup that we'll
need for hw binning pass, and also cleans things up a bit by not having
to pass so many parameters around. And will also make it easier to
introduce different tiling patterns (since we may no longer render
tiles in a simple left-to-right top-to-bottom pattern).
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Previously we were creating a new LLVMContext every time that we called
radeon_llvm_parse_bitcode, which caused us to leak the context every time
that we compiled a CL program.
Sadly, we can't dispose of the LLVMContext at the point that it was being
created because evergreen_launch_grid (and possibly the SI equivalent) was
assuming that the context used to compile the kernels was still available.
Now, we'll create a new LLVMContext when creating EG/SI compute state, store
it there, and pass it to all of the places that need it.
The LLVM Context gets destroyed when we delete the EG/SI compute state.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
CC: "10.0" <mesa-stable@lists.freedesktop.org>
This fixes a crash where old_view->context was already freed in the
pipe_sampler_view_reference function contained in
src/gallium/auxiliary/utils/u_inlines.h. As a result, the
sampler_view_destroy function pointer contained 0xfeeefeee indicating
freed heap memory.
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jonathan Liu <net147@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Every driver supports it. All current and future Gallium drivers always
support it, and all existing classic drivers support it.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Useful in its own right, but also needed for adaptive vsync.
No regressions in the piglit glx-oml-sync-control-getmscrate test.
Signed-off-by: Lauri Kasanen <cand@gmx.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
It is the maximum number of back buffers, but the name is confusing and is
easily read as the maximum back buffer index. Chage to DRI3_NUM_BACK to make
the intended usage a bit clearer.
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Just copying code from the dri2 path to set up the fast color clear state.
This also removes a couple of bogus intel_region_reference calls.
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The buffer-object is the persistent thing passed through the loader, so when
updating an image buffer, check to see if it is already bound to the provided
bo. The region, on the other hand, is allocated separately for the miptree,
and so will never be the same as that passed back from the loader.
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Move the depth field up with width and height.
Remove unused previous_time and frames fields.
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
libxshmfence v1.0 foolishly used 'int32_t *' for the fence type, which
works when the fence is a linux futex. However, version 1.1
changes the exported datatype to 'struct xshmfence *'
Require libxshmfence version 1.1 and switch the API around.
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
While looking through the documentation, I found this in the Sandybridge
PRM (Volume 4, Part 1, Page 140):
"Use of sample_c with SURFTYPE_CUBE surfaces is undefined with the
following surface formats: I24X8_UNORM, L24X8_UNORM, A24X8_UNORM,
I32_FLOAT, L32_FLOAT, A32_FLOAT."
I haven't observed this to be true, but it suggests that we may want to
use other formats.
We already perform DEPTH_TEXTURE_MODE swizzling in the shaders, and
don't rely on the surface format to splat things appropriately. So
using RED should work just as well as INTENSITY.
A few notes about the formats:
- R24_UNORM_X8_TYPELESS has the exact same properties as I24X8_UNORM.
- R16_UNORM and R32_FLOAT are additionally supported as a render target,
while the old I16_UNORM/I32_FLOAT formats are not.
- R32_FLOAT_X8X24_TYPELESS is not supported as a render target, while
the old format (R32G32_FLOAT) was. However, it shares the same
properties as the formats we use for Z24, so it should suffice.
This makes translate_tex_format and brw_blorp_surface_info::set
a bit more similar.
No Piglit changes on Sandybridge or Ivybridge. No oglconform changes on
Sandybridge.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Emitting flushes before depth and hiz resolves at the top of blorp's
state emission fixes the hang. Marchesin and I found the fix
experimentally, as opposed to adhering to a documented hardware
workaround. A more minimal fix likely exists, but this gets the job
done.
Fixes HiZ hangs in the new WebGL Google maps on Sandybridge Chrome OS.
Tested by zooming in and out continuously for 2 hours.
This patch is based on
8bc07bb701
CC: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70740
Signed-off-by: Stéphane Marchesin <marcheu@chromium.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Broadwell allows us to specify an arbitrary value for QPitch, rather
than baking a specific formula into the hardware and requiring software
to lay things out to match. The only restriction is that the software
provided QPitch needs to be large enough so successive array slices do
not overlap.
In order to support this flexibility, software needs to specify QPitch
in a bunch of packets. Storing QPitch makes that easy, and allows us to
adjust it in a single place should we wish to change it in the future.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Broadwell introduces support for Q, UQ, and HF types. It also extends
DF support to allow immediate values.
Irritatingly, although HF and DF both support immediates, they're
represented by a different value depending on the register file.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Ivybridge, Baytrail, and Haswell support double float register types,
but do not support them as immediate values.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
On released hardware, values 4-6 are overloaded. For normal registers,
they mean UB/B/DF. But for immediates, they mean UV/VF/V.
Previously, we just created #defines for each name, reusing the same
value. This meant we could directly splat the brw_reg::type field into
the assembly encoding, which was fairly nice, and worked well.
Unfortunately, Broadwell makes this infeasible: the HF and DF types are
represented as different numeric values depending on whether the
source register is an immediate or not.
To preserve sanity, I decided to simply convert BRW_REGISTER_TYPE_* to
an abstract enum that has a unique value for each register type, and
write translation functions. One nice benefit is that we can add
assertions about register files and generations.
I've chosen not to convert brw_reg::type to the enum, since converting
it caused a lot of trouble due to C++ enum rules (even though it's
defined in an extern "C" block...).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Three-source instructions use a different encoding for register types
(and have a much more limited set to choose from).
Previously, we translated those into BRW_REGISTER_TYPE_* values, then
reused the existing reg_encoding mapping.
Doing it directly is more straightforward and actually less code.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
UB types have never been supported as immediates. On Gen4-5, register
encoding 4 is "Reserved." On Gen6+, it means UV.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Calling the local variables flat_enable and point_sprite_enable is
clearer than dw16 and such. It also matches the names used in
calculate_attr_overrides, which computes them.
v2: Add /* dw16 */ and /* dw10 */ comments, requested by Jordan.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
calculate_attr_overrides is responsible for computing the point sprite
and flat-shading enable bitfields. It does so by OR'ing in a bunch of
bits. However, it relied on the caller to set the initial value to
zero. This is pretty fragile - if the caller neglects to zero out those
variables, then the enable bitfields end up full of garbage, which shows
up as random things being flat-shaded.
This patch moves the zero-initialization into calculate_attr_overrides,
so that the computation is completely in one place.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
git blame ascribes this to the initial commit of the driver.
No released hardware has ever supported half float, according to the
documentation for SrcType in the ISA reference.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
If no function signature is found for a function name, report that the
function is not found instead of printing an empty list of candidates.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This patch changes the error reporting behavior for incorrect function
invocation (triggered by match_function_by_name() unable to find a
matching function call) from using the line number information
associated to the function name term to using the line number
information of the entire function expression. Fixes bug #72264.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=72264
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
For GLSL 1.50 we can get frag shaders with primitive id as an
input, add support to the translator for this.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
In load_texunit_bumpmap tc_array is asserted so lets assert
rot_mat_0 and rot_mat_1 also which are coming from same path.
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Check for malloc() returning null to fix Klocwork warnings.
Minor clean-ups by BrianP.
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The GL_RGB32F, GL_RGB32UI and GL_RGB32I texture buffer formats are
only supposed to be allowed if the GL_ARB_texture_buffer_object_rgb32
extension is supported. Note that the texture buffer extensions
require a core profile. This patch adds those checks.
Fixes the soon-to-be-added
arb_clear_buffer_object-negative-bad-internalformat piglit test.
Add function to test if the buffer is already mapped and if so,
if the mapped range overlaps the given range.
Modify the _mesa_InvalidateBufferSubData function to use
the new function.
Enable buffer_object_subdata_range_good() to use bufferobj_range_mapped
Reviewed-by: Brian Paul <brianp@vmware.com>
alpha, lumincance and intensity formats are illegal in a core context.
Add a check to return MESA_FORMAT_NONE if one of those is requested within
a core context.
Reviewed-by: Brian Paul <brianp@vmware.com>
- change storage class from static to extern
- rename validate_texbuffer_format to _mesa_validate_texbuffer_format
Reviewed-by: Brian Paul <brianp@vmware.com>
- add xml file for extension
- add reference in gl_API.xml
- add pointer to device driver function table (dd.h)
- update dispatch_sanity.cpp
Reviewed-by: Brian Paul <brianp@vmware.com>
Specs say it's legal for implementations to use internal copies, and
the write synchronization seems to work. Fixes clCreateBuffer
(together with previous patches) and buffer-flags piglits.
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Acked-by: Francisco Jerez <currojerez@riseup.net>
v2: Use fewer if statements and functional tricks instead of single-use method,
suggested by Francisco Jerez.
Squash two small patches into one.
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
When LLVM is build with Clang, "llvm-config --cxxflags" contains the
-fcolor-diagnostics flag. It is not recognized by gcc and the build
fails. Fix by removing the flag.
Signed-off-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: Brian Paul <brianp@vmware.com>
The dri2 state tracker is checking for driver support before enabling
dri2ImageExtension version 7. This commit adds a check that also the
kernel driver supports fd sharing through prime.
Note that this adds a libdrm dependency on dri2.c.
v2: Removed unnecessary clamping of bool expression
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Christopher James Halse Rogers <christopher.halse.rogers@canonical.com>
This will avoid compiler warnings in the patch that follows. There
should be no user-visible effect because the change only affects the
behaviour when an invalid enum is passed to
_mesa_shader_type_to_index(), and that can only happen if there is a
bug elsewhere in Mesa.
Reviewed-by: Brian Paul <brianp@vmware.com>
All this cruft was ported from r600g and isn't needed on SI and later
according to hw docs. If we implemented HiS, we would set it to 0.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Replicate some of the gallium pipe transfer functionality.
Also bump minor to signal availability of this feature.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
* These make up the base of what C++ GL Haiku applications
use for 3D rendering.
* Not placed in includes/GL to prevent Haiku headers from
getting installed on non-Haiku systems.
Acked-by: Brian Paul <brianp@vmware.com>
This fixes MSAA resolving for 32-bit integer colorbuffers, which isn't
implemented by the hardware.
It also fixes VM protection faults when resolving MSAA 2D array textures.
This may be a CB bug, because shader-based resolving works fine.
It may also be faster for upside-down and scaled blits.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
For scaled resolve. The filter is only good for magnification.
If somebody has an idea how to implement a good filter for minification,
I'm all ears. I'd have to use derivatives probably.
Reviewed-by: Brian Paul <brianp@vmware.com>
We need this for integer formats and upside-down blits, which Radeons don't
support for MSAA resolving.
It can be used by calling util_blitter_blit.
Reviewed-by: Brian Paul <brianp@vmware.com>
The argument is a i8 pointer not a i32 pointer (even though the value actually
stored/loaded IS i32). Older llvm versions didn't care but 3.2 and newer do
leading to crashes.
Reviewed-by: Zack Rusin <zackr@vmware.com>
Didn't really work as well as hoped (in particular it was not generally
more accurate), will solve this differently.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This code was always problematic, and with 64bit rasterization we no longer
need it at all.
Reviewed-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The target parameter to _mesa_get_tex_image() is a target enum, not an index.
When we're setting up faces for a cubemap, it should be
CUBE_MAP_POSITIVE_X .. CUBE_MAP_NEGATIVE_Z; for all other targets it
should be the same as the texobj's target.
Fixes broken cubemaps [had only +X face but claimed to have all] produced by
glTextureView, which then caused various crashes in the driver when we
tried to use them.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Brian Paul <brianp@vmware.com>
v2: - add assert so we don't run into trouble on Gen6.
- adjust for Tapani's rearrangement of ir_variable
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The GL_ARB_shadow spec says the shadow compare mode should have no
effect when sampling a color texture. As it was, it was up to
drivers to check for that (softpipe, llvmpipe, svga and probably
the rest don't do that). Note: it looks like DX10 allows shadow
compare with some non-depth formats, so this case really should be
handled in the state tracker.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Depending on the depth texture format, we may or may not have to
emit explicit fs code to do the shadow comparison. Before, we
were emitting it more often than needed.
v2: check the actual texture format rather than the screen->depth.z16
field. The screen->depth.z16, x8z24, s8z24 fields may not all be set
to a consistent set of depth formats.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Call TextureView helper function to set TextureView state
appropriately for the TexStorage calls.
Misc updates from review feedback.
Signed-off-by: Courtney Goeltzenleuchter <courtney@LunarG.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Add helper function to set texture_view state from TexStorage calls.
Include review feedback.
Signed-off-by: Courtney Goeltzenleuchter <courtney@LunarG.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Add Mesa TextureView logic.
Incorporate feedback on ARB_texture_view:
- Add S3TC VIEW_CLASSes to compatibility table
- Use existing _mesa_get_tex_image
- Clean up error strings
- Use bool instead of GLboolean for internal functions
- Split compound level & layer test into individual tests
- eliminate helper macro for VIEW_CLASS table
- do not call driver if ptr null.
Signed-off-by: Courtney Goeltzenleuchter <courtney@LunarG.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Refactor to make next_mipmap_level_size defined in mipmap.c a
_mesa_ helper function that can then be used by texture_view
Signed-off-by: Courtney Goeltzenleuchter <courtney@LunarG.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Add support for ARB_texture_view get parameters:
GL_TEXTURE_VIEW_MIN_LEVEL
GL_TEXTURE_VIEW_NUM_LEVELS
GL_TEXTURE_VIEW_MIN_LAYER
GL_TEXTURE_VIEW_NUM_LAYERS
Incorporate feedback regarding when to allow query of
GL_TEXTURE_IMMUTABLE_LEVELS.
Signed-off-by: Courtney Goeltzenleuchter <courtney@LunarG.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Add state needed by glTextureView to the gl_texture_object.
Signed-off-by: Courtney Goeltzenleuchter <courtney@LunarG.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Stub in glTextureView API call to go with the
glTextureView API xml definition.
Includes dispatch test for glTextureView
Signed-off-by: Courtney Goeltzenleuchter <courtney@LunarG.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This patch changes the error condition to satisfy below statement
from OpenGL 4.3 core specification:
"An INVALID_OPERATION error is generated if id is the name of a query
object with a target other SAMPLES_PASSED, ANY_SAMPLES_PASSED, or
ANY_SAMPLES_PASSED_CONSERVATIVE, or if id is the name of a query
currently in progress."
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
I'm not sure why this change is necessary. When I've built previous tar files
(such as 9.2.4) with the "make tarballs" target, they include the
bin/test-driver file. But at my first attempt to build the tar files for the
10.0.1 release this file was not being included and the build failed.
(cherry picked from commit d573899b93)
[The cherry pick is because I original applied this on the 10.0 branch while
working on the 10.0.1 release. But if we don't have this on master as well,
this issue will trip us up again the next time we make a new major-release
branch off of master.]
The driverPrivate pointer is opaque to the driver and we can't assume
it's a struct gl_context in dri_util.c. Instead provide a helper function
to set the struct gl_context flags from the incoming DRI context flags.
v2 (idr): Modify the other classic drivers to also use
driContextSetFlags. I ran all the piglit GLX_ARB_create_context tests
with i965 and classic swrast without regressions.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> [v1]
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Ilia Mirkin <imirkin@alum.mit.edu> [v1 on Gallium nouveau]
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
This patches add MESA_copy_sub_buffer support to the dri sw loader and
then to gallium state tracker, llvmpipe, softpipe and other bits.
It reuses the dri1 driver extension interface, and it updates the swrast
loader interface for a new putimage which can take a stride.
I've tested this with gnome-shell with a cogl hacked to reenable sub copies
for llvmpipe and the one piglit test.
I could probably split this patch up as well.
v2: pass a pipe_box, to reduce the entrypoints, as per Jose's review,
add to p_screen doc comments.
v3: finish off winsys interfaces, add swrast classic support as well.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
swrast: add support for copy_sub_buffer
Required for glClearBuffer, which only clears one colorbuffer attachment.
Example:
If the first colorbuffer is float and the second one is int:
pipe->clear(pipe, PIPE_CLEAR_COLOR0, float_clear_color, ...);
pipe->clear(pipe, PIPE_CLEAR_COLOR1, int_clear_color, ...);
This doesn't need any driver changes yet, because all drivers just use:
if (flags & PIPE_CLEAR_COLOR) ..
The drivers which support GL 3.0 will have to implement it properly though.
This adds 2 optimizations for radeonsi:
- handling of DISCARD_RANGE
- mapping an uninitialized buffer range is automatically UNSYNCHRONIZED
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Christoph Brill <egore911@gmail.com>
v2: Renamed r600_buffer.c to r600_buffer_common.c. The stupid build system
doesn't allow 2 files of the same name in different directories.
If we assume that all buffers allocated by the DDX are scanout, a new flag
that says "this is not scanout" has to be added to support the non-scanout
buffers and maintain backward compatibility.
This fixes bad rendering on Wayland.
The flag is defined as:
#define RADEON_TILING_R600_NO_SCANOUT RADEON_TILING_SWAP_16BIT
AFAIK, RADEON_TILING_SWAP_16BIT is not used on SI.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Patch copies the whole data structure at once instead of
assigning individual variables.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
This patch moves following bitfields and variables to the data
structure:
explicit_location, explicit_index, explicit_binding, has_initializer,
is_unmatched_generic_inout, location_frac, from_named_ifc_block_nonarray,
from_named_ifc_block_array, depth_layout, location, index, binding,
max_array_access, atomic
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
This patch moves following bitfields in to the data structure:
used, assigned, how_declared, mode, interpolation,
origin_upper_left, pixel_center_integer
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Data section helps serialization and cloning of a ir_variable. This
patch includes the helper bits used for read only ir_variables.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Newer virtual HW versions support smooth/stipple/wide lines.
Use that instead of 'draw' fallbacks when possible.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
With this patch llvmpipe will adhere to the ARB_depth_clamp enabled state when
clamping the fragment's zw value. To support this, the variant key now includes
the depth_clamp state. key->depth_clamp is derived from pipe_rasterizer_state's
(depth_clip == 0), thus depth clamp is only enabled when depth clip is disabled.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
On evergreen we have to reserve 1 stack element in some additional cases
besides the ones mentioned in the docs, but stack size computation was
recently reimplemented exactly as described in the docs by the patch that
added workarounds for stack issues on EG/CM, resulting in regressions
with some apps (Serious Sam 3).
This patch fixes it by restoring previous behavior.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=72369
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Tested-by: Andre Heider <a.heider@gmail.com>
Caching in the vbuf module meant that once a vertex has been
emitted it was cached, but it's possible for a vertex at the
same location to be emitted again, but this time with a different
front-face semantic. Caching was causing the first version of the
vertex to be emitted, which resulted in the renderer getting
incorrect front-face attributes. By reseting the vertex_id (which
is used for caching) we make sure that once a front-face info
has been injected the vertex will endup getting emitted.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The fact that we flush denorms to zero breaks our half-float
conversion and blending. This patches enables denorms for
blending. It's a little tricky due to the llvm bug that makes
it incorrectly reorder the mxcsr intrinsics:
http://llvm.org/bugs/show_bug.cgi?id=6393
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Zack Rusin <zackr@vmware.com>
v2: Fix up queryImage return for ATTRIB_FD
Use driver_descriptor.configuration to determine whether the driver
supports DMA-BUF import/export.
v3: Really, truly, fix up queryImage return for ATTRIB_FD
Signed-off-by: Christopher James Halse Rogers <christopher.halse.rogers@canonical.com>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
First off, nv50_program only has 16 in/out varyings. However reporting
16 makes 'm' become 68 in nv50_fp_linkage_validate with the
varying-packing-simple piglit test. (Subverting the assert makes it
compile but fail.) With this patch, varying-packing-simple passes.
See: https://bugs.freedesktop.org/show_bug.cgi?id=69155
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "9.2 10.0" <mesa-stable@lists.freedesktop.org>
To help the transition period when DRI loaders are being updated
to support the newer __driDriverExtensions_foo mechanism,
we populate __driDriverExtensions with the extensions returned
by __driDriverExtensions_foo during a library contructor
function.
We find the driver foo's name by using the dladdr function
which gives the path of the dynamic library's name that
was being loaded.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Keith Packard <keithp@keithp.com>
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
The _EGLSurface struct which is embedded into dri2_egl_surface also contains a
swap interval member so the other member is redundant. Nothing was using it as
far as I can tell.
On Gen4+, OUT_RELOC_FENCED is equivalent to OUT_RELOC; libdrm silently
ignores the fenced flag:
/* We never use HW fences for rendering on 965+ */
if (bufmgr_gem->gen >= 4)
need_fence = false;
Thanks to Eric for noticing this.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Now that loop_controls no longer creates normatively bound loops,
there is no need for ir_loop::normative_bound or the
lower_bounded_loops pass.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously, when loop_controls analyzed a loop and found that it had a
fixed bound (known at compile time), it would remove all of the loop
terminators and instead set the loop's normative_bound field to force
the loop to execute the correct number of times.
This made loop unrolling easy, but it had a serious disadvantage.
Since most GPU's don't have a native mechanism for executing a loop a
fixed number of times, in order to implement the normative bound, the
back-ends would have to synthesize a new loop induction variable. As
a result, many loops wound up having two induction variables instead
of one. This caused extra register pressure and unnecessary
instructions.
This patch modifies loop_controls so that it doesn't set the loop's
normative_bound anymore. Instead it leaves one of the terminators in
the loop (the limiting terminator), so the back-end doesn't have to go
to any extra work to ensure the loop terminates at the right time.
This complicates loop unrolling slightly: when deciding whether a loop
can be unrolled, we have to account for the presence of the limiting
terminator. And when we do unroll the loop, we have to remove the
limiting terminator first.
For an example of how this results in more efficient back end code,
consider the loop:
for (int i = 0; i < 100; i++) {
total += i;
}
Previous to this patch, on i965, this loop would compile down to this
(vec4) native code:
mov(8) g4<1>.xD 0D
mov(8) g8<1>.xD 0D
loop:
cmp.ge.f0(8) null g8<4;4,1>.xD 100D
(+f0) if(8)
break(8)
endif(8)
add(8) g5<1>.xD g5<4;4,1>.xD g4<4;4,1>.xD
add(8) g8<1>.xD g8<4;4,1>.xD 1D
add(8) g4<1>.xD g4<4;4,1>.xD 1D
while(8) loop
(notice that both g8 and g4 are loop induction variables; one is used
to terminate the loop, and the other is used to accumulate the total).
After this patch, the same loop compiles to:
mov(8) g4<1>.xD 0D
loop:
cmp.ge.f0(8) null g4<4;4,1>.xD 100D
(+f0) if(8)
break(8)
endif(8)
add(8) g5<1>.xD g5<4;4,1>.xD g4<4;4,1>.xD
add(8) g4<1>.xD g4<4;4,1>.xD 1D
while(8) loop
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This value is now redundant with
loop_variable_state::limiting_terminator->iterations and
ir_loop::normative_bound.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The old logic of loop_unroll_visitor::visit_leave(ir_loop *) was:
heuristics to skip unrolling in various circumstances;
if (loop contains more than one jump)
return;
else if (loop contains one jump) {
if (the jump is an unconditional "break" at the end of the loop) {
remove the break and set iteration count to 1;
fall through to simple loop unrolling code;
} else {
for (each "if" statement in the loop body)
see if the jump is a "break" at the end of one of its forks;
if (the "break" wasn't found)
return;
splice the remainder of the loop into the other fork of the "if";
remove the "break";
complex loop unrolling code;
return;
}
}
simple loop unrolling code;
return;
These tasks have been moved to their own functions:
- splice the remainder of the loop into the other fork of the "if"
- simple loop unrolling code
- complex loop unrolling code
And the logic has been flattened to:
heuristics to skip unrolling in various circumstances;
if (loop contains more than one jump)
return;
if (loop contains no jumps) {
simple loop unroll;
return;
}
if (the jump is an unconditional "break" at the end of the loop) {
remove the break;
simple loop unroll with iteration count of 1;
return;
}
for (each "if" statement in the loop body) {
if (the jump is a "break" at the end of one of its forks) {
splice the remainder of the loop into the other fork of the "if";
remove the "break";
complex loop unroll;
return;
}
}
This will make it easier to modify the loop unrolling algorithm in a
future patch.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously, the sole responsibility of loop_analysis was to find all
the variables referenced in the loop that are either loop constant or
induction variables, and find all of the simple if statements that
might terminate the loop. The remainder of the analysis necessary to
determine how many times a loop executed was performed by
loop_controls.
This patch makes loop_analysis also responsible for determining the
number of iterations after which each loop terminator will terminate
the loop, and for figuring out which terminator will terminate the
loop first (I'm calling this the "limiting terminator").
This will allow loop unrolling to make use of information that was
previously only visible from loop_controls, namely the identity of the
limiting terminator.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Patches to follow will introduce code into the loop_terminator
constructor. Allocating loop_terminator using new(mem_ctx) syntax
will ensure that the constructor runs.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
When loop_control_visitor::visit_leave(ir_loop *) is analyzing a loop
terminator that acts on a certain ir_variable, it doesn't need to walk
the list of induction variables to find the loop_variable entry
corresponding to the variable. It can just look it up in the
loop_variable_state hashtable and verify that the loop_variable entry
represents an induction variable.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
These fields were part of some planned optimizations that never
materialized. Remove them for now to simplify things; if we ever get
round to adding the optimizations that would require them, we can
always re-introduce them.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This patch replaces the ir_loop fields "from", "to", "increment",
"counter", and "cmp" with a single integer ("normative_bound") that
serves the same purpose.
I've used the name "normative_bound" to emphasize the fact that the
back-end is required to emit code to prevent the loop from running
more than normative_bound times. (By contrast, an "informative" bound
would be a bound that is informational only).
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously, all of the back-ends (ir_to_mesa, st_glsl_to_tgsi, and the
i965 fs and vec4 visitors) had nearly identical logic for handling
bounded loops. This replaces the duplicate logic with an equivalent
lowering pass that is used by all the back-ends.
Note: on i965, there is a slight increase in instruction count. For
example, a loop like this:
for (int i = 0; i < 100; i++) {
total += i;
}
would previously compile down to this (vec4) native code:
mov(8) g4<1>.xD 0D
mov(8) g8<1>.xD 0D
loop:
cmp.ge.f0(8) null g8<4;4,1>.xD 100D
(+f0) break(8)
add(8) g5<1>.xD g5<4;4,1>.xD g4<4;4,1>.xD
add(8) g8<1>.xD g8<4;4,1>.xD 1D
add(8) g4<1>.xD g4<4;4,1>.xD 1D
while(8) loop
After this patch, the "(+f0) break(8)" turns into:
(+f0) if(8)
break(8)
endif(8)
because the back-end isn't smart enough to recognize that "if
(condition) break;" can be done using a conditional break instruction.
However, it should be relatively easy for a future peephole
optimization to properly optimize this.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously, loop analysis would set
this->conditional_or_nested_assignment based on the most recently
visited assignment to the variable. As a result, if a vaiable was
assigned to more than once in a loop, the flag might be set
incorrectly. For example, in a loop like this:
int x;
for (int i = 0; i < 3; i++) {
if (i == 0)
x = 10;
...
x = 20;
...
}
loop analysis would have incorrectly concluded that all assignments to
x were unconditional.
In practice this was a benign bug, because
conditional_or_nested_assignment is only used to disqualify variables
from being considered as loop induction variables or loop constant
variables, and having multiple assignments also disqualifies a
variable from being considered as either of those things.
Still, we should get the analysis correct to avoid future confusion.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously, when visiting an ir_call, loop analysis would only mark
the innermost enclosing loop as containing a call. As a result, when
encountering a loop like this:
for (i = 0; i < 3; i++) {
for (int j = 0; j < 3; j++) {
foo();
}
}
it would incorrectly conclude that the outer loop ran three times.
(This is not certain; if foo() modifies i, then the outer loop might
run more or fewer times).
Fixes piglit test "vs-call-in-nested-loop.shader_test".
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously, when visiting a variable dereference, loop analysis would
only consider its effect on the innermost enclosing loop. As a
result, when encountering a loop like this:
for (int i = 0; i < 3; i++) {
for (int j = 0; j < 3; j++) {
...
i = 2;
}
}
it would incorrectly conclude that the outer loop ran three times.
Fixes piglit test "vs-inner-loop-modifies-outer-loop-var.shader_test".
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Fast color clears of MSAA buffers work just like fast color clears
with non-MSAA buffers, except that the alignment and scaledown
requirements are different.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This will make it easier to add fast color clear support to MSAA
buffers, since they have different alignment and scaling requirements.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, we didn't do multisample blorp clears because we couldn't
figure out how to get them to work. The reason for this was because
we weren't setting the brw_blorp_params num_samples field consistently
with dst.num_samples. Now that those two fields have been collapsed
down into one, we can do multisample blorp clears.
However, we need to do a few other pieces of bookkeeping to make them
work correctly in all circumstances:
- Since blorp clears may now operate on multisampled window system
framebuffers, they need to call
intel_renderbuffer_set_needs_downsample() to ensure that a
downsample happens before buffer swap (or glReadPixels()).
- When clearing a layered multisample buffer attachment using UMS or
CMS layout, we need to advance layer by multiples of num_samples
(since each logical layer is associated with num_samples physical
layers).
Note: we still don't do multisample fast color clears; more work needs
to be done to enable those.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, brw_blorp_params contained two fields for determining
sample count: num_samples (which determined the multisample
configuration of the rendering pipeline) and dst.num_samples (which
determined the multisample configuration of the render target
surface). This was redundant, since both fields had to be set to the
same value to avoid rendering errors.
This patch eliminates num_samples to avoid future confusion.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch renames the enum that's used to keep track of fast clear
state from "mcs_state" to "fast_clear_state", and it removes the enum
value INTEL_MCS_STATE_MSAA (which previously meant, "this is an MSAA
buffer, so we're not keeping track of fast clear state"). The only
real purpose that enum value was serving was to prevent us from trying
to do fast clear resolves on MSAA buffers, and it's just as easy to
prevent that by checking the buffer's msaa_layout.
This paves the way for implementing fast clears of MSAA buffers.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The "layer" parameters used in blorp, and the
intel_renderbuffer::mt_layer field, represent a physical layer rather
than a logical layer. This is important for 2D multisample arrays on
Gen7+ because the UMS and CMS multisample layouts use N physical
layers to represent each logical layer, where N is the number of
samples.
Also add an assertion to blorp to help catch bugs if we fail to follow
these conventions.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Clarify the fact that we only optimize full buffer clears using fast
color clear, and why.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Create the ref_bo without any storage type flags set for now. The issue
probably arises from our use of the additional buffer space at the end
of the ref_bo. It should probably be split up in the future.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Tested-by: Martin Peres <martin.peres@labri.fr>
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
With this patch, generate_fs_loop will clamp any fragment shader depth writes
to the viewport's min and max depth values. Viewport selection is determined
by the geometry shader output for the viewport array index. If no index is
specified, then the default viewport index is zero. Semantics for this path
can be found in draw_clamp_viewport_idx and lp_clamp_viewport_idx.
lp_jit_viewport was created to store viewport information visible to JIT code,
and is validated when the LP_NEW_VIEWPORT dirty flag is set.
lp_rast_shader_inputs is responsible for passing the viewport_index through
the rasterizer stage to fragment stage (via lp_jit_thread_data).
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
The Wayland EGL platform now respects the eglSwapInterval value. The value is
clamped to either 0 or 1 because it is difficult (and probably not useful) to
sync to more than 1 redraw.
The main change is that if the swap interval is 0 then Mesa won't install a
frame callback so that eglSwapBuffers can be executed as often as necessary.
Instead it will do a sync request after the swap buffers. It will block for
sync complete event in get_back_bo instead of the frame callback. The
compositor is likely to send a release event while processing the new buffer
attach and this makes sure we will receive that before deciding whether to
allocate a new buffer.
If there are no buffers available then instead of returning with an error,
get_back_bo will now poll the compositor by repeatedly sending sync requests
every 10ms. This is a last resort and in theory this shouldn't happen because
there should be no reason for the compositor to hold on to more than three
buffers. That means whenever we attach the fourth buffer we should always get
an immediate release event which should come in with the notification for the
first sync request that we are throttled to.
When the compositor is directly scanning out from the application's buffer it
may end up holding on to three buffers. These are the one that is is currently
scanning out from, one that has been given to DRM as the next buffer to flip
to, and one that has been attached and will be given to DRM as soon as the
previous flip completes. When we attach a fourth buffer to the compositor it
should replace that third buffer so we should get a release event immediately
after that. This patch therefore also changes the number of buffer slots to 4
so that we can accomodate that situation.
If DRM eventually gets a way to cancel a pending page flip then the compositors
can be changed to only need to hold on to two buffers and this value can be
put back to 3.
This also moves the vblank configuration defines from platform_x11.c to the
common egl_dri2.h header so they can be shared by both platforms.
Consider a typical game-style main loop which might be like this:
while (1) {
draw_something();
eglSwapBuffers();
}
In this case the game is relying on eglSwapBuffers to throttle to a sensible
frame rate. Previously this game would end up using three buffers even though
it should only need two. This is because Mesa decides whether to allocate a
new buffer in get_back_bo which would be before it has tried to read any
events from the compositor so it wouldn't have seen any buffer release events
yet.
This patch just moves the block for the frame callback to get_back_bo.
Typically the compositor will send a release event immediately after one of
the attaches so if we block for the frame callback here then we can be sure to
have completed at least one roundtrip and received that release event after
attaching the previous buffer before deciding whether to allocate a new one.
dri2_swap_buffers always calls get_back_bo so even if the client doesn't
render anything we will still be sure to block to the frame callback. The code
to create the new frame callback has been moved to after this call so that we
can be sure to have cleared the previous frame callback before requesting a
new one.
This patch fixes this MinGW build error.
CC glapi_gentable.lo
glapi_gentable.c:47:19: fatal error: dlfcn.h: No such file or directory
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Drivers will need to look at this to decide if they need to do
per-sample fragment shader dispatch.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
My understanding is that Broadwell retains the same SCS mechanism
that Haswell has, so even if the underlying issue with this format
is not fixed, the w/a will be applied in SCS rather than needing
shader code.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now that all the pieces are in place, this should provide
a nice performance boost for apps using multisample textures.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We need to emit extra shader code in this case to sample the
MCS surface first; we can't just blindly do this all the time
since IVB will sometimes try to access the MCS surface even if
disabled.
V3: Use actual MSAA layout from the texture's mt, rather
then computing what would have been used based on the format.
This is simpler and less fragile - there's at least one case where
we might want to have a texture's MSAA layout change based on what
the app does (CMS SINT falling back to UMS if the app ever attempts
to render to it with a channel disabled.)
This also obsoletes V2's 1/10 -- compute_msaa_layout can now remain
an implementation detail of the miptree code.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
This gives us correct behavior for both renderbuffers (which previously
worked) and multisample textures (which would never get an MCS surface
allocated, even if CMS layout was selected)
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The bspec says:
"SW must program the sample mask value in this field so that it matches
with 3DSTATE_SAMPLE_MASK"
I haven't observed this to actually fix anything, but stumbled across it
while adding the rest of the support for CMS layout for multisample
textures.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
For some reason this was left out when the version was changed...
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
Before this patch, the following code would not be optimized even though
the first two instructions were common to the then and else blocks:
(+f0) IF
MOV dst0 ...
MOV dst1 ...
MOV dst2 ...
ELSE
MOV dst0 ...
MOV dst1 ...
MOV dst3 ...
ENDIF
This commit extends the peephole to handle this case.
No shader-db changes.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
fs_visitor::try_replace_with_sel optimizes only if statements whose
"then" and "else" bodies contain a single MOV instruction. It also
could not handle constant arguments, since they cause an extra MOV
immediate to be generated (since we haven't run constant propagation,
there are more than the single MOV).
This peephole fixes both of these and operates as a normal optimization
pass.
fs_visitor::try_replace_with_sel is still arguably necessary, since it
runs before pull constant loads are lowered.
total instructions in shared programs: 1559129 -> 1545833 (-0.85%)
instructions in affected programs: 167120 -> 153824 (-7.96%)
GAINED: 13
LOST: 6
Reviewed-by: Paul Berry <stereotype441@gmail.com>
This is the last thing that register_coalesce() still handled.
total instructions in shared programs: 1561060 -> 1560908 (-0.01%)
instructions in affected programs: 15758 -> 15606 (-0.96%)
Reviewed-by: Eric Anholt <eric@anholt.net>
parent_mem_ctx was unused since db47074a, so remove the two wrappers
around create() and make create() the constructor.
Reviewed-by: Eric Anholt <eric@anholt.net>
make_list is just a one-line wrapper and was confusingly called by
NULL objects. E.g., cur_if == NULL; cur_if->make_list(mem_ctx).
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously we made the basic block following an ENDIF instruction a
successor of the basic blocks ending with IF and ELSE. The PRM says that
IF and ELSE instructions jump *to* the ENDIF, rather than over it.
This should be immaterial to dataflow analysis, except for if, break,
endif sequences:
START B1 <-B0 <-B9
0x00000100: cmp.g.f0(8) null g15<8,8,1>F g4<0,1,0>F
0x00000110: (+f0) if(8) 0 0 null 0x00000000UD
END B1 ->B2 ->B4
START B2 <-B1
break
0x00000120: break(8) 0 0 null 0D
END B2 ->B10
START B3
0x00000130: endif(8) 2 null 0x00000002UD
END B3 ->B4
The ENDIF block would have no parents, so dataflow analysis would
generate incorrect results, preventing copy propagation from eliminating
some instructions.
This patch changes the CFG to make ENDIF start rather than end basic
blocks, so that it can be the jump target of the IF and ELSE
instructions.
It helps three programs (including two fs8/fs16 pairs).
total instructions in shared programs: 1561126 -> 1561060 (-0.00%)
instructions in affected programs: 837 -> 771 (-7.89%)
More importantly, it allows copy propagation to handle more cases.
Disabling the register_coalesce() pass before this patch hurts 58
programs, while afterward it only hurts 11 programs.
Reviewed-by: Eric Anholt <eric@anholt.net>
This extension enabled the use of texture array with fixed-function and
assembly fragment shaders. No applications are known to use this
extension.
NOTE: This patch regresses GL_TEXTURE_1D_ARRAY and GL_TEXTURE_2D_ARRAY
cases of the copyteximage piglit test. The test is incorrectly using
texture arrays with fixed function while only requiring the
GL_EXT_texture_array extension. A fix for the test has been posted to
the piglit mailing list.
http://lists.freedesktop.org/archives/piglit/2013-November/008639.html
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Every driver that enables one also enables the other. The difference
between the two is MESA adds support for fixed-function and assembly
fragment shaders, but EXT only adds support for GLSL. The MESA
extension was created back when Mesa did not support GLSL.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
main/texobj.c: In function 'count_tex_size':
main/texobj.c:886:23: warning: unused parameter 'key' [-Wunused-parameter]
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
main/texobj.c: In function '_mesa_test_texobj_completeness':
main/texobj.c:553:34: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
main/texobj.c:553:193: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
main/texobj.c:553:254: warning: signed and unsigned type in conditional expression [-Wsign-compare]
main/texobj.c:553:148: warning: signed and unsigned type in conditional expression [-Wsign-compare]
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
That enum requires GL_ARB_texture_cube_map_array, and it is only
available on desktop GL. It looks like this has been an un-noticed
issue since GL_ARB_texture_cube_map_array support was added in commit
e0e7e295.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This adds an extension called EGL_WL_create_wayland_buffer_from_image
which adds the following single function:
struct wl_buffer *
eglCreateWaylandBufferFromImageWL(EGLDisplay dpy, EGLImageKHR image);
The function creates a wl_buffer which shares its contents with the given
EGLImage. The expected use case for this is in a nested Wayland compositor
which is using subsurfaces to present buffers from its clients. Using this
extension it can attach the client buffers directly to the subsurface without
having to blit the contents into an intermediate buffer. The compositing can
then be done in the parent compositor.
The extension is only implemented in the Wayland EGL platform because of
course it wouldn't make sense anywhere else.
If we're not using EGL_EXT_swap_buffers_with_damage, we have to
damage the full extent. EGL operates on buffer coordinates, but
wl_surface.damage takes surface coordinates. EGL doesn't know the
buffer transformation (rotated or scaled) and can't post accurate
damage in surface coordinates. The damage event however is clipped to
the surface extents so we can just damage the maximum rectangle.
In case of EGL_EXT_swap_buffers_with_damage, the application knows
the buffer transform and is expected to pass in rectangles in
surface space.
https://bugs.freedesktop.org/show_bug.cgi?id=70250
Cc: "10.0" mesa-stable@lists.freedesktop.org
flush_with_flags, when available, allows the driver to throttle.
Using this suppress input lag issues that can be observed in heavy
rendering situations on non-intel cards.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.0" mesa-stable@lists.freedesktop.org
This typically won't make a difference, since we only send the requests at
wl_display_flush() time. There might be a small race
with another thread calling wl_display_flush() after our commit request,
but before we flush the DRI driver. Moving the commit below the DRI
driver flush call looks more natural and eliminates the small race.
Cc: "10.0" mesa-stable@lists.freedesktop.org
We would like the compositor to receive the commited buffer
as soon as possible, so it has the time to treat it, and
release old ones. We shouldn't rely on the client
to flush the queue for us.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.0" mesa-stable@lists.freedesktop.org
Display lists allocate memory in chunks of 256 tokens (1KB) at a time.
If an app creates many short display lists or uses glXUseXFont() this
can waste quite a bit of memory.
This patch uses realloc() to trim short lists and reduce the memory
used.
Also, null/zero-out some list construction fields in _mesa_EndList().
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Now, sizeof(gl_dlist_node)==4 even on 64-bit systems. This can
halve the memory used by some display lists on 64-bit systems.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This is a first step in reducing memory used by display lists on
64-bit systems. On 64-bit systems, the gl_dlist_node union type
is 8 bytes because of the 'data' and 'next' fields. This causes
every display list node/token to occupy 8 bytes instead of 4 as
originally designed. This basically doubles the memory used by
some display lists on 64-bit systems.
The fix is to remove the 64-bit 'data' and 'next' pointer fields
from the union and instead store them as a pair of 32-bit values.
Easily done with a few helper functions.
The next patch will take care of the 'next' field.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This fixes a memory leak in some situations. Also avoids emitting an
extra fence if the kick handler does the call to nouveau_fence_next
itself.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "9.2 10.0" <mesa-stable@lists.freedesktop.org>
So that it acts like ordinary free(). This lets us remove a bunch of
if statements where the function is called.
v2:
- Avoiding compile error on MSVC and possible warnings on other compilers.
- Added comment regards passing NULL pointer being safe.
Reviewed-by: Brian Paul <brianp@vmware.com>
I missed this in the boolean -> enum conversion. C cheerfully casts
false -> 0 -> UNKNOWN_RING. On Gen4-5, this causes the render ring
prelude hook to get called in the middle of the batch, which is crazy.
BRW_BATCH_STRUCT is not used on Gen6+.
Fixes regressions since 395a32717d
("i965: Introduce an UNKNOWN_RING state.").
Fixes "fips -v glxgears" on Ironlake.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This reverts commit a4bf7f6b6e.
It breaks occlusion queries on Gen4-5. Doing this right will likely
require larger changes, which should be done at a future date.
Some Piglit tests still passed due to other bugs; fixing those revealed
this problem.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
I guarded half of the callers to start/stop_oa_counters with generation
checks, but missed the other half (which were added later). OACONTROL
doesn't exist on Ironlake, so we better not write it. Also, there's no
need---Ironlake's performance counters are always running.
This patch moves the generation checks into start/stop_oa_counters,
rather than requiring the caller to do them.
Fixes assertion failures in Piglit's AMD_performance_monitor/measure.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
One can select if they want to fallback to softpipe.
Current approach makes this not possible, whereas other
targets (dri-swrast) handle this approapriately.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
The BSpec states that the aligment for the non-msrt clear rectangle must
be doubled; the BSpec does not restricit the workaround to specific
hardware.
Commit 9a1a67b applied the workaround to Haswell GT3. Commit 8b659ce
expanded the workaround to all Haswell variants. This commit expands it
to all hardware.
No Piglit regressions on Ivybridge 0x0166. No fixes either.
I know no Ivybridge nor Baytrail bug related to this workaround.
However, the BSpec says the extra alignment is required, so let's do it.
v2: Apply to all hardware, not just gen7.
CC: "9.2, 10.0" <mesa-stable@lists.freedesktop.org>
CC: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
All bound layers (from first_layer to last_layer) should be cleared.
This uses a vertex shader which outputs gl_Layer = gl_InstanceID, so each
instance goes to a different layer. By rendering a quad and setting
the instance count to the number of layers, it will trivially clear all
layers.
This requires AMD_vertex_shader_layer (or PIPE_CAP_TGSI_VS_LAYER), which only
radeonsi supports at the moment. r600 could do this too. Standard DX11
hardware will have to use a geometry shader though, which has higher overhead.
This is a subset of geometry shaders. It's all about setting first_layer and
last_layer correctly.
Also some code between st_render_texture and update_framebuffer_state is
consolidated. It doesn't use rtt_level and derives the level from dimensions
instead as the code in st_atom_framebuffer.c did.
Commit a594cec broke EGL X11 backend by adding dependency between
X11 and DRM backends requiring HAVE_EGL_PLATFORM_DRM defined for X11.
This patch fixes the issue by adding additional define for libdrm
detection independent of which backend is being compiled. Tested by
compiling Mesa with '--with-egl-platforms=x11' and running es2gears_x11
+ glbenchmark2.7 successfully.
v2: return true for dri2_auth if running without libdrm (Samuel)
v3: check libdrm when building EGL drm platform + AM_CFLAGS fix (Emil)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=72062
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Cc: Samuel Thibault <samuel.thibault@ens-lyon.org>
Cc: mesa-stable@lists.freedesktop.org
MI_STORE_REGISTER_MEM has to take a 48-bit address, so the existing code
doesn't work. But supposedly Broadwell has a register whitelist and
just works out of the box anyway, so there's no need to check.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The Gen7 sampler state code still works. Increasing the alignment to
64 bytes makes bit 5 zero, which is good because it's now reserved.
Since we don't use the new filter bits, we can leave those as zero too,
which means we don't need to update the code to update the pointer.
(We probably should anyway, for clarity, but alas, another day.)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The documentation is really hard to follow, but apparently a 32-bit x
32-bit multiply just works without the MACH macro. The macro apparently
is only necessary to get the full 64-bit value.
Fixes Piglit tests [vf]s-op-mult-int-int.shader_test.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Like Haswell, we do this in SURFACE_STATE rather than shader
workarounds.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Broadwell doesn't support a surface vertical alignment of 2. It only
supports VALIGN_4, VALIGN_8, or VALIGN_16. I chose 4 since it's the
least wasteful.
v2: Replace my comment with a better one from Eric. Move Broadwell
checks earlier so it's more obvious that "return 2" won't be hit.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
We need to SEND from a GRF, and we can only obtain those prior to
register allocation.
This allows us to do pull constant loads without the MRF hack.
v2: Reword comments (suggested by Paul).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
SEND can't deal with swizzles, source modifiers, and so on. This should
avoid problems with VS pull constant loads on Broadwell.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Pre-patch, the workaround was applied to only HSW GT3. However, the
workaround also fixes render corruption on the HSW GT1 Chromebook,
codenamed Falco.
Also, update the BSpec quote that discusses the workaround to reflect
the latest BSpec.
The BSpec states that the workaround is required for Ivybridge and
Baytrail as well as Haswell. But, we apply the workaround to only
Haswell because (a) we suspect that is the only hardware where it is
actually required and (b) we haven't yet validated the workaround for
the other hardware.
CC: "9.2, 10.0" <mesa-stable@lists.freedesktop.org>
CC: Anuj Phogat <anuj.phogat@gmail.com>
OTC-Tracker: CHRMOS-812
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Previously, we stored an array of up to 16 additional shaders to link,
as well as a count of how many each shader actually needed.
Since the built-in functions rewrite, all the built-ins are stored in a
single shader. So all we need is a boolean indicating whether a shader
needs to link against built-ins or not.
During linking, we can avoid creating the temporary array if none of the
shaders being linked need built-ins. Otherwise, it's simply a copy of
the array that has one additional element. This is much simpler.
This patch saves approximately 128 bytes of memory per gl_shader object.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Since the built-in functions rewrite, num_builtins_to_link is always either
0 or 1, so we don't need tho crazy loop starting at -1 with a special
case.
All we need to do is print the prototypes from the current shader, and
the single built-in function shader (if present).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously, when we hit a "no matching function" error, it looked like:
0:0(0): error: no matching function for call to `cos()'
0:0(0): error: candidates are: float cos(float)
0:0(0): error: vec2 cos(vec2)
0:0(0): error: vec3 cos(vec3)
0:0(0): error: vec4 cos(vec4)
Now it looks like:
0:0(0): error: no matching function for call to `cos()'; candidates are:
0:0(0): error: float cos(float)
0:0(0): error: vec2 cos(vec2)
0:0(0): error: vec3 cos(vec3)
0:0(0): error: vec4 cos(vec4)
This is not really any worse and removes the need for the prefix variable.
It will also help with the next commit's refactoring.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
There's no need to loop through the "parameters" list and remove every
element; move_nodes_to(¶meters) already throws away all elements of
the destination list.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
make[5]: Entering directory `/jhbuild/build/mesa/mesa/src/mapi/glapi/tests'
CXX check_table.o
/jhbuild/checkout/mesa/mesa/src/mapi/glapi/tests/check_table.cpp:29:30: fatal error: glapi/glapitable.h: No such file or directory
We should look for the generated file glapi/glapitable.h in builddir, not srcdir
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
On gen6, multisamble resolve blits use the SAMPLE message to blend
together the 4 samples for each texel. For some reason, SAMPLE
doesn't blend together the proper samples when the source format is
L32_FLOAT or I32_FLOAT, resulting in blocky artifacts.
To work around this problem, sample from the source surface using
R32_FLOAT. This shouldn't affect rendering correctness, because when
doing these resolve blits, the destination format is R32_FLOAT, so the
channel replication done by L32_FLOAT and I32_FLOAT is unnecessary.
Fixes piglit tests on Sandy Bridge:
- spec/ARB_texture_float/multisample-formats 2 GL_ARB_texture_float
- spec/ARB_texture_float/multisample-formats 4 GL_ARB_texture_float
No piglit regressions on Sandy Bridge.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70601
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The compiler back-ends (i965's fs_visitor and brw_visitor,
ir_to_mesa_visitor, and glsl_to_tgsi_visitor) have been assuming this
for some time. Thanks to the preceding patch, the compiler front-end
no longer breaks this assumption.
This patch adds code to validate the assumption so that if we have
future bugs, we'll be able to catch them earlier.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The compiler back-ends (i965's fs_visitor and brw_visitor,
ir_to_mesa_visitor, and glsl_to_tgsi_visitor) assume that when
ir_loop::counter is non-null, it points to a fresh ir_variable that
should be used as the loop counter (as opposed to an ir_variable that
exists elsewhere in the instruction stream).
However, previous to this patch:
(1) loop_control_visitor did not create a new variable for
ir_loop::counter; instead it re-used the existing ir_variable.
This caused the loop counter to be double-incremented (once
explicitly by the body of the loop, and once implicitly by
ir_loop::increment).
(2) ir_clone did not clone ir_loop::counter properly, resulting in the
cloned ir_loop pointing to the source ir_loop's counter.
(3) ir_hierarchical_visitor did not visit ir_loop::counter, resulting
in the ir_variable being missed by reparenting.
Additionally, most optimization passes (e.g. loop unrolling) assume
that the variable mentioned by ir_loop::counter is not accessed in the
body of the loop (an assumption which (1) violates).
The combination of these factors caused a perfect storm in which the
code worked properly nearly all of the time: for loops that got
unrolled, (1) would introduce a double-increment, but loop unrolling
would fail to notice it (since it assumes that ir_loop::counter is not
accessed in the body of the loop), so it would unroll the loop the
correct number of times. For loops that didn't get unrolled, (1)
would introduce a double-increment, but then later when the IR was
cloned for linking, (2) would prevent the loop counter from being
cloned properly, so it would look to further analysis stages like an
independent variable (and hence the double-increment would stop
occurring). At the end of linking, (3) would prevent the loop counter
from being reparented, so it would still belong to the shader object
rather than the linked program object. Provided that the client
program didn't delete the shader object, the memory would never get
reclaimed, and so the shader would function properly.
However, for loops that didn't get unrolled, if the client program did
delete the shader object, and the memory belonging to the loop counter
got re-used, this could cause a use-after-free bug, leading to a
crash.
This patch fixes loop_control_visitor, ir_clone, and
ir_hierarchical_visitor to treat ir_loop::counter the same way the
back-ends treat it: as a freshly allocated ir_variable that needs to
be visited and cloned independently of other ir_variables.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=72026
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If an ir_loop has a non-null "counter" field, the variable referred to
by this field is implicitly read and written by the loop. We need to
account for this in ir_variable_refcount, otherwise there is a danger
we will try to dead-code-eliminate the loop counter variable.
Note: at the moment the dead code elimination bug doesn't occur due to
a bug in ir_hierarchical_visitor: it doesn't visit the "counter"
field, so dead code elimination doesn't treat it as a candidate for
elimination. But the patch to follow will fix that bug, so we need to
fix ir_variable_refcount first in order to avoid breaking dead code
elimination.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The flags value is a bitfield so use the union's 'bf' field, not 'e'
(enum) field. There's no actual change in behavior here since both
fields of the union are the same size.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Trying to compile any of these functions into a display list
now just generates a GL_INVALID_OPERATION error.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Now that it is possible to query drivers for the max sampler view it should
be safe to increase this without crashing.
Not entirely convinced this really works correctly though if state trackers
using non-linked sampler / sampler_views use this.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Ever since introducing separate sampler and sampler view max this was really
missing.
Every driver but llvmpipe reports the same number as number of samplers for
now, so nothing should break.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This adds support for this to more drivers, in particular for all the "special"
ones useful for debugging.
HW drivers are left alone, some should be able to support it if they want but
they may not be interested at this point.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Only allow __DRI_CTX_FLAG_ROBUST_BUFFER_ACCESS in brwCreateContext if
intelInitScreen2 also enabled __DRI2_ROBUSTNESS (thereby enabling
GLX_ARB_create_context).
This fixes a regression in the piglit test
"glx/GLX_ARB_create_context/invalid flag"
v2: Remove commented debug code. Noticed by Jordan.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reported-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
This patch fixes this build error with Oracle Solaris Studio.
libtool: link: /opt/solarisstudio12.3/bin/cc -g -o glcpp/glcpp glcpp.o prog_hash_table.o ./.libs/libglcpp.a
Undefined first referenced
symbol in file
sqrt prog_hash_table.o
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
In brw_update_renderbuffer_surfaces(), if there are no color draw
buffers, we always set up a null render target at surface index 0 so we
have something to use with the FB write marking the end of thread.
However, when we recently began computing surface indexes dynamically,
we failed to reserve space for it. This meant that the first texture
would be assigned surface index 0, and our closing FB write would
clobber the texture.
Fixes Piglit's EXT_packed_depth_stencil/fbo-blit-d24s8 test on Gen4-5,
which regressed as of commit 4e5306453d
("i965/fs: Dynamically set up the WM binding table offsets.")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70605
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Tested-by: lu hua <huax.lu@intel.com>
Cc: "10.0" mesa-stable@lists.freedesktop.org
Now that we have dynamic binding tables there's no good reason anymore
to expose so few atomic counter buffers. Increase it to 16.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If reparent_ir() is called on invalid IR, then there's a danger that
it will fail to reparent all of the necessary nodes. For example, if
the IR contains an ir_dereference_variable which refers to an
ir_variable that's not in the tree, that ir_variable won't get
reparented, resulting in subtle use-after-free bugs once the
non-reparented nodes are freed. (This is exactly what happened in the
bug fixed by the previous commit).
This patch makes this kind of bug far easier to track down, by
transforming it from a use-after-free bug into an explicit IR
validation error.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
In commit 065da16 (glsl: Convert lower_clip_distance_visitor to be an
ir_rvalue_visitor), we failed to notice that since
lower_clip_distance_visitor overrides visit_leave(ir_assignment *),
ir_rvalue_visitor::visit_leave(ir_assignment *) wasn't getting called.
As a result, clip distance dereferences appearing directly on the
right hand side of an assignment (not in a subexpression) weren't
getting properly lowered. This caused an ir_dereference_variable node
to be left in the IR that referred to the old gl_ClipDistance
variable. However, since the lowering pass replaces gl_ClipDistance
with gl_ClipDistanceMESA, this turned into a dangling pointer when the
IR got reparented.
Prior to the introduction of geometry shaders, this bug was unlikely
to arise, because (a) reading from gl_ClipDistance[i] in the fragment
shader was rare, and (b) when it happened, it was likely that it would
either appear in a subexpression, or be hoisted into a subexpression
by tree grafting.
However, in a geometry shader, we're likely to see a statement like
this, which would trigger the bug:
gl_ClipDistance[i] = gl_in[j].gl_ClipDistance[i];
This patch causes
lower_clip_distance_visitor::visit_leave(ir_assignment *) to call the
base class visitor, so that the right hand side of the assignment is
properly lowered.
Fixes piglit test:
- spec/glsl-1.50/execution/geometry/clip-distance-itemized-copy
Cc: Ian Romanick <idr@freedesktop.org>
Cc: "9.2" <mesa-stable@lists.freedesktop.org>
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The previous commit fixes a bug wherein we would incorrectly refer to
stale geometry shader prog_data when no geometry shader was active.
This patch reduces the likelihood of that sort of bug occurring in the
future by setting prog_data to NULL whenever there is no GS program.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, in brw_gs_upload_binding_table(), we checked whether
brw->gs.prog_data was NULL in order to determine whether a geometry
shader was active. This didn't work: brw->gs.prog_data starts off as
NULL, but it is set to non-NULL when a geometry shader program is
built, and then never set to NULL again. As a result, if we called
brw_gs_upload_binding_table() while there was no geometry shader
active, but a geometry shader had previously been active, it would
refer to a stale (and possibly freed) prog_data structure.
This patch fixes the problem by modifying
brw_gs_upload_binding_table() to use the proper technique to determine
whether a geometry shader is active: by checking whether
brw->geometry_program is NULL.
This fixes the crash reported in comment 2 of bug 71870 (the incorrect
rendering remains, however).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=71870
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Rather than always advertising the extension but failing to create a
context with reset notifiction, just don't advertise it. I don't know
why it didn't occur to me to do it this way in the first place.
NOTE: Kristian requested that I provide a follow-up for master that
dynamically generates the list of DRI extensions instead of selected
between two hardcoded lists.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Suggested-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Replace all occurences of the macro with its expansion.
It seems that the macro intended to provide cross-platform static mutex
intialization. However, it had the same definition in all pre-processor
paths:
#define _EGL_DECLARE_MUTEX(m) _EGLMutex m = _EGL_MUTEX_INITIALIZER
Therefore this abstraction obscured rather than helped.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Insert two fields into _egl_global to hold the client extensions and
statically initialize them:
ClientExtensions // a struct of bools
ClientExtensionString
Post-patch, Mesa supports exactly one client extension,
EGL_EXT_client_extensions.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The fast tiled texture upload code does not compile with GCC 4.8's -Og
optimization flag.
memcpy() has the always_inline attribute set. This poses a problem,
since {x,y}tile_copy_faster calls it indirectly via {x,y}tile_copy,
and {x,y}tile_copy normally aren't inlined at -Og.
Using __attribute__((flatten)) tells GCC to inline every function call
inside the function, which I believe was the author's intent.
Fix suggested by Alexander Monakov.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Cc: mesa-stable@lists.freedesktop.org
8 bit precision is required by d3d10 but unfortunately
requires 64 bit rasterizer. This commit implements
64 bit rasterization with full support for 8bit subpixel
precision. It's a combination of all individual commits
from the llvmpipe-rast-64 branch.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
.. and mark them off on the extensions list as done.
V2: Enable only if pipelined register writes work.
V3: Also update relnotes
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Just prior to emitting the 3DPRIMITIVE command, we load each of the
indirect registers. The values loaded are either from offsets into the
current indirect BO, or constant zero if the parameter is not used for
this draw.
Enabling use of the indirect registers is done by turning on a bit in
the first dword of the 3DPRIMITIVE command itself.
V3: - Deduplicate the common part of both indexed and nonindexed indirect
setup.
- Just refer to the indirect bo out of the context directly.
V4: - Fix bo reference to specify the range we care about.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
- MMIO registers for draw parameters
- New bit in 3DPRIMITIVE command to enable indirection
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Based on part of Patch 2 of Christoph Bumiller's ARB_draw_indirect series.
V3: - Disallow primcount==0 for DrawMulti*Indirect. The spec is unclear
on this, but it's silly. We might go back on this later if it
turns out to be a problem.
- Make it clear that the caller has dealt with stride==0
V4: - Allow primcount==0 again.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Based on part of Patch 2 of Christoph Bumiller's ARB_draw_indirect series.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
We will reuse the same extension flag for ARB_multi_draw_indirect since
it can always be supported by looping.
Based on part of Patch 2 of Christoph Bumiller's ARB_draw_indirect series.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Based on part of Patch 2 of Christoph Bumiller's ARB_draw_indirect series.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Split from patch implementing ARB_draw_indirect.
v2: Const-qualify the struct gl_buffer_object *indirect argument.
v3: Fix up some more draw calls for new argument.
v4: Fix up rebase conflicts in i965.
v5: Undo const-qualification
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
V2: drop description of `fall` and `wm`, which have been removed by the
previous patch; describe `stats`.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
DEBUG_IOCTL comes from i965, and is about to be removed. Both defines
have the same value (4).
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Limit the max glsl version level to what the state tracker supports.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
V2: Return after error to avoid cascading error messages and
removed redundant "to" from error message
Signed-off-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
I recently made us try two different things that tried to reduce register
pressure so that we would be more likely to allocate successfully. But
now that we have the logic for trying two, we can make the first thing we
try be the normal, not-prioritizing-register-pressure heuristic.
This means one less scheduling pass in the common case of that heuristic
not producing spills, plus the best schedule we know how to produce, if
that one happens to succeed. This is important, because our register
allocation produces a lot of possibly avoidable dependencies for the
post-register-allocation schedule, despite ra_set_allocate_round_robin().
GLB2.7: 1.04127% +/- 0.732461% fps improvement (n=31)
nexuiz: No difference (n=5)
lightsmark: 0.838512% +/- 0.300147% fps improvement (n=86)
minecraft apitrace: No difference (n=15)
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
The canary is basically just to give a better debugging message when you
ralloc_free() something that wasn't rallocated. Reduces maximum memory
usage of apitrace replay of the dota2 demo by 60MB on my 64-bit system (so
half that on a real 32-bit dota2 environment).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I think I was thinking of the batch command packet cache when I pasted
this in, but this counter is only used for dumping out streamed state for
INTEL_DEBUG=batch and for putting annotations in our aub files.
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Commit 2f89662 added the driconf option 'clamp_max_samples'. In that
commit, the option did not alter the context version. The neglect to
alter the context version is a fatal issue for some apps.
For example, consider running Chromium with clamp_max_samples=0.
Pre-patch, Mesa creates a GL 3.0 context but clamps GL_MAX_SAMPLES to
0. This violates the GL 3.0 spec, which requires GL_MAX_SAMPLES >= 4.
The spec violation causes WebGL context creation to fail in many
scenarios because Chromium correctly assumes that a GL 3.0 context
supports at least 4 samples.
Since the driconf option was introduced largely for Chromium, the issue
really needs fixing.
This patch fixes calculation of the context version to respect the
post-clamped value of GL_MAX_SAMPLES. This in turn fixes WebGL on
Chromium when clamp_max_samples=0.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
clamp_max_samples() and intel_quantize_num_samples() each maintained
their own list of which MSAA modes the hardware supports. This patch
removes the duplication by making intel_quantize_num_samples() use the
same list as clamp_max_samples(), the list maintained in
brw_supported_msaa_modes().
By removing the duplication, we prevent the scenario where someone
updates one list but forgets to update the other.
Move function `brw_context.c:static brw_supported_msaa_modes()` to
`intel_screen.c:(non-static) intel_supported_msaa_modes()` and patch
intel_quantize_num_samples() to use the list returned by that function.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
This simplifies the loop logic in a subsqequent patch that refactors
intel_quantize_num_samples() to use brw_supported_msaa_modes().
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
These degenerate instructions can often be emitted by state trackers
when the semantics of instructions don't match precisely.
Reviewed-by: Brian Paul <brianp@vmware.com>
Several of links the were contributed by Keith Whitwell and Roland Scheidegger.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
From section 6.1.18 (Renderbuffer Object Queries) of the GL 3.2 spec,
under the heading "If the value of FRAMEBUFFER_ATTACHMENT_OBJECT_TYPE
is TEXTURE, then":
If pname is FRAMEBUFFER_ATTACHMENT_LAYERED, then params will
contain TRUE if an entire level of a three-dimesional texture,
cube map texture, or one-or two-dimensional array texture is
attached. Otherwise, params will contain FALSE.
Fixes piglit tests:
- spec/!OpenGL 3.2/layered-rendering/framebuffer-layered-attachments
- spec/!OpenGL 3.2/layered-rendering/framebuffertexture-defaults
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
v2: Don't include "EXT" in the error message, since this query only
makes sensen in context versions that have adopted
glGetFramebufferAttachmentParameteriv().
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously we were using the code path for validating
glFramebufferTextureLayer(). But glFramebufferTexture() allows
additional texture types.
Fixes piglit tests:
- spec/!OpenGL 3.2/layered-rendering/gl-layer-cube-map
- spec/!OpenGL 3.2/layered-rendering/framebuffertexture
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
v2: Clarify comment above framebuffer_texture().
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
From section 4.4.7 (Layered Framebuffers) of the GLSL 3.2 spec:
When the Clear or ClearBuffer* commands are used to clear a
layered framebuffer attachment, all layers of the attachment are
cleared.
This patch fixes the fast depth clear path.
Fixes piglit test "spec/!OpenGL 3.2/layered-rendering/clear-depth".
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
From section 4.4.7 (Layered Framebuffers) of the GLSL 3.2 spec:
When the Clear or ClearBuffer* commands are used to clear a
layered framebuffer attachment, all layers of the attachment are
cleared.
This patch fixes the blorp clear path for color buffers.
Fixes piglit test "spec/!OpenGL 3.2/layered-rendering/clear-color".
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
From section 4.4.7 (Layered Framebuffers) of the GLSL 3.2 spec:
When the Clear or ClearBuffer* commands are used to clear a
layered framebuffer attachment, all layers of the attachment are
cleared.
This patch fixes meta clears to properly clear all layers of a layered
framebuffer attachment. We accomplish this by adding a geometry
shader to the meta clear program which sets gl_Layer to a uniform
value. When clearing a layered framebuffer, we execute in a loop,
setting the uniform to point to each layer in turn.
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
In order to properly clear layered framebuffers, we need to know how
many layers they have. The easiest way to do this is to record it in
the gl_framebuffer struct when we check framebuffer completeness.
This patch replaces the gl_framebuffer::Layered boolean with a
gl_framebuffer::NumLayers integer, which is 0 if the framebuffer is
not layered, and equal to the number of layers otherwise.
v2: Remove gl_framebuffer::Layered and make gl_framebuffer::NumLayers
always have a defined value. Fix factor of 6 error in the number of
layers in a cube map array.
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Prevents a GPU page fault if somehow the uniform bo gets evicted
before the screen_create pushbuf has been submitted.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Previously, we checked for interstage uniform interface block link
errors in validate_interstage_interface_blocks(), which is only called
on pairs of adjacent shader stages. Therefore, we failed to detect
uniform interface block mismatches between non-adjacent shader stages.
Before the introduction of geometry shaders, this wasn't a problem,
because the only supported shader stages were vertex and fragment
shaders, therefore they were always adjacent. However, now that we
allow a program to contain vertex, geometry, and fragment shaders,
that is no longer the case.
Fixes piglit test "skip-stage-uniform-block-array-size-mismatch".
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
v2: Rename validate_interstage_interface_blocks() to
validate_interstage_inout_blocks() to reflect the fact that it no
longer validates uniform blocks.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
v3: Make validate_interstage_inout_blocks() skip uniform blocks.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously, when attempting to link a vertex shader and a geometry
shader that use different GLSL versions, we would sometimes generate a
link error due to the implicit declaration of gl_PerVertex being
different between the two GLSL versions.
This patch fixes that problem by only requiring interface block
definitions to match when they are explicitly declared.
Fixes piglit test "shaders/version-mixing vs-gs".
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
v2: In the interface_block_definition constructor, move the assignment
to explicitly_declared after the existing if block.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
From section 7.1 (Built-In Language Variables) of the GLSL 4.10
spec:
Also, if a built-in interface block is redeclared, no member of
the built-in declaration can be redeclared outside the block
redeclaration.
We have been regarding this text as a clarification to the behaviour
established for gl_PerVertex by GLSL 1.50, so we apply it regardless
of GLSL version.
This patch enforces the rule by adding an enum to ir_variable to track
how the variable was declared: implicitly, normally, or in an
interface block.
Fixes piglit tests:
- gs-redeclares-pervertex-out-after-global-redeclaration.geom
- vs-redeclares-pervertex-out-after-global-redeclaration.vert
- gs-redeclares-pervertex-out-after-other-global-redeclaration.geom
- vs-redeclares-pervertex-out-after-other-global-redeclaration.vert
- gs-redeclares-pervertex-out-before-global-redeclaration
- vs-redeclares-pervertex-out-before-global-redeclaration
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
v2: Don't set "how_declared" redundantly in builtin_variables.cpp.
Properly clone "how_declared".
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Unfortunately, our hardware only has one set of aggregating performance
counters shared between all 3D programs, and their values are not saved
or restored by hardware contexts. Also, at least on Sandybridge and
Ivybridge, the counters lose their values if the GPU goes to sleep.
To work around both of these problems, we have to snapshot the
performance counters at the beginning and end of each batch, similar to
how we handle query objects on platforms that don't support hardware
contexts. I call these "bookend" snapshots.
Since there can be multiple performance monitors active at a time, we
store the bookend snapshots in a global BO, shared by all monitors.
For monitors that span multiple batches, acquiring results involves
adding up three segments:
BeginPerfMonitor --> End of Batch 1 ("head")
Start of Batch 2 --> End of Batch 2
... ("middle")
Start of Batch N-1 --> End of Batch N-1
Start of Batch N --> EndPerfMonitor ("tail")
Monitors that refer to bookend BO snapshots are considered "unresolved".
We delay resolving them (and adding up deltas to obtain the results) as
long as possible to avoid blocking on mapping monitor->oa_bo.
We can also run out of space in the bookend BO, at which point we have
to resolve all unresolved monitors. Then we can throw away the
snapshots and begin writing at the beginning of the buffer.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
In order to use the Observability Architecture effectively, we'll need
to take snapshots of the OA counters via MI_REPORT_PERF_COUNT at the
start and end of each batch.
Experimentation reveals that we need to flush before and after each
MI_REPORT_PERF_COUNT to get working values. For simplicitly, I chose to
use intel_batchbuffer_emit_mi_flush(), which unfortunately expands to
triple pipe controls on Sandybridge.
We may want to start computing per-generation reserved batch space to
avoid the insanity of Sandybridge's PIPE_CONTROL cost. That said, much
of this cost existed before I rewrote the query object support to use
hardware contexts, so it's at least not entirely new.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Currently, this only considers the monitor start and end snapshots.
This is woefully insufficient, but allows me to add a bunch of the
infrastructure now and flesh it out later.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
We need to start OA at the beginning of each batch where monitors are
active. OACONTROL isn't part of the hardware context, so to avoid
leaving counters enabled for other applications, we turn them off at the
end of the batch too.
We also need to start them at BeginPerfMonitor time (unless they've
already been started). We stop them when the monitor last ends as well.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
We'll need to write this register to start/stop performance counters.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
MI_REPORT_PERF_COUNT writes a snapshot of the Observability Architecture
counters to a buffer. Exactly how it works varies between generations:
Ironlake requires two packets, Sandybridge has to use GGTT, and Ivybridge
and later use PPGTT.
v2: Assert that we didn't use more space than we reserved (suggested
by Eric Anholt).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Using the OA counters requires some per-batch work. When starting and
ending a batch, it's useful to know whether any monitors are actually
interested in OA data.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
In addition to listing the counter names, we include several "remap"
tables. Confusingly, counters are documented with names like "A23",
are written to some buffer offset other than 23, and exposed by core
Mesa under a counter ID that is different still.
The first is inevitable; MI_REPORT_PERF_COUNT writes certain counters to
fixed locations in the buffer. The latter could be avoided, but core
Mesa uses the "Counters" array index as the ID for a counter. We could
do remapping there, but it would just complicate the core Mesa code.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This is fairly simple:
- At BeginPerfMonitor time, take an opening snapshot.
- At EndPerfMonitor time, take a closing snapshot.
- The first time the application asks for results, subtract the two and
store that value. Then free the BO containing the snapshots.
- On subsequent requests for the results, just return the saved value.
- On reset, throw away the results.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
For now, we only support these on Gen6+, since that's what currently
uses hardware contexts. When we add Ironlake hardware context support,
we can add pipeline statistics register support for that as well.
In theory, we could support pipeline statistics counters even without
hardware contexts, but it would be annoyingly painful.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Since we don't support any counters, there are zero groups.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The Observability Architecture counters are 32-bit unsigned values, and
the Pipeline Statistics Register counters are 64-bit unsigned values.
These convenience macros make it easy to create those types of counters.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
It's useful to see the state of all outstanding monitors; the start
of a new batch seems like a reasonable time to print them out.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
These stub functions will be filled out in later patches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This will enable debugging printfs for the AMD_performance_monitor code.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Without hardware contexts, the pipeline statistics registers are
free-running and include data from every 3D application running.
In order to find out the contributions of one particular context, we
need to take a snapshot at the start and end of each batch.
Previously, we emitted the PIPE_CONTROL necessary to capture
PS_DEPTH_COUNT when drawing primitives. Special tracking ensured it
happened only on the first draw of the batch, rather than on every draw.
Moving this to brw_new_batch increases symmetry, since the final
snapshot has always been in brw_finish_batch, which is just a few lines
below. It should be basically equivalent.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The new intel_batchbuffer_emit_render_ring_prelude() hook will be called
when switching from BLT or UNKNOWN_RING to RENDER_RING. This provides a
place to emit state that should go at the start of each render ring
batch, with minimal overhead.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
When we first create a batch buffer, it's empty. We don't actually
know what ring it will be targeted at until the first BEGIN_BATCH or
BEGIN_BATCH_BLT macro.
Previously, one could determine the state of the batch by checking
brw->batch.ring (blit vs. render) and brw->batch.used != 0 (known vs.
unknown).
This should be functionally equivalent, but the tri-state enum is a bit
clearer.
v2: Catch three explicit require_space callers (thanks to Carl and Eric).
v3: Split the boolean -> enum change from the UNKNOWN_RING change.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Passing BLT_RING or RENDER_RING to batchbuffer functions is a lot more
obvious than passing true or false.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Some rounding errors could crop up when calculating a0. Use a more accurate
method (barycentric interpolation essentially) to fix this, though to fix
the REAL problem (which is that our interpolation will give very bad results
with small triangles far away from the origin when they have steep gradients)
this does absolutely nothing (actually makes it worse). (To fix the real
problem, either would need to use a vertex corner (or some other point inside
the tri) as starting point value instead of fb origin and pass that down to
interpolation, or mimic what hw does, use barycentric interpolation (using
the coordinates extracted from the rasterizer edge functions) - maybe another
time.)
Some (silly) tests though really want a high accuracy at fb origin and don't
care much about anything else (Just. Don't. Ask.).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
D3D9 Shader Model 2 restricted the fog register to one component,
http://msdn.microsoft.com/en-us/library/windows/desktop/bb172945.aspx ,
but that restriction no longer exists in Shader Model 3, and several
WHCK tests enforce that.
So this change:
- lifts the single-component restriction TGSI_SEMANTIC_FOG
from Gallium interface
- updates the Mesa state tracker to enforce output fog has (f, 0, 0, 1)
- draw module was updated to leave TGSI_SEMANTIC_FOG output registers
alone
Several gallium drivers that are going out of their way to clear
TGSI_SEMANTIC_FOG components could be simplified in the future.
Thanks to Si Chen and Michal Krol for identifying the problem.
Testing done: piglit fogcoord-*.vpfp tests
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Same as Si Chen's commit e7a5905d8a for
tgsi_exec module.
Not actually tested, because softpipe is failing the test that caught
this bug due to unrelated issues.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Earlier comments suggest this was removed from GL core spec but it is
still there. Enabling makes 'texture_lod_bias_getter' Khronos
conformance tests pass, also removes some errors from Metro Last Light
game which is using this API.
v2: leave NOTE comment (Ian)
Cc: "9.0 9.1 9.2 10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
The alignment of a virtual memory area must always be at least 4096 bytes.
It only worked because size was aligned to 4096 outside of the function.
Signed-off-by: Christian König <christian.koenig@amd.com>
If a user set MESA_INFO and the OpenGL application uses a
3.0 or later context then the MESA_INFO debug output will have
an error when it queries for extensions using the deprecated
enum GL_EXTENSIONS. Passing context argument allows code
to return extension list directly regardless of profile.
Commit title updated as recommended by Kenneth Graunke.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
BLORP is essential. However, porting it to Gen8 is a huge amount of
work. Disabling it for now allows us to proceed with basic hardware
enablement.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
HiZ is difficult to implement, and while it's essential for performance,
we don't need it right away for purposes of hardware enabling.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
As always, the chipset limits here are placeholders, rather than the
actual values.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
i965 passed piglit, but swrast and gallium both segfaulted without this.
i965 happened to work because it never ran _mesa_load_state_parameters()
on the new program before the test called glProgramLocalParameter(), which
was allocating a LocalParams array for the fallback path.
v2: Since v1 threw away old localparams data, leaked old LocalParams
memory, only fixed fragment programs, and I was dubious of my previous
invariants already (nothing but program_parse.y will generate
LocalParams, and only that one path of program_parse.y will), just
late-allocate localparams at the other point of dereferencing them.
This adds overhead to _mesa_load_state_parameter, which is
uncomfortable, but I'm pretty sure that giant switch statement is
super slow already.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=71734
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Removes IF/ENDIF and IF/ELSE/ENDIF with no intervening instructions.
total instructions in shared programs: 1360393 -> 1360387 (-0.00%)
instructions in affected programs: 157 -> 151 (-3.82%)
(no change in vertex shaders)
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
For commit 4df56177 Paul discovered that the hardware restriction that
Align16 instructions cannot be compressed was lifted on Haswell. This
has prevented us from emitting compressed three-source instructions.
For added confirmation, the bspec lists a work around called
WaBreakSimd16TernaryInstructionsIntoSimd8 that hasn't been applicable
since very early Haswell silicon.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, register_coalesce() would modify
mov vgrf1:f vgrf2:f
cmp null vgrf3:d vgrf1:d
to be
cmp null vgrf3:d vgrf2:f
and incorrectly use vgrf2's type in the instruction that the mov was
coalesced into.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It's not necessary to scale down cubemap texture coords when generating
mipmaps: we are doing a 2x minification therefore it's guaranteed that
the texture coords will always be at least 1 texel away of the edges.
Scaling down can actually be harmful, as it may cause artefacts when
generating mipmaps with nearest filtering. Sample points will lie
exactly in the middle each 2x2 texels, so the scaling factor was causing
different texels to be take on each quadrant of the cube face. This is
apparent with a 1x1 checkerboard pattern in the base mipmap level:
instead of next mipmap level receiving a constant color throughout the
face, it will have different colors for each quadrant of the face.
The behaviour for blits is left untouched for now, but the cubemap
texture coord scaling hack should be reconsidered eventually.
Reviewed-by: Brian Paul <brianp@vmware.com>
We need to check the drawbuffer's orientation before inverting Y
coordinates. Fixes piglit feedback tests when running with the
-fbo option.
Cc: "9.2" "10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The exec_mask must be taken in consideration, just like emit_kill above.
The tgsi_exec module has the same bug and should be fixed in a future
change.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Gen7 does not allow render targets to have a vertical alignment of 2.
So, when creating a surface, if its format is renderable, and its
vertical alignment is 2, force it to use X tiling.
Reviewed-by: Eric Anholt <eric@anholt.net>
Gen6+ allows for color buffers to use a vertical alignment of either 4
or 2. Previously we defaulted to 2. This may have caused problems on
Gen7 because Y-tiled render targets are not allowed to use a vertical
alignment of 2.
This patch changes the vertical alignment to 4 on Gen7, except for the
few formats where a vertical alignment of 2 is required.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Commit 70953b5 (i965: Initialize all member variables of
vec4_instruction on construction) inadvertently added a line to the
vec4_instruction constructor setting this->ir to NULL, wiping out the
previously set value. As a result, ever since then, the output of
INTEL_DEBUG=vs and INTEL_DEBUG=gs has been missing IR annotations.
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This is basically a a respin of f1dfcf4bce35e6796f873d9a00103b280da81e4c
per Jose's suggestion.
Just set the SVGA3dSurfaceFormatCaps flags for 3D and cube textures
when checking the texture format capabilities. This will filter out
unsupported combinations like 3D+DXT.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Refine 8c533022. Provide a stub __glXGetCurrentContext() function when
$(DEFINES) are such that it is not a macro.
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
We were always passing PIPE_TEXTURE_2D, but not all formats are
supported for all types of textures. In particular, the driver may
not supported texture compression for all types of textures.
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
v2: Don't go to extra work to avoid extraneous flushes. (Previous
experiments in the kernel have suggested that flushing the pipeline
when it is already empty is extremely cheap).
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Add new OSMesaPostprocess() function to allow using the gallium
postprocessing filters. This only works for OSMesa with gallium
drivers, not the legacy swrast OSMesa.
Bump OSMESA_MAJOR/MINOR_VERSION numbers to 10.0
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Move private data structures and function prototypes out of the
public postprocess.h header file.
Create a pp_private.h for the shared, private data structures, functions.
Remove pp_program.h header.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
- Indent items under a GL version to allow context diffs to do their work.
- Move complete drivers into the GL version line - this should make the
stuff a little bit easier to read.
v2: keep the fd.o link (Emil Velikov)
Acked-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Joerg Mayer <jmayer@loplof.de>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
The above two variables are unused as of commit
commit 024fe6852a
Author: Tom Stellard <thomas.stellard@amd.com>
Date: Tue Apr 2 10:42:50 2013 -0700
radeon/llvm: Use LLVM C API for compiling LLVM IR to ISA v2
which removed the only cpp file from drivers/radeon, but missed to
remove the CXXFLAGS. The sequential commit reintroduced and empty
LLVM_CPP_FILES.
Lets cleanup and remove both.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
If a performance monitor has never ended, then no result can be
available. Core Mesa can easily handle this, saving drivers a tiny bit
of complexity.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
If a monitor has ended, it means a result should eventually become
available, pending some flushing.
This is distinct from !m->Active; if a monitor has not been started,
then m->Active == false and m->Ended == false.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The i965 implementation uses calloc, so I missed this. It's best to
simply initialize it to avoid requiring a zeroing allocator, though.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Now that branch 10.0 is created, bump the minor version in
master.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We'll need this for Broadwell code as well.
Normally, when we make things public, we add the "brw" prefix. I'm not
crazy about that in this case, since it deals with prog_instruction.h's
SWIZZLE_XYZW values, rather than the BRW_SWIZZLE_XYZW enums. However,
I can't think of a better name, and at least the comments and code make
it clear.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
Broadwell code should not include brw_eu.h (since it is for Gen4-7
assembly encoding), but needs the URB write flags enum.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
Only Gen4 color write setup uses the force_sechalf flag, and it only
sets it on a single instruction. It also already has to get a pointer
to the instruction and manually set the saturate flag, so we may as well
just set force_sechalf the same way and avoid the complexity of a stack.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
Previous assumption was that the same set of flags can be reused
for both classic and gallium drivers. With megadriver work done
the classic drivers ended up using their own (single) instance of
the flags.
Move these into Automake.inc and rename to indicate that those
are gallium specific. Additionally silence an automake/autoconf
warning "XXX is not a standard libtool library name", due to
the parsing issues of the module tag.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Greatly reduce duplication and provide a sane minimum of
CFLAGS for all DRI targets.
Note: This commit adds VISIBILITY_CFLAGS to the following:
* freedreno
* i915
* ilo
* nouveau
* vmwgfx
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
In order to use the trace driver, one needs to define
GALLIUM_TRACE. Neither one of the two targets was
defining it, thus we're safe to remove libtrace.la.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Automake.inc already has GALLIUM_VIDEO_CFLAGS, which
provide the essential compiler flags needed.
Note: this commit adds VISIBILITY_CFLAGS to nouveau.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Cleanup the duplicating flags and consolidate into a sigle variable.
Note: this patch adds VISIBILITY_CFLAGS to the following targets
* freedreno/drm
* i915/{drm,sw}
* nouveau/drm
* sw/fbdev
* sw/null
* sw/wayland
* sw/wrapper
* sw/xlib
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
In order for one to use trace, noop, rbug and/or galahad, they must
set the corresponding GALLIUM_* CFLAG.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Store the compiler flags into a variable, in order to minimise
flags duplication (amongst vdpau and xvmc).
Note: this commit add VISIBILITY_CFLAGS to the nouveau target
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
* minimise flags duplication
* distingush between VISIBILITY C and CXX flags
* set only required flags - C and/or CXX
v2: add LLVM_CFLAGS back to AM_CFLAGS (add missing backslash)
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
* Allow the lists to be shared among build systems.
* Update automake and Android build systems.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Nearly everything within the three Makefile.am's is identical.
Let's simplify things a little.
v2: Rebase and rewrite the commit message (Emil Velikov)
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
The array was 64kb per struct gl_program, plus we statically stored a copy
of one on disk for _mesa_DummyProgram. Given that most struct gl_programs
we generate are for GLSL shaders that don't have local parameters, this
was a waste.
Since you can store and fetch parameters beyond what the program actually
uses, we do have to do a late allocation if necessary at
GetProgramLocalParameter time.
Reduces peak memory usage in the dota2 trace I made by 76MB (4.5%)
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This has been replaced with referring to env parameters using
PROGRAM_STATE_VAR and _mesa_load_state_parameters.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This has been replaced with referring to local parameters using
PROGRAM_STATE_VAR and _mesa_load_state_parameters.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Notably, ENV and LOCAL aren't used any more (replaced by STATE_VAR), but
apparently CONSTANT is.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The comment was stale, because the lowering in question wasn't happening
in lower_instructions.cpp. Presumably if the lowering ever moves there,
we can plumb the lowering mask through to opt_algebraic.
total instructions in shared programs: 1618696 -> 1616810 (-0.12%)
instructions in affected programs: 243018 -> 241132 (-0.78%)
GAINED: 0
LOST: 0
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Previously, brw_new_batch was called just after execbuf, but before
intel_batchbuffer_reset. Essentially, it prepared for the creation of a
new batch, that wasn't yet available, and which it didn't create. This
was a bit awkward.
This patch makes brw_new_batch call intel_batchbuffer_reset as the very
first operation. This means that brw_new_batch actually creates a new
batchbuffer, and thus has it available. It brings the creation of the
new batchbuffer and BRW_NEW_BATCH flagging together into one place.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
More rebase fail. This code was written long before i915 and i965 were
split, so most of the code in i9[16]5/intel_screen.c only needed to
exist in one place. It looks like I fixed n-1 of those places after
rebasing on the split.
I only found this from the defined-but-not-used warning for
intelRendererQueryExtension. I noticed this while fixing the other,
related warnings.
(Note: During review, we decided to *not* pick this back to 10.0.)
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Paul Berry <stereotype441@gmail.com>
radeon_llvm_compile allocates memory for binary.code, binary.config,
or neither depending on what's being done.
We need to make sure to free that memory after it's no longer needed.
v2: Don't bother checking for null before FREE()
CC: "10.0" <mesa-stable@lists.freedesktop.org>
use memset to initialize to 0's... otherwise code_size and config_size
could be uninitialized when read later in this method.
It's also hard to do NULL checks on uninitialized pointers.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
v2: Fix indentation
CC: "10.0" <mesa-stable@lists.freedesktop.org>
For DX9-level shaders, there's only limited support for indirect
indexing of registers (with the loop counter register, not the
general address register.)
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
After we blit/copy to a dest texture image we need to mark it as
being defined. This fixes broken mipmap generation for quite a
few texture formats. Mipgen involves making texture views and
svga_texture_view_surface() skips texture images that are undefined.
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The index translation code expects the number of indexes to be
consistent with the primitive type (ex: a multiple of 3 for
PIPE_PRIM_TRIANGLES). If it's not, we can write out of bounds
in the destination buffer.
Fixes failed assertions in the pipebuffer debug code found with
Piglit primitive-restart-draw-mode test.
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Previously, when doing intrastage and interstage interface block
linking, we only checked the interface type; this prevented us from
catching some link errors.
We now check the following additional constraints:
- For intrastage linking, the presence/absence of interface names must
match.
- For shader ins/outs, the interface names themselves must match when
doing intrastage linking (note: it's not clear from the spec whether
this is necessary, but Mesa's implementation currently relies on
it).
- Array vs. nonarray must be consistent, taking into account the
special rules for vertex-geometry linkage.
- Array sizes must be consistent (exception: during intrastage
linking, an unsized array matches a sized array).
Note: validate_interstage_interface_blocks currently handles both
uniforms and in/out variables. As a result, if all three shader types
are present (VS, GS, and FS), and a uniform interface block is
mentioned in the VS and FS but not the GS, it won't be validated. I
plan to address this in later patches.
Fixes the following piglit tests in spec/glsl-1.50/linker:
- interface-blocks-vs-fs-array-size-mismatch
- interface-vs-array-to-fs-unnamed
- interface-vs-unnamed-to-fs-array
- intrastage-interface-unnamed-array
v2: Simplify logic in intrastage_match() for handling array sizes.
Make extra_array_level const. Use an unnamed temporary
interface_block_definition in validate_interstage_interface_blocks()'s
first call to definitions->store().
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
From the Sandy Bridge PRM, Vol 1 Part 1 7.18.3.4 (Alignment Unit
Size):
j [vertical alignment] = 4 for any render target surface is
multisampled (4x)
From the Ivy Bridge PRM, Vol 4 Part 1 2.12.2.1 (SURFACE_STATE for most
messages), under the "Surface Vertical Alignment" heading:
This field is intended to be set to VALIGN_4 if the surface was
rendered as a depth buffer, for a multisampled (4x) render target,
or for a multisampled (8x) render target, since these surfaces
support only alignment of 4.
Back in 2012 when we added multisampling support to the i965 driver,
we forgot to update the logic for computing the vertical alignment, so
we were often using a vertical alignment of 2 for multisampled
buffers, leading to subtle rendering errors.
Note that the specs also require a vertical alignment of 4 for all
Y-tiled render target surfaces; I plan to address that in a separate
patch.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=53077
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
For both vertex and fragment shaders we default MaxUniformComponents
to 4 * MAX_UNIFORMS. It makes sense to do this for geometry shaders
too; if back-ends have different limits they can override them as
necessary.
Fixes piglit test:
spec/glsl-1.50/built-in constants/gl_MaxGeometryUniformComponents
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
* Inherit gl_context so we always have access to it
* Thanks curro for the idea.
* Last Haiku cannidate for 10.0.0
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
The debug printfs wouldn't actually compile when enabled, so kill them off
and insert some new one in another place, and make sure it keeps compiling
by enclosing it in a if-0 clause.
In particular get rid of home-grown vector helpers which didn't add much.
And while here fix formatting a bit. No functional change.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
d3d10 requires us to convert NaNs to zero for any float->int conversion.
We don't really do that but mostly seems to work. In particular I suspect the
very common float->unorm8 path only really passes because it relies on sse2
pack intrinsics which just happen to work by luck for NaNs (float->int
conversion in hw gives integer indeterminate value, which just happens to be
-0x80000000 hence gets converted to zero in the end after pack intrinsics).
However, float->srgb didn't get so lucky, because we need to clamp before
blending and clamping resulted in NaN behavior being undefined (and actually
got converted to 1.0 by clamping with sse2). Fix this by using a zero/one clamp
with defined nan behavior as we can handle the NaN for free this way.
I suspect there's more bugs lurking in this area (e.g. converting floats to
snorm) as we don't really use defined NaN behavior everywhere but this seems
to be good enough.
While here respecify nan behavior modes a bit, in particular the return_second
mode didn't really do what we wanted. From the caller's perspective, we really
wanted to say we need the non-nan result, but we already know the second arg
isn't a NaN. So we use this now instead, which means that cpu architectures
which actually implement min/max by always returning non-nan (that is adhering
to ieee754-2008 rules) don't need to bend over backwards for nothing.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Systems with little physical memory installed will report less than
2GiB, and some systems may (hypothetically?) have a larger address space
for the GPU. My IVB still reports 1534.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
This patch implements accelerated path for glDrawPixels from a PBO in
i965. The code follows what intel_pixel_read, intel_pixel_copy,
intel_pixel_bitmap and intel_tex_image are doing. Piglit quick.tests
show no regressions. In my testing on IVB, performance improvement is
huge (about 30x, didn't measure exactly) since generic path goes via
_mesa_unpack_color_span_float, memcpy, extract_float_rgba.
Signed-off-by: Alexander Monakov <amonakov@ispras.ru>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
createContextAttribs is a superset of what createNewContext provides.
Also remove the function typedef, since createNewContext is deprecated
and no longer used in multiple interfaces.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
This lets us allocate color buffers as __DRIimages and pass them into
the driver instead of having to create a __DRIbuffer with the flink
that requires.
With this patch, we can now run gbm on render-nodes. A render-node is a
drm device that doesn't support modesetting and all the legacy DRI ioctls.
flink is also not supported, but now that gbm doesn't need flink, we can
run piglit on head-less gbm or head-less GPGPU.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Tested-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Since LIFO fails on some shaders in one particular way, and non-LIFO
systematically fails in another way on different kinds of shaders, try
them both, and pick whichever one successfully register allocates first.
Slightly prefer non-LIFO in case we produce extra dependencies in register
allocation, since it should start out with fewer stalls than LIFO.
This is madness, but I haven't come up with another way to get unigine
tropics to not spill while keeping other programs from not spilling and
retaining the non-unigine performance wins from texture-grf.
total instructions in shared programs: 1626728 -> 1626288 (-0.03%)
instructions in affected programs: 1015 -> 575 (-43.35%)
GAINED: 50
LOST: 0
Improves Unigine Tropics performance by 14.5257% +/- 0.241838% (n=38)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70445
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Long ago, the HW_REG usage in assign_curb/urb_setup() were scheduling
barriers, so we had to run scheduler before them in order for it to be
able to do basically anything. Now that that's fixed, we can delay the
scheduling until we go to allocate (which will make the next change less
scary).
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We care about depth-until-program-end, as a proxy for "make sure I
schedule those early instructions that open up the other things that can
make progress while keeping register pressure low", not actual latency
(since we're relying on the post-register-alloc scheduling to actually
schedule for the hardware).
total instructions in shared programs: 1609931 -> 1609931 (0.00%)
instructions in affected programs: 0 -> 0
GAINED: 55
LOST: 43
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
In the SIMD16 spilling changes, I replaced a "1" in the spill path with
"mlen", but obviously it wasn't mlen before because spills have the g0
header along with the payload. The interface I was trying to use was
asking for how many physical regs we're writing, so we're looking for "1"
or "2".
I'm guessing this actually passed piglit because the high 8 bits of the
execution mask in SIMD8 mode are all 0s.
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Previously, the best thing we had was to schedule the things unblocked by
the last chosen instruction, on the hope that it would be consuming two
values at the end of their live intervals while only producing one new
value. But that's just a guess, and we can do counting of usage of
registers to know when an instruction would (almost surely) reduce
register pressure.
The only failure mode I know of in this new dominant heuristic is that
inside of a loop when scheduling the iterator (for example), choosing the
last use of the iterator doesn't actually reduce the live interval of the
iterator. But it doesn't seem to matter in shader-db:
total instructions in shared programs: 1618700 -> 1618700 (0.00%)
instructions in affected programs: 0 -> 0
GAINED: 13
LOST: 0
Note: The new functions are made virtual because I expect we'll soon lift
the pre-regalloc scheduling heuristic over to the vec4 backend.
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes infinite loop in find_grid_optimal_factor() in cases where the
user specifies a grid size with less dimensions than the device
supports.
Reported-by: Tom Stellard <thomas.stellard@amd.com>
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Since we explicitly require a integer input we should avoid using exp2 math
(even if we were using optimized versions), which turns the exp2 into a int
sub (plus some casts).
v2: fix bogus uint (needs to be int) math spotted by Matthew, fix comments
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Otherwise, the function would enable generic vertex attributes 0
and 1 of the array object it does not own. This was causing crashes
in Euro Truck Simulator 2, since the incorrectly enabled generic
attribute 0 in the foreign context got precedence before vertex
position attribute at later time, leading to NULL pointer dereference.
Cc: "9.2" <mesa-stable@lists.freedesktop.org>
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Petr Sebor <petr@scssoft.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Adding a vl_mpeg-based helper didn't seem to work, as it produced data
that the card couldn't handle. (And I didn't investigate further.) This
makes the decoding functionality only accessible via XvMC and avoids
crashes when attempting to use VDPAU.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
This fixes a crash in glamor when mesa links against static LLVM.
v2:
- Inline LINKER_SCRIPT variable
v3: Kai Wasserbäch
- Fix out out-of-tree-builds
Tested-by: Kai Wasserbäch <kai@dev.carbon-project.or>
This makes it possible to use clover with statically linked LLVM.
v2:
- Inline LINKER_SCRIPT variable
v3: Kai Wasserbäch
- Fix out out-of-tree-builds
Tested-by: Kai Wasserbäch <kai@dev.carbon-project.or>
X_f, Y_f, Xp_f, Yp_f variables are used just inside
translate_dst_to_src().So, they can be defined just
as local variables.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
When this function was added, the returned value was signed in some
places, unsigned in others.
v2: also add unsigned in the unit test, per Ian.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously, we would bogusly replace the entire statement containing the
ir_texture node with an ir_dereference_variable.
Correct this to just replace the ir_texture node itself as intended.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Commit b16b3c87 began performing CSE on CMP instructions with null
destinations. I relaxed the restrictions a bit too much, thereby
allowing CSE to be performed on instructions with, for instance, an
explicit accumulator destination.
This broke the arb_gpu_shader5/fs-imulExtended shader tests because
they emit MUL instructions with the accumulator as the destination. CSE
would instead cause the MUL to write to a GRF, which is lower precision
than the accumulator.
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: 10.0 <mesa-stable@lists.freedesktop.org>
Uses the __DRIimage loader interfaces.
v2: Fix _XIOErrors when DRI3 isn't present (change by anholt). Apparently
XCB just terminates your connection if you don't check for extensions
before using them, instead of returning an error like you'd expect.
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
These provide an interface between the driver and the loader to allocate
color buffers through the DRIimage extension interface rather than through a
loader-specific extension (as is used by DRI2, for instance).
The driver uses the loader 'getBuffers' interface to allocate color buffers.
The loader uses the createNewScreen2, createNewDrawable, createNewContext,
getAPIMask and createContextAttribs APIS (mostly shared with DRI2).
This interface will work with the DRI3 loader, and should also work with GBM
and other loaders so that drivers need not be customized for each new loader
interface, as long as they provide this image interface.
v2: Fix build of i915 and i965 together (by anholt)
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
The __DRI_IMAGE_FORMAT codes are used by the image extension, drivers need to
be able to translate between them. Instead of duplicating this translation in
each driver, create a shared version.
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Instead of assuming that the size will be height * pitch, have the caller pass
in the size explicitly.
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
This just renames them so that they can be used with the DRI3 extension
without causing too much confusion.
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
There's only one minor functional change, for immediates the pixel offsets
are no longer added since the values are all the same for all elements in
any case (it might be better if those weren't stored as soa vectors in the
first place maybe).
Reviewed-by: Zack Rusin <zackr@vmware.com>
After adding $(DEFINES) to AM_CPPFLAGS, the __glXGetCurrentContext
wrapper function is no longer needed and causes compile errors. Using
the correct defines causes it to be a macro!
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
These tests primarilly ensure that the functions added by this extension
don't abuse other interfaces (e.g., glx_screen::query_renderer_integer)
when provided bad data.
These tests helped me find a couple small bugs in the initial
implementation.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
The enumerated values are currently allocated from Intel's range.
v2: Fix a typo. Update the list of functions to which the new enums can
be passed. The "Current" versions were previously missing. Both things
noticed by Marek.
v3: Fix typo in return type of glXQueryRendererIntegerMESA in the spec
body (noticed by Ken). Fix typo in issue #14 referencing itself instead
of issue #13 (noticed by Dave).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
The new functions for this extension were added to a separate file
(dri2_query_renderer.c) to facilitate unit testing. I tried putting
them in dri2_glx.c, and it resulting in an unending chain of
dependencies. It was the proverbial threading hanging from a sweater.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This structures will be accessed by internal functions that will be
added in a file separate from dri2_glx.c. The new code will be added to
a new file to facilitate unit testing.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Use sysconf instead of sysinfo for improved portability. Suggested
by Ken.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Use sysconf instead of sysinfo for improved portability. Suggested
by Ken.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Add assertions that the version string has the expected format.
This will catch build errors (or changes to the version string format)
in debug build without exposing release builds to buffer over-runs.
Suggested by Ken.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This will soon be used in intel_screen.c from a function that doesn't
have a gl_context.
v2: Delete local variables that are now unused. This matches v1 of the
changes to the i915 driver.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This will soon be used in intel_screen.c from a function that doesn't
have a gl_context.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This will soon be used in intel_screen.c from a function that doesn't
have a gl_context.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This will soon be used in intel_screen.c from a function that doesn't
have a gl_context.
v2: Remove spurious break after return.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This will be used to let apps query hardware and driver limits before
creating a GL context.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If the application requests reset notifiction, connect up the reset
status query method and set gl_context::ResetStrategy.
v2: Update based on kernel interface / libdrm changes.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Soon some drivers will support a different set of flags than other
drivers. If some flags have to be filtered in the driver, we might as
well filter all of them in the driver.
The changes in nouveau use tabs because nouveau seems to have it's own
indentation rules.
v2: Fix some rebase failures noticed by Ken (returning the wrong types,
etc.).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
No drivers advertise the DRI2 extension yet, so no driver should ever
see a value other than false for notify_reset.
The changes in nouveau use tabs because nouveau seems to have it's own
indentation rules.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Drivers still have to implement dd_function_table::GetGraphicsResetStatus.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
These will be used to determine whether to signal a GPU reset after
another context in the share group has observed a reset.
v2: Change ShareGroupReset from GLboolean to bool. Suggested by Brian.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This allows drivers to determine whether a GPU reset has occured. It
should return non-zero status if a reset was observed by the specified
context. Another mechanism will be used to observe resets occuring in
other contexts in the share group.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This isn't going to be used in the actual implemenation of
glGetGraphicsResetStatus.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
v2: Add comments on the purpose of the auxiliary data structures.
Check for atomic counter overlaps. Use the contains_atomic()
convenience method. Add static assert with the number of expected
shader stages.
v3: Don't resize atomic arrays.
v4: Add comment on the reason why we don't resize atomic counter
arrays. Use 'strcmp(...) == 0' instead of '!strcmp(...)'.
v5 (idr): Don't use STL in the linker.
Signed-off-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
v2: Mark atomic counters as read-only variables. Move offset overlap
code to the linker. Use the contains_atomic() convenience method.
v3: Use pointer to integer instead of non-const reference. Add
comment so we remember to add a spec quotation from the next GLSL
release once the issue of atomic counter aggregation within
structures is clarified.
v4 (idr): Don't use std::map because it's overkill. Add an assertion
that ctx->Const.MaxAtomicBufferBindings <= MAX_COMBINED_ATOMIC_BUFFERS.
Signed-off-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This reverts most of commit 0f2da77307.
(I chose to leave the additions to brw_defines.h.)
My previous Ironlake implementation was somewhat broken: counter data
was global, rather than per-context. This meant that performance
monitors captured data from your compositor, 2D driver, and other 3D
programs.
Originally, I believed that Sandybridge and later had an easy way to
avoid this problem (setting per-context flags in OACONTROL), while
Ironlake did not. So I'd intended to leave it as a known limitation of
performance monitoring support on Ironlake. However, this turned out
not to be true.
Unfortunately, our hardware only has one set of aggregating performance
counters shared between all 3D programs, and their values are not saved
or restored by hardware contexts. Also, at least on Sandybridge and
Ivybridge, the counters lose their values if the GPU goes to sleep.
To work around both of these problems, we have to snapshot the
performance counters at the beginning and end of each batch, similar to
how we handle query objects on platforms that don't support hardware
contexts.
For occlusion queries, this batch bookending approach is fairly simple:
only one occlusion query can be active at a time, and the result is a
single integer. Performance monitors are more complex: an arbitrary
number of monitors can be active at a time, each monitoring some subset
of our ~30 observability counters. Individual monitors can be started
and stopped at any point during the batch. Tracking where each monitor
started/ended relative to batch flushes ends up being a pain. And you
can run out of space in the buffer.
Properly supporting this required some serious rearchitecting of the
code. Rather than writing patches to try and morph a broken system into
a working one (which operates quite differently), I decided it would be
simplest to revert the old code and start fresh. Parts will look
familiar, but other parts are new.
I also decided it would be best to include Sandybridge and Ivybridge
support from the start, since the newer platforms have added complexity
that I wanted to make sure worked. They're also what most people care
about these days.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, we only exposed them in desktop GL or with:
#extension GL_OES_standard_derivatives : enable
GLSL ES 3.00 includes these without an extension, so we need to expose
them by default.
Note that the above #extension line results in an error or desktop GL,
so we don't need to worry about this.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This just tells the state tracker to turn on the GL_ARB_shader_texture_lod
extension. This simply allows the GLSL compiler to emit TXL and TXD
instructions for both vertex and fragment shaders. We already support
these opcodes in the svga driver. Though, the shadow2DGrad() Piglit
tests are failing.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Improves performance of RoboHornet's 2D Canvas toDataURL benchmark
[http://www.robohornet.org/#e=canvastodataurl] by approximately 5x
on Baytrail on ChromiumOS.
Elapsed time drops by -81.4861% +/- 1.22619% (n=3 s=14.9105, confidence=95%).
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Uses SSE 4.1's MOVNTDQA instruction (streaming load) to read from
uncached memory without polluting the cache.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
This patch make changes to correctly set up the Dispatch GRF Start
Register in case of 'SIMD16 only' FS dispatch.
This fixes an issue of incorrect rendering on dolphin emulator with
GL_SAMPLE_SHADING enabled.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This theoretically works on earlier hardware as well, but the extension
requires at least GL3.0.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
V2: fix interaction with VertexAttribFormat, since that landed after
this was originally written
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
With this patch, the llvmpipe and draw modules will calculate the depth bias
according to floating point depth buffer semantics described in the
arb_depth_buffer_float specification, when the driver has a z buffer bound
with a format type of UTIL_FORMAT_TYPE_FLOAT.
By default, the driver will use the existing UNORM calculation for depth bias.
A new function, draw_set_zs_format, was added to calculate the Minimum
Resolvable Depth value and floating point depth sense for the draw module.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
This brings over the batch-wrap-prevention and aperture space checking
code from the normal brw_draw.c path, so that we don't need to flush the
batch every time.
There's a risk here if the intel_emit_post_sync_nonzero_flush() call isn't
high enough up in the state emit sequences -- before, we implicitly had
one at the batch flush before any state was emitted, so Mesa's workaround
emits didn't really matter. Since the SNB fixes by Ken, I didn't see any
regressions after 3 piglit runs.
Improves cairo-gl performance by 13.7733% +/- 1.74876% (n=30/32)
Improves minecraft apitrace performance by 1.03183% +/- 0.482297% (n=90).
Reduces low-resolution GLB 2.7 performance by 1.17553% +/- 0.432263% (n=88)
Reduces Lightsmark performance by 3.70246% +/- 0.322432% (n=126)
No statistically significant performance difference on unigine tropics
(n=10)
No statistically significant performance difference on openarena (n=755)
The two apps that are hurt happen to include stalls on busy buffer
objects, so I think this is an effect of missing out on an opportune
flush.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
update_array() and update_array_format() are changed to update the new
attrib and binding states, and the client arrays become derived state.
Reviewed-by: Eric Anholt <eric@anholt.net>
...and rename it to _mesa_bind_buffer_gen().
This is so the function can be called from _mesa_BindVertexBuffer().
This patch also adds a caller parameter so we can report the right
entry point in error messages.
Based on a patch by Eric Anholt.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This will become derived state as part of the ARB_vertex_attrib_binding
support.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Split out the code for updating the array format into a new function
called update_array_format(). This function will be called by both
update_array() and the new glVertexAttrib*Format() entry points in
ARB_vertex_attrib_binding.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Currently, has_surface_tile_offset is equivalent to gen == 4 && !is_g4x.
We already use it for related checks in brw_wm_surface_state.c, so it
makes sense to use it here too. It's simpler and more future-proof.
Broadwell also lacks surface tile offsets. With this patch, I won't
need to update any generation checking; I can simply not set the flag.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Hardware docs say we can only use SIMD8 dispatch in this condition.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
We weren't adding the soa offsets when constructing the indices
for the gather functions. That meant that we were always returning
the data in the first element.
(Copied straight from the same fix for temps.)
While here fix up a couple of broken comments in the fetch functions,
plus don't name a straight float type float4 which is just confusing.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Zack Rusin <zackr@vmware.com>
Otherwise OutputSurface interop has funny results sometimes.
This fixes interop with the mpv media player.
v2 (chk): add proper locking
Signed-off-by: Christian König <christian.koenig@amd.com>
V2: Add comment explaining what emit_alpha_test() is for;
fix spurious temp and bogus whitespace.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
The same setup is required here as when the user-provided shader
explicitly uses KIL or discard.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
We have to do this in the shader instead, since these gens lack an
independent RT0 alpha value in their render target write messages.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
Now that brw_update_texture_buffer_surface() uses the virtual
emit_buffer_surface_state() function, it works for Gen7+ too.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Now that brw_create_constant_surface uses a virtual function internally,
it doesn't need to be virtual itself. We can delete the Gen7+ variant
and simplify things.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
This will allow us to combine the Gen4-6 and Gen7 variants of these
functions.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
This entails adding "mocs" and "rw" parameters to the Gen4-5 version.
I made it actually pay attention to the rw flag (even though it is
always false), but mocs is always ignored.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
fix: intel_screen.c:1320:4: warning: initialization from
incompatible pointer type [enabled by default]
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Before the series with 3c9dc2d31b to
dynamically assign our binding table indices, we didn't really track our
binding table count per shader, so we never filled in these fields.
Affects cairo-gl trace runtime by -2.47953% +/- 1.07281% (n=20)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
SSE can't handle true vector shifts (with variable shift count),
so llvm is turning them into a mess of extracts, scalar shifts and inserts.
It is however possible to emulate them in lp_build_minify with float muls,
which should be way faster (saves over 20 instructions per 8-wide
lp_build_minify). This wouldn't work for "generic" 32bit shifts though
since we've got only 24bits of mantissa (actually for left shifts it would
work by using sse41 int mul instead of float mul but not for right shifts).
Note that this has very limited scope for now, since this is only used with
per-pixel lod (otherwise we're avoiding the non-constant shift count by doing
per-quad shifts manually), and only 1d textures even then (though the latter
should change).
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This will enable removing the dd_function_table::Scissor hook in the
near future.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
This will enable removing the dd_function_table::DepthRange hook in the
near future.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
The x, y, width, and height parameters aren't used by radeon_viewport,
so don't pass them. This should make future changes to the
dd_function_table::Viewport interface a little easier.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jljusten@gmail.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Courtney Goeltzenleuchter <courtney@lunarg.com>
The i830 and the i915 driver have the same dd_function_table::Viewport
function... it just has two names and lives in two places. Using a
single implementation allows cleaning up the saved_viewport nonsense
too.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jljusten@gmail.com>
Cc: Courtney Goeltzenleuchter <courtney@lunarg.com>
The i965 driver never installed a dd_function_table::Viewport function,
so this wrapper never actually did anything.
No piglit regressions on IVB on DRI2.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jljusten@gmail.com>
Cc: Courtney Goeltzenleuchter <courtney@lunarg.com>
* Goodbye BeOS, we hardly knew thee
* As BeOS was gcc2 only, there was little chance
of this being useful.
* Doesn't effect Haiku in any meaningful way
Reviewed-by: Brian Paul <brianp@vmware.com>
Previously, when packing geometry shader input varyings like this:
in float foo[3];
in float bar[3];
lower_packed_varyings would declare a packed varying like this:
(declare (shader_in flat) (array ivec4 3) packed:foo[0],bar[0])
That's confusing, since the packed varying acutally stores all three
values of foo and all three values of bar.
This patch causes it to generate the more sensible declaration:
(declare (shader_in flat) (array ivec4 3) packed:foo,bar)
Note that there should be no functional change for users of geometry
shaders, since the packed name is only used for generating debug
output. But this should reduce confusion when using INTEL_DEBUG=gs.
Reviewed-by: Eric Anholt <eric@anholt.net>
* Call mesa viewport call on winndow resize
* Add initial postprocessing code
* Pass hgl_context to private statetracker
as it is more useful than GalliumContext
* Use Lock and Unlock functions to standardize
GalliumContext locking
* Create texture resources in texture validation
Acked-by: Brian Paul <brianp@vmware.com>
The changes between Gen6-7 are minimal, and can easily be solved with
an extra generation check. This cuts a lot of duplicated code.
It also helps prevent even more duplication for Broadwell.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The latency information has been obtained empirically from
measurements taken on Haswell and Ivy Bridge.
Acked-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This can deal with all the 15 32-bit untyped atomic operations the
hardware supports, but only INC and PREDEC are going to be exposed
through the API for now.
v2: Represent atomics as GLSL intrinsics. Add support for variably
indexed atomic counter arrays.
v3: Add comment on why we don't need to assign uniform storage for
atomic counters.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
This can deal with all the 15 32-bit untyped atomic operations the
hardware supports, but only INC and PREDEC are going to be exposed
through the API for now.
v2: Represent atomics as GLSL intrinsics. Add support for variably
indexed atomic counter arrays. Fix interaction with fragment
discard.
v3: Add comment on why we don't need to assign uniform storage for
atomic counters.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
This patch fixes the three dead code elimination passes and the
VEC4/FS instruction scheduling passes so they leave instructions with
side effects alone.
At some point it might be interesting to have the instruction
scheduler calculate the exact memory dependencies between atomic ops,
but they're rare enough that it seems unlikely that it will make any
practical difference.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Inspired by a patch sent to the mailing list by Tom Stellard, but
using a different algorithm to calculate the optimal block size that
has been found to be considerably more effective.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
dso_list was added as an argument for createInternalizePass in 3.4, and then
it was removed again in the same llvm version.
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
The new option clamps GL_MAX_SAMPLES to a hardware-supported MSAA mode.
If negative, then no clamping occurs.
v2: (for Paul)
- Add option to i965 only, not to all DRI drivers.
- Do not realy on int->uint cast to convert negative
values to large positive values. Explicitly check for
clamp_max_samples < 0.
v3: (for Ken)
- Don't allow clamp_max_samples to alter context version.
- Use clearer for-loop and correct comment.
- Rename variables.
v4: (for Ken)
- Merge identical if-branches.
Reviewed-and-tested-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Actually link VS out / FS in based on semantic info, keeping in mind
that position/pointsize can also be an input to the FS. This fixes a
few fragment shaders which were using gl_Position.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Fixes use of full-precision in fragment shader (ie. don't clobber r0.x
since that can be used by future bary instructions for varying fetch).
And makes use of full-precision the default in fragment shader (but can
be overriden via FD_MESA_DEBUG=fraghalf).
Seems like half precision is often not enough for texture coordinates.
The blob compiler is clever enough to keep texture coords in full
precision registers while using half precision for everything else. But
we aren't quite that clever yet, so better to default to full precision.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Handle some relative addressing constraints: cannot handle const or
relative in cat5 and src2 of cat3.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
- Enable GEN7_WM_MSDISPMODE_PERSAMPLE, GEN7_WM_POSOFFSET_SAMPLE,
GEN7_WM_OMASK_TO_RENDER_TARGET as per extension's specification.
- Only enable one of GEN7_WM_8_DISPATCH_ENABLE or GEN7_WM_16_DISPATCH_ENABLE
when GEN7_WM_MSDISPMODE_PERSAMPLE is enabled. Refer IVB PRM Vol. 2, Part 1,
Page 288 for details.
V2:
- Use shared function _mesa_get_min_invocations_per_fragment().
- Use brw_wm_prog_data variables: uses_pos_offset, uses_omask.
V3:
- Enable simd16 dispatch with per sample shading.
- Make changes to give preference to 'simd16 only' mode over
'simd8 only' mode in case of non 1x per sample shading.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
- Enable GEN6_WM_MSDISPMODE_PERSAMPLE, GEN6_WM_POSOFFSET_SAMPLE,
GEN6_WM_OMASK_TO_RENDER_TARGET as per extension's specification.
- Only enable one of GEN6_WM_8_DISPATCH_ENABLE or GEN6_WM_16_DISPATCH_ENABLE
when GEN6_WM_MSDISPMODE_PERSAMPLE is enabled.
Refer SNB PRM Vol. 2, Part 1, Page 279 for details.
V2:
- Use shared function _mesa_get_min_invocations_per_fragment().
- Use brw_wm_prog_data variables: uses_pos_offset, uses_omask.
V3:
- Enable simd16 dispatch with per sample shading.
- Make changes to give preference to 'simd16 only' mode over
'simd8 only' mode in case of non 1x per sample shading.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
V2:
- Update comments
- Add a special backend instructions to compute sample_mask.
- Add a new variable uses_omask in brw_wm_prog_data.
V3:
- Make changes to support simd16 mode.
- Delete redundant AND instruction and handle the register
stride in FS backend instruction.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
V2:
- Update comments
- Add compute_sample_id variables in brw_wm_prog_key
- Add a special backend instruction to compute sample_id.
V3:
- Make changes to support simd16 mode.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
This is required while adding builtin system value vec{2, 3, 4}
variables. For example:
(declare (sys) vec2 gl_SamplePosition)
Without this patch above glsl ir splits in to:
(declare (temporary) float gl_SamplePosition_x)
(declare (temporary) float gl_SamplePosition_y)
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This function is used to test if we need to do per sample shading or
per fragment shading.
V2: Use MAX2() to make sure the function returns a number >= 1.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
New builtins added by GL_ARB_sample_shading:
in vec2 gl_SamplePosition
in int gl_SampleID
in int gl_NumSamples
out int gl_SampleMask[]
V2: - Use SWIZZLE_XXXX for STATE_NUM_SAMPLES.
- Use "result.samplemask" in arb_output_attrib_string.
- Add comment to explain the size of gl_SampleMask[] array.
- Make gl_SampleID and gl_SamplePosition system values.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Number of samples will be required in fragment shader program by new
GLSL builtin uniform "gl_NumSamples".
V2: Use "state.numsamples" in place of "state.num.samples"
Use _NEW_BUFFERS flag in place of _NEW_MULTISAMPLE
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <idr@freedesktop.org>
Reviewed-by: Ken Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
New functions added by GL_ARB_sample_shading:
glMinSampleShadingARB()
New enums:
GL_SAMPLE_SHADING_ARB
GL_MIN_SAMPLE_SHADING_VALUE_ARB
V2: Update comments.
Create new GL4x.xml.
Remove redundant code in get.c.
Update the API_XML list in Makefile.am.
Add extra_gl40_ARB_sample_shading predicate to get.c.
V3:
Fix make check failure.
Add checks for desktop GL.
Use GLfloat in place of GLclampf in glMinSampleShading().
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ken Graunke <kenneth@whitecape.org>
This patch implements the common support code required for the
GL_ARB_sample_shading extension.
V2: Move GL_ARB_sample_shading to ARB extension list.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <idr@freedesktop.org>
Reviewed-by: Ken Graunke <kenneth@whitecape.org>
Only one program's instruction count is changed, but a shader in Tropics
is also affected.
instructions in affected programs: 326 -> 320 (-1.84%)
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
total instructions in shared programs: 1409124 -> 1406971 (-0.15%)
instructions in affected programs: 158376 -> 156223 (-1.36%)
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Helps a lot of Steam games.
total instructions in shared programs: 1409360 -> 1409124 (-0.02%)
instructions in affected programs: 20842 -> 20606 (-1.13%)
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
This only operates on constant/uniform values for now, because otherwise I'd
have to deal with killing my available CSE entries when assignments happen,
and getting even this working in the tree ir was painful enough.
As is, it has the following effect in shader-db:
total instructions in shared programs: 1524077 -> 1521964 (-0.14%)
instructions in affected programs: 50629 -> 48516 (-4.17%)
GAINED: 0
LOST: 0
And, for tropics, that accounts for most of the effect, the FPS
improvement is 11.67% +/- 0.72% (n=3).
v2: Use read_only field of the variable, manually check the lod_info union
members, use get_num_operands(), rename cse_operands_visitor to
is_cse_candidate_visitor, move all is-a-candidate logic to that
function, and call it before checking for CSE on a given rvalue, more
comments, use private keyword.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Prior to the GLSL CSE pass, all of our testing happened to have a freshly
computed temporary in op[1], from the multiply by 16 to get a byte offset.
As of CSE you'll get var_refs of a reused value when you've got multiple
loads from the same offset.
Make a proper temporary for computing our temporary value, to avoid
shifting the value farther and farther down. Avoids a regression in
gs-float-array-variable-index
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Previously, the write of each 32-bit half might land in separate batch
buffers, which is insane.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
This depends on ARB_transform_feedback2, so I've predicated it on the
ability to do register writes.
It also depends on ARB_transform_feedback3, which is the only reason we
couldn't expose it previously.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This extension is written a bit strangely. Although it introduces the
concept of multiple transform feedback streams, it doesn't actually
provide more than a single stream.
The ARB_gpu_shader5 extension is what introduces the ability to write to
streams other than stream #0 and increases the required number of streams.
Since we don't yet support ARB_gpu_shader5, we can safely enable
ARB_transform_feedback3 even though we only support a single stream.
This does provide some useful functionality: applications can now use
more than one interleaved transform feedback buffer.
v2: Only expose the extension if ARB_transform_feedback2 is also
available, to avoid confusing applications (suggested by Ian).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
ARB_transform_feedback3 allows applications to insert blank space
between interleaved varyings by adding fake 1, 2, 3, or 4-component
varyings named gl_SkipComponents[1234].
Mesa's core data structures don't explicitly track these, instead simply
tracking the buffer offset for each real varying. If there is padding
due to gl_SkipComponents, these will not be contiguous.
Our hardware takes the specification quite literally. Instead of
specifying offsets for each varying, it assumes they're all contiguous
and requires you to program fake varyings for each "hole".
This patch adds support for emitting SO_DECL structures for these holes.
Although we've lost the information about exactly how the application
specified their padding (i.e. gl_SkipComponents2, gl_SkipComponents2
vs. a single gl_SkipComponents4), it shouldn't matter. We just need to
emit the right amount of space. This patch emits the minimal number of
hole SO_DECL structures.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Currently, we emit one SO_DECL structure per output, so we use the index
in the Outputs[] array as the index into the so_decl[] array as well.
In order to support the fake "gl_SkipComponents[1234]" varyings from
ARB_transform_feedback3, we'll need to emit SO_DECLs to fill in the
holes between successive outputs. This means we'll likely emit more
SO_DECLs than there are outputs, so we need to count it explicitly.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
With Linux 3.12, register writes work on Ivybridge and Baytrail, but not
Haswell. That will be fixed in a future kernel revision, at which point
this extension will automatically be enabled.
v2: Use I915_GEM_DOMAIN_INSTRUCTION for the register read, and also
correctly set the writeable flag when mapping (caught by Eric).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
We only want to enable ARB_transform_feedback2 if we can write to
registers from batchbuffers. In order to test that, we need to be able
to submit batches. And for batches to work, we need to program the
initial pipeline state (like PIPELINE_SELECT), which is done from
brw_state_init().
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Implementing the GetTransformFeedbackVertexCount() driver hook allows
the VBO module to call us with the right number of vertices.
The hardware doesn't directly count the number of vertices written by
SOL, so we instead use the SO_NUM_PRIMS_WRITTEN(n) counters and multiply
by the number of vertices per primitive.
Unfortunately, counting the number of primitives generated is tricky:
a program might pause a transform feedback operation, start a second one
with a different object, then switch back and resume. Both transform
feedback operations share the SO_NUM_PRIMS_WRITTEN counters.
To work around this, we save the counter values at Begin, Pause, Resume,
and End. This "bookends" each section where transform feedback is
active for the current object. Adding up differences of pairs gives
us the number of primitives generated. (This is similar to what we
do for occlusion queries on platforms without hardware contexts.)
v2: Fix missing parenthesis in assertion (caught by Eric Anholt).
v3: Reuse prim_count_bo rather than freeing it and immediately
allocating a new one (suggested by Topi Pohjolainen).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Renaming it makes it obvious that it isn't used, and the assertion
verifies that the VBO module never passes us such an object.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
DrawTransformFeedback() needs to obtain the number of vertices written
to a particular stream during the last Begin/EndTransformFeedback block.
The new driver hook returns exactly that information.
Gallium drivers already implement this by passing the transform feedback
object to the drawing function, counting the number of vertices written
on the GPU, and using draw indirect. This is efficient, but doesn't
always work:
If vertex data comes from user arrays, then the VBO module needs to
know how many vertices to upload, so we need to synchronously count.
Gallium drivers are currently broken in this case.
It also doesn't work if primitive restart is done in software. For
normal drawing, vbo_draw_arrays() performs software primitive restart,
splitting the draw call in two. vbo_draw_transform_feedback() currently
doesn't because it has no idea how many vertices need to be drawn.
The new driver hook gives it that information, allowing us to reuse
the existing vbo_draw_arrays() code to do everything right.
On Intel hardware (at least Ivybridge), using the draw indirect approach
is difficult since the hardware counts primitives, rather than vertices,
which requires doing some simple math. So we always use this hook.
Gallium drivers will likely want to use this hook in some cases, but
want to use the existing draw indirect approach where possible. Hence,
I've added a flag to allow drivers to opt-in to this call.
v2: Make it possible to implement this hook but only use this path
when necessary (suggested by Marek).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The ARB_transform_feedback2 extension introduces the ability to pause
and resume transform feedback sessions. Although only one can be active
at a time, it's possible to switch between multiple transform feedback
objects while paused.
In order to facilitate this, we need to save/restore the SO_WRITE_OFFSET
registers so that after resuming, the GPU continues writing where it
left off.
This functionality also exists in ES 3.0, but somehow we completely
forgot to implement it.
v2: Reduce alignment from 4096 to 64 (it seemed excessive).
v3: Use I915_GEM_DOMAIN_INSTRUCTION instead of RENDER, for consistency
with other writes. It shouldn't matter on IVB+.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This adds the basic driver hooks to allocate/free the brw variant.
It doesn't contain any additional information yet, but it will soon.
v2: Use the new _mesa_init_transform_feedback_object helper function
(requested by Eric and Ian).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This picks up a missing obj->EverBound = GL_FALSE line, and will catch
any new fields that get added in the future.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Both Gallium and i965 subclass gl_transform_feedback_object, which
requires implementing a custom NewTransformFeedback hook. Creating a
helper function to initialize the fields avoids code duplication and
divergence.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
We'd like to CSE some instructions, like CMP, that often have null
destinations. Instead of replacing them with MOVs to null, just don't
emit the MOV.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
This avoids a lot of message setup we had to do otherwise. Improves
GLB2.7 performance with register spilling force enabled by 1.6442% +/-
0.553218% (n=4).
v2: Use BRW_PREDICATE_NONE, improve a comment (by Paul).
Reviewed-by: Paul Berry <stereotype441@gmail.com>
We were clearing the reg_offset before trying to use it. Oops. Fixes
glsl-fs-texture2drect with the reg spilling debug enabled.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Things blew up when I enabled the debug register spill code without
disabling 16-wide, so I decided to just fix 16-wide spilling.
We still don't generate 16-wide when register spilling happens as part of
allocation (since we expect it to be slower), but now we can experiment
with allowing it in some cases in the future.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
I believe this will never happen in SIMD8 mode, but it could for SIMD16
when we fix it.
v2: Fix off-by-one in my register counting comment (caught by Paul).
Reviewed-by: Paul Berry <stereotype441@gmail.com> (v1)
Now that reg spilling generates new vgrfs, we were looping forever if you
ever turned it on.
Instead, move the debug code into the register allocator right near where
we'd be doing spilling anyway, which should more accurately reflect how
register spilling occurs in the wild.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
I'm going to need to reuse this for fixing register spilling on SIMD16.
Note that BRW_MAX_MRF is 16, which is the same as BRW_MAX_GRF -
GEN7_MRF_HACK_START.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
The ICD loader should be responsible for installing headers.
Reviewed and Tested-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
When faced with a million instructions that all became candidates at the
same time (none of which individually reduce register pressure), the ones
on the critical path are more likely to be the ones that will free up some
candidates soon.
shader-db:
total instructions in shared programs: 1681070 -> 1681070 (0.00%)
instructions in affected programs: 0 -> 0
GAINED: 40
LOST: 74
Fixes indistinguishable-from-hanging behavior in GLES3conform's
uniform_buffer_object_max_uniform_block_size test, regressed by
c3c9a8c857. Given that
93bd627d5a was unlocked by that commit, the
net effect on 16-wide program count is still quite positive, and I think
this should give us more stable scheduling (less dependency on original
instruction emit order).
v2: Comment suggestions by Paul
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70943
Reviewed-by: Paul Berry <stereotype441@gmail.com>
This is a step in doing scheduling as described in Muchnick (p538). A
difference is that our latency function is only specific to one
instruction (it doesn't describe, for example, the different latency
between WAR of a send's arguments and RAW of a send's destination), but
that's changeable later. We also don't separately compute the postorder
traversal of the graph, since we can use the setting of the delay field as
the "visited" flag.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Commit aec20d66d9
(automake: properly handle non-default expat installation),
assumed that up-to date distributions use a recent version
of expat that handles security vunerabilities CVE-2012-1147
and CVE-2012-1148. Seems like this is not always the case
and they prefer to backport only the fix, rather than use
the updated library.
This commit adds a default case -lexpat whenever expat is
not found, while properly handling expat.pc if present.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=71022
Reported-By: Bryce Harrington <b.harrington@samsung.com>
Reported-By: Vinson Lee <vlee@freedesktop.org>
Tested-by: Bryce Harrington <b.harrington@samsung.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
This will simplify the addition of layout(location) qualifiers for
separate shader objects. This was validated with new piglit tests
arb_explicit_attrib_location/1.30/compiler/not-enabled-01.vert and
arb_explicit_attrib_location/1.30/compiler/not-enabled-02.vert.
v2: Refactor error checking to check_explicit_attrib_location_allowed
and eliminate the gotos. Suggested by Paul.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Use mode_string to get the name of the variable mode. Slightly change
the control flow. Both of these changes make it easier to support
separate shader object location layouts.
The format of the message changed because mode_string can return a
string like "shader output". This would result in an awkward message
like "vertex shader shader output..."
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
I made this a function (instead of a method of ir_variable) because it
made the change set smaller, and I expect that there will be an overload
that takes an ir_var_mode enum. Having both functions used the same way
seemed better.
v2: Add missing case for ir_var_system_value.
v3: Change the ir_var_mode_count case to just break. Move the assertion
and the return outside the switch-statment. In the unlikely event that
var->mode is an invalid value other than ir_var_mode_count, the
assertion will still fire, and in release builds we won't wind up
returning a garbage pointer. Suggested by Paul.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Since the separation of ir_var_function_in and ir_var_shader_in (similar
for out), this check is no longer necessary. Previously, global_scope
was the only way to tell which was which.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Future patches will add some extra code to this path, and some of that
code will want to exit from the explicit location code early.
v2: Change a geometry shader "break" to a "return" so that try to apply
a bogus geometry shader location qualifier (which could cause cascading
errors). Suggested by Paul.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
The return value has been unused since commit d348b0c. This was
originally included in another patch, but it was split out by Ian
Romanick.
v2: Drop unnecessary final return. Suggested by Paul.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Cc: Eric Anholt <eric@anholt.net>
Incremental builds were failing because not all generated source files
were missing dependencies to src/mapi/glapi/gen/*.xml.
Hopefully this change will be the end of these incremental build
failures.
This avoids a defect in lower_output_reads.
The problem is lower_output_reads treats the gl_FragData array as a single
variable. It first redirects all output writes to a temporary variable (array)
and then writes the whole temporary variable to the output, generating
assignments to all elements of gl_FragData.
BTW this pass can be modified to lower all arrays, not just inputs and outputs.
The question is whether it is worth it.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
v2: addressed Paul Berry's comments
Use PKG_CHECK_MODULE over requesting the user to setup the
option at configure time. Drop unused EXPAT_INCLUDE and
update all targets.
NOTE: The this commit removes the --with-expat configure
option. One should ensure that the expat they wish to use
has expat.pc file accessible by pkg-config.
v2:
* Add note about the removal of --with-expat
(per Tom Stellard)
* Drop EXPAT_CFLAGS for targets that do not build DRI_COMMON
(spotted by Matt Turner)
v3:
* Rebase on top of megadrivers (drop EXPAT_CFLAGS from swrast)
Acked-by: Matt Turner <mattst88@gmail.com> (v2)
Reviewed-by: Tom Stellard <thomas.stellard@amd.com> (v2)
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Conflicts:
configure.ac
src/mesa/drivers/dri/common/Makefile.am
The function should have never used it in the first place as it was
a left over from the DRI1 days of the nouveau ddx. While we're around
check if KMS is supported before opening the nouveau device, and
add support for Fermi & Kepler cards.
Compile tested only due to the lack of a Fermi/Kepler card.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
The function xf86GetEntityInfo() retrieves the entity rather than
doing any changes. Remove this no-op code.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
v2: Remove xf86PciInfo.h, all drivers provide their own PCI ID list
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
A convenient front end to indices generate/translate code, for emulating
primitives which are not supported natively by the driver.
This handles saving/restoring index buffer state, etc.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
The idea of the original order was that you'd dead code eliminate accesses
to push constants. But I've never seen a case of that (nor has
shader-db), while we frequently see sparse accesses of large constant
arrays that would overflow into pull constants.
Cuts pull constant use on csgo, serious sam, planeshift, and the cave:
total instructions in shared programs: 1695103 -> 1688795 (-0.37%)
instructions in affected programs: 92024 -> 85716 (-6.85%)
GAINED: 339
LOST: 0
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* Use more consistant data sources
* Fix improper color space assignments
* Remove unnecessary comments and code
* Drop unnecessary round_up function (this was leftover
from moving winsys code out of renderer)
Acked-by: Brian Paul <brianp@vmware.com>
* Instead of assuming the displaytarget is the same
stride / colorspace as the destination, lets
actually check the source bitmap.
* Fixes random stride issues in rendering
Acked-by: Brian Paul <brianp@vmware.com>
The MRF variant is going to be used extensively by the atomic counter
intrinsics to assemble untyped atomic and surface read messages
easily.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
The maximum number of atomic buffer objects is somewhat arbitrary, we
can change it in the future easily if it turns out it's not enough...
v2: Add comments with the relevant mesa dirty bits. Fix usage of
BRW_NEW_UNIFORM_BUFFER in the GS ABO state atom.
v3: Update binding table layout diagrams.
v4: Resolve conflicts with the recent dynamic surface index assignment changes.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Almost a trivial change, it boils down to renaming a few identifiers
so their names still make sense for opaque types other than sampler.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Fix the linker to deal with intrinsic functions which are undefined
all the way down to the driver back-end, and introduce intrinsic
definition helpers in the built-in generator.
We still need to figure out what kind of interface we want for drivers
to communicate to the GLSL front-end which of the supported intrinsics
should use a default GLSL implementation and which should use a
hardware-specific override. As there's no default GLSL implementation
for atomic ops, this seems like something we can worry about later on.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
v2: Define local helper function to generate ir_call nodes in the
builtin generator.
And use it to forbid comparisons of opaque operands. According to the
GL 4.2 specification:
> Except for array indexing, structure member selection, and
> parentheses, opaque variables are not allowed to be operands in
> expressions.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
v2: Fix GLSL version in which the type became available. Add
contains_atomic() convenience method. Split off atomic counter
comparison error checking to a separate patch that will handle all
opaque types. Include new ir_variable fields for atomic types.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This patch implements the common support code required for the
ARB_shader_atomic_counters extension. It defines the necessary data
structures for tracking atomic counter buffer objects (from now on
"ABOs") associated with some specific context or shader program, it
implements support for binding buffers to an ABO binding point and
querying the existing atomic counters and buffers declared by GLSL
shaders.
v2: Fix extension checks. Drop unused MAX_ATOMIC_BUFFERS constant.
Acked-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Add XML file for the dispatch code generator, update the
dispatch_sanity test and add stub definition for the new entry point.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
These ralloc contexts belong to a specific object and are being
deallocated manually from the class destructor. Now that we've hooked
up destructors to ralloc there's no reason for them to be children of
any other context, and doing so might to lead to double frees under
some circumstances. The class destructor has all the responsibility
of freeing class memory resources now.
This patch makes sure that class destructors are called as they should
be when a C++ object allocated by ralloc is released.
Based on a previous patch by Kenneth Graunke, but it doesn't exhibit
the ~0.8% performance regression in shader compilation times because
we now use the HAS_TRIVIAL_DESTRUCTOR() macro to detect the typical
case where the indirect function call can be avoided because the
object's destructor doesn't need to do anything.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Only implemented on GCC and Clang for now. Other compilers use a
dummy implementation that always returns false, which should be a safe
[but slightly inefficient] assumption in all cases.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This will let us use strcasecmp() from anywhere inside Mesa without
having to worry about the fact that it doesn't exist in MSVC.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The layer coming from GS needs to be clamped (not sure if that's actually
the correct error behavior but we need something) as the number can be higher
than the amount of layers in the fb. However, this code was using the layer
calculation from the scene, and this was actually calculated in
lp_scene_begin_rasterization() hence too late (so setup was using the value
from the _previous_ scene or just zero if it was the first scene).
Since the value is used in both rasterization and setup, move calculation up
to lp_scene_begin_binning() though it's a bit more inconvenient to calculate
there. (Theoretically could move _all_ code which was in
lp_scene_begin_rasterization() to there, because ever since we got rid of
swizzled render/depth buffers our "map" functions preparing the fb data for
render don't actually change the data in there at all, but it feels like
it would be a hack.)
v2: improve comments
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Before we were only checking the st->vertex_array_out_of_memory flag
after updating array state. But if there's two consecutive glDrawArrays
calls and the first one is skipped because of OOM, the second one should
be skipped too.
Cc: 9.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Orbital Explorer was generating a 4000 instruction geometry shader, which
was taking 275 trips through dead code elimination and register
coalescing, each of which updated live variables to get its work done, and
invalidated those live variables afterwards.
By using bitfields instead of bools (reducing the working set size by a
factor of 8) in live variables analysis, it drops from 88% of the profile
to 57%, and reduces overall runtime from I-got-bored-and-killed-it (Paul
says 3+ minutes) to 10.5 seconds.
Compare to f179f419d1 on the FS side.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
This prevents unnecessary (and wrong) register allocation in the
scheduler for preloaded values in fixed registers.
Fixes interpolation-mixed.shader_test on rv770
(and probably on all other pre-evergreen chips).
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Tested-by: Andreas Boll <andreas.boll.dev@gmail.com>
I noticed this in a shader in Unigine Heaven that was spilling. While it
doesn't really reduce register pressure, it shaves a few instructions
anyway (7955 -> 7882).
v2: Fix turning "0 >> x" into "x" instead of "0" (caught by Erik
Faye-Lund).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
While ir_builder is slightly less efficient, we're only increasing the
work when there's actual optimization being done, and it's way more
readable code.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Matt and I had each screwed up these common required patterns recently, in
ways that wouldn't have been noticed for a long time if not for code
review. Just enforce it in the caller so that we don't rely on code
review catching these bugs.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
There is nothing in the OpenGL specification which prevents the user from
calling glGenQueries to generate a new query object while another object is
active. Neither is there anything in the Mesa implementation which prevents
this. So remove the INVALID_OPERATION errors in this case.
Similarly, it is explicitly allowed by the OpenGL specification to delete an
active query, so remove the assertion for that case, replacing it with the
necesssary state updates to end the query, (clear the bindpt pointer and call
into the driver's EndQuery hook).
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
The normal drawing path does this, and it's necessary on Ivybridge,
so let's try it on Sandybridge too. It's not explicitly documented
as necessary, but might help with hangs.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Xinkai Chen <yeled.nova@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: "9.2" <mesa-stable@lists.freedesktop.org>
From the documentation:
"[DevIVB] 3DSTATE_DEPTH_BUFFER must always be programmed along with the
other Depth/Stencil state commands(i.e. 3DSTATE_CLEAR_PARAMS,
3DSTATE_STENCIL_BUFFER, or 3DSTATE_HIER_DEPTH_BUFFER)."
We normally do this, but BLORP was failing to do so in the case where it
disables depth.
Not observed to fix anything yet.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Xinkai Chen <yeled.nova@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: "9.2" <mesa-stable@lists.freedesktop.org>
For some reason, we put the flush in the caller, rather than just before
emitting the packet. This is more than a cosmetic problem: BLORP calls
gen6_emit_3dstate_multisample() directly, and so it missed the flush.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Xinkai Chen <yeled.nova@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: "9.2" <mesa-stable@lists.freedesktop.org>
From the comments above intel_emit_post_sync_nonzero_flush:
"[DevSNB-C+{W/A}] Before any depth stall flush (including those
produced by non-pipelined state commands), software needs to first
send a PIPE_CONTROL with no bits set except Post-Sync Operation != 0."
This suggests that every non-pipelined (0x79xx) command needs a
post-sync non-zero flush before it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Xinkai Chen <yeled.nova@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: "9.2" <mesa-stable@lists.freedesktop.org>
Otherwise the gen6 w/a in the kernel won't kick in and the write will
land nowhere.
Inspired by a patch Ken pointed me at which had the same issue (but
isn't yet merged and also for a gen7+ feature). An audit of the entire
driver didn't reveal any other case than the one in in the write_reg
helper used by the gen6 queryobj code.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Tested-by: Xinkai Chen <yeled.nova@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: "9.2" <mesa-stable@lists.freedesktop.org>
The main purpose of this patch is to increase readability of
the array code by introducing is_unsized_array() to glsl_types.
Some redundent is_array() checks are also removed, and small number
of other related clean ups.
The introduction of is_unsized_array() should also make the
ARB_arrays_of_arrays code simpler and more readable when it arrives.
V2: Also replace code that checks for unsized arrays directly with the
length variable
Signed-off-by: Timothy Arceri <t_arceri@yahoo.com.au>
v3 (Paul Berry <stereotype441@gmail.com>): clean up formatting.
Separate whitespace cleanups to their own patch.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
When a geometry shader is present, the fragment shader gl_PrimitiveID
input acts like an ordinary varying, receiving data from the gs
gl_PrimitiveID output. When there's no geometry shader, we have to
ask the fixed function SF hardware to provide the primitive ID to the
fragment shader instead.
Previously, the SF setup code would handle this situation by
recognizing that the FS gl_PrimitiveID input didn't match to any VS
output; since normally an FS input with no corresponding VS output
leads to undefined data, the SF setup code used to just arbitrarily
assign it to receive data from attribute 0.
This patch changes the SF setup code so that instead of arbitrarily
using attribute 0, it assigns the unmatched FS input to receive
gl_PrimitiveID. In the case where the FS input really is
gl_PrimitiveID, this produces the intended result. In all other
cases, no harm is done since GL specifies that the behaviour is
undefined.
Fixes piglit test primitive-id-no-gs.
v2: If an attribute is already being overridden with point
coordinates, don't try to also override it with gl_PrimitiveID. This
is necessary to avoid regressing piglit tests such as
shaders/glsl-fs-pointcoord.
Reviewed-by: Eric Anholt <eric@anholt.net>
v2: Actually implement interop between the gallium
state tracker and the VDPAU backend.
v3: Make it also available in non legacy contexts,
fix video buffer sharing.
v4: deny interop if we don't have the same screen object
v5: rebased on upstream changes
v6: implemented VDPAUGetSurfaceivNV, improved error handling,
unregister all surfaces in VDPAUFiniNV
v7: squash merge with Mareks changes
Signed-off-by: Christian König <christian.koenig@amd.com>
ir_txf expects an ivec* coordinate, and may be larger than ivec2;
shuffle things around so that this will work.
V2: Fix style nits, use ir_builder
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It turns out that nonzero offsets with gsampler2DRect don't work -- they
just return garbage. Work around this by folding the offset into the
coord.
Done as an IR pass rather than yet another hack in the visitors because
it's clear what's going on this way. Can possibly reuse this to replace
the existing txf coord+offset hacks.
V2: Use ir_builder
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We don't have a message that does 4 independent offsets; a lowering
pass needs to lower it to 4 normal gather4s before reaching this
point.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Note that gather4_po_c's parameters are too long for SIMD16. It might be
worth emitting 2xSIMD8 messages in this case at some point.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
gather4_c's argument layout is straightforward -- refz just goes on the
end.
gather4_po_c's layout however -- the array index is replaced with refz.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
ARB_gpu_shader5's textureGather*() functions which take shadow samplers
have a separate `refz` parameter rather than adding it to the
coordinate.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
V3: fixup crazy check for whether we need to emit the coordinate after
custom handling.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Some texturing ops are about to have nonconstant offset support; the
offset in the header in these cases should be zero.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The generator code ends up clearer this way than if we had to sniff
via the message length. Implemented via the gather4_po message in
hardware, which is present in Gen7 and later.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Prior to ARB_gpu_shader5 / GLSL 4.0, the offset is required to be
a constant expression.
With that extension, it is relaxed to be an arbitrary expression.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Since 062317d667 (i965: Go back to using the kernel SOL reset feature.)
we've been flushing the batch on BeginTransformFeedback(). So it's not
necessary to do it on EndTransformFeedback(). A PIPE_CONTROL will work.
This makes gen7_end_transform_feedback() exactly the same as the gen6
variant. However, they'll diverge again shortly.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This was a hack to avoid choosing to schedule all texturing before
consumption of any texture results due to the way dependency chains worked
out in the presence of MRFs. On gen7, we don't have MRFs, so the problem
doesn't apply, and this was just badly constraining our scheduling.
total instructions in shared programs: 1615306 -> 1612534 (-0.17%)
instructions in affected programs: 9958 -> 7186 (-27.84%)
GAINED: 259
LOST: 9
Reviewed-by: Matt Turner <mattst88@gmail.com>
The LIFO plan was simple: Take the most recently made available
instructions, and pick those first.
But because of the order we were pushing things onto our list of
available-to-schedule instructions, it meant that when a set of
instructions was made available at the same time (for example, everything
at the start of the program that didn't depend on other instructions) we'd
schedule them in reverse order.
If you had 10 texture calls in a row in your program, each with
independent argument setup, we'd set up the last texture call's args and
execute it first, even though we wouldn't be able to consume its results
until we'd finished the other 9 texture calls (assuming consumption of
texture results happens near each texture call, and combines it with
another texture result, which is normal for a convolution shader).
To fix this, walk the list for doing LIFO in the order that instructions
were originally generated in the program, but choose to push
newly-made-available instructions to the other end of the list instead.
total instructions in shared programs: 1587242 -> 1586290 (-0.06%)
instructions in affected programs: 7801 -> 6849 (-12.20%)
GAINED: 76
LOST: 67
Thanks to Chia-I Wu for pointing out the bug in my first version of the
patch that made it a huge loss.
Reviewed-by: Matt Turner <mattst88@gmail.com>
When PIPE_CAP_MIXED_FRAMEBUFFER_SIZES is not provided, parts of
ARB_framebuffer_object can't be supported, such as on NV30.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
This CAP will determine whether ARB_framebuffer_object can be enabled.
The nv30 driver does not allow mixing swizzled and linear zsbuf/cbuf
textures.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
In commit 1b4a737 (glsl: Support redeclaration of VS and GS
gl_PerVertex output), I added code to ensure that when an unnamed
gl_PerVertex interface block is redeclared, any ir_variables that
weren't included in the redeclaration are removed from the IR (and the
symbol table). This ensures that only those variables that were
explicitly redeclared may be used.
However, when I wrote this code, I neglected to match the variable
mode when finding variables to remove. This meant that redeclaring a
built-in output block might cause the built-in input gl_in to be
accidentally removed.
Fixes piglit test gs-redeclares-pervertex-out-only.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The GLSL 4.10 rules for redeclaration of built-in interface blocks
(which we've chosen to regard as clarifications of GLSL 1.50) only
require gl_PerVertex blocks to match in shaders that actually use
those blocks. The easiest way to implement this is to detect
situations where a compiled shader doesn't refer to any elements of
gl_PerVertex, and remove all the associated ir_variables from the
shader at the end of ast-to-ir conversion.
Fixes piglit tests
linker/interstage-{pervertex,pervertex-in,pervertex-out}-redeclaration-unneeded.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Normally when a built-in array (such as gl_ClipDistance) is
redeclared, we call get_variable_being_redeclared() to do the
redeclaration, and it in turn calls check_builtin_array_max_size() to
make sure that the redeclared array size isn't too large.
However when a built-in array is redeclared as part of redeclaring
gl_in, we don't call get_variable_being_redeclared() (since the
individual built-ins aren't each represented by their own ir_variable
anymore). So we need to add an explicit call to
check_builtin_array_max_size() to make sure the new array size isn't
too large.
Note: at the moment this is redundant with a test that's done at link
time, so there's no change to piglit results. But the patch that
follows will prevent link errors from being reported if gl_PerVertex
isn't used, so in order to prevent that patch from causing
regressions, we need to add the compile check now. Besides, it's
nicer to report this error at compile time anyhow.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The queries GEOMETRY_VERTICES_OUT, GEOMETRY_INPUT_TYPE, and
GEOMETRY_OUTPUT_TYPE (defined by GL 3.2) differ from the corresponding
queries in ARB_geometry_shader4 in the following ways:
- They use different enum values
- They can only be queried; they cannot be set.
- Attempting to query them yields INVALID_OPERATION if the program is
not linked, or lacks a geometry shader.
This patch switches us over from the ARB_geometry_shader4 behaviour to
the GL 3.2 behaviour.
Fixes piglit test query-gs-prim-types.
v2: Improve comment above has_core_gs.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
When program_resource_visitor visits variables that were created by
lower_named_interface_blocks, it needs to do extra work to un-do the
effects of lower_named_interface_blocks and construct the proper API
names.
Fixes piglit test
spec/glsl-1.50/execution/interface-blocks-api-access-members.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
These variables will need to be treated specially by
program_resource_visitor, so that they can be addressed through the
API using their interface block name (and array index, for interface
block arrays).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Later patches will use this information to do proper error checking of
interpolation qualifiers that appear inside of interface blocks.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
In future patches, we will need this in order to interpret
interpolation qualifiers that appear inside interface blocks.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Although in principle there is no hardware limitation that prevents
gl_MaxGeometryInputComponents from being set to 128 on Gen7, we have
the following limitations in the vec4 compiler back end:
- Registers assigned to geometry shader inputs can't be spilled or
later re-used for any other purpose.
- The last 16 registers are set aside for the "MRF hack", meaning they
can only be used to send messages, and not for general purpose
computation.
- Up to 32 registers may be reserved for push constants, even if there
is sufficient register pressure to make this impractical.
A shader using 128 geometry input components, and having an input type
of triangles_adjacency, would use up:
- 1 register for r0 (which holds URB handles and various pieces of
control information).
- 1 register for gl_PrimitiveID.
- 102 registers for geometry shader inputs (17 registers per input
vertex, assuming DUAL_INSTANCED dispatch mode and allowing for one
register of overhead for gl_Position and gl_PointSize, which are
present in the URB map even if they are not used).
- Up to 32 registers for push constants.
- 16 registers for the "MRF hack".
That's a total of 152 registers, which is well over the 128 registers
the hardware supports.
Fortunately, the GLSL 1.50 spec allows us to reduce
gl_MaxGeometryInputComponents to 64. Doing that frees up 48
registers, brining the total down to 104 registers, leaving 24
registers available to do computation.
Fixes piglit test
spec/glsl-1.50/execution/geometry/max-input-components.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This is similar to what we do for 16-wide vs 8-wide fragment shaders.
First we try compiling the geometry shader in DUAL_OBJECT mode. If we
can't do that without spilling, we fall back on DUAL_INSTANCED mode,
which should require less spilling (since it uses an interleaved
layout of payload registers).
In an ideal world we'd fall back to SINGLE mode, which would allow us
to interleave general-purpose registers too (resulting in even less
likelihood of spilling). But at the moment, the vec4 generator and
visitor classes don't have the infrastructure to interleave general
purpose registers, so DUAL_INSTANCED is the best we can do.
As a side benefit this paves the way for implementing instanced
geometry shaders (which are incompatible with DUAL_OBJECT mode).
Since most geometry shaders used in piglit testing are small,
DUAL_INSTANCED mode won't get exercised very much in a normal piglit
run. To force DUAL_INSTANCED mode to be used for all geometry
shaders, set INTEL_DEBUG=nodualobj.
Reviewed-by: Eric Anholt <eric@anholt.net>
Geometry shaders that run in "DUAL_INSTANCED" mode store their inputs
in vec4's. This means that when compiling gl_PointSize input
swizzling (a MOV instruction which uses a geometry shader input as
both source and destination), we need to do two things:
- Set force_writemask_all to ensure that the MOV happens regardless of
which channels are enabled.
- Set the source register region to <4;4,1> (instead of <0;4,1> to
satisfy register region restrictions.
v2: move the source register region fixup to the top of
vec4_generator::generate_vec4_instruction(), so that it applies to all
instructions rather than just MOV.
Reviewed-by: Eric Anholt <eric@anholt.net>
In future patches, this will allow us to first try compiling a
geometry shader in DUAL_OBJECT mode (which is more efficient but uses
more registers) and then if spilling is required, fall back on
DUAL_INSTANCED mode.
Reviewed-by: Eric Anholt <eric@anholt.net>
Otherwise the scheduler would be invoked with prog_data->total_grf ==
0, causing havoc.
In a future patch, this will allow us to try compiling a geometry
shader in DUAL_OBJECT mode with spilling disabled, and then fall back
to DUAL_INSTANCED mode if that failed.
Reviewed-by: Eric Anholt <eric@anholt.net>
When geometry shaders are operated in "single" or "dual instanced"
mode, a single set of geometry shader inputs is interleaved into the
thread payload (with each payload register containing a pair of
inputs) in order to save register space.
This patch modifies vec4_visitor::lower_attributes_to_hw_regs so that
it can handle the interleaved format.
Reviewed-by: Eric Anholt <eric@anholt.net>
All geometry shaders begin this instruction:
mov(1) g0.2<1>:ud 0x0:ud { align1 }
which sets up GRF0 properly for scratch reads and writes. Since this
instruction has a SIMD size of 1, it will only have an effect if the
first channel is enabled. In practice, the hardware seems to always
dispatch geometry shaders with the first channel enabled, but I can't
find anything in the docs to guarantee that.
So to be on the safe side, set force_writemask_all on the instruction,
which guarantees that it will have the desired effect regardless of
which channels are enabled.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When lower_named_interface_blocks lowers a built-in interface block
member to an ir_variable, it needs to set explicit_location in the
ir_variable. Otherwise the linker gets confused and treats the
variable as a generic varying.
Fixes the following piglit tests, which were regressed by commit
63974c0 (glsl: Simplify the interface to
link_invalidate_variable_locations):
- clip-distance-bulk-copy
- clip-distance-in-bulk-read
- clip-distance-in-explicitly-sized
- clip-distance-in-param
- clip-distance-in-values
- core-inputs
- gs-redeclares-both-pervertex-blocks
- gs-redeclares-pervertex-in-only
- redeclare-pervertex-subset-vs-to-gs
- unsized-in-named-interface-block-gs
- unsized-in-named-interface-block-multiple
- unsized-in-unnamed-interface-block-gs
- unsized-in-unnamed-interface-block-multiple
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70820
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This should never have been in the program key in the first place,
since it's determined by the shader source, not by GL state. Change
the code to just refer to gl_program::UsesClipDistanceOut directly.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This will make it easier for back-ends to share code between geometry
shader and vertex shader compilation. Also, it is renamed to
"UsesClipDistanceOut" to clarify that (a) in geometry shaders, it
refers to the gl_ClipDistance output rather than the gl_ClipDistance
input, and (b) it is irrelevant in fragment shaders.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Since gl_ClipDistance is lowered from an array of floats to an array
of vec4's during compilation, transform feedback has special logic to
keep track of the pre-lowered array size so that attempting to perform
transform feedback on gl_ClipDistance produces a result with the
correct size.
Previously, this special logic always consulted the vertex shader's
size for gl_ClipDistance. This patch fixes it so that it uses the
geometry shader's size for gl_ClipDistance when a geometry shader is
in use.
Fixes piglit test spec/glsl-1.50/transform-feedback-type-and-size.
v2: Change the type of LastClipDistanceArraySize to "unsigned", and
clarify the comment above it.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We've always overriden
ctx->Const.{Vertex,Fragment}Program.MaxTextureImageUnits to reflect
the number of texture image units supported by the hardware (rather
than using the default values assigned by Mesa core) so it seems
sensible to do that for GeometryProgram.MaxTextureImageUnits too. We
set it to 0 if geometry shaders aren't supported.
Once that is done, we can just unconditionally add
GeometryProgram.MaxTextureImageUnits to MaxCombinedTextureImageUnits.
Fixes piglit test "spec/glsl-1.50/built-in
constants/gl_MaxCombinedTextureImageUnits".
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The encoding of constant, relative, and relative-const src registers is
a bit more complex than originally thought, which gives an extra bit to
encode const reg # at expense of taking a bit from relative offset.
In most cases a3xx seems to actually use a scheme whereby it can encode
an extra bit for const register. You have three possible encodings in
thirteen bits:
register: (11 bits for N.c)
00........... rN.c
relative: (10 bits for N)
010.......... r<a0.x + N>
011.......... c<a0.x + N>
const: (12 bits for N.c)
1............ cN.c
Which means we can deal w/ more consts than previously thought.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
d3d10 requires that cube corners are filtered with accurate weights (that
is, the weight of the non-existing corner texel should be evenly distributed
to the other 3 texels). OpenGL does not require this (but recommends it).
This requires us to use different filtering code, since we need per-texel
weights which our 2d lerp doesn't (and can't) do. And of course the (now
per element) weights need to be adjusted too for it to work.
Invoke the new filtering code whenever there's an edge to keep things simpler,
as it will work for edges too not just corners but of course it's only needed
with corners.
More ugly code for not much gain but at least a hacked up cubemap demo
shows very nice corners now... Not sure yet if and how this should be
configurable...
v2: incorporate feedback from Jose, only use special corner filtering code
when there's a corner not when there's only an edge (as corner filtering code
is slower, though a perf difference was only measureable when always
forcing edge code). Plus some minor style fixes.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
No driver uses it any more, and it's been replaced by megadrivers.
v2: Remove always-on conditional for NEED_LIBPROGRAM (review by Emil)
Reviewed-by: Matt Turner <mattst88@gmail.com> (v1)
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
v2: drop dridir now that it's unused.
v3: Fix linking after rebase when building just swrast from classic but a
drm-using gallium driver.
v4: Consistently put spaces around += in the updated Makefile.am block.
v5: Set a global driverAPI variable so loaders don't have to update to
createNewScreen2() (though they may want to for thread safety).
Reviewed-by: Matt Turner <mattst88@gmail.com> (v3)
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
This required some reordering of headers to ensure that the symbol name
redefines happened before any prototypes.
v2: drop dridir now that it's unused.
v3: Consistently put spaces around += in the updated Makefile.am blocks.
v4: Set a global driverAPI variable so loaders don't have to update to
createNewScreen2() (though they may want to for thread safety).
Reviewed-by: Matt Turner <mattst88@gmail.com> (v2)
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
i915 has symbols for formerly-shared code that conflict with i965, so we
define them away using gen-symbol-redefs.py. Options considered:
- This option. Downsides: The symbols in profiling and debugging don't
match the source. The symbol list may change in the future and we won't
notice without manually running the tool again.
- Use objcopy --localize-hidden to automatically demote our symbols to
locals. This didn't work on i965 due to c++ weak symbols (which can't
be localized), but could work on i915. We could do it on i915 only, but
it does produce libtool warnings at link time due to libtool not knowing
if the resulting .o file is safe to link (stupid libtool). Plus you end
up with different symbols of the same name, which is confusing for
debugging too. On the other hand, no future symbol conflicts long term.
- Write our own libelf tool that handles c++ weak symbols like we want and
apply it to all drivers. All the downsides of above, but applies
uniformly across drivers.
- Edit the files to just rename all the i915 or i965 symbols that
conflict. There are on the order of 100 that have a prefix we used to
share, so it would take a bit of typing. Fewest downsides, but still
can have conflicts long term.
Ultimately, this is the least invasive change at the moment, and we can
see if the "more symbol conflicts appear later" thing is a real concern or
not.
Note that the ability to compile a version of i915 without INTEL_DEBUG env
support is dropped. It's too useful.
v2: drop dridir now that it's unused.
v3: Consistently put spaces around += in the updated Makefile.am block.
v4: Set a global driverAPI variable so loaders don't have to update to
createNewScreen2() (though they may want to for thread safety).
Reviewed-by: Matt Turner <mattst88@gmail.com> (v2)
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
v2: drop dridir now that it's unused.
v3: Consistently put spaces around += in the updated Makefile.am block.
v4: Set a global driverAPI variable so loaders don't have to update to
createNewScreen2() (though they may want to for thread safety).
v5: Fix missed public symbol in nouveau. (caught by Emil)
Reviewed-by: Matt Turner <mattst88@gmail.com> (v2)
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Previously, we've split things such that mesa core is in libdricore,
exposing the whole Mesa core interface in the global namespace, and the
i965_dri.so code all links against that. Along with polluting application
namespace terribly, it requires extra PLT indirections and prevents LTO.
Instead, we can build all of the driver contents into the same .so with
just a few symbols exposed to be referenced from the actual driver .so
file, allowing LTO and reducing our exposed symbol count massively.
FPS improvement on GLB2.7 with INTEL_NO_HW=1: 2.61061% +/- 1.16957% (n=50)
(without LTO, just the PLT reductions from this commit)
Note that the X Server requires commit
7ecfab47eb221dbb996ea6c033348b8eceaeb893 to successfully load this driver!
v2: Set a global driverAPI variable so loaders don't have to update to
createNewScreen2() (though they may want to for thread safety).
v3: Drop AM_CPPFLAGS addition (Emil pointed out I'd missed some cflags
that would be necessary, though only if we actually relied on them).
v4: Fix install with DESTDIR set.
Reviewed-by: Matt Turner <mattst88@gmail.com> (v1)
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com> (v2)
As we move to megadrivers, we are unable to build multiple drivers with
the same public global symbol per driver (Think an X Server with an intel
and a nouveau driver, and the X Server implementing indirect for both --
we have to actually talk to the right driver). By slipping the
driDriverAPI vtable into the driver's extension list, we can replace the
usage of the global symbol with usage of the loader-dlsym()ed driver
information.
v2: Pull in the hunk to avoid crashing on null driver_extensions. Thanks,
Emil!
Reviewed-by: Matt Turner <mattst88@gmail.com> (v1)
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
This will allow a megadrivers build to reference the actual driver being
loaded from the shared dri_util screen creation code.
v2: Fix indentation, fallback case in EGL (review by Emil).
Reviewed-by: Matt Turner <mattst88@gmail.com> (v1)
Reviewed-by: Chad Versace <chad.versace@linux.intel.com> (v1)
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
The previous interface relied on a static struct, which meant that the
driver didn't get a chance to edit the struct before the struct got used.
For megadrivers, I want struct specific to the driver being loaded.
v2: Fix the prototype in the docs (caught by Marek). Since the driver
name was in the function, we didn't need to also pass it in.
v3: Fix asprintf error checking (caught by Matt's gcc).
Reviewed-by: Matt Turner <mattst88@gmail.com> (v1)
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Turns out already we have this nice mechanism for providing optional
things from the driver to the loader, and I was going to have to rename
the public global symbol to avoid conflicts when doing megadrivers.
While the former __driConfigOptions is technically loader interface, this
is the only loader that made use of that symbol. Continue paying
attention to it if we can't find the new option, to retain compatibility
with old drivers.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Based on a similar fix from Aaron Watry. It seems unlikely that we
will ever need a kernel-specific setting for this, and the Gallium API
doesn't support it. Remove kernel::max_block_size() altogether.
The gallium vbuf module, which we've been using for some time now, takes
care of uploading user-space vertex/index data into real buffers. The
upload code in the svga driver was unused.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Print info about packing, format, type, and tiling. This will help debug
future issues with this fastpath.
Reviewed-by: Frank Henigman <fjhenigman@google.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Fixes texture corruption of Weston clients on cairo-glesv2 backend.
Commit 49ed599 introduced the bug.
Corruption occured when glTexSubImage called
intel_texsubimage_tiled_memcpy() with:
x,y=10,9
w,h=7,7
format=GL_ALPHA(0x1906)
type=GL_UNSIGNED_BYTE(0x1401)
gl_format=MESA_FORMAT_A8(0x18)
packing.alignemnt=4
The function miscalculated the source image's stride as w*cpp=7 without
taking into account the packing alignment. The actual stride was 8.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70435
Reported-by: U. Artie Eoff <ullysses.a.eoff@intel.com>
Tested-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by:Frank Henigman <fjhenigman@google.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
In commit 800610f (i965/fs: Improve accuracy of dFdy() to match
dFdx()) I unrolled the high-accuracy dFdy() computation from a single
SIMD16 instruction to two SIMD8 instructions because of text I found
in the i965 (gen4) PRM saying that instruction compression could not
be used in align16 mode. I couldn't find similar text in later
hardware docs, and I observed problems trying to use instruction
compression on align16 mode on Ivy Bridge, so I assumed that the
restriction still applied and the associated documentation had simply
been lost.
After consultation with the hardware engineers, it turns out this is
not the case. In point of fact, the restriction was dropped in gen5,
re-introduced in Ivy Bridge, and dropped again in Haswell. The reason
I didn't notice this is that in the Ivy Bridge documentation, the
restriction was in a different section, and described using different
language.
Now that we know that the restriction only applies to Gen4 and Ivy
Bridge, we can limit the unrolling to those platforms.
Tested on gen5, gen6, and gen7 (both Ivy Bridge and Haswell).
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
From the GLSL 1.50 spec, section 4.3.8.1 (Input Layout Qualifiers):
The layout qualifier identifiers for geometry shader inputs are
layout-qualifier-id
points
lines
lines_adjacency
triangles
triangles_adjacency
And from section 4.3.8.2 (Output Layout Qualifiers)
The layout qualifier identifiers for geometry shader outputs are
layout-qualifier-id
points
line_strip
triangle_strip
max_vertices = integer-constant
We were erroneously allowing line_strip and triangle_strip to be used
as input qualifiers, and we were allowing lines, lines_adjacency,
triangles, and triangles_adjacency to be used as output qualifiers.
Fixes piglit tests "glsl-1.50-gs-{input,output}-layout-qualifiers *".
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
On DOTA2, framerate on dota2-de1.dem in windowed mode on my laptop
improves by 7.69854% +/- 0.909163% (n=3). In a microbenchmark hitting
this code path (wall time of piglit vbo-subdata-many), runtime decreases
from 0.8 to 0.05 seconds.
v2: Use out of range start/end instead of separate bool for the active
flag (suggestion by Jordan), fix double-upload in the stalling path.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
The brw_prepare_vertices that sets up buffers[] depends on these
parameters, so don't let brw_prepare_vertices() skip it.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Supporting this extension turns out to simplify our code a bit over not
supporting this extension, once the glBufferSubData() synchronization code
lands.
v2: Use 16 byte alignment like we do for uniform buffers, due to unaligned
access penalties.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> (v1)
This was mostly for the i915 system-memory VBO code, which we don't have
any more, but since that existed we've ended up producing dependencies on
it being there.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
If glBufferData(), glBufferSubData(0, obj->Size), or similar happens, we
get a new drm_intel_bo for the buffer object, and thus need to re-upload
texture buffer state so we point at the new data.
Fixes the new piglit GL_ARB_texture_buffer_object/data-sync
Cc: "9.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
The new function replaces four old functions: set_fragment/vertex/
geometry/compute_sampler_views().
Note: at this time, it's expected that the 'start' parameter will
always be zero.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Emil Velikov <emil.l.velikov@gmail.com>
Writing a 64-bit register value to memory is sufficiently complicated
that it makes sense to reuse this function rather than duplicating it.
Exposing it outside of gen6_queryobj.c means it needs a more descriptive
function name. It could probably be moved to brw_util.c or somewhere
else, but this works too.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The current callers just want to write a single register, so combining
the register read with a pipeline flush made sense. However, in the
future we'll want to do multiple register reads back to back, and we'll
only want to flush once.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The unit tests added in the previous commits prove some things about the
state of some internal data structures. The most important of these is
that all built-in input and output variables have explicit_location
set. This means that link_invalidate_variable_locations doesn't need to
know the range of non-generic shader inputs or outputs. It can simply
reset location state depending on whether explicit_location is set.
There are two additional assumptions that were already implicit in the
code that comments now document.
- ir_variable::is_unmatched_generic_inout is only used by the linker
when connecting outputs from one shader stage to inputs of another
shader stage.
- Any varying that has explicit_location set must be a built-in. This
will be true until GL_ARB_separate_shader_objects is supported.
As a result, the input_base and output_base parameters to
link_invalidate_variable_locations are no longer necessary, and the code
for resetting locations and setting is_unmatched_generic_inout can be
simplified.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Validates:
- ir_variable::explicit_location should not be modified.
- If ir_variable::explicit_location is not set, ir_variable::location,
ir_variable::location_frac, and
ir_variable::is_unmatched_generic_inout must be reset to 0.
- If ir_variable::explicit_location is set, ir_variable::location
should not be modified. ir_variable::location_frac, and
ir_variable::is_unmatched_generic_inout must be reset to 0.
Previous unit tests have shown that all non-generic inputs / outputs
have explicit_location set.
v2: Split the link_invalidate_variable_locations interface change out to
a separate patch. Remove the vertex_in_builtin_without_explicit and
vertex_out_builtin_without_explicit tests. There was a lot of good
discussion about this on the mailing list to which I refer the
interested reader. Both changes suggested by Paul.
http://lists.freedesktop.org/archives/mesa-dev/2013-October/046652.html
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
This will make it easier to unit test this function in successive
patches. Also, correct the prototype in linker.h. It was... wrong.
v2: Split the interface change from adding the unit tests. Suggested by
Paul.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Checks that the variables generated meet certain criteria.
- Geometry shader inputs have an explicit location.
- Geometry shader outputs have an explicit location.
- Fragment shader-only varying locations are not used.
- Geometry shader uniforms and system values don't have an explicit
location.
- Geometry shader constants don't have an explicit location and are
read-only.
- No other kinds of geometry variables exist.
It does not verify that an specific variables exist.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Checks that the variables generated meet certain criteria.
- Fragment shader inputs have an explicit location.
- Fragment shader outputs have an explicit location.
- Vertex / geometry shader-only varying locations are not used.
- Fragment shader uniforms and system values don't have an explicit
location.
- Fragment shader constants don't have an explicit location and are
read-only.
- No other kinds of fragment variables exist.
It does not verify that an specific variables exist.
v2: Use _mesa_varying_slot_in_fs in
fragment_builtin.inputs_have_explicit_location. Suggested by Paul.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Checks that the variables generated meet certain criteria.
- Vertex shader inputs have an explicit location.
- Vertex shader outputs have an explicit location.
- Fragment shader-only varying locations are not used.
- Vertex shader uniforms and system values don't have an explicit
location.
- Vertex shader constants don't have an explicit location and are
read-only.
- No other kinds of vertex variables exist.
It does not verify that an specific variables exist.
v2: Fix memory management mistakes in
common_builtin::string_starts_with_prefix. Clean up error message
reporting in common_builtin::no_invalid_variable_modes. Both suggested
by Paul.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Ever since the addition of interface blocks with instance names, we have
had an implicit invariant:
var->type->is_interface() ==
(var->type == var->interface_type)
The odd use of == here is intentional because !var->type->is_interface()
implies var->type != var->interface_type.
Further, if var->type->is_array() is true, we have a related implicit
invariant:
var->type->fields.array->is_interface() ==
(var->type->fields.array == var->interface_type)
However, the ir_variable constructor doesn't maintain either invariant.
That seems kind of silly... and I tripped over it while writing some
other code. This patch makes the constructor do the right thing, and it
introduces some tests to verify that behavior.
v2: Add general-ir-test to .gitignore. Update the description of the
ir_variable invariant for arrays in the commit message. Both suggested
by Paul.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
After some discussions about the correct way to update
_mesa_program_state_string, I decided to make a unit test for the
function. It turns out that the function didn't work quite the way I
thought. The unit test proves that the code was already correct.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: Anuj Phogat <anuj.phogat@gmail.com>
Silence a bunch of MSVC type conversion warnings.
Changed return type of S_FIXED to int32_t (signed). The result
is the same. It just seems more intuitive that a signed conversion
function should return a signed value.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This extension never saw any real use so remove it.
v2: also update tests/num_strings.cpp for 'make check'
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Dead code elimination would get rid of the extra instructions, but
skipping this saves iterations through the optimization loop.
From shader-db:
N Min Max Median Avg Stddev
x 14672 3 16 3 3.1334515 0.59904168
+ 14672 1 16 3 2.8955153 0.77732963
Difference at 95.0% confidence
-0.237936 +/- 0.0158798
-7.59342% +/- 0.506783%
(Student's t, pooled s = 0.693935)
Embarassingly, the classic shadow mapping shader:
void main() { }
used to require three iterations through the optimization loop.
With this patch, it only requires one (which makes no progress).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously one side could be UD while the other was float.
V2: Prefer float; apparently IVB can dispatch float ops faster. (Thanks
Eric)
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
Gather unconditionally uses a header, but in some cases the
texture_offset value will be zero.
V2: Don't introduce a bogus conversion.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, Mesa followed the linkage rules outlined in the GLSL
1.20-1.40 specs, which (collectively) said that GLSL versions 1.10 and
1.20 could be linked together, but no other versions could be linked.
In GLSL 4.30, the linkage rules were relaxed so that any two desktop
GLSL versions can be linked together. This change was made because it
reflected the behaviour of nearly all existing implementations (see
Khronos bug 8463). Mesa was one of the few (perhaps the only)
exceptions to prohibit cross-linking of some GLSL versions.
Since the GLSL linkage rules were deliberately relaxed in order to
match the behaviour of existing implementations, it seems appropriate
to relax the rules in Mesa too (even though Mesa doesn't support GLSL
4.30 yet).
Note that linking ES and desktop shaders is still prohibited, as is
linking ES shaders having different GLSL versions.
Fixes piglit tests "shaders/version-mixing {interstage,intrastage}".
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Identifiers with double underscores are reserved, and using them has
undefined behavior according to the C++ spec. It's unlikely to make
any difference, but...
Tested-by: Tom Stellard <thomas.stellard@amd.com>
For seamless cube filtering it is necessary to determine new faces and new
coords per sample. The logic for this is _seriously_ complex (what needs
to happen is very "asymmetric" wrt face, x/y under/overflow), further
complicated by the fact that if the 4 samples are in a corner (meaning we
only have actually 3 samples, and all 3 are on different faces) then
falling off the edge is happening _both_ on x and y axis simultaneously.
There was a noticeable performance hit in mesa's cubemap demo when seamless
filtering was forced on (just below 10 percent or so in a debug build, when
disabling all filtering hacks, otherwise it would probably be a bit more) and
when always doing the logic, hence use a branch which it only does it if any
of the pixels in a quad (or in two quads) actually hit this. With that there
was no measurable performance hit in the cubemap demo (neither in a debug nor
release buidl), but this will vary (cubemap demo very rarely hits edges).
Might also be different on other cpus, as this forces SoA sampling path which
potentially can be quite a bit slower.
Note that as for corners, this code gets all the 3 samples which actually
exist right, and the 4th texel will simply be the same as one of the others,
meaning that filter weights will be a bit wrong. This however should be
enough for full OpenGL (but not d3d10) compliance.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Using atomic function for ncs is superfluous since it is
protected by a mutex anyway. Also lock the mutex only once
while retrieving the next CS for submission.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Fixes this build error.
CC clientattrib.lo
In file included from ../../include/GL/glx.h:333,
from glxclient.h:45,
from clientattrib.c:32:
../../include/GL/glxext.h:275: error: redefinition of typedef ‘GLXContextID’
../../include/GL/glx.h:171: note: previous declaration of ‘GLXContextID’ was here
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70591
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Everything necessary for these appears to be implemented. We'll want to
add more tests to guard against bugs, but it should be functionally
complete.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
translate_sse.c contains code for msabi on x86_64, but it appears to be
untested.
Currently arguments 1 and 2 passed to the generated code are moved as 32-bit
quantities into the registers used by sysvabi, irrespective of the architecture.
Since these may be pointers, they must be moved as 64-bit quantities to avoid
truncation.
Commit f4dd099171 disabled tranlate_sse.c on MinGW
x86_64, I don't know if was due to this issue, or a different one...
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Brian Paul <brianp@vmware.com>
Cygwin also uses the msabi calling convention on x86_64, not the sysvabi calling
convention
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Brian Paul <brianp@vmware.com>
ignored, and an empty message aborts the commit.
The heap is NX on 64-bit Cygwin, so use the rtasm_exec_malloc() implementation
which uses mmap() to allocate an anonymous page with execute permission, rather
than the one which just uses malloc().
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Brian Paul <brianp@vmware.com>
With most of the virtual functions gone, brwInitVtbl() is now tiny.
Merging it into the caller allows us to delete the entire file.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Now that i915 and i965 have been split, the separation between
intelDestroyContext and brw_destroy_context is kind of arbitrary.
This patch replaces the only brw->vtbl.destroy() call with the body
of brw_destroy_context (the only implementation of that virtual
function).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
dri_bo_release is a helper function that calls drm_intel_bo_unreference
but then also sets the pointer to NULL. This is unnecessary, since
brw_destroy_context is called from intelDestroyContext, which also frees
brw completely.
If you're still trying to access them, you've got bigger problems.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Having almost the entire body of the function indented one level for a
check that should never happen seems silly. Just early return.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Since the i915/i965 split, there's only one implementation of this
virtual function. We may as well just call it directly.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Since the i915/i965 split, there's only one implementation of this
virtual function. We may as well just call it directly.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
In commit f878d20 (glsl: Update ir_variable::max_ifc_array_access
properly), I accidentally used the wrong kind of check to determine
whether the variable being accessed was an interface instance (I used
var->get_interface_type() != NULL when I should have used
var->is_interface_instance()). As a result, if an unnamed interface
block contained a struct which contained an array,
update_max_array_access() would mistakenly interpret the struct as a
named interface block and try to dereference a null
var->max_ifc_array_access.
This patch corrects the check, fixing the null dereference.
Fixes piglit test interface-block-struct-nesting.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70368
Reviewed-by: Matt Turner <mattst88@gmail.com>
In desktop GLSL, location qualifiers are case-insensitive. In GLSL
ES, they are case-sensitive. This patch handles the difference by
using a new function to match layout qualifiers,
match_layout_qualifier(), which calls either strcmp() or strcasecmp()
as appropriate.
Fixes piglit tests:
- layout-not-case-sensitive-in.geom
- layout-not-case-sensitive-max-vert.geom
- layout-not-case-sensitive-out.geom
- layout-not-case-sensitive.frag
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This fixes a number of piglit crashes when running on a hacked up llvmpipe.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This just adds the missing bits so the ubo tests don't crash.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
From the ARB_geometry_shader4 spec (section Geometry Shader outputs):
"The built-in special variable gl_Position is intended to hold the
homogeneous vertex position. Writing gl_Position is optional."
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
v2 (Paul Berry <stereotype441@gmail.com>: Split out to separate patch
(previously this was part of "glsl: add builtins for geometry
shaders.")
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Exposing 10-bit color configs confuses too many applications that try to
use the chooser to pick an 8 bit config. The chooser consider an fbconfig
with more bits a better match and will thus give a 10 bit config when an
application asks for a config with GLX_RED_SIZE 1 or 8.
One key example is glxinfo, which does this, and then doesn't specify that
it needs a config where GLX_DRAWABLE_TYPE has the GLX_WINDOW_BIT set.
This way it ends up with a 10 bit config that it can't use to create a
GLX window and fails to log extensions.
This reverts commit f354bcc177.
https://bugs.freedesktop.org/show_bug.cgi?id=70557
We can't perform DCE using the liveness pass between GVN and GCM
because it relies on the correct schedule, but GVN doesn't care about
preserving correctness - it's rescheduled later by GCM.
This patch makes dce_cleanup pass perform simple DCE
between GVN and GCM instead of relying on liveness pass.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=70088
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Two extra instructions in some heroesofnewerth shaders, but a win for
everything else.
total instructions in shared programs: 1531352 -> 1530815 (-0.04%)
instructions in affected programs: 121898 -> 121361 (-0.44%)
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This reverts commit 94d05bf87a as it has a
few problems:
- it breaks windows builds becuase env[LLVM_CXXFLAGS] is never set there
- it is merging not only rtti, but the whole cxxflags (defines etc)
which has proven to be a source of troubles (breaks debugging etc.)
This is required in order for clang to correctly handle the OpenCL C
barrier() builtin which has the following restrictions acording to
the OpenCL 1.1 Specification:
If barrier is inside a conditional statement, then all work-items must
enter the conditional if any work-item enters the conditional statement
and executes the barrier.
If barrier is inside a loop, all work-items must execute the barrier for
each iteration of the loop before any are allowed to continue execution
beyond the barrier.
By linking before otimizations, we can replace calls to barrier() with
calls to a target specific intrinsic which has the noduplicate attribute
This attribute prevents clang from performing optimizations which could
violate the above rules.
This attribute must be applied to the call instruction that invokes
the function, so it is not enough to add this attribute the barrier()
declaration.
As a bonus this will probably speed up compile times since we will no
longer need to run link-time optimizations.
During the recent bind_sampler_states() interface change in gallium
we changed the CSO single_sampler_done() function so that if we were
decreasing the number of sampler states bound in the driver, we'd
null-out the "extra/old" sampler states to unbind them. See commit
1e2fbf265.
However, we didn't make the corresponding fix for sampler views.
This caused an assertion to fail in the svga driver which checked
that the number of sampler views matched the number of sampler states.
This patch fixes cso_restore_sampler_views() so that it nulls-out
the extra/old sampler views if the number of new views is less than
the number of current/old views.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Use GL_MAP_INVALIDATE_RANGE, UNSYNCHRONIZED and FLUSH_EXPLICIT flags
when mapping VBOs during display list compilation. This mirrors what
we do for immediate-mode VBO building in vbo_exec_vtx_map().
This improves performance for applications which interleave display
list compilation with execution. For example:
glNewList(A);
glBegin/End prims;
glEndList();
glCallList(A);
glNewList(B);
glBegin/End prims;
glEndList();
glCallList(B);
Mesa's vbo module tries to combine the vertex data from lists A and B
into the same VBO when there's room. Before, when we mapped the VBO for
building list B, we did so with GL_MAP_WRITE_BIT only. Even though we
were writing to an unused part of the buffer, the map would stall until
the preceeding drawing call finished.
Use the extra map flags and FlushMappedBufferRange() to avoid the stall.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Instead of checking width==height in four places, just do it in
_mesa_legal_texture_dimensions() where we do the other width, height,
depth checks. Similarly, move the check that cube map array depth is
a multiple of 6.
This change also fixes some missing cube dimension checks for the
glTexStorage[23]D() functions.
Remove width==height assertion in _mesa_get_tex_max_num_levels() since
that's called before the other size checks for glTexStorage.
Cc: "9.2" <mesa-stable@lists.freedesktop.org>
We can now add GBM support for the 10 bit/channel formats which lets us
create a gbm surface that we can use with KMS for display hardware that
support the format.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
We add support for the ARGB2101010 color format to the DRI image extension,
which allows DRI loaders to create a __DRIimage with this color format.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
This commit enables ARGB2101010 system framebuffers (that is, DRI drawables)
for the i965 drivers. This is done by generating DRI configs that advertise
this color format as well as teaching intelCreateBuffer to pick the right
color format when it sees such a DRI config.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
This extends the common dri driver infrastructure with the ability to create
__DRIconfigs for 10 bits/channel + 2 bit alphs formats. This still has
to be supported and requested by a driver, so this doesn't enable anthing yet.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
The EGLConfig doesn't have the rgba masks, only the rgba sizes. To
make sure a config is usable with a given GBM/KMS format, we need a way
to make sure the formats really match.
All callers now use the more correct rgba mask mechanism for filtering
out mathcing DRI configs. Even if depth and buffer size match, the
color component layout can be different, or in case or ARGB8888 and
ARGB2101010 the color components can even be different sizes.
Since anything that the depth check would reject is also rejected by
the rgba mask comparison, the depth parameter is redundant and not
specific enough. We should probably have removed it when the rgba
masks argument was introduced, but better late than never.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Matching on visual depth to buffer size makes 8 bpc RGBA look similar to
10 bit RGB with 2 bit alphs - both have buffer size 32. Instead, build
the rgba masks from the visual data and use that for finding matching
DRI configs.
We need to keep the special case that allows us to match 24 bit visuals
to DRI configs with buffer size 32. We do that by creating an alpha
mask of "all the non-rgb bits" for 24 bit visuals and matching a second
time with that.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Add information for RGB565 to the table of image formats so that we can
create a __DRIimage for that format. This in turn enables RGB565
wayland clients.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
With this patch Wayland clients can now ask EGL for RGB 565 format buffers
and attach them to a Wayland compositor.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
* The rtti fix actually dug up a bug in the scons build scripts.
* Autotools took the LLVM cpp and cxx flags, while scons only took
the cpp flags.
* This grabs the cxx flags and applies them where needed. We may
want to make the same change for the llvm cpp flags in scons.
* The only linux platform I can find with LLVM no-rtti is Ubuntu.
* Fixes bug #70471
Tested-by: Vinson Lee <vlee@freedesktop.org>
The original intent of the variable was to prevent adding
libdrm dependency for non drm drivers (swrast). This is
already handled with __NOT_HAVE_DRM_H, and with the recent
merge of the dri_util and drisw_util code this variable has
started causing build issues.
Eg. the following will fail
$ ./autogen.sh --with-dri-drivers=swrast --with-gallium-drivers=
$ make
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Andreas Boll <andreas.boll.dev@gmail.com>
When a geometry shader is active, the transform feedback primitive
type ("mode") needs to be validated against the geometry shader output
primitive type, not the primitive type passed to the glDraw*()
function.
Fixes the following piglit tests:
- glsl-1.50-geometry-primitive-types GL_LINES
- glsl-1.50-geometry-primitive-types GL_LINES_ADJACENCY
- glsl-1.50-geometry-primitive-types GL_LINE_STRIP
- glsl-1.50-geometry-primitive-types GL_LINE_STRIP_ADJACENCY
- glsl-1.50-geometry-primitive-types GL_TRIANGLES
- glsl-1.50-geometry-primitive-types GL_TRIANGLES_ADJACENCY
- glsl-1.50-geometry-primitive-types GL_TRIANGLE_FAN
Exposes previously hidden failures in the following piglit tests:
- glsl-1.50-geometry-primitive-id-restart GL_LINES other
- glsl-1.50-geometry-primitive-id-restart GL_LINES_ADJACENCY other
- glsl-1.50-geometry-primitive-id-restart GL_LINE_LOOP ffs
- glsl-1.50-geometry-primitive-id-restart GL_LINE_LOOP other
- glsl-1.50-geometry-primitive-id-restart GL_LINE_STRIP other
- glsl-1.50-geometry-primitive-id-restart GL_LINE_STRIP_ADJACENCY other
- glsl-1.50-geometry-primitive-id-restart GL_TRIANGLES other
- glsl-1.50-geometry-primitive-id-restart GL_TRIANGLES_ADJACENCY other
- glsl-1.50-geometry-primitive-id-restart GL_TRIANGLE_FAN ffs
- glsl-1.50-geometry-primitive-id-restart GL_TRIANGLE_FAN other
- glsl-1.50-geometry-primitive-id-restart GL_TRIANGLE_STRIP other
- glsl-1.50-geometry-primitive-id-restart GL_TRIANGLE_STRIP_ADJACENCY other
(These failures were previously hidden due to a flaw in the test: it
doesn't check for GL errors. I'll fix the test shortly).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Ivy Bridge's "reorder enable" bit gives us a binary choice for the
order in which vertices from triangle strips are delivered to the
geometry shader. Neither choice follows the OpenGL spec, but setting
the bit is better, because it gets triangle orientation correct.
Haswell replaces the "reorder enable" bit with a new "reorder mode"
bit (which occupies the same location in the command packet). This
bit gives us a different binary choice, which affects both triangle
strips and triangle strips with adjacency. Setting the bit ("reorder
trailing") gives the proper order according to the OpenGL spec.
So in either case we want to set the bit.
On Ivy Bridge, fixes piglit test "triangle-strip-orientation".
On Haswell, fixes piglit tests "glsl-1.50-geometry-primitive-types
{GL_TRIANGLE_STRIP,GL_TRIANGLE_STRIP_ADJACENCY}" and
"glsl-1.50-geometry-tri-strip-ordering-with-prim-restart *".
v2: Rename the bit to "REORDER_TRAILING" for consistency with Haswell
docs.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Despite the name, this field wasn't being set to the dispatch width at
all; it was always 8. The only place it was used was that the
constant buffer read length was aligned to it, and as far as I can
tell from the docs, there is no need to align this value to the
dispatch width; aligning it to a multiple of 8 is sufficient. So I've
just replaced it with a hardcoded 8.
v2: In gen6_wm_state, use brw->wm.base.push_const_size for consistency
with VS and GS state upload.
Reviewed-by: Eric Anholt <eric@anholt.net>
This patch populates the following built-in GLSL 1.50 variables based
on constants stored in ctx->Const:
- gl_MaxVertexOutputComponents
- gl_MaxGeometryInputComponents
- gl_MaxGeometryOutputComponents
- gl_MaxFragmentInputComponents
- gl_MaxGeometryTextureImageUnits
- gl_MaxGeometryOutputVertices
- gl_MaxGeometryTotalOutputComponents
- gl_MaxGeometryUniformComponents
- gl_MaxGeometryVaryingComponents
On i965/gen7, fixes all Piglit tests in "spec/glsl-1.50/built-in
constants/*" except for gl_MaxCombinedTextureImageUnits and
gl_MaxGeometryUniformComponents.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Now that both vec4 and fs are dynamically assigning offsets, a lot of the
code is the same.
v2: Avoid passing around the next offset through the class. (Review by
Paul)
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Note that the dropped comment in brw_context.h is mostly (better written)
in brw_binding_table.c as well.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
It would be nice to be able to pack our binding table so that programs
that use 1 render target don't upload an extra BRW_MAX_DRAW_BUFFERS - 1
binding table entries. To do that, we need the compiled program to have
information on where its surfaces go.
v2: Rename size to size_bytes to be more explicit.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
* As discussed on the mailing list,
forced no-rtti breaks C++ public
API's such as the Haiku C++ libGL.so
* -fno-rtti *can* be still set however
instead of blindly forcing -fno-rtti,
we can rely on the llvm-config
--cppflags output.
If the system llvm is built without
rtti (default), the no-rtti flag will be
present in llvm-config --cppflags
(which we pick up on)
If llvm is built with rtti
(REQUIRES_RTTI=1), then -fno-rtti is
removed from llvm-config --cppflags.
* We could selectively add / remove rtti
from various components, however mixing
rtti and non-rtti code is tricky and
could introduce missing symbols.
* This needs impact tested.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Add simple plain C routines for NV12<->YV12 and YUYV<->UYVY
conversions. The NV12->YV12 conversion is commonly used, for instance
by VLC.
Reviewed-by: Christian König <christian.koenig@amd.com>
Textures that likely reside in VRAM, are mapped for reading and
don't require direct mapping should be staged into GTT, to avoid bad
performance. This fixes readback performance of VDPAU surfaces.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This new bind flag forces linear storage, but does not have other
side effects like R600_RESOURCE_FLAG_TRANSFER.
Reviewed-by: Christian König <christian.koenig@amd.com>
Currently it's hardcoded in the shader, so every change requires
compilation of the shader variant, killing the performance
in Serious Sam 3 and probably other apps.
This patch passes alpha_ref in the user sgpr and removes it from
the shader key.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
This fixes the issue when dst and src is the same reg and operation on one
channel overwrites the source for other channels, e.g.:
UMUL TEMP[2].xyz, TEMP[0].xyzz, TEMP[2].xxxx
In this example the result of the operation on channel x is written in
TEMP[2].x and then used as a second source operand for channels y and z
instead of original value in TEMP[2].x.
This patch stores the results in temp reg and moves them to
dst after performing operation on all channels.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=70327
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
v2: Keep the random 32-bit only version of memcpy, since Ian says I
can't delete it without data proving it isn't useful.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
brw_context.h includes imports.h which includes compiler.h which already
defines these.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
These make it easy to convert a floating point value to a fixed point
numbers. The second parameter is the number of bits used for the
fractional part of the number.
It looks like core Mesa has similar functions already, but none that
allows an arbitrary number of fractional bits. The more generic version
is probably useful to everyone.
r600g apparently has an identical copy of the S_FIXED macro, but doesn't
include this file. I'm not sure what to do about that, so I'm just
going to leave it for now.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This seems generally useful, so it may as well live in core Mesa.
In fact, the comment for ALIGN() in macros.h actually says to "see also"
ROUND_DOWN_TO, which...was in a driver somewhere.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
intel_batchbuffer_init() sets up initial batchbuffer state; it seems
like a reasonable place to initialize this flag.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Configuring which dirty flags we want sounds like a job for
brw_init_state().
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
It actually just wants generation checking, and brw->gen is the usual
way of doing that. In the future, we'll also want to check brw->hw_ctx,
which isn't available from the screen.
While we're changing the function signature, convert from camel case to
our usual naming conventions.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
There's no point in having two files for context functions. This patch
moves the code from intel_context.c into brw_context.c unmodified
(other than whitespace fixes).
Right now, this looks silly; future patches will merge functions and
tidy things up.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
brw_init_surface_formats already sets entries in TextureFormatsSupported
to true; it may as well take care of initializing it to false too.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This flag is only used in one place, and is only set on one platform.
Just check for original Gen4 in the relevant function.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This seems like a better place for it, and helps clean up
brwCreateContext (which is full of a lot of random stuff).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
This was always set to false, and is only used for debugging.
To enable it, simply change the if (0) block and recompile.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Since each kind of device has its own brw_device_info structure, we can
simply store the URB and thread limits there. This eliminates all the
large if-ladders, and simplifies the context initialization code quite a
bit.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
This option was useful during initial development, but it's been ages
since I've heard of anyone using it. Plus, Gen7+ mandates separate
stencil, so it was really only useful on Sandybridge anyway.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The idea is that struct brw_device_info should store statically-known
information about hardware features. Using the new family name in the
PCI ID table, we can easily grab the right structure.
This is basically the equivalent of intel_device_info in the kernel.
This patch also makes the new structure available from intel_screen, but
nothing uses it. Right now, it looks very redundant with existing
fields, but that will change.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
I removed this a while ago, since we never used it, but I'm finally
resurrecting the idea in the next commits.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Nothing uses the #define name, and it's not terribly useful - the
numerical ID serves the same purpose. The only thing we could really do
with it is generate slightly prettier preprocessed code. But who looks
at that?
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Using a helper function clarifies the context initialization code.
I would've liked to completely centralize it, but moving the optionCache
code from intelInitExtensions into here would've required setting flags
in the context, which seems like a waste.
v2: Rebase for the introduction of disable_derivative_optimization.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Now that intelInitContext isn't shared between i915 and i965, the split
is fairly arbitrary. This patch moves a bunch of the basic context
creation and generation checking code up to the top-level function
(and slightly earlier).
More will follow.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Now that there isn't an intel_context structure, the split between
brw_context.[ch] and intel_context.[ch] is rather awkward and arbitrary.
Removing intel_context.[ch] seems desirable, but not everything really
belongs in brw_context.[ch], either.
Moving INTEL_DEBUG handling into separate intel_debug.[ch] files should
make them relatively easy to find.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
"error" is a very generic name. dri_ctx_error is the name used in
intelInitContext(), which is more specific.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Geometry shader support is now working well, and adequately piglit
tested. There are just a few piglit failures left to fix. So there's
no need for an "experimental" warning anymore.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Geometry shaders were the last thing we needed to finish before
turning on GLSL 1.50 and GL 3.2 support. They are now working well,
with just a few piglit failures left to fix.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
With code dump enabled LLVM may generate disassembly during compilation.
Show this disassembly when available and prefer it to SI bytecode dump.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Signed-off-by: Jay Cornwall <jay@jcornwall.me>
Fix coord wrapping (and face selection too) in case of edges.
Unfortunately, the coord wrapping is way more complicated than what
the code did, as it depends on the face and the direction where the
texel falls off the face (the logic needed to get this right in fact
seems utterly ridiculous).
Also fix a bug in (y direction under/overflow) face selection.
And get rid of complicated cube corner handling. Just like edge case,
the coord wrapping was wrong and it seems very difficult to fix.
I'm near certain it can't always work anyway (though ordinary seamless
filtering on edge has actually a similar problem but not as severe)
because we don't have per-pixel face, hence could have multiple corner
texels which would make it very difficult to average the remaining texels
correctly. Hence simply pick a texel which would only have fallen off one
edge but not both instead, which is not quite accurate but actually I think
should be enough to meet OpenGL (but not d3d10) requirements.
v2: small fixes suggested by Brian, add some comments.
Reviewed-by: Brian Paul <brianp@vmware.com>
The previous limit of of 128*1024 was reported to cause frequent recompiles
in some apps due to shader variant thrashing on IRC in some apps leading
to noticeable lags.
Note that the LP_MAX_SHADER_VARIANTS limit (1024) was more or less impossible
to reach, since even simple fragment shaders without texturing (glxgears) used
more than twice than 128 instructions, hence the instruction limit would have
always been reached first (excluding things like trivial shaders not writing
color). Even with the new limit it is VERY likely the instruction limit is hit
first.
Should help with such lags due to recompiles (though other shader types have
their own limits, LP_MAX_SETUP_VARIANTS and DRAW_MAX_SHADER_VARIANTS, in
particular the latter seems a bit small (128)).
Reviewed-by: Brian Paul <brianp@vmware.com>
Fixes this build error.
CC imports.lo
../../src/mesa/main/imports.c: In function '_mesa_strtof':
../../src/mesa/main/imports.c:570:20: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'loc'
../../src/mesa/main/imports.c:570:20: error: 'loc' undeclared (first use in this function)
../../src/mesa/main/imports.c:570:20: note: each undeclared identifier is reported only once for each function it appears in
../../src/mesa/main/imports.c:572:7: error: implicit declaration of function 'newlocale'
../../src/mesa/main/imports.c:572:23: error: 'LC_CTYPE_MASK' undeclared (first use in this function)
../../src/mesa/main/imports.c:574:4: error: implicit declaration of function 'strtof_l'
../../src/mesa/main/imports.c:580:1: warning: control reaches end of non-void function
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Now that libEGL has been fixed to not leak all kinds of symbols, gbm
links to its own copy of the libwayland-drm.a helper library. That means
we can't rely on comparing the addresses of a static vtable symbol in that
library to determine if a wl_buffer is a wl_drm_buffer. Instead, we
move the vtable into the wl_drm struct and use that for comparing.
https://bugs.freedesktop.org/show_bug.cgi?id=69437
Cc: 9.2 <mesa-stable@lists.freedesktop.org>
execinfo.h is not available on NetBSD.
Fixes this bulid error.
CC glapi_gentable.lo
glapi_gentable.c:44:22: fatal error: execinfo.h: No such file or directory
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
This was overriding the top-level .dir-locals.el causing some settings
(like forcing spaces instead of tabs!) to be lost.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
We should be able to safely set the framebuffer state without a
fragment shader bound. bind_ps_state will take care of updating the
necessary state bits later.
v2: check in update_db_shader_control
Fixes GL2ExtensionTests/egl_image_external/TestSimpleUnassociated.test
which is part of gles2/3 conformance suite. Here image external
textures are switched to be treated the same as 2D textures. These
can be associated with the fallback texture providing fixed sample
values of (0, 0, 0, 1).
The OES_EGL_image_external spec says:
"Sampling an external texture which is not associated with any EGLImage
sibling will return a sample value of (0,0,0,1)."
"External textures cannot be used with TexImage2D, TexSubImage2D,
CompressedTexImage2D, CompressedTexSubImage2D, CopyTexImage2D, or
CopyTexSubImage2D, and an INVALID_ENUM error will be generated if
this is attempted."
And quoting Chad:
"That's enforced in _mesa_TexImage*() by calling
legal_teximage_target(), and enforced in _mesa_TexSubImage*() by
calling legal_texsubmimage_target(). Each of the
legal_tex*image_target() functions reject external textures.
Therefore, allowing GL_TEXTURE_EXTERNAL_OES in store_texsubimage()
won't violate the above spec quote.
I think it's safe to allow GL_TEXTURE_EXTERNAL_OES in
store_texsubimage(), as long as the texture has only a single
plane. Luckily, that's the only type of external textures that
Mesa currently supports."
CC: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Extend the fast texture upload from BGRA X-tiled to include RGBA,
Alpha/Luminance, and Y-tiled. Speed improvements, measured with
mesa demos teximage program, on 256 x 256 texture, in MB/s, on a
Sandy Bridge (Ivy is comparable):
before after increase
BGRA/X-tiled 3266 4524 1.39x
BGRA/Y-tiled 1739 3971 2.28x
RGBA/X-tiled 474 4694 9.90x
RGBA/Y-tiled 477 3368 7.06x
L/X-tiled 1268 1516 1.20x
L/Y-tiled 1439 1581 1.10x
v2: Cosmetic changes only: reformat and reword comments, make doxygen-friendly,
rename variables, use existing macros, add an assert.
Signed-off-by: Frank Henigman <fjhenigman@google.com>
Reviewed-and-tested-by: Chad Versace <chad.versace@linux.intel.com>
This is part of the prep for megadrivers, which won't allow using a single
global symbol due to the fact that there will be multiple drivers built
into the same dri.so file. For that, we'll need screen init to take a
reference to the driver to set up this vtable.
v2: Fix two missed references to driDriverAPI.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
The intel_screen.c used to be a dispatch to one of 3 driver functions, but
was down to 1, so it was kind of a waste. In addition, it was trying to
free all of the data that might have been partially freed in the kernel
3.6 check (which comes after intelInitContext, and thus might have had
driverPrivate set and result in intelDestroyContext() doing work on the
freed data). By moving the driverPrivate setup earlier, we can use
intelDestroyContext() consistently and avoid such problems in the future.
v2: Adjust the prototype of brwCreateContext to use the proper enum
(fixing a compiler warning in some builds)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
If bufmgr didn't get created, then screen creation failed, and we never
should have got here in the first place. This was added by Chris Wilson
in 2010 with no explanation for why it would be needed.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
i965, i915, radeon, r200, swrast, and nouveau were mostly trying to do the
same logic, except where they failed to. Notably, swrast had code that
appeared to try to enable GLES1/2 but forgot to set api_mask (thus
preventing any gles context from being created), and the non-intel drivers
didn't support MESA_GL_VERSION_OVERRIDE.
nouveau still relies on _mesa_compute_version(), because I don't know what
its limits actually are, and gallium drivers don't declare limits up front
at all. I think I've heard talk about doing so, though.
v2: Compat max version should be 30 (noted by Ken)
Drop r100's custom max version check, too (noted by Emil Velikov)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The only important difference was not calling drmGetVersion, and making
the swrast extension vtable. That doesn't justify duplicating the other
330 lines of code.
v2: fix the scons build (code by Emil Velikov)
v3: fix scons build with swrast-only (code by Emil Velikov)
v4: Drop the new define I added, when we already have __NOT_HAVE_DRM_H.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Looking at Lightsmark's shaders, the way we used MRFs (or in gen7's
case, GRFs) was bad in a couple of ways. One was that it prevented
compute-to-MRF for the common case of a texcoord that gets used
exactly once, but where the texcoord setup all gets emitted before the
texture calls (such as when it's a bare fragment shader input, which
gets interpolated before processing main()). Another was that it
introduced a bunch of dependencies that constrained scheduling, and
forced waits for texture operations to be done before they are
required. For example, we can now move the compute-to-MRF
interpolation for the second texture send down after the first send.
The downside is that this generally prevents
remove_duplicate_mrf_writes() from doing anything, whereas previously
it avoided work for the case of sampling from the same texcoord twice.
However, I suspect that most of the win that originally justified that
code was in avoiding the WAR stall on the first send, which this patch
also avoids, rather than the small cost of the extra instruction. We
see instruction count regressions in shaders in unigine, yofrankie,
savage2, hon, and gstreamer.
Improves GLB2.7 performance by 0.633628% +/- 0.491809% (n=121/125, avg of
~66fps, outliers below 61 dropped).
Improves openarena performance by 1.01092% +/- 0.66897% (n=425).
No significant difference on Lightsmark (n=44).
v2: Squash in the fix for register unspilling for send-from-GRF, fixing a
segfault in lightsmark.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Matt Turner <mattst88@gmail.com>
For texturing from GRFs, we now have payloads of arbitrary sizes up to the
message length limit.
v2 (Kenneth Graunke): Rebase on intel_context -> brw_context change.
v3: Add some comment text.
v4: Change some magic 16s to BRW_MAX_MRF (noted by Ken). Leave the 11,
which is the magic "max sampler message length". BRW_MAX_MRF sizing
on the little int arrays is retained because I could see us needing to
extend in the future if we move to GRFs for FB writes (those go to at
least 12 long in a quick scan of the specs)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v2)
Acked-by: Matt Turner <mattst88@gmail.com>
This will let us coalesce into texture-from-GRF arguments, which would
otherwise be prevented due to the live interval for the whole vgrf
extending across all the MOVs setting up the channels of the message
v2 (Kenneth Graunke): Rebase for renames.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now optimization passes will be able to look at the per-channel ranges.
v2: Rebase on various optimization pass changes.
v3 (Kenneth Graunke): Rename live_variables to live_intervals; split
introduction of invalidate_live_intervals() into a separate patch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When compacting the list of VGRFs, we patch up the live interval ranges
(which are indexed by VGRF number). Unfortunately, once we make
per-component data available, this will become too complicated to
maintain. Instead, simply invalidate them.
This was pulled out of a patch by Eric Anholt.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
In compute_live_intervals(), start and end are shorter names for
the virtual_grf_start and virtual_grf_end class members.
Now that the fs_live_intervals class has arrays named start and end
which are indexed by var, rather than VGRF, reusing the name is
confusing. Plus, most of the code has been factored out, so using the
long names isn't as inconvenient.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This is the information we'll actually use to replace the
virtual_grf_start[]/end[] arrays.
No change in shader-db.
v2 (Kenneth Graunke): Rebase; minor comment updates.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These blocks are about to grow some more code, and the indentation was
getting out of hand.
v2 (Kenneth Graunke): Rebase, minor typo fixes and style changes.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
For now, this simply sets live_intervals_valid = false, but in the
future it will do something more sophisticated.
Based on a patch by Eric Anholt.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This significantly improves our handling of VGRFs of size > 1.
Previously, we only marked VGRFs as def'd if the whole register was
written by a single instruction. Large VGRFs which were written
piecemeal would not be considered def'd at all, even if they were
ultimately completely written.
Without being def'd, these were then marked "live in" to the basic
block, often extending the range to preceding blocks and sometimes
even the start of the program.
The new per-component tracking gives more accurate live intervals,
which makes register coalescing more effective.
In the future, this should help with texturing from GRFs on Gen7+.
A sampler message might be represented by a 2-register VGRF which
holds the texture coordinates. If those are incoming varyings,
they'll be produced by two PLN instructions, which are piecemeal writes.
No reduction in shader-db instruction counts. However, code which
prints the live interval ranges does show that some VGRFs now have
smaller (and more correct) live intervals.
v2: Rebase on current send-from-GRF code requiring adding extra use[]s.
v3: Rebase on live intervals fix to include defs in the end of the
interval.
v4 (Kenneth Graunke): Rebase; split off a few preparatory patches;
add lots of comments; minor style changes; rewrite commit message.
v5 (Eric Anholt): whitespace nit.
Written-by: Eric Anholt <eric@anholt.net> [v1-3]
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> [v4]
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net> (v4)
num_vars was shorthand for the number of virtual GRFs. num_vgrfs is a
bit clearer. Plus, the next patch will introduce "vars" which are
distinct from vgrfs.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This has no functional effect, but should make subsequent changes a
little simpler.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
From section 4.1.9 (Arrays) of the GLSL 4.40 spec (as of revision 7):
However, unless noted otherwise, blocks cannot be redeclared;
an unsized array in a user-declared block cannot be sized
through redeclaration.
The only place where the spec notes that interface blocks can be
redeclared is to allow for redeclaration of built-in interface blocks
such as gl_PerVertex. Therefore, user-defined interface blocks can
never be redeclared. This is a clarification of previous intent (see
Khronos bug 10659).
We were already preventing interface block redeclaration using the
same block name at compile time, but we weren't preventing interface
block redeclaration using the same instance name (and different block
names) at compile time. And we weren't preventing an instance name
from conflicting with a previously-declared ordinary variable.
In practice the problem would be caught at link time, but only because
of a coincidence: since ast_interface_block::hir() wasn't doing any
checking to see if the instance name already existed in the shader, it
was creating a second ir_variable in the shader having the same name
but a different type. Coincidentally, when the linker checked for
intrastage consistency of global variable declarations, it treated the
two declarations from the same shader as a conflict, so it reported a
link error.
But it seems dangerous to rely on that linker behaviour to catch
illegal redeclarations that really ought to be detected at compile
time.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This patch verifies that:
- The gl_PerVertex input interface block may only be redeclared in a
geometry shader, and that it may only be redeclared as gl_in[].
- The gl_PerVertex output interface block may only be redeclared in a
vertex or geometry shader, and that it may only be redeclared as a
non-array without an interface name.
- gl_PerVertex may not be redeclared as any other type of interface
block (i.e. as a uniform interface block).
As a side-effect, the code now keeps track of what the previous
declaration of gl_PerVertex was--this will be needed in future
patches.
Fixes piglit tests:
- spec/glsl-1.50/compiler/gs-redeclares-pervertex-in-with-incorrect-name.geom
- spec/glsl-1.50/compiler/gs-redeclares-pervertex-out-as-array.geom
- spec/glsl-1.50/compiler/gs-redeclares-pervertex-out-with-instance-name.geom
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
In later patches, we'll use this in order to implement the required
behaviour that after the gl_PerVertex interface block has been
redeclared, only members of the redeclared interface block may be
used.
v2: Update the function name and comment to clarify that we aren't
actually removing the variable from the symbol table, just disabling
it.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This will be used by future patches to change an ir_variable's
interface type when the gl_PerVertex built-in interface block is
redeclared.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This patch modifies the get_variable_being_redeclared() function so
that it no longer relies on the ast_declaration for the variable being
redeclared. In future patches, this will allow
get_variable_being_redeclared() to be used for processing
redeclarations of the built-in gl_PerVertex interface block.
v2: Also make get_variable_being_redeclared() static.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Note: we need to make an exception for the gl_PerVertex interface
block, since in geometry shaders it is allowed to be redeclared with
the instance name gl_in. Future patches will make redeclaration of
gl_PerVertex work properly.
Fixes piglit test
spec/glsl-1.50/compiler/interface-block-instance-name-uses-gl-prefix.vert.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Note: we need to make an exception for the gl_PerVertex interface
block, since built-in variables are allowed to be redeclared inside
it. Future patches will make redeclaration of gl_PerVertex work
properly.
Fixes piglit tests:
- spec/glsl-1.50/compiler/interface-block-array-elem-uses-gl-prefix.vert
- spec/glsl-1.50/compiler/named-interface-block-elem-uses-gl-prefix.vert
- spec/glsl-1.50/compiler/unnamed-interface-block-elem-uses-gl-prefix.vert
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Note: we need to make an exception for the gl_PerVertex interface
block, since this is allowed to be redeclared. Future patches will
make redeclaration of gl_PerVertex work properly.
Fixes piglit test
spec/glsl-1.50/compiler/interface-block-name-uses-gl-prefix.vert.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Note: some limited amount of redeclaration is actually allowed,
provided the shader is redeclaring the built-in gl_PerVertex interface
block. Support for this will be added in future patches.
Fixes piglit tests
spec/glsl-1.50/compiler/unnamed-interface-block-elem-conflicts-with-prev-{block-elem,global}.vert.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
GLSL reserves identifiers beginning with "gl_" or containing "__", but
we haven't been consistent about enforcing this rule. This patch
makes a new function to check whether identifier names are valid. In
the process it closes a loophole where we would previously allow
function argument names to contain "__".
v2: Rename check_valid_identifier() -> validate_identifier(). Add
curly braces in validate_identifier().
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
In commit e2660770731b018411fbe1620cacddaf8dff5287 (glsl: Keep track
of location for interface block fields), I neglected to update
glsl_type::record_key_compare to account for the fact that interface
types now contain location information. As a result, interface types
that differ only by their location information would not be properly
distinguished.
At the moment this is not a problem, because the only interface block
in which location information != -1 is gl_PerVertex, and gl_PerVertex
is always created in the same way. However, in the patches that
follow, we'll be adding new ways to create gl_PerVertex (by
redeclaring it), so we'll need location information to be handled
properly.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Although these interfaces can't be accessed directly by GLSL (since
they don't have an instance name), they will be necessary in order to
allow redeclarations of gl_PerVertex.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Currently, we create just a single gl_PerVertex interface block for
geometry shader inputs. In later patches, we'll also need to create
an interface block for geometry and vertex shader outputs. Moving the
code into its own class will make reuse easier.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We use a location of -1 for variables which don't have their own
assigned locations--this includes ir_variables which represent named
interface blocks. Technically the location assigned to gl_in doesn't
matter, since gl_in is only accessed via its members (which have their
own locations). But it's nice to be consistent.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Calling radeon_drm_cs_flush from multiple threads might cause deadlocks,
fix this by immediately signaling the semaphore after waiting for it.
This is a candidate for the stable branch(es).
Partially fixes: https://bugs.freedesktop.org/show_bug.cgi?id=70123
v2: some fixes on commit message
Signed-off-by: Christian König <christian.koenig@amd.com>
Unless the polygon fill mode is different from PIPE_POLYGON_MODE_FILL,
so checking the the polygon mode is sufficient.
Testing done: no regression in polygon-mode-offset
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Not used since ages, and it wouldn't work at all with explicit derivatives now
(not that it did before as it ignored them but now the code would just use
the derivs pre-projected which would be quite random numbers).
v2: also get rid of 3 helper functions no longer used.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
They need some special handling. Quite complicated.
Additionally, use the same code for implicit derivatives too if no_rho_approx
and no_quad_lod is set, because it seems while generally it should be ok
to use per quad lod for implicit derivatives there's at least some test which
insists that in case of cubemaps the shared lod value MUST come from a pixel
inside the primitive (due to the derivatives becoming different if a different
larger major axis is chosen).
v2: based on Brian's feedback, clean up code a bit.
And use sign bit of major axis instead of pre-select s/t/r sign for coord
mirroring (which should be the same in the end, saves 2 ands).
Also fix two bugs with select/mirror of derivatives, the minor axes need to
use major axis sign as well (instead of major derivative axis sign), and
don't mistakenly use absolute values of major derivative and inverse major
values.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
There's two reasons for this:
1) even when ignoring rho approximation for cube maps, the result is still
not correct, but it's better as the max error at edges is now sqrt(2) instead
of 2 (which was a full mip level), same as it is for ordinary 2d maps when
doing rho approximations (so the error actually goes from factor 2 at edges and
sqrt(2) completely inside a face to sqrt(2) at edges and 0 inside a face).
2) I want to repurpose rho_no_approx for cubemaps for fully correct cubemap
derivatives (so don't need yet another debug var).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
We were already setting the array size of unsized arrays that appeared
inside unnamed interface blocks, but we weren't updating
ir_variable::interface_type to reflect the new array size, causing
bogus link errors.
This patch causes array_sizing_visitor to keep track of all the
unnamed interface types it sees, and the ir_variables corresponding to
each one. After the visitor runs, a new function,
fixup_unnamed_interface_types(), adjusts each unnamed interface type
to correctly correspond with the array sizes in the ir_variables.
Fixes piglit tests:
- spec/glsl-1.50/execution/unsized-in-unnamed-interface-block-gs
- spec/glsl-1.50/execution/unsized-in-unnamed-interface-block-multiple
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
When multiple shaders of the same type access an interface block
containing an unsized array, we need to set the array size based on
the maximum array element accessed across all the shaders. This is
similar to what we already do with unsized arrays occurring outside of
interface blocks.
Note: one corner case is not yet addressed by these patches: the case
where one compilation unit defines an interface block containing
unsized arrays and another compilation unit defines the same interface
block containing sized arrays.
Fixes piglit test:
- spec/glsl-1.50/execution/unsized-in-named-interface-block-multiple
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Unsized arrays appearing inside named interface blocks now get a
proper size assigned by the array_sizing_visitor.
Fixes piglit tests:
- spec/glsl-1.50/execution/unsized-in-named-interface-block
- spec/glsl-1.50/execution/unsized-in-named-interface-block-gs
- spec/glsl-1.50/linker/unsized-in-named-interface-block
- spec/glsl-1.50/linker/unsized-in-named-interface-block-gs
- spec/glsl-1.50/linker/unsized-in-unnamed-interface-block-gs (*)
(*) is fixed by dumb luck--support for unsized arrays in unnamed
interface blocks will come in a later patch.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This patch modifies update_max_array_access() so that it updates
ir_variable::max_ifc_array_access to reflect the shader's use of
arrays appearing within interface blocks.
v2: Use an ordinary function in ast_array_index.cpp rather than a
virtual function in ir_rvalue. Avoid dereferencing NULL when handling
accesses to ordinary structs.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
For interface blocks that contain arrays, this field will contain the
maximum element of each contained array that is accessed by the
shader. This is a first step toward supporting unsized arrays in
interface blocks.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Currently, when converting an access to an array element from ast to
IR, we need to see if the array is an ir_dereference_variable, and if
so update the variable's max_array_access.
When we add support for unsized arrays in interface blocks, we'll also
need to account for cases where the array is an ir_dereference_record
and the record is an interface block.
To make this easier, move the update into its own function.
v2: Use an ordinary function in ast_array_index.cpp rather than a
virtual function in ir_rvalue.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Although it's not explicitly stated in the GLSL 1.50 spec, unsized
arrays are allowed in interface blocks.
section 1.2.3 (Changes from revision 5 of version 1.5) of the GLSL
1.50 spec says:
* Completed full update to grammar section. Tested spec examples
against it:
...
* add unsized arrays for block members
And section 7.1 (Vertex and Geometry Shader Special Variables)
includes an unsized array in the built-in gl_PerVertex interface
block:
out gl_PerVertex {
vec4 gl_Position;
float gl_PointSize;
float gl_ClipDistance[];
};
Furthermore, GLSL 4.30 contains an example of an unsized array
occurring inside an interface block. From section 4.3.9 (Interface
Blocks):
uniform Transform { // API uses "Transform[2]" to refer to instance 2
mat4 ModelViewMatrix;
mat4 ModelViewProjectionMatrix;
vec4 a[]; // array will get implicitly sized
float Deformation;
} transforms[4];
This patch adds the parser rule to support unsized arrays inside
interface blocks. Later patches in the series will add the
appropriate semantics to handle them.
Fixes piglit tests:
- spec/glsl-1.50/execution/unsized-in-unnamed-interface-block
- spec/glsl-1.50/linker/unsized-in-unnamed-interface-block
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Interface declarations have two names associated with them: the block
name and the instance name. It's the block name that needs to be
passed to get_interface_instance(). This patch renames the argument
so that there's no confusion.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
BLORP performs blits by drawing a rectangle with a shader that samples
from the source texture, and writes color data to the destination.
The sampler always returns 32-bit RGBA float data, regardless of the
source format's component ordering or data type. Likewise, the render
target write message takes 32-bit RGBA float data, and converts it
appropriately. So the bulk of the work is already taken care of for us.
This greatly accelerates a lot of CopyTexSubImage calls, and makes
Legends of Aethereus playable on Ivybridge. At the default settings,
LOA continually blits between SRGBA8888 (the window format) and
RGBA16_FLOAT. Since neither BLORP nor our BLT paths supported this,
it fell back to meta, spending 33% of the CPU in floorf() converting
between floats and half-floats.
v2: Use != instead of ^ (suggested by Ian). Note that only
CopyTexSubImage is affected by this patch (caught by Eric).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The previous code for sRGB overrides assumes that the source and
destination formats are equal, other than the color space. This won't
be feasible when we add support for format conversions.
Here are a few cases, and how the old code handled them:
1. RGB8 -> SRGB8, MSAA ==> SRGB8 -> SRGB8
2. RGB8 -> SRGB8, single ==> RGB8 -> RGB8
3. SRGB8 -> RGB8, MSAA ==> RGB8 -> RGB8
4. SRGB8 -> RGB8, single ==> SRGB8 -> SRGB8
Apparently, preserving the behavior of #1 is important. When doing a
multisample to single-sample resolve, blending the samples together in
an sRGB correct fashion results in a noticably higher quality image.
It also is necessary to pass Piglit's EXT_framebuffer_multisample
accuracy color tests.
Paul, Eric, Anuj, and I talked about this, and aren't sure that it
matters in the other cases.
This patch preserves the behavior of #1, but otherwise reverts to
doing everything in linear space, changing the behavior of case #4.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We could conceivably use BRW_SURFACEFORMAT_R24_UNORM_X8_TYPELESS for
Z24 source images, allowing conversions from Z24 to either Z16 or Z32F.
Unfortunately, we can't use it for destination images since it isn't
supported as a render target.
Using different formats for sources or destinations would be painful,
so for now, punt.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Currently, all that matters is that we copy the correct number of bits,
so any format that has 32-bits of data will work fine.
Once BLORP begins handling format conversions, the sampler will need to
correctly interpret the data. We don't need a depth format, but we do
need the right number of components and data type (FLOAT).
For Z32F, this means using R32_FLOAT.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Currently, all that matters is that we copy the correct number of bits,
so any format that has 16-bits of data will work fine.
Once BLORP begins handling format conversions, the sampler will need to
correctly interpret the data. We don't need a depth format, but we do
need the right number of components and data type (UNORM).
For Z16, this means using R16_UNORM.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Once blorp gains the ability to do format conversions, it's conceivable
that the source format may be texturable but not supported as a render
target. This would break Paul's code, which assumes that it can use the
render_target_format array even for the source format.
There are three ways to convert MESA_FORMAT enums to BRW_SURFACEFORMAT
enums:
1. brw_format_for_mesa_format()
This translates the Mesa format to the most equivalent BRW format.
2. brw->render_target_format[]
This is used for renderbuffers, and handles the subset of formats
that are renderable. However, it's not always equivalent, since
it overrides a few non-renderable formats. For example, it
converts B8G8R8X8_UNORM to B8G8R8A8_UNORM so it can be rendered to.
3. translate_tex_format()
This is used for textures. It wraps brw_format_for_mesa_format(),
but overrides depth textures, and one sRGB case on Gen4.
BLORP has a fourth function, which uses brw->render_target_format[]
and overrides depth formats (differently than translate_tex_format).
This patch makes the BLORP function to use brw_format_for_mesa_format()
for textures/source data, since not everything will be a render target.
It continues using brw->render_target_format[] for render targets, since
it needs the format overrides that provides.
We don't use translate_tex_format() since the additional overrides are
not useful or simply redundant.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
As we're moving towards expanding the number of subpixel
bits and the width of the variables used in the computations
we need to make this code a bit more centralized.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The code introduces two new 32bit integer multiplication opcodes which
can be used to produce correct 64 bit results. GLSL, OpenCL and D3D10+
require them. We use two seperate opcodes, because they match the
behavior of GLSL and OpenCL, are a lot easier to add than a single
opcode with multiple destinations and because there's not much (any)
difference wrt code-generation.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The UD values were getting set up as floats. This happened to work out
because they were used as the second argument where the first was a dword,
and gen6+ doesn't do source conversions. But it did trigger fulsim
warnings, and it meant if you used the push constant as the first operand
you would have been disappointed.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
The EGL library has some references to x11 but it gets the link flags
from the XCB_DRI2_LIBS if and only if HAVE_EGL_PLATFORM_X11 is true.
The X11_LIBS variable was probably coming from a PKG_CHECK_MODULES (x11)
earlier in history.
If it is possible to have HAVE_EGL_DRIVER_GLX without HAVE_EGL_PLATFORM_X11
then the link flags for libX11 should be passed. However, it won't come
from X11_LIBS which is undefined.
Reported-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Gaetan Nadon <memsize@videotron.ca>
The compiler cannot find the Xlib.h in the installed system headers.
All supplied include directives point to inside the mesa module.
The X11_CFLAGS variable is undefined (not defined in config.status).
It appears the intent was to use X11_INCLUDES defined in configure.ac.
The Xlib.h file is not installed on my workstation. It is supplied in
the libx11-dev package. This allows an X developer control over which
version of this file is used for X development.
Use to test: --enable-gallium-egl --enable-xlib-glx --disable-dri
Acked-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Gaetan Nadon <memsize@videotron.ca>
The compiler cannot find the Xlib.h in the installed system headers.
All supplied include directives point to inside the mesa module.
The X11_CFLAGS variable is undefined (not defined in config.status).
It appears the intent was to use X11_INCLUDES defined in configure.ac.
The Xlib.h file is not installed on my workstation. It is supplied in
the libx11-dev package. This allows an X developer control over which
version of this file is used for X development.
Acked-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Gaetan Nadon <memsize@videotron.ca>
The X11_CFLAGS variable is undefined (not defined in config.status).
It appears the intent was to use X11_INCLUDES defined in configure.ac.
It is used for building the code in the x11 subdir.
The build does not fail on this one as LIBDRM_CFLAGS happens to have
the inludedir value as the one for X11. It will not always be the case.
The option --enable-gallium-egl is required durimg configuration.
Acked-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Gaetan Nadon <memsize@videotron.ca>
OutputSurfaces have simple YCbCr rendering functionality built in,
but so far only 4:2:0 subsampling worked correctly. This fixes 4:2:2
and 4:4:4 formats.
Reviewed-by: Christian König <christian.koenig@amd.com>
It doesn't work (decodes to garbage) with most videos on UVD 3.0. Worse
yet, it often results in random memory corruption or GPU hangs. Rumor
has it only the newest UVD hardware could do it anyway.
Reviewed-by: Christian König <christian.koenig@amd.com>
The DPB size calculations seem to be off; there is various random
corruption happening, even with advanced profile. Always assuming
a minimum number of references appears to fix it, similarly to
H.264. This might overallocate the DPB. Also clean up the SPS/PPS
field setup so that it matches VC-1 specifications better.
With these changes, all advanced profile VC-1 files I could get my
hand on work fine.
Reviewed-by: Christian König <christian.koenig@amd.com>
UVD can only support NV12 in the case of hardware decoding, but we
can still use all other formats for software decoding. Use the UNKNOWN
profile to signal that we're not interesting in hardware decoding.
v2: use profile instead of entrypoint
Reviewed-by: Christian König <christian.koenig@amd.com>
which contains -Wl,-Bsymbolic. If I understand it correctly, it prevents
symbols from clashing if multiple drivers are loaded at the same time.
Tested-by: Emil Velikov <emil.l.velikov@gmail.com>
Copy sechalf to the new register, otherwise we would read wrong HW registers.
Signed-off-by: Chia-I Wu <olv@lunarg.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
When the instruction to send the sampler message is forced uncompressed or
sechalf, send SIMD8 one even in SIMD16 mode.
Signed-off-by: Chia-I Wu <olv@lunarg.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
SIMD8 sampler messages are allowed in SIMD16 mode, and they could not work
without BRW_COMPRESSION_2NDHALF. Later PRMs (gen5 and later) do not
explicitly state whether BRW_COMPRESSION_2NDHALF is allowed, but they do have
examples using send with SecHalf. It should be safe to assume SecHalf is
valid.
Signed-off-by: Chia-I Wu <olv@lunarg.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
gl_PointSize is stored in the w component of VARYING_SLOT_PSIZ, but
the geometry shader infrastructure assumes that it should look for all
geometry shader inputs of type float in the x component. So when
compiling a geomtery shader that uses a gl_PointSize input, fix it up
during the shader prolog by moving the w component to the x component.
This is similar to how we emit fixups and workarounds for vertex
shader attributes.
Fixes piglit test spec/glsl-1.50/execution/geometry/core-inputs.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This corresponds to the lowering of gl_ClipDistance to
gl_ClipDistanceMESA for vertex and geometry shader outputs. Since
this lowering pass occurs after lower_named_interface blocks, it deals
with 2D arrays (gl_ClipDistance[vertex][clip_plane]) rather than 1D
arrays in an interface block
(gl_in[vertex].gl_ClipDistance[clip_plane]).
v2 (Paul Berry <stereotype441@gmail.com>): Fix indexing order for
gl_ClipDistance input lowering. Properly lower bulk assignment of
gl_ClipDistance inputs. Rework for GLSL 1.50 style geometry shaders.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
v3 (Paul Berry <stereotype441@gmail.com>): Add comments and assertions
to clarify that the 2D version of clip distance is only used for
geometry shader inputs.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously, builtin_variables.cpp was written assuming that we
supported ARB_geometry_shader4 style geometry shader inputs, meaning
that each built-in varying input to a geometry was supplied via an
array variable whose name ended in "In", e.g. gl_PositionIn or
gl_PointSizeIn.
However, in GLSL 1.50 style geometry shaders, things work
differently--built-in inputs are supplied to geometry shaders via a
built-in interface block called gl_in, which contains all the built-in
inputs using their usual names (e.g. the gl_Position input is supplied
to the geometry shader as gl_in[i].gl_Position).
This patch adds the necessary logic to builtin_variables.cpp to create
the gl_in interface block and populate it accordingly for geometry
shader inputs. The old ARB_geometry_shader4 style varyings are
removed, though they can easily be added back in the future if we
decide to support ARB_geometry_shader4.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This patch adds a "location" element to struct glsl_struct_field, so
that we can keep track of the gl_varying_slot associated with each
built-in geometry shader input.
In lower_named_interface_blocks, we use this value to populate the
"location" field in the ir_variable that stores each geometry shader
input.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
For a few reasons.
1: In the (current) common case, these conditionals are never true. All
we're doing by checking them is slowing down MakeCurrent. The server
does these checks already anyway.
2: GLX >= 3.0 contexts may legally be made current without a bound
framebuffer.
This does not fix piglit/glx-create-context-current-no-framebuffer, but
is a prerequisite for fixing it.
Cc: "9.1 9.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Adam Jackson <ajax@redhat.com>
The new sampler bind sends us NULL samplers, so we need to count
the number of valid samplers ourselves. This fixes ~500 piglit
regressions from the sampler rework.
While we're at it, let's also support start.
In 3DSTATE_DEPTH_BUFFER, we set Width and Height to the miptree slice's
physical dimensions. (Logical and physical dimensions may differ for
multisample surfaces).
However, in SURFACE_STATE, we always set Width and Height to the slice's
logical dimensions. We should do the same for 3DSTATE_DEPTH_BUFFER,
because the hw docs say so.
No Piglit regressions (-x glx -x glean) on Ivybridge with Wayland.
v2: No Piglit regressions, for real this time.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Now, one can do the following to generate and read the i965 Doxygen:
cd $MESA_TOP/doxygen
make
firefox i965/index.html
Reviewed-by: Frank Henigman <fjhenigman@google.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
The registers in the architecture register file don't share much in
common, so there's no point in grouping them together. Use the HW_REG
class instead. The vec4 backend already does this.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Accidentally pushed an old version of the patch.
v2: Set destination register using brw_null_reg().
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
CSE would otherwise combine the two mul(8) emitted by [iu]mulExtended:
mul(8) acc0 x y
mach(8) null x y
mov(8) lsb acc0
...
mul(8) acc0 x y
mach(8) msb x y
Into:
mul(8) temp x y
mov(8) acc0 temp
mach(8) null x y
mov(8) lsb acc0
...
mov(8) acc0 temp
mach(8) msb x y
But mul(8) into the accumulator produces more than 32-bits of precision,
which is required and lost if multiplying into a general register and
moving to the accumulator.
Reviewed-by: Eric Anholt <eric@anholt.net>
These built-ins have two "out" parameters, which makes implementing them
efficiently with our current compiler infrastructure difficult. Instead,
implement them in terms of the existing ir_binop_mul IR (to return the
low 32-bits) and a new ir_binop_mul64 which returns the high 32-bits.
v2: Rename mul64 -> imul_high as suggested by Ken.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
i965 implements this with a single (multiple destination) instruction,
SUBB. Emitting SUBB directly from usubBorrow() would be ideal, but our
optimization passes don't know how to copy with expressions with
side-effects.
Radeon has an SUBB_UINT instruction that only generates the borrow
bit. I've chosen to go this route and implement usubBorrow() by doing the
subtraction and the borrow operations separately.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
i965 implements this with a single (multiple destination) instruction,
ADDC. Emitting ADDC directly from uaddCarry() would be ideal, but our
optimization passes don't know how to copy with expressions with
side-effects.
Radeon has an ADDC_UINT instruction that only generates the carry
bit. I've chosen to go this route and implement uaddCarry() by doing the
addition and the carry operations separately.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Calculates the carry out of the addition of two values and the
borrow from subtraction respectively. Will be used in uaddCarry() and
usubBorrow() built-in implementations.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The only GLSL extension that is not enabled is AMD_vertex_shader_layer.
I think the standalone-compiler could enable this (as shading language
support is complete), but no driver enables it.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Infer whether or not to use ES based on the GLSL version (100 or 300 are
for ES). This replaces the --glsl-es command line option. Set various
compiler limits based on the minimums required for the specified GLSL
version.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
The choices aren't just 0 and 1, so using the enum names is much more
clear.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This allows application developers to use Mesa's compiler as a
standalone validator for their shaders.
This is mostly a revert of commit 569f0e4.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Pull the data directly from the context like the other varying related
limits. The parser state shadow copies were added back when the parser
state didn't have a pointer to the context. There's no reason to do it
now days.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
gl_MaxVertexOutputVectors => ctx->Const.VertexProgram.MaxOutputComponents
gl_MaxFragmentInputVectors => ctx->Const.FragmentProgram.MaxInputComponents
v2: Add types so that the code compiles. Pointed out by Brian.
v3: Leave gl_MaxVaryingFloats et al. as-is. Suggested by Paul.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com> [v2]
Reviewed-by: Marek Olšák <marek.olsak@amd.com> [v2]
Reviewed-by: Paul Berry <stereotype441@gmail.com> [v2]
Starting with OpenGL 3.2 input limits and output limits for stages may
not match. This means they need to be accounted separately.
No piglit regressions.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
No need to keep a copy of the message in system memory anymore,
since it should now be in GART memory on newer chips.
Signed-off-by: Christian König <christian.koenig@amd.com>
This fixes issues where get_rt_format would see a 0 format because the
nouveau_surface had not been properly initialized. Fixes crash on
supertuxkart startup (which still fails due to out-of-vram issues).
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Francisco Jerez <currojerez@riseup.net>
As of ARB_gpu_shader5, textureGather doesn't always read the
post-swizzle RED channel -- so we can't just look at the red swizzle
state.
Theoretically we could only flag the quirk if *some* green swizzle is in
use, but that's probably more trouble than it's worth.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
- For HSW: Select the channel based on the component selected (swizzle
is done in HW)
- For IVB: Select the channel based on the swizzle state for the
component selected. Only apply the RG32F w/a if we actually want
green -- we're about to flag it regardless of swizzle state.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
- For HSW: Select the channel based on the component selected (swizzle
is done in HW)
- For IVB: Select the channel based on the swizzle state for the
component selected. Only apply the RG32F w/a if we actually want
green -- we're about to flag it regardless of swizzle state.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
- gsampler2DRect support
- optional `comp` parameter
Future patches will add shadow sampler support and
textureGatherOffsets().
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
ARB_gpu_shader5 introduces new variants of textureGather* which have an
explicit component selector, rather than relying purely on the sampler's
swizzle state.
This patch adds the GLSL plumbing for the extra parameter.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Theoretically would work on Gen5 as well but requires GLSL 1.30, which
is not (yet) enabled by default there.
V2: Enable for Gen5 conditionally on GLSL version.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Previously we special-cased textureSize() but this is the more correct
condition.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
* This in essence means that Mesa would be
taking control of Haiku's OpenGL kit.
* This works by dispatching renderers from the
OpenGL add-ons directory
This patch fixes this Oracle Solaris Studio build error.
"../../src/glsl/ir_constant_expression.cpp", line 1398: Error: The function "isnormal" must have a prototype.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
All texture instructions can use offsets, not just TXF. Offsets into
the literals array were wrong, too.
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
We return 0 for GL_NUM_SHADER_BINARY_FORMATS, so
GL_SHADER_BINARY_FORMATS should not write any data to the application
buffer.
Fixes piglit test 'arb_get_program_binary-overrun shader'.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
As we march over the source buffer we're uploading in pieces, we
need to memcpy from the current offset, not the start of the buffer.
Fixes graphical corruption when drawing very large vertex buffers.
Cc: "9.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Matthew McClure <mcclurem@vmware.com>
This patch adds the lrint, lrintf, llrint, and llrintf rounding utility
functions. When packing unorm depth values, we will round to nearest.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
With commit cb1febb07, I have incorrectly removed HAVE_COMMON_DRI
assuming that swrast does not need to build the translations for
driconf options, as effectively swrast/drisw does not use them.
With the incoming unification work of dri and drisw, it makes
sense just to revert the offending hunk.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70057
Reported-by: Vinson Lee <vlee@freedesktop.org>
Tested-by: Vinson Lee <vlee@freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Previously, we computed dFdy() using the following instruction:
add(8) dst<1>F src<4,4,0)F -src.2<4,4,0>F { align1 1Q }
That had the disadvantage that it computed the same value for all 4
pixels of a 2x2 subspan, which meant that it was less accurate than
dFdx(). This patch changes it to the following instruction when
c->key.high_quality_derivatives is set:
add(8) dst<1>F src<4,4,1>.xyxyF -src<4,4,1>.zwzwF { align16 1Q }
This gives it comparable accuracy to dFdx().
Unfortunately, align16 instructions can't be compressed, so in SIMD16
shaders, instead of emitting this instruction:
add(16) dst<1>F src<4,4,1>.xyxyF -src<4,4,1>.zwzwF { align16 1H }
We need to unroll to two instructions:
add(8) dst<1>F src<4,4,1>.xyxyF -src<4,4,1>.zwzwF { align16 1Q }
add(8) (dst+1)<1>F (src+1)<4,4,1>.xyxyF -(src+1)<4,4,1>.zwzwF { align16 2Q }
Fixes piglit test spec/glsl-1.10/execution/fs-dfdy-accuracy.
Acked-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
The inferface/prototype in native_wayland_bufmgr.h uses boolean/int, as
well as the rest of the file. Convert to improve consistency and to
prevent gcc compiler warnings due to type miss-match.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Clean up inconsistency in enum decoration:
- Use the undecorated enums where possible.
- MAX_PROGRAM_TEXTURE_GATHER_COMPONENTS_ARB remains decorated, since it
has no undecorated equivalent in GL4.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70054
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The new surface channel select bits allow us to avoid having to
recompile the shader for this workaround.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-and-tested-by: Kenneth Graunke <kenneth@whitecape.org>
This allows us to use a different surface format for gather4, which is
required for R32G32_FLOAT to work on Gen7.
V4: - Only emit alternate surface state for shaders which will actually
use it.
- Pass a simple 'for_gather' flag rather than a function pointer.
The callee can decide what w/a to apply.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Worst-case is that *every* texunit uses a format that needs overriding.
V4: Place the gather slots last, so shaders which don't use gather don't
get penalized by having a huge binding table.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
gather4 GREEN channel against a surface with format R32G32_FLOAT doesn't work
correctly on IVB. w/a from bspec:
- use R32G32_FLOAT_LD = 0x97 instead, for gather4 only.
- select BLUE channel to read GREEN
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
V4: Only flag quirks if there are any uses of gather in the shader,
to avoid spurious recompiles just because someone happened to use
RG32F.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Pretty much the same as the FS case. Channel select goes in the header,
V2: Less mangling.
V3: Avoid sampling at all, for degenerate swizzles.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Lowers ir_tg4 (from textureGather and textureGatherOffset builtins) to
SHADER_OPCODE_TG4.
The usual post-sampling swizzle workaround can't work for ir_tg4,
so avoid doing that:
* For R/G/B/A swizzles use the hardware channel select (lives in the
same dword in the header as the texel offset), and then don't do
anything afterward in the shader.
* For 0/1 swizzles blast the appropriate constant over all the output
channels instead of sampling.
V2: Avoid duplicating header enabling block
V3: Avoid sampling at all, for degenerate swizzles.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Adds the Gen7 message IDs, a new SHADER_OPCODE_TG4 pseudo-op, and
low-level support for emitting it via generate_tex().
V3: Updated for changes in master.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When used with a cube array in VS, failed assertion in ir_validate:
Assignment count of LHS write mask channels enabled not
matching RHS vector size (3 LHS, 4 RHS).
To fix this, swizzle the RHS correctly for the writemask.
This showed up in the ARB_texture_gather tests, which exercise cube
arrays in the VS.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Cc: "9.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Consider only the top-left and top-right pixels to approximate DDX in a 2x2
subspan, unless the application requests a more accurate approximation via
GL_FRAGMENT_SHADER_DERIVATIVE_HINT or this optimization is disabled from the
new driconf option disable_derivative_optimization.
This results in a less accurate approximation. However, it improves the
performance of Xonotic with Ultra settings by 24.3879% +/- 0.832202% (at 95.0%
confidence) on Haswell. No noticeable image quality difference observed.
The improvement comes from faster sample_d. It seems, on Haswell, some
optimizations are introduced to allow faster sample_d when all pixels in a
subspan have the same derivative. I considered SAMPLE_STATE too, which allows
one to control the quality of sample_d on Haswell. But it gave much worse
image quality without giving better performance comparing to this change.
No piglit quick.tests regression on Haswell (tested with v1).
v2: better guess for precompile program key
Signed-off-by: Chia-I Wu <olv@lunarg.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
We have the destination framebuffer object passed in; there's no need to
go digging around in the context.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Using it encourages the (IMHO worrying) practice of leaving member
variables uninitialized in constructor definitions. This macro
shouldn't be necessary anymore after the last patch series fixing all
its users to initialize all member variables from the class
constructor. Remove it.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
All member variables of glsl_to_tgsi_instruction are already being
initialized from its implicitly defined constructor, it's not
necessary to use rzalloc to allocate its memory.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
All member variables of ir_to_mesa_instruction are already being
initialized from its implicitly defined constructor, it's not
necessary to use rzalloc to allocate its memory.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
All member variables of vec4_live_variables are already being
initialized from its constructor, it's not necessary to use rzalloc to
allocate its memory, and doing so makes it more likely that we will
start relying on the allocator to zero out all memory if the class is
ever extended with new member variables.
That's bad because it ties objects to some specific allocation scheme,
and gives unpredictable results when an object is created with a
different allocator -- Stack allocation, array allocation, or
aggregation inside a different object are some of the useful
possibilities that come to my mind.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
All member variables of fs_live_variables are already being
initialized from its constructor, it's not necessary to use rzalloc to
allocate its memory, and doing so makes it more likely that we will
start relying on the allocator to zero out all memory if the class is
ever extended with new member variables.
That's bad because it ties objects to some specific allocation scheme,
and gives unpredictable results when an object is created with a
different allocator -- Stack allocation, array allocation, or
aggregation inside a different object are some of the useful
possibilities that come to my mind.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
All member variables of fs_inst are already being initialized from its
constructor, it's not necessary to use rzalloc to allocate its memory,
and doing so makes it more likely that we will start relying on the
allocator to zero out all memory if the class is ever extended with
new member variables.
That's bad because it ties objects to some specific allocation scheme,
and gives unpredictable results when an object is created with a
different allocator -- Stack allocation, array allocation, or
aggregation inside a different object are some of the useful
possibilities that come to my mind.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
All member variables of ip_record are already being initialized from
its constructor, it's not necessary to use rzalloc to allocate its
memory, and doing so makes it more likely that we will start relying
on the allocator to zero out all memory if the class is ever extended
with new member variables.
That's bad because it ties objects to some specific allocation scheme,
and gives unpredictable results when an object is created with a
different allocator -- Stack allocation, array allocation, or
aggregation inside a different object are some of the useful
possibilities that come to my mind.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The cfg_t object relies on the memory allocator zeroing out its
contents before it's initialized, which is quite an unusual practice
in the C++ world because it ties objects to some specific allocation
scheme, and gives unpredictable results when an object is created with
a different allocator -- Stack allocation, array allocation, or
aggregation inside a different object are some of the useful
possibilities that come to my mind. Initialize all fields from the
constructor and stop using the zeroing allocator.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The bblock_t object relies on the memory allocator zeroing out its
contents before it's initialized, which is quite an unusual practice
in the C++ world because it ties objects to some specific allocation
scheme, and gives unpredictable results when an object is created with
a different allocator -- Stack allocation, array allocation, or
aggregation inside a different object are some of the useful
possibilities that come to my mind. Initialize all fields from the
constructor and stop using the zeroing allocator.
v2: Use zero initialization for numeric types instead of default construction.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
All member variables of ast_type_qualifier are already being
initialized from its implicitly defined constructor, it's not
necessary to use rzalloc to allocate its memory.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
All member variables of ast_node are already being initialized from
its constructor, but some of its derived classes were leaving members
uninitialized -- Fix them.
Using rzalloc makes it more likely that we will start relying on the
allocator to zero out all memory if the class is ever extended with
new member variables. That's bad because it ties objects to some
specific allocation scheme, and gives unpredictable results when an
object is created with a different allocator -- Stack allocation,
array allocation, or aggregation inside a different object are some of
the useful possibilities that come to my mind.
v2: Use NULL initialization instead of default construction for pointers.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The vec4_instruction object relies on the memory allocator zeroing out
its contents before it's initialized, which is quite an unusual
practice in the C++ world because it ties objects to some specific
allocation scheme, and gives unpredictable results when an object is
created with a different allocator -- Stack allocation, array
allocation, or aggregation inside a different object are some of the
useful possibilities that come to my mind. Initialize all fields from
the constructor and stop using the zeroing allocator.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The _mesa_glsl_parse_state object relies on the memory allocator
zeroing out its contents before it's initialized, which is quite an
unusual practice in the C++ world because it ties objects to some
specific allocation scheme, and gives unpredictable results when an
object is created with a different allocator -- Stack allocation,
array allocation, or aggregation inside a different object are some of
the useful possibilities that come to my mind. Initialize all fields
from the constructor and stop using the zeroing allocator.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Several C++ source files include "main/uniforms.h" from an extern "C"
block, which is both unnecessary, because "uniforms.h" already checks
for a C++ compiler and sets the right linkage, and incorrect, because
the header file includes other C++ headers ("glsl_types.h" and
"ir_uniform.h") that are supposed to get C++ linkage.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
In commit 247f90c77e (i965/gs: Set
control data header size/format appropriately for EndPrimitive()), I
incorrectly numbered the DWORDs in the 3DSTATE_GS command starting
from 1 instead of starting from 0. This caused the control data
format to be programmed into the wrong DWORD, resulting in corruption
in some geometry shaders that used an output type of points.
This patch numbers the DWORDs starting from 0, as we do for all other
commands, which causes the control data format to be programmed into
the correct DWORD.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
The spec doesn't say GL_INVALID_VALUE should be raised for bufSize <= 0.
In any case, memcpy(len < 0) will lead to a crash, so don't allow it.
CC: "9.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Error checking bufSize isn't mentioned in the spec, but it is in the
man pages. However, I believe the man page is incorrect. Typically,
GL functions that take GLsizei parameters check that they're positive
or non-negative. Negative values don't make sense here.
A spec bug has been filed with Khronos/ARB.
v2: check for negative values, not <= 0.
This incorporates Vinson's change to check for a null src pointer as
detected by coverity.
Also, rename the function params to be src/dst, const-qualify src,
and use GL types to match the calling functions. And add some more
comments.
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
libllvmradeon.la is available whenever NEED_RADEON_LLVM is set, using
R600_NEED_RADEON_GALLIUM is rather ambiguous and unnecessary. Drop it
in favour of NEED_RADEON_LLVM.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
The mesa/drivers/dri/Makefile.am already guards the individual
targets/subdirs with HAVE_*_DRI before including them. Thus making
the additional check within each Makefile.am unnecessary.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
libdricommon.la is available whenever a non swrast driver is built.
All the classic dri drivers make use of the prebuild library but all
of the gallium ones rebuild it explicitly.
While we're here gallium/{llvm,soft}pipe does not require HAVE_COMMON_DRI
thus do not set in during configure.
v2: [Emil] Add commit message and drop HAVE_COMMON_DRI from configure.ac
v3: [Emil] Rebase and resolve targets/r*/dri conflicts
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
shader has already been dereferenced earlier so cannot be null here.
Fixes "Dereference before null check" defect reported by Coverity.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
The block size for all formats is currently at least 1 byte. Add an
assertion for this.
This should silence several Coverity "Division or modulo by zero"
defects.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
There is an earlier null check for draw so draw could be null here as
well.
Fixes "Dereference after null check" defect reported by Coverity.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Fix build errors.
CC surface.lo
surface.c: In function 'vlVdpVideoSurfaceGetBitsYCbCr':
surface.c:247:10: error: implicit declaration of function 'util_copy_rect' [-Werror=implicit-function-declaration]
CC output.lo
output.c: In function 'vlVdpOutputSurfaceGetBitsNative':
output.c:216:4: error: implicit declaration of function 'util_copy_rect' [-Werror=implicit-function-declaration]
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Fix build error.
CC device.lo
device.c: In function 'vlVdpDefaultSamplerViewTemplate':
device.c:251:4: error: implicit declaration of function 'util_format_description' [-Werror=implicit-function-declaration]
device.c:251:9: warning: assignment makes pointer from integer without a cast [enabled by default]
device.c:252:12: error: dereferencing pointer to incomplete type
device.c:252:28: error: 'UTIL_FORMAT_SWIZZLE_0' undeclared (first use in this function)
device.c:252:28: note: each undeclared identifier is reported only once for each function it appears in
device.c:254:12: error: dereferencing pointer to incomplete type
device.c:256:12: error: dereferencing pointer to incomplete type
device.c:258:12: error: dereferencing pointer to incomplete type
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
This patch fixes the build error introduced with commit
81bb98e928.
CC subpicture.lo
subpicture.c: In function 'upload_sampler':
subpicture.c:181:4: error: implicit declaration of function 'util_copy_rect' [-Werror=implicit-function-declaration]
subpicture.c: In function 'XvMCClearSubpicture':
subpicture.c:304:21: error: storage size of 'uc' isn't known
subpicture.c:328:4: error: implicit declaration of function 'util_fill_rect' [-Werror=implicit-function-declaration]
subpicture.c:304:21: warning: unused variable 'uc' [-Wunused-variable]
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
The svga/d3d9 convention is that pixel centers are at integer coordinates.
Fixes piglit glsl-arb-fragment-coord-conventions test.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Using the map/unmap path for glTexImage is a little bit faster
than blitting. Also, this fixes about 50 assorted piglit failures
that seem to be related to the blit version of glReadPixels.
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
u_rect.h was including u_surface.h just to avoid touching a bunch
of other source files after some functions were moved from u_rect.h
to u_surface.h. This patch cleans up that hack.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The format of the window system framebuffer changed from ARGB8888 to
SARGB8, but we're still supposed to render to it the same as ARGB8888
unless the user flipped the GL_FRAMEBUFFER_SRGB switch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
NOTE: This is a candidate for stable branches.
I believe this extension was enabled by accident. As far as I can tell,
there has never been any code in Mesa to actually support it. Not only
that, this extension is only useful in the common-lite profile, and Mesa
does the common profile.
This "fixes" the piglit test oes_matrix_get-api.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "9.1 9.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
For some reason that I don't yet fully understand, Glaze does not work with
libEGL unless libEGL is linked with -Bsymbolic.[*]
Beyond that specific reason, all of the reasons for which libGL.so is linked
with -Bsymbolic, (see the commit history), should also apply here.
[*] The specific behavior I am seeing is that when Glaze calls dlopen for
libEGL.so, ifunc resolvers within Glaze for EGL functions are called before
the dlopen returns. These resolvers cannot succeed, as they need the return
value from dlopen in order to find the functions to resolve to. I don't know
what's causing these resolvers to be called, but I have verified that linking
libEGL with -Bsymbolic causes this problematic behavior to stop.
CC: "9.1 and 9.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
From the bspec documentation of the SEND instruction:
"destination region cannot cross the 256-bit register boundary."
To avoid violating this restriction when executing SIMD16 texturing
operations (such as those used by blorp), we need to ensure that the
destination of the SEND instruction doesn't exceed 256 bits in size.
An easy way to do this is to set the type of the destination register
to UW (unsigned word), since 16 unsigned words can fit inside a
256-bit register. Fortunately, this has no effect on the sampling
operation, since the sampler always infers the destination data type
from the sampler message rather than from the type of the instruction
operand.
Previously, we did this for texturing operations issued by the vec4
and fs back-ends, but not for blorp. This patch makes blorp use the
same trick.
I haven't observed any behavioural difference on actual hardware due
to this patch, but it avoids a warning from the simulator so it seems
like the right thing to do.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Chad Versace <chad.versace@linux.intel.com>
We originally had a path just did the loop and called
ctx->Driver.AllocTextureImageBuffer(), which I moved into Mesa core. But
we can do better, avoiding incorrect miptree size guesses and later
texture validations by just directly allocating the miptree and setting it
to all the images.
v2: drop debug printf.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
As long as the baselevel, maxlevel still sit inside the range we had
previously validated, there's no need to reallocate the texture.
I also hope this makes our texture validation logic much more obvious.
It's taken me enough tries to write this change, that's for sure. Reduces
miptree copy count on a piglit run by 1.3%, though the change in amount of
data moved is much smaller.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Given that a teximage that calls us with this flag set will immediately
proceed to allocate the other levels, we can probably just go ahead and
allocate those levels now.
Reduces miptree copies in piglit by about .05%.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
If the caller shows up with GL_BASE_LEVEL != 0, it doesn't mean that the
texture will over the course of its lifetime have that nonzero baselevel,
it means that the caller is filling the texture from the bottom up for
some reason (one could imagine demand-loading detailed texture layers at
runtime, for example). If we allocate from just the current baselevel, it
means when they come along with the next level up, we'll have to allocate
a new miptree and copy all of our bits out of the first miptree.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Let's say you started allocating your 2D texture with level 2 of a tree as
a 1x1 image. The driver doesn't know if this means that level 0 is 4x4 or
4x1 or 1x4, so we would just allocate a single 1x1 and let it get copied
in to the real location at texture validate time later.
Since this is just a temporary allocation that *will* get copied, the
extra space allocation of just taking the normal path which will happen to
producing a 4x1 level 0, 2x1 level 1, and 1x1 level 2 is the right way to
go, to reduce complexity in the normal case.
No change in miptree copies over the course of a piglit run.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
This has no effect currently, because intel_finalize_mipmap_tree() always
makes mt->first_level == tObj->BaseLevel.
The change I made before to handle it
(b1080cfbdb) got very close to working, but
after fixing some unrelated bugs in the series, it still left
tex-miplevel-selection producing errors when testing textureLod(). The
problem is that for explicit LODs, the sampler's LOD clamping is ignored,
and only the surface's MIP clamping is respected. So we need to use
surface mip clamping, which applies on top of the sampler's mip clamping,
so the sampler change gets backed out.
Now actually tested with a non-regressing series producing a non-zero
computed baselevel.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
We know that the object's mt is equal to the firstimage's mt because it's
gone through intel_finalize_mipmap_tree(). Saves a lookup of firstimage
on pre-gen7.
v2: Merge in the warning fix that appeared later in the series (noted by
Chad)
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
The function r600_choose_tiling is new and needs a review.
The only change in functionality is that it enables 2D tiling for compressed
textures on SI. It was probably accidentally turned off.
v2: don't make scanout buffers linear
More work needs to be done for this to be entirely shared with r600g.
I'm just trying to share r600_texture.c now.
The reason I put the implementation to si_descriptors.c is that the emit
function had already been there.
This doesn't fix any known issue (I haven't run piglit with this yet),
but the code was obviously completely wrong. It looks like copy-pasted from CMP.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
If the argument to emit_bool_to_cond_code() is an ir_expression, we
loop over the operands, calling accept() on each of them, which
generates assembly code to compute that subexpression. We then emit
one or two final instruction that perform the top-level operation on
those operands.
If it's not an expression (say, a boolean-valued variable), we simply
call accept() on the whole value.
In commit 80ecb8f1 (i965/fs: Avoid generating extra AND instructions on
bool logic ops), Eric made logic operations jump out of the expression
path to the non-expression path.
Unfortunately, this meant that we would first accept() the two operands,
skip generating any code that used them, then accept() the whole
expression, generating code for the operands a second time.
Dead code elimination would always remove the first set of redundant
operand assembly, since nothing actually used them. But we shouldn't
generate it in the first place.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Ironlake's counters are always enabled; userspace can simply send a
MI_REPORT_PERF_COUNT packet to take a snapshot of them. This makes it
easy to implement.
The counters are documented in the source code for the intel-gpu-tools
intel_perf_counters utility.
v2: Adjust for core data structure changes. Add a table mapping buffer
object offsets to exposed counters (which changes each generation).
Finally, add report ID assertions to sanity check the BO layout
(thanks to Carl Worth).
v3: Update for core BeginPerfMonitor hook changes (requested by Brian).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This provides an interface for applications (and OpenGL-based tools) to
access GPU performance counters. Since the exact performance counters
available vary between vendors and hardware generations, the extension
provides an API the application can use to get the names, types, and
minimum/maximum values of all available counters. Counters are also
organized into groups.
Applications create "performance monitor" objects, select the counters
they want to track, and Begin/End monitoring, much like OpenGL's query
API. Multiple monitors can be in flight simultaneously.
v2: Pass ctx to all driver hooks (suggested by Christoph), and attempt
to fix overallocation of bitsets (caught by Christoph). Incomplete.
v3: Significantly rework core data structures. Store counters in groups
rather than in a global list. Use their array index in the group's
counter list as the ID rather than trying to store a globally unique
counter ID. Use bitsets for active counters within a group, and
also track which groups are active so that's easy to query.
v4: Remove _mesa_ prefix on static functions; detect out of memory
conditions in new_performance_monitor(); make BeginPerfMonitor hook
return a boolean rather than setting m->Active or raising an error.
Switch to GLuint/unsigned for NumGroups, NumCounters, and
MaxActiveCounters (which also means switching a bunch of temporary
variable types). All suggested by Brian Paul. Also, remove
commented out code at the bottom of the block. Finally, fix the
dispatch sanity test (noticed by Ian Romanick).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com> [v3]
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This is better than overriding the extension enable based on the
language version; it's robust against shaders that do:
#version 140
#extension GL_ARB_uniform_buffer_object : disable
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Explicit attribute locations are supported with GLSL 3.30, GLSL ES 3.00,
or "#extension GL_ARB_explicit_attrib_location: enable". Using a helper
function makes it easy to check for this.
This enables support in GLSL 3.30, which was previously missing.
Previously, we overrode the extension enable flag for ES 3.00. This is
not robust against a shader such as:
#version 330
#extension GL_ARB_explicit_attrib_location : disable
Disabling extensions should not remove core language functionality.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Hardware requires the magnitude of the largest component to not exceed
1; brw_cubemap_normalize ensures that this is the case.
Unfortunately, we would previously multiply the array index for cube
arrays by the normalization factor. The incorrect array index would then
cause the sampler to attempt to access either the wrong cube, or memory
outside the cube surface entirely, resulting in garbage rendering or in
the worst case, hangs.
Alter the normalization pass to only multiply the .xyz components.
Fixes broken rendering in the arb_texture_cube_map_array-cubemap piglit,
which was recently adjusted to provoke this behavior.
V2: Fix indent.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Cc: "9.2" mesa-stable@lists.freedesktop.org
Reviewed-by: Eric Anholt <eric@anholt.net>
Compress empty triangles (don't emit more than one in a row) and
never emit empty triangles if we already generated a triangle
covering a non-null area. We can't skip all null-triangles
because c_primitives expects ones that were generated from vertices
exactly at the clipping-plane, to be emitted.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
We need to count the clipper primitives before the rasterizer
discards one it considers to be null.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
We need to subdivide triangles if either of the dimensions is
larger than the max edge length, not when both of them are larger.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The fix is at the end (TGSI_TEXTURE_SHADOWCUBE handling), but I also
restructured the code for it to be more readable.
Fixes spec/!OpenGL 3.0/sampler-cube-shadow.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
This fixes some piglits, e.g:
spec/!OpenGL 3.0/required-renderbuffer-attachment-formats.
This can be ported to r600g.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Only create one screen for each winsys instance.
This helps with buffer sharing and interop handling.
v2: rebased and some minor cleanup
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This reverts commit 755c11dc5e.
We agreed that this is band-aid that's not very useful and
the proper solution is to rewrite the rasterization algo
so that it operates on 64 bit values.
Signed-off-by: Zack Rusin <zackr@vmware.com>
No such argument exists since this commit:
commit 92f3fca0ea
Author: Ian Romanick <ian.d.romanick@intel.com>
AuthorDate: Sun Aug 21 17:23:58 2011 -0700
Commit: Ian Romanick <ian.d.romanick@intel.com>
CommitDate: Tue Aug 23 14:52:09 2011 -0700
mesa: Remove target parameter from dd_function_table::BufferSubData
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When subdiving a triangle we're using a temporary array to store
the new coordinates for the subdivided triangles. Unfortunately
the array used for that was not aligned properly causing
random crashes in the llvm jit code which was trying to load
vectors from it.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
This patch fixes the MSVC build error introduced by commit
673129e0b9.
enums.c
mesa\main\enums.c(3776) : error C2143: syntax error : missing ';' before 'type'
mesa\main\enums.c(3781) : error C2065: 'elt' : undeclared identifier
mesa\main\enums.c(3781) : warning C4047: '!=' : 'int' differs in levels of indirection from 'void *'
mesa\main\enums.c(3782) : error C2065: 'elt' : undeclared identifier
mesa\main\enums.c(3782) : error C2223: left of '->offset' must point to struct/union
mesa\main\enums.c(3782) : warning C4033: '_mesa_lookup_enum_by_nr' must return a value
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Normally, LD_PRELOAD will take precedence over your own symbols, which you
want for things like malloc() in libc. But we don't have any local
symbols we would want overridden (like hash_table_insert(), for example!),
so tell the linker to resolve them internally. This also avoids calls
through the PLT.
Saves almost 100k on libdricore's size, and gets us a bunch of the
performance back that we had with non-dricore.
Reviewed-by: Ian Romanick <ian.d.romanick@.intel.com>
This gives the compiler the chance to inline and not export class symbols
even in the absence of LTO. Saves about 60kb on disk.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@.intel.com>
Noticed while grepping through the code for something else.
v2: Don't convert really-runtime asserts to static asserts.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@.intel.com>
Since it's only used for debug information, we can misalign the struct and
save the disk space. Another 19k on a 64-bit build.
v2: Make a compiler.h macro to only use the attribute if we know we can.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@.intel.com>
Now that there's no name -> enum direction, we can drop the extra strings,
and merge the offsets table and the reduced_enums table.
Between the previous commit and this one, Mesa core drops by 30k.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@.intel.com>
It's been unused for a long time. I stopped digging through git history
as of 2009.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@.intel.com>
Unfortunately d3d10 requires a lot higher precision (e.g.
wgf11clipping tests for it). The smallest number of precision
bits with which it passes is 8. That means that we need to
decrease the maximum length of an edge that we can handle without
subdivision by 4 bits. Abstracted the code a bit to make it easier
to change once to switch to 64bit rasterization.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
This patch fixes these MSVC build errors.
ir_constant_expression.cpp
src\glsl\ir_constant_expression.cpp(564) : warning C4244: '=' : conversion from 'int' to 'float', possible loss of data
src\glsl\ir_constant_expression.cpp(1384) : error C3861: 'isnormal': identifier not found
src\glsl\ir_constant_expression.cpp(1385) : error C3861: 'copysign': identifier not found
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=69541
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Acked-by: Matt Turner <mattst88@gmail.com>
Share the winsys between different fd's if they point to the same device.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Waiting for an empty queue is nonsense and can lead to deadlocks if we have
multiple waiters or another thread that continuously sends down new commands.
Just post the cs to the queue and immediately wait for it to finish.
This is a candidate for the stable branch.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Kill the thread only after we checked that it's not used any more, not before.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The rescale_texcoord(), if it does something, will return just the
GLSL-sized coordinate, leaving out the 3rd and 4th components where we
were storing our projected shadow compare and the texture projector.
Deref the shadow compare before using the shared rescale-the-coordinate
code to fix the problem.
Fixes piglit tex-shadow2drect.shader_test and txp-shadow2drect.shader_test
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=69525
NOTE: This is a candidate for stable branches.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Probably non-intentional, but the SURFACE_STATE setup refactoring
for buffer surfaces had missed the scs bits when creating constant
surface states.
Fixes broken GLB 2.5 on Haswell where the knight's textures are missing
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These classes declared a placement new operator, but didn't declare a
delete operator. Switching to the macro gives them a delete operator,
which probably is a good idea anyway.
This also eliminates a lot of boilerplate.
v2: Properly use RZALLOC in Mesa IR/TGSI translators. Caught by Eric
and Chad.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Most of our C++ classes define placement new and delete operators so we
can do convenient allocation via:
thing *foo = new(mem_ctx) thing(...)
Currently, this is done via a lot of boilerplate. By adding simple
macros to ralloc, we can condense this to a single line, making it
trivial to add this feature to a new class.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Allocate a CMASK on demand and use it to fast clear single-sample
colorbuffers. Both FBOs and window system colorbuffers are fast
cleared. Expand as needed when colorbuffers are mapped or displayed
on screen.
v2: cosmetics, move transfer expansion into dma_blit
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
r600g needs explicit flushing before DRI2 buffers are presented on the screen.
v2: add (stub) implementations for all drivers, fix frontbuffer flushing
v3: fix galahad
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
We must take rounding in consideration when re-scaling to narrow
normalized channels, such as 2-bit normalized alpha.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
GL_MAX_GEOMETRY_TEXTURE_IMAGE_UNITS, GL_MAX_GEOMETRY_OUTPUT_VERTICES,
GL_MAX_GEOMETRY_TOTAL_OUTPUT_COMPONENTS, and
GL_MAX_GEOMETRY_UNIFORM_COMPONENTS all have the same enum value and
meaning as their _ARB counterparts.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
The comment '# GL 3.0 / GLES3' was incorrect. The
MAX_VERTEX_OUTPUT_COMPONENTS and MAX_FRAGMENT_INPUT_COMPONENTS queries
were added in OpenGL 3.2 (with geometry shaders) and OpenGL ES 3.0.
This just fixes that comment.
v2: Add the GEOMETRY queries in the existing '# GL 3.2' section since
they have nothing to do with GLES3. Suggested by Paul.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
In OpenGL ES 3.0 the minimum-maximum for GL_MAX_VERTEX_OUTPUT_VECTORS is 16,
but the minimum-maximum for GL_MAX_FRAGMENT_INTPUT_VECTORS is 15.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
In OpenGL ES 3.0 the minimum-maximum for GL_MAX_VERTEX_OUTPUT_VECTORS is 16,
but the minimum-maximum for GL_MAX_FRAGMENT_INTPUT_VECTORS is 15.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Now that MaxVaryings is > 16, VertexProgram.MaxOutputComponents,
GeometryProgram.MaxInputComponents, GeometryProgram.MaxOutputComponents,
and FragmentProgram.MaxInputComponents also need to be set.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: Paul Berry <stereotype441@gmail.com>
Previously gl_constants::MaxVaryingComponents was used. Now
gl_constants::VertexProgram::MaxOutputs and
gl_constants::GeometryProgram::MaxOutputs are used.
This means that st_extensions.c had to be updated to set these fields
instead of MaxVaryingComponents. It was previously the only place that
set MaxVaryingComponents.
I believe that the structure is allocated by calloc, so the value should
be initialized to zero in non-Gallium drivers before and after my
change. Right now nobody enables GL_ARB_geometry_shader4, so it's
pretty much dead code anyway.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Cc: Zack Rusin <zackr@vmware.com>
In OpenGL 3.2 these are independently queryable. In addition, the spec
has different minimum-maximums for various values.
GL_MAX_VERTEX_OUTPUT_COMPONENTS is 64, but
GL_MAX_GEOMETRY_OUTPUT_COMPONENTS (and GL_MAX_FRAGMENT_INPUT_COMPONENTS)
is 128.
In OpenGL ES 3.0 these are also independently queryable. The spec has
different minimum-maximums for various values.
GL_MAX_VERTEX_OUTPUT_VECTORS is 16, but GL_MAX_FRAGMENT_INTPUT_VECTORS
is 15.
None of these values are used yet. I have just added space to the
structures. Future patches will add users and eventually remove some
old fields.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Cc: Zack Rusin <zackr@vmware.com>
This was an embarassingly large amount of copy and pasted code,
and it wasn't particularly simple code either. By factoring it out
into a helper function, we consolidate the complexity.
v2: Properly NULL-check bo. Caught by Eric Anholt.
v3: Do the subtraction by 1 in gen7_emit_buffer_surface_state, rather
than making callers do it. This makes the buffer_size parameter
the actual size of the buffer. Suggested by Paul Berry.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This was an embarassingly large amount of copy and pasted code,
and it wasn't particularly simple code either. By factoring it out
into a helper function, we consolidate the complexity.
v2: Properly NULL-check bo. Caught by Eric Anholt.
v3: Do the subtraction by 1 in gen7_emit_buffer_surface_state, rather
than making callers do it. This makes the buffer_size parameter
the actual size of the buffer. Suggested by Paul Berry.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The value that's split into width/height/depth needs to be the size of
the buffer minus one. This makes it consistent with the constant buffer
and shader time SURFACE_STATE setup code.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This fixes myriads of regressions since commit 169f9c030c
("i965: Add an assertion that writemask != NULL for non-ARFs.").
On Sandybridge, our control flow handling (such as brw_IF) does:
brw_set_dest(p, insn, brw_imm_w(0));
insn->bits1.branch_gen6.jump_count = 0;
This results in a IMM destination with zero for the writemask. IMM
destinations are rather bizarre, but the code has been working for ages,
so I'm loathe to change it.
Fixes glxgears on Sandybridge.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
I would use _mesa_delete_shader, but it's declared static, and we don't
really need any of the stuff in it anyway.
This fixes a memory leak caught by Valgrind.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
&a and &b are the address of the local stack variables, not the actual
structures. Instead of comparing the fields of a and b, we compared
...some stack memory.
Not a candidate for stable since GS code doesn't exist in 9.2.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
The code to upload the binding tables for each stage was scattered
across brw_{vs,gs,wm}_surface_state.c and brw_misc_state.c, which also
contain a lot of code to populate individual SURFACE_STATE structures.
This patch brings all the binding table upload code together, and splits
it out from the code which fills in SURFACE_STATE entries.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This is not quite the same: brw_upload_binding_table() also has code to
early-return if there are no entries, while the existing code did not.
The PS binding table is unlikely to be empty since it will have at least
one color buffer. If it ever is empty, early returning seems wise.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Instead of passing in a brw_vec4_prog_data structure, we can simply
pass the one field it needs: the number of entries in the binding table.
We also need to pass in the shader time surface index rather than
hardcoding SURF_INDEX_VEC4_SHADER_TIME.
Since the resulting function is stage-agnostic, this patch removes
"vec4_" from the name.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This is probably more efficient. At any rate, it's less code.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The first comment was a bit stale; there are more kinds of surfaces than
textures and pull constants.
The second was a leftover "to do" comment for something I already did.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
xlib_sw_winsys.h:5:22: fatal error: X11/Xlib.h: No such file or directory
The compiler cannot find the Xlib.h in the installed system headers.
All supplied include directives point to inside the mesa module.
The X11_CFLAGS variable is undefined (not defined in config.status).
It appears the intent was to use X11_INCLUDES defined in configure.ac.
The Xlib.h file is not installed on my workstation. It is supplied in
the libx11-dev package. This allows an X developer control over which
version of this file is used for X development.
Signed-off-by: Gaetan Nadon <memsize@videotron.ca>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
egl_glx.c:40:22: fatal error: X11/Xlib.h: No such file or directory
The compiler cannot find the Xlib.h in the installed system headers.
All supplied include directives point to inside the mesa module.
The X11_CFLAGS variable is undefined (not defined in config.status).
It appears the intent was to use X11_INCLUDES defined in configure.ac.
The Xlib.h file is not installed on my workstation. It is supplied in
the libx11-dev package. This allows an X developer control over which
version of this file is used for X development.
Signed-off-by: Gaetan Nadon <memsize@videotron.ca>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Technically without seamless filtering enabled GL allows any wrap mode, which
made sense when supporting true borders (can get seamless effect with border
and CLAMP_TO_BORDER), but gallium doesn't support borders and d3d9 requires
wrap modes to be ignored and it's a pain to fix up the sampler state (as it
makes it texture dependent). It is difficult to imagine a situation where an
app really wants another behavior so just cheat here. (It looks like some
graphics hw (intel) actually requires this too hence it should be safe.)
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This removes a lot of code, but not everything, as util_blit_pixels_tex
is still useful when one needs to override pipe_sampler_view::swizzle_?.
Reviewed-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
By calling util_map_texcoords2d_onto_cubemap.
A new parameter for util_blit_pixels_tex is necessary, as
pipe_sampler_view::first_layer is always supposed to point to the first
face when sampling from cubemaps.
Reviewed-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The previous names were really confusing to talk about:
- brw_fs_visitor() contained methods named emit_whatever().
- brw_fs_generator() contained methods named generate_whatever(), but
lived in brw_fs_emit.cpp.
So when someone said "the emit layer", or "emit code", we weren't sure
whether they meant the visitor's emit() functions or the generator in
brw_fs_emit.cpp.
By renaming these files, the method names, class names, and file names
all match, which is much less confusing.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Paul Berry <stereotype441@gmail.com>
Acked-by: Eric Anholt <eric@anholt.net>
I initially implemented frexp() as an IR opcode with a lowering pass,
but since it returns a value and has an out-parameter, it would break
assumptions our optimization passes make about ir_expressions being pure
(i.e., having no side effects).
For example, if opt_tree_grafting encounters this code:
uniform float u;
void main()
{
int exp;
float f = frexp(u, out exp);
float g = float(exp)/256.0;
float h = float(exp) + 1.0;
gl_FragColor = vec4(f, g, h, g + h);
}
it may try to optimize it to this:
uniform float u;
void main()
{
int exp;
float g = float(exp)/256.0;
float h = float(exp) + 1.0;
gl_FragColor = vec4(frexp(u, out exp), g, h, g + h);
}
Some hardware has an instruction which performs frexp(), but we would
need some other compiler infrastructure to be able to generate it, such
as an intrinsics system that would allow backends to emit specific code
for particular bits of IR.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Note the parameter name change in the int version of ir_constant, to
avoid the conflict with the loop iterator.
v2: Make analogous change to builtin_builder::imm().
Reviewed-by: Paul Berry <stereotype441@gmail.com>
These data structures are used for debug output, so it wasn't hurting
anything that there were missing bits. But it's good to keep things
up to date.
This patch also adds static asserts so that the {brw,cache}_bits[]
arrays are the proper size, so that we don't forget to add to them in
the future. Unfortunately there's no convenient way to assert that
mesa_bits[] is the proper size.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If the geometry shader refers to the built-in variable
gl_PrimitiveIDIn, we need to set a bit in 3DSTATE_GS to tell the
hardware to dispatch primitive ID to r1, and we need to leave room for
it when allocating registers.
Note: this feature doesn't yet work properly when software primitive
restart is in use (the primitive ID counter will incorrectly reset
with each primitive restart, since software primitive restart works by
performing multiple draw calls). I plan to address that in a future
patch series.
Fixes piglit test "spec/glsl-1.50/execution/geometry/primitive-id-in".
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When we previously implemented primitive restart, we didn't add cases
to brw_primitive_restart.c's can_cut_index_handle_prims() for the
primitive types that are introduced with geometry shaders. It turns
out that all of the new primitive types are supported by hardware
primitive restart.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
As part of its support for geometry shaders, GL 3.2 introduces four
new primitive types: GL_LINES_ADJACENCY, GL_LINE_STRIP_ADJACENCY,
GL_TRIANGLES_ADJACENCY, and GL_TRIANGLE_STRIP_ADJACENCY.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Simply adjust wrap mode to clamp_to_edge. This is all that's needed for a
correct implementation for nearest filtering, and it's way better than
using repeat wrap for instance for linear filtering (though obviously this
doesn't actually do seamless filtering).
v2: fix s/t wrap not r/s...
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Specifying a miptree layout makes no sense for constant buffers.
This has no functional change since BRW_SURFACE_MIPMAPLAYOUT_BELOW is
just a #define for 0.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
This paves the way for using gen7_upload_constant_state for PS data.
The formula is copied from gen7_wm_state.c.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
GL 3.2 requires us to support 128 varying components for geometry
shader outputs and fragment shader inputs, and 64 varying components
otherwise. But there's no hardware limitation that restricts us to 64
varying components, and core Mesa doesn't currently allow different
stages to have different maximum values, so just go ahead and enable
128 varying components for all stages. This gets us better test
coverage anyway.
Even though we are only working on GL 3.2 support for gen7 right now,
gen6 also supports 128 varying components, so go ahead and switch it
on there too.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously we only ever did 1 URB write, since the maximum number of
varyings we support is small enough to fit in 1 URB write (when using
BRW_URB_SWIZZLE_NONE, which is what the pre-Gen7 GS always uses). But
we're about to increase the number of varying components we support
from 64 to 128.
With 128 varyings, the most URB writes we'll have to do is 2, but it's
just as easy to write a general-purpose loop.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The "{VS,GS} URB Entry Allocation Size" fields of 3DSTATE_URB allow
values in the range 0-4, but they are U8-1 fields, so the range of
possible allocation sizes is 1-5. We were erroneously prohibiting a
size of 5.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously we only ever did 1 or 2 URB writes, since the maximum
number of varyings we support is small enough to fit in 2 URB writes.
But GL 3.2 requires the geometry shader to support 128 output varying
components, and this could require up to 3 URB writes.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Since the SF/SBE stage is only capable of performing arbitrary
reorderings of 16 varying slots, we can't arrange the fragment shader
inputs in an arbitrary order if there are more than 16 input varying
slots in use. We need to make sure that slots 16-31 match the
corresponding outputs of the previous pipeline stage.
The easiest way to accomplish this is to just make all varying slots
match up with the previous pipeline stage.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The for loop was rather silly. In addition to checking brw->gen < 6
on each loop iteration, it took pains to exclude bits from
fp->Base.InputsRead that don't correspond to fragment shader inputs.
But those bits would never have been set in the first place, since the
only bits that are ever set in fp->Base.InputsRead are fragment shader
inputs.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now that the vertex shader output VUE map is determined solely by a
64-bit bitfield, we don't have to store it in its entirety in the
geometry shader program key; instead, we can just store the bitfield,
and let the geometry shader infer the VUE map at compile time.
This dramatically reduces the size of the geometry shader program key,
which we want to keep small since it gets recomputed whenever the
active program changes.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, on Gen6+, we laid out the vertex (or geometry) shader VUE
map differently depending whether user clipping was active. If it was
active, we put the clip distances in slots 2 and 3 (where the clipper
expects them); if it was inactive, we assigned them in the order of
the gl_varying_slot enum.
This made for unnecessary recompiles, since turning clipping on/off
for a shader that used gl_ClipDistance might rearrange the varyings.
It also required extra bookkeeping, since it required the user
clipping flag to be provided to brw_compute_vue_map() as a parameter.
With this patch, we always put clip distances at in slots 2 and 3 if
they are written to. do_vs_prog() and do_gs_prog() are responsible
for ensuring that clip distances are written to when user clipping is
enabled (as do_vs_prog() previously did for gen4-5).
This makes the only input to brw_compute_vue_map() a bitfield of which
varyings the shader writes to, a fact that we'll take advantage of in
forthcoming patches.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, if a fragment shader accessed gl_FragCoord or
gl_FrontFacing, we would assign them their own slots in the fragment
shader input attribute array, using up space that could be made
available to real varyings. This was not strictly necessary (since
these values are not true varyings, and are instead computed from
other data available in the FS payload). But we had to do it anyway
because the SF/SBE setup code assumed that every 1 bit in the
gl_program::InputsRead bitfield corresponded to a genuine varying
variable.
Now that the SF/SBE code consults brw_wm_prog_data and only sets up
the attributes that the fragment shader actually needs, we don't have
to do this anymore.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, the SF/SBE setup code delivered varying inputs to the FS
in the order in which they appear in the gl_program::InputsRead
bitfield, since that's what the FS expects.
When we add support for more than 64 varying components, this will no
longer always be the case, because the Gen6+ SF/SBE stage is only
capable of performing arbitrary reorderings of 16 varying slots. So,
when there are more than 16 vec4's worth of varying inputs, the FS
will have to adjust the order its input varyings in order to partially
match the order of outputs from the geometry or vertex shader.
To allow extra flexibility in the ordering of FS varyings, this patch
causes the SF/SBE to deliver varying inputs to the FS in exactly the
order that the FS requests, by consulting brw_wm_prog_data::urb_setup
and brw_wm_prog_data::num_varying_inputs.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We always program the SF unit to start reading the vertex URB entry at
offset 1. In upcoming patches, we'll be adding FS code that relies on
this. So consistently use the constant BRW_SF_URB_ENTRY_READ_OFFSET
rather than hardcoding a 1.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, we assumed that the number of varying inputs consumed by
the fragment shader was equal to the number of bits set in
gl_program::InputsRead. However, we'll soon be making two changes
that will cause that not to be true:
- We'll stop wasting varying input space for gl_FragCoord and
gl_FrontFacing, which aren't varyings.
- For fragment shaders that have more than 16 varying inputs, we'll
adjust the layout of the inputs to account for the fact that the
SF/SBE pipeline stage can't reorder inputs beyond the first 16; if
there are GS outputs that the FS doens't use (or vice versa) this
may cause the number of FS varying inputs to change.
So, instead of trying to guess the number of FS inputs from
gl_program::InputsRead, simply read it from
brw_wm_prog_data:num_varying_inputs, which is guaranteed to be correct
since it's populated by fs_visitor::calculate_urb_setup().
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
On gen4-5, the FS stage reads varying inputs from URB entries that
were output by the SF thread, where each register stores the
interpolation setup for two components of a vec4, therefore the FS
urb_read_length is twice the number of FS input varyings. On gen6+,
varying inputs are directly deposited in the FS payload by the SF/SBE
fixed function logic, so urb_read_length is irrelevant.
However, in future patches, it will be nice to be able to consult
brw_wm_prog_data to determine how many varying inputs the FS expects
(rather than inferring it from gl_program::InputsRead). So instead of
storing urb_read_length, we simply store num_varying_inputs in
brw_wm_prog_data. On gen4-5, we multiply this by 2 to recover the URB
read length.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
At the moment, for Gen6+, the FS assumes that all varying inputs are
delivered to it in the order in which they appear in the
gl_program::InputsRead bitfield, and the SF/SBE setup code ensures
that they are delivered in this order.
When we add support for more than 64 varying components, this will no
longer always be possible, because the Gen6+ SF/SBE stage is only
capable of performing arbitrary reorderings of 16 varying slots.
To allow extra flexibility in the ordering of FS varyings, this patch
causes the FS to advertise exactly what ordering it expects.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
For each sampler type, this tests that:
- The base type is GLSL_TYPE_SAMPLER.
- The dimensionality is set correctly.
- The returned data type is correct.
- The sampler_array and sampler_shadow flags are set correctly.
- sampler_coordinate_components() returns the correct value.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <idr@freedesktop.org>
It seems a user app can get us into this state, I trigger the fail
running fbo-maxsize inside virgl, it fails to create the backing
storage for the texture object, but then segfaults here when it
should fail the completeness test.
Cc: "9.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Fix the return type and allow src and dst types for comparison
to be separate, this at least fixes the two test cases I've written.
v2: drop the u32->s32 change
Acked-by: Christoph Bumiller <christoph.bumiller@speed.at>
Signed-off-by: Dave Airlie <airlied@redhat.com>
When the old contents do not need to be preserved, it is faster to
create a new backing bo rather than stall.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
max_index may be 0xffffffff. The hardware does not need 1 + max_index
(although it does not hurt unless max_index wraps around to zero).
Signed-off-by: Rob Clark <robclark@freedesktop.org>
For mem->gmem we don't sample depth/stencil as it's native type. So we
need to setup the swizzle state for the sampler based on the format used
for sampling.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Needed by some games, like etuxracer and supertuxkart which use alpha
test rather than blending, to handle texture transparency.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
With a debug option to force DIRECT (mainly to make it easier for
capturing cmdstream dumps). Using INDIRECT for large shaders at least
makes a noticable reduction in CPU load, which helps for CPU limited
games.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Because of how the tiling works, we can't really flush at arbitrary
points very easily. So wraparound is handled by resetting to top of
ringbuffer. Previously this would stall until current rendering is
complete. Instead cycle through multiple ringbuffers to avoid a stall.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Emit markers by writing to scratch registers in order to "triangulate"
gpu lockup position from post-mortem register dump. By comparing
register values in post-mortem dump to command-stream, it is possible to
narrow down which DRAW_INDX caused the lockup.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Have a single helper that all draws come through.. mainly for a
convenient debug and instrumentation point.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
The varying-out config comes from the inputs of the frag shader (so that
we aren't exporting unneeded varyinges). The varyings-count should come
from the frag shader as well, to avoid a discrepency in configuration
and resulting gpu lockup.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
These functions are defined in EXT_texture_array, which makes no
mention of what shader types they should be allowed in. At the time
EXT_texture_array was introduced, functions ending in "Lod" were
available only in vertex shaders, however this restriction was lifted
in later spec versions and extensions.
We already have the function lod_exists_in_stage() for figuring out
whether functions ending in "Lod" should be available, so just re-use
that.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Since BRW_MAX_WM_SURFACES is greater than BRW_MAX_VEC4_SURFACES, the
existing array isn't large enough to be used by the WM. Increasing it
will make it possible to share them.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
These are largely based on the similar fields in brw->wm.
v2: Add a better comment than "Scratch buffer".
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Everyone at the Khronos meeting was as surprised that GLSL didn't
already support this as we were. Several vendors said they'd ship it,
but there didn't seem to be enough interest to put in the effort to make
it ARB or KHR.
v2: Fix a couple typos and rename the spec file to
EXT_shader_integer_mix.spec. Suggested by Roland.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
start_instance doesn't affect gl_InstanceID.
There's no piglit test, but it's kinda obvious the code was wrong.
Reviewed-by: Christian König <christian.koenig@amd.com>
The shader is responsible for writing to streamout buffers using
the TBUFFER_STORE_FORMAT_* instructions.
The locations of some input SGPRs and VGPRs are assigned dynamically, because
the input SGPRs controlling streamout are not declared if they are not needed,
decreasing the indices of all following inputs.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
So that the "init" state is always emitted first and not later in draw_vbo.
This fixes streamout where the "init" state, which disables streamout,
was emitted in draw_vbo after streamout was enabled.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
This could happen if set_stream_output_targets is called twice
in a row without a draw call in between.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Paused transform feedback objects may refer to a program other than the
current program. If any active objects refer to a program, LinkProgram
must reject the request to relink.
The code to detect this is ugly since _mesa_HashWalk is awkward to use,
but unfortunately we can't use hash_table_foreach since there's no way
to get at the underlying struct hash_table (and even then, we'd need to
handle locking somehow).
Fixes the last subcase of Piglit's new ARB_transform_feedback2
api-errors test.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This is actually a pretty important error condition: otherwise, you
could set up transform feedback with one program, and resume it with
a program that generates a completely different set of outputs.
Fixes a subcase of Piglit's new ARB_transform_feedback2 api-errors test.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The next few patches will use this for API error checking.
All of the drivers appear to CALLOC_STRUCT transform feedback objects,
so this should be properly NULL initialized on creation.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Fixes a subcase of Piglit's new ARB_transform_feedback2 api-errors test.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Move the code back into the common UVD files since we now
have base structures for R600 and radeonsi.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Fixes FTBFS on kfreebsd-*
Debian GNU/kFreeBSD doesn't provide getprogname() since it uses stdlib.h
from glibc. Instead it provides program_invocation_short_name from glibc.
You can find the same order in src/mesa/drivers/dri/common/xmlconfig.c
Cc: "9.2" <mesa-stable@lists.freedesktop.org>
Tested-by: Julien Cristau <jcristau@debian.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
It was wrong for EXP.y, as we clamped the source before computing the
fractional part, and this opcode should be rarely used, so it's not
worth the hassle.
We used to pass the number of components actually used for the
coordinate (rather than padding, shadow comparitors, and projectors) by
hand, specifying it on every _texture() call.
The new helper function can just compute this, eliminating a lot of
potential mistakes.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
This computes the number of components necessary to address a sampler
based on its dimensionality. It will be useful for texturing built-ins.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
It is planned to ship openSUSE 13.1 with -shared libs.
nouveau.la, nv30.la, nv50.la and nvc0.la are currently LIBADDs in all nouveau
related targets.
This change makes it possible to easily build one shared libnouveau.so which is
then LIBADDed.
Also dlopen will be faster for one library instead of three and build time on
-jX will be reduced.
Whitespace fixes were requested by 'git am'.
Signed-off-by: Johannes Obermayr <johannesobermayr@gmx.de>
Acked-by: Christoph Bumiller <christoph.bumiller@speed.at>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
According to GLSL, the shader may call EndPrimitive() at any point
during its execution, causing the line or triangle strip currently
being output to be terminated and a new strip to be begun.
This is implemented in gen7 hardware by using one control data bit per
vertex, to indicate whether EndPrimitive() was called after that
vertex was emitted.
In order to make this work without sacrificing too much efficiency, we
accumulate 32 control data bits at a time in a GRF. When we have
accumulated 32 bits (or when the shader terminates), we output them to
the appropriate DWORD in the control data header and reset the
accumulator to 0.
We have to take special care to make sure that EndPrimitive() calls
that occur prior to the first vertex have no effect.
Since geometry shaders that output a large number of vertices are
likely to be rare, an optimization kicks in if max_vertices <= 32. In
this case, we know that we can wait until the end of shader execution
before any control data bits need to be output.
I've tried to write the code in such a way that in the future, we can
easily adapt it to output stream ID bits (which are two bits/vertex
instead of one).
Fixes piglit tests "spec/glsl-1.50/glsl-1.50-geometry-end-primitive *".
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, brw_urb_WRITE() would always generate a URB_WRITE_HWORD
message, we always wanted to write data to the URB in pairs of varying
slots or larger (an HWORD is 32 bytes, which is 2 varying slots).
In order to support geometry shader EndPrimitive functionality, we'll
need the ability to write to just a single OWORD (16 byte) slot, since
we'll only be outputting 32 of the control data bits at a time. So
this patch adds a flag that will cause brw_urb_WRITE to generate a
URB_WRITE_OWORD message.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, brw_urb_WRITE() would unconditionally override the channel
masks in the URB_WRITE message to 0xff (indicating that all channels
should be written to the URB).
In order to support geometry shader EndPrimitive functionality, we'll
need the ability to set the channel masks programatically, so that we
can output just 32 of the control data bits at a time. So this patch
adds a flag that will prevent brw_urb_WRITE() from overriding them.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The gen7 geometry shader uses a "control data header" at the beginning
of the output URB entry to store either
(a) flag bits (1 bit/vertex) indicating whether EndPrimitive() was
called after each vertex, or
(b) stream ID bits (2 bits/vertex) indicating which stream each vertex
should be sent to (when multiple transform feedback streams are in
use).
Fortunately, OpenGL only requires separate streams to be supported
when the output type is points, and EndPrimitive() only has an effect
when the output type is line_strip or triangle_strip, so it's not a
problem that these two uses of the control data header are mutually
exclusive.
This patch modifies do_vec4_gs_prog() to determine the correct
hardware settings for configuring the control data header, and
modifies upload_gs_state() to propagate these settings to the
hardware.
In addition, it modifies do_vec4_gs_prog() to ensure that the output
URB entry is large enough to contain both the output vertices *and*
the control data header.
Finally, it modifies vec4_gs_visitor so that it accounts for the size
of the control data header when computing the offset within the URB
where output vertex data should be stored.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
v2: Fixed incorrect handling of IVB/HSW differences.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This information will be useful in the i965 back end, since we can
save some compilation effort if we know from the outset that the
shader never calls EndPrimitive().
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Do not attempt to share the code that uploads
3DSTATE_BINDING_TABLE_POINTERS_GS, 3DSTATE_SAMPLER_STATE_POINTERS_GS,
or 3DSTATE_GS with VS.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
v3: Add _NEW_TRANSFORM to gen7_gs_state.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This will allow us to reuse some code when setting up the geometry
shader stage.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Mesa provides the wayland-egl libs and the pkgconfig file, but the headers
originate from the wayland package. Ensure everything matches, by requiring
application builds to look at the wayland headers as well.
Signed-off-by: Torsten Duwe <duwe@suse.de>
Signed-off-by: Johannes Obermayr <johannesobermayr@gmx.de>
Commit b77316ad75
st/dri: always copy new DRI front and back buffers to corresponding MSAA buffers
introduced creating a pipe_context for every call to validate, which is not required
because the callers have a context anyway.
Only exception is egl_g3d_create_pbuffer_from_client_buffer, can someone test if it
still works with NULL passed as context for validate? From examining the code I
believe it does, but I didn't thoroughly test it.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Cc: 9.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We've observed GPU hangs on Ivybridge from the following instruction:
mov(8) g115<1>.F 0D { align16 WE_normal NoDDChk 1Q };
There should be no reason to ever set the writemask on a destination
register to zero, except for perhaps the ARF NULL register.
This patch adds an assertion to enforce this for non-ARF registers.
Excluding ARFs is conservative yet should still catch the majority
of mistakes.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Otherwise, coordinates with four components would result in a MOV
with a destination writemask that has no channels enabled:
mov(8) g115<1>.F 0D { align16 WE_normal NoDDChk 1Q };
At best, this is stupid: we emit code that shouldn't do anything.
Worse, it apparently causes GPU hangs (observable with Chris's
textureGather test on CubeArrays.)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ian Romanick <idr@freedesktop.org>
Cc: mesa-stable@lists.freedesktop.org
This was originally introduced by commit
ba47aabc98, but unfortunately the commit message
doesn't go into much detail about why +INF would be a problem here.
A similar issue exists for STATE_FOG_PARAMS_OPTIMIZED, but allowing infinity
there would potentially introduce NaNs where they shouldn't exist, depending
on the values of fog end and the fog coord. Since STATE_FOG_PARAMS_OPTIMIZED
is only used for fixed function (including ARB_fragment_program with fog
option), and the calculation there probably isn't very stable to begin with
when fog start and end are close together, it seems best to just leave it
alone.
This fixes piglit glsl-fs-fogscale, and a couple of Wine D3D tests. No piglit
regressions on Cayman.
Signed-off-by: Henri Verbeet <hverbeet@gmail.com>
Tested-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
278372b47e added the uninitialized pointer
field gl_sync_object:Label. A free of this pointer, added in commit
6d8dd59cf5, resulted in a crash.
This patch fixes piglit ARB_sync regressions with swrast introduced by
6d8dd59cf5.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
These instructions will be used with immediate arguments in the upcoming
ldexp lowering pass and frexp implementation.
v2: Add vec4 support as well.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
It's a ?: that operates per-component on vectors. Will be used in
upcoming lowering pass for ldexp and the implementation of frexp.
csel(selector, a, b):
per-component result = selector ? a : b
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
builtin_info was originally going to be a structure containing a bunch
of information, but after various rewrites, it turned into a boolean
availability predicate.
builtin_avail is a better name than builtin_info, since it doesn't
store any information other than availability.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
This creates a new replacement for the existing built-in function code.
The new module lives in builtin_functions.cpp (not builtin_function.cpp)
and exists in parallel with the existing system. It isn't used yet.
The new built-in function code takes a significantly different approach:
Instead of implementing built-ins via printed IR, build time scripts,
and run time parsing, we now implement them directly in C++, using
ir_builder. This translates to faster load times, and a much less
complex build system.
It also takes a different approach to built-in availability: each
signature now stores a boolean predicate, which makes it easy to
construct arbitrary expressions based on _mesa_glsl_parse_state's
fields. This is much more flexible than the old system, and also
easier to use.
Built-ins are also now stored in a single gl_shader object, rather
than being spread out across a number of shaders that need to be linked.
When searching for a matching prototype, we simply consult the
availability predicate. This also simplifies the code.
v2: Incorporate Matt Turner's feedback: use the new fma() function rather
than expr(). Don't expose textureQueryLOD() in GLSL 4.00 (since it
was renamed to textureQueryLod()). Also correct some #undefs.
v3: Incorporate Paul Berry's feedback: rename legacy to compatibility;
add comments to explain a few things; fix uvec availability; include
shaderobj.h instead of repeating the _mesa_new_shader prototype.
v4: Fix lack of TEX_PROJECT on textureProjGrad[Offset] (caught by oglc).
Add an out_var convenience function (more feedback by Matt Turner).
v5: Rework availability predicates for Lod functions. They were broken.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Enthusiastically-acked-by: Paul Berry <stereotype441@gmail.com>
Each ir_factory needs an instruction list and memory context in order to
be useful. Rather than creating an object and manually assigning these,
we can just use optional parameters in the constructor.
This makes it possible to create a ready-to-use factory in one line:
ir_factory body(&sig->body, mem_ctx);
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Adding new convenience emitters makes it easier to generate IR involving
these opcodes.
bitfield_insert is particularly useful, since there is no expr() for
quadops.
v2: Add fma() and rename lrp() operands to x/y/a to match the GLSL
specification (suggested by Matt Turner). Fix whitespace issues.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
IR builder already offers a lot of swizzling functions, such as
swizzle_xxxx, swizzle_z, or swizzle_for_size.
The swizzle_xxxx style is convenient if you statically know which
components you want. swizzle_for_size is great if you want to select
the first few components. However, if you want to select components
based on, say, a loop counter, none of those are sufficient.
IR builder actually already had support for arbitrary swizzling, but
didn't expose it. This patch exposes that API.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
dotlike() uses ir_binop_mul for scalars, and ir_binop_dot for vectors.
When generating built-in functions, we often want to use regular
multiply for scalar signatures, and dot() for vector signatures.
ir_binop_dot only works on vectors, so we have to switch opcodes,
even if the code is otherwise identical. dotlike() makes this easy.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
We use "ret" as the function name since "return" is a C++ keyword, and
"ir_return" is already a class name.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
This adds two new signatures:
assign(lhs, rhs, condition, writemask);
assign(lhs, rhs, condition);
All the other existing APIs still exist.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
We already have ir_expression constructors for unary and binary
operations, which automatically infer the type based on the opcode and
operand types.
These are convenient and also required for ir_builder support.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
This isn't strictly necessary, since creators of ir_texture objects
should set LOD when relevant. However, it's nice to have a NULL pointer
in case they forget.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
During compilation, we'll use this to determine built-in availability.
The plan is to have a single shader containing every built-in in every
version of the language, but filter out the ones that aren't actually
available to the shader being compiled.
At link time, we don't actually need this filtering capability: we've
already imported prototypes for every built-in that the shader actually
calls, and they're flagged as is_builtin(). The linker doesn't import
any additional prototypes, so it won't pull in any unavailable
built-ins. When resolving prototypes to function definitions, the
linker ensures the values of is_builtin() match, which means that a
shader can't trick the linker into importing the body of an unavailable
built-in by defining a suspiciously similar prototype.
In other words, during linking, we can just pass in NULL. It will work
out fine.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
We can simply call the stored predicate function. If state is NULL,
just report that the function is available.
v2: Add a comment (requested by Paul Berry).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
This promises the method won't modify the contents of the object.
This allows us to call it even with a const pointer to the state.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
A signature is a built-in if and only if builtin_info != NULL, so we
don't actually need a separate flag bit. Making a boolean-valued
method allows existing code to ask the same question while not worrying
about the internal representation.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
For the upcoming built-in function rewrite, we'll need to be able to
answer "Is this built-in function signature available?".
This is actually a somewhat complex question, since it depends on the
language version, GLSL vs. GLSL ES, enabled extensions, and the current
shader stage.
Storing such a set of constraints in a structure would be painful, so
instead we store a function pointer. When creating a signature, we
simply point to a predicate that inspects _mesa_glsl_parse_state and
answers whether the signature is available in the current shader.
Unfortunately, IR reader doesn't actually know when built-in functions
are available, so this patch makes it lie and say that they're always
present. This allows us to hook up the new functionality; it just won't
be useful until real data is populated. In the meantime, the existing
profile mechanism ensures built-ins are available in the right places.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Otherwise, coordinates with four components would result in a MOV
with a destination writemask that has no channels enabled:
mov(8) g115<1>.F 0D { align16 WE_normal NoDDChk 1Q };
At best, this is stupid: we emit code that shouldn't do anything.
Worse, it apparently causes GPU hangs (observable with Chris's
textureGather test on CubeArrays.)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: Chris Forbes <chrisf@ijw.co.nz>
Cc: mesa-stable@lists.freedesktop.org
The change is very small. Do seamless filtering if either the context
enable is set or the sampler enable is set.
The AMD_seamless_cubemap_per_texture says:
"If TEXTURE_CUBE_MAP_SEAMLESS_ARB is emabled (sic) globally or the
value of the texture's TEXTURE_CUBE_MAP_SEAMLESS_ARB parameter is
TRUE, seamless cube map sampling is enabled..."
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Appendix F.2 of the OpenGL ES 3.0.0 spec says:
"OpenGL ES 3.0 requires that all cube map filtering be
seamless. OpenGL ES 2.0 specified that a single cube map face be
selected and used for filtering."
Setting the field only in the context will work fine with sampler
objects (and drivers that support AMD_seamless_cubemap_per_texture)
because seamless filtering is used if *either* the context or the
sampler enable it:
"If TEXTURE_CUBE_MAP_SEAMLESS_ARB is emabled (sic) globally or the
value of the texture's TEXTURE_CUBE_MAP_SEAMLESS_ARB parameter is
TRUE, seamless cube map sampling is enabled..."
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reported-by: Maxence Le Dore <maxence.ledore@gmail.com>
Thanked-by: Maxence Le Dore <maxence.ledore@gmail.com>
There is no GL_TEXTURE_CUBE_MAP_SEAMLESS in any version of OpenGL ES or
in any extension that applies to OpenGL ES. The same error check
already occurs for glTexParameteri.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Cc: Maxence Le Dore <maxence.ledore@gmail.com>
This aligns the gfx, compute, and dma IBs to 8 DW boundries.
This aligns the the IB to the fetch size of the CP for optimal
performance. Additionally, r6xx hardware requires at least 4
DW alignment to avoid a hw bug. This also aligns the DMA
IBs to 8 DW which is required for the DMA engine. This
alignment is already handled in the gallium driver, but that
patch can be removed now that it's done in the winsys.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
CC: "9.2" <mesa-stable@lists.freedesktop.org>
CC: "9.1" <mesa-stable@lists.freedesktop.org>
We support indirect addressing only on the vertex index, but some
shaders also use indirect addressing on attributes. This patch
adds support for indirect addressing on both dimensions inside
gs arrays.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
When adding a new buffer to the beginning of the memory pool, we were
accidentally deleting the buffer that was first in the buffer list.
This was caused by a bug in the memory pool's linked list
implementation.
DPA2 is listed in the "Defeatured Instructions" section of the
965 PRM, Volume 4:
"The following instructions are removed from Gen4 implementation mainly
due to implementation cost/schedule reasons. They are candidates for
future generations."
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
RSR and RSL are listed in the "Defeatured Instructions" section of the
965 PRM, Volume 4:
"The following instructions are removed from Gen4 implementation mainly
due to implementation cost/schedule reasons. They are candidates for
future generations."
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes a bug where if an uniform array is passed to a function the accesses
to the array are not propagated so later all but the first vector of the
uniform array are removed in parcel_out_uniform_storage resulting in
broken shaders and out of bounds access to arrays in
brw::vec4_visitor::pack_uniform_registers.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-and-Tested-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Dominik Behr <dbehr@chromium.org>
Haswell GT2 and GT3 require the number of vertex shader URB entries to
be at least 64, not 32.
At the moment, we always meet this requirement automatically, because
in the absence of a geometry shader, we assign all available URB space
to the vertex shader. But when we turn on support for geometry
shaders, this lower limit will become important.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
This patch creates a new file brw_vec4_vs_visitor.cpp, to contain code
that is specific to the vertex shader. Now the organization of vertex
shader and geometry shader visitor code is symmetric: vs-specific code
is in brw_vec4_vs_visitor.cpp, gs-specific code is in
brw_vec4_gs_visitor.cpp, and code shared between vs and gs is in
brw_vec4_visitor.cpp.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
This will allow it to be shared between brw_vec4_visitor.cpp and
brw_vec4_vs_visitor.cpp (which will be created in the next patch).
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
The fixup code emulates non-BGRA render targets by adding an
extra instruction at the end of fragment shaders to swizzle the
output. To do this, we also swizzle the blend function. However
an oversight until now was that the writemask wasn't getting
swizzled. This patch fixes that which fixes a bunch of piglit
tests.
The old code in dri2_glx suffered from a typographical error that caused
the default version to be 2.1 instead of 1.2 (minimum required by the
Linux OpenGL ABI). drisw_glx had a similar error resulting in a default
version of 0.1.
Some driver/card combinations (r200/RV280, i915/915G) don't support
OpenGL 2.1. These create in some corner cases an indirect context
instead of a direct context when calling glXCreateContextAttribsARB().
This happens because of a bad default value. To avoid this, just used
the default value specified by the GLX_ARB_create_context specification:
"The default values for GLX_CONTEXT_MAJOR_VERSION_ARB and
GLX_CONTEXT_MINOR_VERSION_ARB are 1 and 0 respectively. In this
case, implementations will typically return the most recent version
of OpenGL they support which is backwards compatible with OpenGL 1.0
(e.g. 3.0, 3.1 + GL_ARB_compatibility, or 3.2 compatibility
profile)"
Refactor all the default value setting to dri2_convert_glx_attribs, and
make sure the correct defaults are set in that one place.
Signed-off-by: Rico Schüller <kgbricola@web.de>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla http://bugs.winehq.org/show_bug.cgi?id=34238
Cc: "9.1 9.2" <mesa-stable@lists.freedesktop.org>
This patch adds liveness analysis to i915g and a couple
optimizations which benefit from it. One interesting
optimization turns (fake) indirect texture accesses into direct
texture accesses (the i915 supports a maximum of 4 indirect
texture accesses). Among other things this fixes a bunch of
piglit tests.
The vertex shader color outputs (gl_FrontColor, gl_BackColor,
gl_FrontSecondaryColor, and gl_BackSecondaryColor) don't have the same
names as the matching fragment shader color inputs (gl_Color and
gl_SecondaryColor). As a result, the qualifiers on them were not being
properly cross validated.
Full spec compliance required ir_variable::used and
ir_variable::assigned be set properly. Without the preceeding patch,
which fixes the ::clone method to copy them, this will not be the case.
Fixes all of the previously failing piglit
spec/glsl-1.30/linker/interpolation-qualifiers tests.
v2: Update callers of cross_validate_types_and_qualifiers and
cross_validate_front_and_back_color. The function signature changed in
v2 of a previous patch. Suggested by Paul.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=47755
The new function, cross_validate_types_and_qualifiers, will have
multiple callers from this file in future commits.
v2: Don't pass the names of the producer / consumer stages to
cross_validate_types_and_qualifiers. Instead, pass the types and get
the names only in the error paths. Suggested by Paul.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Changes to the grammar for GL_ARB_shading_language_420pack (commit
6eec502) moved precision qualifiers out of the type_specifier production
chain. This caused declarations such as:
struct S {
lowp float f;
};
to generate parse errors. Section 4.1.8 (Structures) of both the GLSL
ES 1.00 spec and GLSL 1.30 specs says:
"Member declarators may contain precision qualifiers, but may not
contain any other qualifiers."
So, it sure seems like we shouldn't generate a parse error. :)
Instead of type_specifier, use fully_specified_type in struct members.
However, fully_specified_type allows a lot of other qualifiers that are
not allowed on structure members, so expeclitly disallow them.
Note, this makes struct_declaration look an awful lot like
member_declaration (used for interface blocks). We may want to
(somehow) unify these rules to reduce code duplication at some point.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68753
Reported-by: Aras Pranckevicius <aras@unity3d.com>
Cc: Aras Pranckevicius <aras@unity3d.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: "9.2" <mesa-stable@lists.freedesktop.org>
Remap any type or severity exclusive to KHR_debug to
something suitable for ARB_debug_output
Signed-off-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Brian Paul <brianp@vmware.com>
V3: make sure to add null terminator when setting label,
generate error when the client specifies an explicit
length that exceeds MAX_LABEL_LENGTH, set label pointer
to NULL when freed, and output correct length in
MAX_LABEL_LENGTH error message.
V2: fixed indentation of comment
Signed-off-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Brian Paul <brianp@vmware.com>
The main GL context's swtnl_im field is the VBO module's vbo_context
structure. Using the name "swtnl" in the name is confusing since
some drivers use hardware texturing and lighting, but still rely on the
VBO module for drawing.
v2: Forward declare the type and use that instead of void *
(suggested by Eric Anholt).
v3: Remove unnecessary cast (pointed out by by Topi Pohjolainen).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Some drawing functions take a single _mesa_prim object, while others
take an array of primitives. Both kinds of functions used a parameter
called "prim" (the singular form), which was confusing.
Using the plural form, "prims," clearly communicates that the parameter
is an array of primitives.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
can_cut_index_handle_prims() was passed an array of _mesa_prim objects
and a count, and ran a loop for that many iterations. However, it
treated the array like a pointer, repeatedly checking the first element.
This patch makes it actually check every primitive.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Fixes assertion failues in 24 piglit tests with
MESA_GL_VERSION_OVERRIDE=3.0, 12 of which are now passing.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
LIBGL_SHOW_FPS=1 makes GLX print FPS every second while other values do
nothing. Extend it so that LIBGL_SHOW_FPS=N will print the FPS every N
seconds.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The VBO module actually calls us with an array of _mesa_prim objects.
For example, it may break up a DrawArrays() call into multiple
primitives when primitive restart is enabled.
Previously, we treated prim like a pointer, always accessing element 0.
This worked because all of the primitive objects in a single draw call
have the same value for num_instances and basevertex.
However, accessing an array as a pointer and using the wrong object's
fields is misleading. For stylistic reasons alone, we should use the
right object.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
These functions have almost identical code; the only difference is that
a few of the bits moved around. Adding a few trivial conditionals
allows the same function to work on all generations, and the resulting
code is still quite readable.
v2: Comment that the workaround flush is only necessary on SNB
(requested by Paul Berry).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Fixes broken rendering if these MRFs contained anything other than zero.
NOTE: This is a candidate for stable branches.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
There is a slight functionality change. Previously we would compute a
common value for num_samplers for all stages, and populate that many
entries in each stage's surf_offset table regardless of how many
samplers each stage used. Now we only populate the number of entries
in the surf_offset table corresponding to the number of samplers
actually used by the stage.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously these functions would accept a pointer to the binding table
and an index indicating which entry in the binding table should be
updated. Now they merely take a pointer to the binding table entry to
be updated.
This will make it easier to generalize brw_texture_surfaces to support
geometry shaders.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch implements pull constant upload, binding table upload, and
surface setup for geometry shaders, by re-using vertex shader code
that was generalized in previous patches.
Based on work by Eric Anholt <eric@anholt.net>.
v2: Update ditry bits for brw_gs_ubo_surfaces to account for commit
77d8fbc (mesa: add & use a new driver flag for UBO updates instead of
_NEW_BUFFER_OBJECT).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The hardware requires that after constant buffers for a stage are
allocated using a 3DSTATE_PUSH_CONSTANT_ALLOC_{VS,HS,DS,GS,PS}
command, and prior to execution of a 3DPRIMITIVE, the corresponding
stage's constant buffers must be reprogrammed using a
3DSTATE_CONSTANT_{VS,HS,DS,GS,PS} command.
Previously we didn't need to worry about this, because we only
programmed 3DSTATE_PUSH_CONSTANT_ALLOC_{VS,HS,DS,GS,PS} once on
startup (or, previous to that, whenever BRW_NEW_CONTEXT was flagged).
But now that we reallocate the constant buffers whenever geometry
shaders are switched on and off, we need to make sure the constant
buffers are reprogrammed.
We do this by adding a new bit, BRW_NEW_PUSH_CONSTANT_ALLOCATION, to
brw->state.dirty.brw.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, we would always use the same push constant allocation
regardless of what shader programs were being run: the available push
constant space was split into 2 equal size partitions, one for the
vertex shader, and one for the fragment shader.
Now that we are adding geometry shader support, we need to do
something smarter. This patch adjusts things so that when a geometry
shader is in use, we split the available push constant space into 3
nearly-equal size partitions instead of 2.
Since the push constant allocation is now affected by GL state, it can
no longer be set up by brw_upload_initial_gpu_state(); instead it must
be set up by a state atom.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is required by the internal hardware docs and the PRM. Probably
the reason we were getting away with not doing it was because we only
emitted 3DSTATE_PUSH_CONSTANT_ALLOC_PS during startup. However that's
going to change with the introduction of geometry shaders.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, we gave all of the URB space (other than the small amount
that is used for push constants) to the vertex shader. However, when
a geometry shader is active, we need to divide it up between the
vertex and geometry shaders.
The size of the URB entries for the vertex and geometry shaders can
vary dramatically from one shader to the next. So it doesn't make
sense to simply split the available space in two. In particular:
- On Ivy Bridge GT1, this would not leave enough space for the worst
case geometry shader, which requires 64k of URB space.
- Due to hardware-imposed limits on the maximum number of URB entries,
sometimes a given shader stage will only be capable of using a small
amount of URB space. When this happens, it may make sense to
allocate substantially less than half of the available space to that
stage.
Our algorithm for dividing space between the two stages is to first
compute (a) the minimum amount of URB space that each stage needs in
order to function properly, and (b) the amount of additional URB space
that each stage "wants" (i.e. that it would be capable of making use
of). If the total amount of space available is not enough to satisfy
needs + wants, then each stage's "wants" amount is scaled back by the
same factor in order to fit.
When only a vertex shader is active, this algorithm produces
equivalent results to the old algorithm (if the vertex shader stage
can make use of all the available URB space, we assign all the space
to it; if it can't, we let it use as much as it can).
In the future, when we need to support tessellation control and
tessellation evaluation pipeline stages, it should be straightforward
to expand this algorithm to cover them.
v2: Use "unsigned" rather than "GLuint".
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This paves the way for sharing the code that will set up the vertex
and geometry shader pipeline state.
v2: Rename the base class to brw_stage_state.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Defines that previously referred to VS now refer to VEC4, since they
will be shared by the user-programmable vertex shader and geometry
shader stages.
Defines that previously referred to the Gen6 geometry shader stage
(which is only used for transform feedback) are now renamed to
explicitly refer to Gen6, to avoid confusion with the Gen7
user-programmable geometry shader stage.
Based on work by Eric Anholt <eric@anholt.net>.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
This will avoid confusion when we add geometry shaders, since these
data structures will be shared by vertex and geometry shaders.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Now that the name "gs" is no longer used to refer to the legacy fixed
function geometry shaders, we can use it to refer to user-defined
geometry shaders.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
"ff" is for "fixed function". This frees up the name "gs" to refer to
user-defined geometry shaders.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
This mimics r600g. The R600_CONTEXT_xxx flags are added to rctx->b.flags
and si_emit_cache_flush emits the packets. That's it. The shared radeon code
tells us when the streamout cache should be flushed, so we have to check
the flags anyway.
There is a new atom "cache_flush", because caches must be flushed *after*
resource descriptors are changed in memory.
Functional changes:
* Write caches are flushed at the end of CS and read caches are flushed
at its beginning.
* Sampler view states are removed from si_state, they only held the flush
flags.
* Everytime a shader is changed, the I cache is flushed. Is this needed?
Due to a hw bug, this also flushes the K cache.
* The WRITE_DATA packet is changed to use TC, which fixes a rendering issue
in openarena. I'm not sure how TC interacts with CP DMA, but for now it
seems to work better than any other solution I tried. (BTW CIK allows us
to use TC for CP DMA.)
* Flush the K cache instead of the texture cache when updating resource
descriptors (due to a hw bug, this also flushes the I cache).
I think the K cache flush is correct here, but I'm not sure if the texture
cache should be flushed too (probably not considering we use TC
for WRITE_DATA, but we don't use TC for CP DMA).
* The number of resource contexts is decreased to 16. With all of these cache
changes, 4 doesn't work, but 8 works, which suggests I'm actually doing
the right thing here and the pipeline isn't drained during flushes.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Tested-by: Tom Stellard <thomas.stellard@amd.com>
There is a new "class" si_buffer_resources, which should be good enough for
implementing any kind of buffer bindings (constant buffers, vertex buffers,
streamout buffers, shader storage buffers, etc.)
I don't even keep a copy of pipe_constant_buffer - we don't need it.
The main motivation behind this is to have a well-tested infrastrusture
for setting up streamout buffers.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Tested-by: Tom Stellard <thomas.stellard@amd.com>
This streamout state code will be used by radeonsi.
There are new structures r600_common_context and r600_common_screen.
What is inherited by what is shown here:
pipe_context -> r600_common_context -> r600_context
pipe_screen -> r600_common_screen -> r600_screen
The common structures reside in drivers/radeon. Currently they only contain
enough functionality to be able to handle streamout. Eventually I'd like
the whole pipe_screen implementation to be shared and some of the context
stuff too.
This is quite big, but most changes are because of the new structures and
the fact r600_write_value is replaced by radeon_emit.
Thanks to Tom Stellard for fixing the build for r600g/compute.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Tested-by: Tom Stellard <thomas.stellard@amd.com>
It is incorrect to assume that src[0] of a SEND-from-GRF opcode is the
GRF. For example, FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD uses src[1] for
the GRF.
To be safe, loop over all the source registers and mark any GRFs. We
probably won't ever have more than one, but it's simpler to just check
all three rather than attempting to bail early.
Not observed to fix anything yet, but likely to. Parallels the bug fix
in the previous commit, which actually does fix known failures.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: mesa-stable@lists.freedesktop.org
It is incorrect to assume that src[0] of a SEND-from-GRF opcode is the GRF.
VS_OPCODE_PULL_CONSTANT_LOAD_GEN7 uses an IMM as src[0], and stores the
GRF as src[1].
To be safe, loop over all the source registers and mark any GRFs. We
probably won't ever have more than one, but it's simpler to just check
all three rather than attempting to bail early.
Fixes assertion failures in Unigine Sanctuary since we started making
register allocation rely on split_virtual_grfs working. (The register
classes were actually sufficient, we were just interpreting an IMM as
a virtual GRF number.)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68637
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: mesa-stable@lists.freedesktop.org
pstipple/aaline stages used PIPE_MAX_SAMPLER instead of
PIPE_MAX_SHADER_SAMPLER_VIEWS when dealing with sampler views.
Now these stages can't actually handle sampler_unit != texture_unit anyway
(they cannot work with d3d10 shaders at all due to using tex not sample
opcodes as "mixed mode" shaders are impossible) but this leads to crashes if
a driver just installs these stages and then more than PIPE_MAX_SAMPLER views
are set even if the stages aren't even used.
Reviewed-by: Zack Rusin <zackr@vmware.com>
Turns out we don't need to do much extra work for detecting this case,
since we are guaranteed to get a empty static texture state in this case,
hence just rely on format being 0 and return all zero then.
Previously needed dummy textures (would just have crashed on format being 0
otherwise) which cannot return the correct result for size queries and when
sampling textures with wrap modes using border.
As a bonus should hugely increase performance when sampling unbound textures -
too bad it isn't a useful feature :-).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Zack Rusin <zackr@vmware.com>
No idea if this is working right but copied straight from llvmpipe.
(Not only does this check the so_target but also use buffer->data instead
of buffer for the mapping.)
Just trying to get rid of a segfault testing something else...
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Zack Rusin <zackr@vmware.com>
The only reason this was needed was because the fetch texel function had to
get the (dynamic) border color, but this is now done much earlier.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Instead of enhancing the AoS path so it can deal with it, just use SoA. Fixing
AoS path wouldn't be all that difficult (use all the same logic as SoA) but
considered not worth it for now.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
If the app is asking us to do GL_COMPRESSED_RGBA, then the app obviously
doesn't have pre-compressed data to hand us. So don't choose a storage
format that we won't actually be able to compress and store.
Fixes black screen in warzone2100 when libtxc_dxtn is not present. Also
66 piglit tests.
NOTE: This is a candidate for the 9.2 branch.
Reported-by: Paul Wise <pabs@debian.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
You should only be flagging the formats as supported if you support them
anyway.
NOTE: This is a candidate for the 9.2 branch. (required for next commit)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Thanks to Ken for trawling through my neglected public branches and
finding the bug in this change (inside a megacommit) that made me abandon
this work.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This avoids the need to get the inter- and intra-tile offset and adjust
our miptree info based on them.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These are things that happen to be occurring because of the batch flush at
the start of the blorp op (which exists to prevent batch space or aperture
space overflow), but the intention was for this sequence of state resets at
the end of blorp to be everything necessary for the next draw call.
Found when debugging the next commit, by comparing brw_new_batch() and
intel_batchbuffer_reset() to brw_blorp_exec().
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
The code that got replaced with map_raw didn't do the flush, but now
map_raw() is responsible for it and we don't have to worry about it.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
This gives us more information about why we're flushing that we can
use for handling our throttling.
v2 (Kenneth Graunke): Rebase on latest master, add missing
FLUSH_VERTICES and FLUSH_CURRENT, which fixes a regression in Glean's
polygonOffset test.
v3 (anholt): Drop FLUSH_CURRENT -- FLUSH_VERTICES is what we need, which
is "get any queued prims out of VBO and into the driver", not "update
ctx->Current so we can read it with the CPU." Also drop batch->used
check, which intel_batchbuffer_flush() does anyway.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
This was already happening because blorp happens to flush at the end of
every call, but we have been talking about removing that at some point,
and this would surely get overlooked.
v2 (Kenneth Graunke): Rebase on latest master. Note that we did remove
the other flush, and this change actually did get overlooked!
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
intel_flush() now did nothing except call through (and
intel_batchbuffer_flush() does the no-op check, too!)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
df06745c5a made it so that we didn't
allocate extra uniform space for unused clip planes, which also
incidentally made us not allocate any space at all, which we were relying
on for this no-uniforms case. Instead of putting the knowledge of this
special HW exception into the thing that normally preallocates prog_data
for us, just allocate it here.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68766
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Since we can have per-pixel lod we should also honor the filter per-pixel
(in fact we didn't honor it per quad neither in the multiple quad case).
Do this by running the linear path and simply beating the weights into shape
(the sample with the higher weight is the one which should have been chosen
with nearest filtering hence adjust filter weight to 1.0/0.0 based on that).
If all pixels use nearest filter (either min and mag) then still run just a
nearest filter as this is way cheaper (probably around 4 times faster for 2d,
more for 3d case) and it should be relatively rare that pixels really need
different filtering. OTOH if all pixels would require linear don't do anything
special since the linear path with filter adjustments shouldn't really be all
that much more expensive than ordinary linear, and we think it's rare that
min/mag filters are configured differently so there doesn't seem much value
in trying to optimize this further.
This does not yet fix the AoS path (though currently AoS is only used for
single quads hence it could be considered less broken, just never honoring
per-pixel filter decision but doing it per quad).
v2: simplify code a bit (unify min linear and min nearest cases)
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
While a sqrt here and there shouldn't hurt much (depending on the cpu) it is
possible to completely omit it since rho is only used for calculating lod and
there log2(x) == 0.5*log2(x^2). Depending on the exact path taken for
calculating lod this means we get a simple mul instead of sqrt (in case of
nearest mip filter in fact we don't need to replace the sqrt with something
else at all), only in some not very useful path this doesn't work (combined
brilinear calculation of int level and fractional lod, accurate rho calc but
brilinear filtering seems odd).
Apart from being faster as an added bonus this should increase our crappy
fractional accuracy of lod, since fast_log2 is only good for ~3bits and this
should increase accuracy by one bit (though not used if dimension is just one
as we'd need an extra mul there as we never had the squared rho in the first
place).
v2: use separate ilog2_sqrt function if we have squared rho.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This is just preparation for per-pixel (or per-quad in case of multiple quads)
min/mag filter since some assumptions about number of miplevels being equal
to number of lods no longer holds true.
This change does not change behavior yet (though theoretically when forcing
per-element path it might be slower with different min/mag filter since the
code will respect this setting even when there's no mip maps now in this case,
so some lod calcs will be done per-element just ultimately still the same
filter used for all pixels).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Fixes "Missing break in switch" defect reported by Coverity.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The downstream android kernel driver is "kgsl", the upstream drm/kms
driver is called "msm". Since libdrm_freedreno handles the differences
between the two, we need to load the same thing for either device.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
We need to set the flag on all the .xyzw components that are written by
the instruction, not just on .x. Otherwise a later use of rN.y (for
example) will not trigger the appropriate sync bit to be set.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Seems like most/all instructions have some restrictions about const src
registers. In seems like the 2 src (cat2) instructions can take at most
one const, and the 3 src (cat3) instructions can take at most one const
in the first 2 arguments. And so on. Handle this properly now.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
GLSL 1.30 doesn't allow precision qualifiers on sampler types,
but in GLSL ES, sampler types are also allowed. This seems like
an oversight (since the intention of including these in GLSL 1.30
is to allow compatibility with ES shaders).
Currently, Mesa allows "default" precision qualifiers to be set for
sampler types in GLSL (commit d5948f2). This patch makes it follow
GLSL ES rules and also allow declaring sampler variables with a
precision qualifier in GLSL 1.30 (and later). e.g.
uniform lowp sampler2D sampler;
This fixes a shader compilation error in Khronos OpenGL conformance
test "depth_texture_mipmap".
V2: Update comments.
Signed-off-by: Ian Romanick <idr@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <idr@lists.freedesktop.org>
Cc: <mesa-stable@lists.freedesktop.org>
Previously, we allocated space in brw_vs_prog_data's params and
pull_params arrays for MAX_CLIP_PLANES vec4s---even when it wasn't
necessary.
On a 64-bit architecture, this used 0.5 kB of space (8 clip planes *
4 floats per plane * 8 bytes per float pointer * 2 arrays of pointers =
512 bytes). Since this cost was per-vertex shader, it added up.
Conveniently, we already store the number of clip plane constants in the
program key. By using that, we can allocate the exact amount of space
needed. For the common case where user clipping is disabled, this means
0 bytes.
While we're here, mention exactly what code requires this extra space,
since it wasn't obvious.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
When validating draw parameters move check for 0 draw count last
(drawing with count 0 is not an error), so that other parameters (e.g.: the
primitive type) are validated and the correct errors (if applicable) are
generated.
>From the OpenGL 3.3 spec page 33 (page 48 of the PDF):
"[Regarding DrawArraysOneInstance, in terms of which other draw operations
are defined:]
If count is negative, an INVALID_VALUE error is generated."
This patch also changes the bahavior of MultiDrawElements to perform the draw
operation if some primitive's index counts are zero.
Signed-off-by: Fabian Bieler <fabianbieler@fastmail.fm>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
MAD will be generated directly from ir_triop_fma, so this assertion
checks that all ir_expressions are usable.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
MSAA was tested by one user on RS690 and it works for him with color
compression (CMASK) disabled. Our theory is that his chipset lacks CMASK RAM.
Since we don't have hardware documentation about which chipsets actually have
CMASK RAM, I had to take a guess based on the presence of HiZ.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
In particular noone is interested in the vertex count, so drop that,
and also drop the duplicated num_primitives_generated /
so.primitives_storage_needed variables in drivers. I am unable for now to figure
out if primitives_storage_needed in SO stats (used for d3d10) should
increase if SO is disabled, though the equivalent num_primitives_generated
used for OpenGL definitely should increase. In any case we were only counting
when SO is active both in softpipe and llvmpipe anyway so don't pretend there's
an independent num_primitives_generated counter which would count always.
(This means the PIPE_QUERY_PRIMITIVES_GENERATED count will still be wrong just
as before, should eventually fix this by doing either separate counting for this
query or adjust the code so it always counts this even if SO is inactive depending
on what's correct for d3d10.)
Reviewed-by: Brian Paul <brianp@vmware.com>
SEQ and SNE are not native i915 instructions, so they each generate at
least 3 instructions. If both operands are uniforms or constants, we
get 5 instructions like:
U[1] = MOV CONST[1]
U[0].xyz = SGE CONST[0].xxxx, U[1]
U[1] = MOV CONST[1].-x-y-z-w
R[0].xyz = SGE CONST[0].-x-x-x-x, U[1]
R[0].xyz = MUL R[0], U[0]
This code is stupid. Instead of having the individual calls to
i915_emit_arith generate the moves to utemps, do it in the caller. This
results in code like:
U[1] = MOV CONST[1]
U[0].xyz = SGE CONST[0].xxxx, U[1]
R[0].xyz = SGE CONST[0].-x-x-x-x, U[1].-x-y-z-w
R[0].xyz = MUL R[0], U[0]
This allows fs-temp-array-mat2-index-col-wr and
fs-temp-array-mat2-index-row-wr to fit in hardware limits (instead of
falling back to software rasterization).
NOTE: Without pending patches to the piglit tests, these tests will now
fail. This is an unrelated, pre-existing issue.
v2: Copy most of the body of the commit message into comments in the
code. Suggested by Eric.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
The command is submitted once the event has been triggered, but it might not
have completed yet. Therefore, we have to add it to deps in order to wait on it.
Signed-off-by: Niels Ole Salscheider <niels_ole@salscheider-online.de>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
I previously fixed this partly in 9e8400f4c9,
however I didn't go far enough in testing it, now when I parse a TGSI shader
with arrays in it my iterator can see the ArrayID set to the proper value.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Now that we use a fixed set of register classes, we can set up the
register set and conflict graphs once, at context creation, rather than
on every VS compile. This is obviously less expensive, and also what
we already do in the FS backend.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
We're soon going to be calling brw_alloc_reg_set() from outside of the
visitor, where we don't have the precomputed "max_grf" variable handy.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
For now, nothing else can get allocated over them. That may change at
some point in the future.
This also means that base_reg_count can be computed without knowing the
number of registers used for the payload, which is required if we want
to allocate the register set once at context creation time.
See commit 551e1cd44f, which implemented
virtually identical code in the FS backend.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Arrays, structures, and matrices use large VGRFs of arbitrary sizes.
However, split_virtual_grfs() breaks those down into VGRFs of size 1.
For reference, commit 5d90b98879 is the
analogous change to the FS backend.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
(From a suggestion by Francisco Jerez)
If an enum represents a bitfield of flags, e.g.:
enum E {
A = 1,
B = 2,
C = 4,
D = 8,
};
then C++ normally prohibits statements like this:
enum E x = A | B;
because A and B are implicitly converted to ints before OR-ing them,
and an int can't be stored in an enum without a type cast. C, on the
other hand, allows an int to be implicitly converted to an enum
without casting.
In the past we've dealt with this situation by storing flag bitfields
as ints. This avoids ugly casting at the expense of some type safety
that C++ would normally have offered (e.g. we get no warning if we
accidentally use the wrong enum type).
However, we can get the best of both worlds if we override the |
operator. The ugly casting is confined to the operator overload, and
we still get the benefit of C++ making sure we don't use the wrong
enum type.
v2: Remove unnecessary comment and unnecessary use of "enum" keyword.
Use static_cast.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
We never noticed that this field was uninitialized because it is only
used in an error path that reports internal Mesa errors.
But it's silly to have it around anyway because &brw->ctx is
equivalent.
Should fix Coverity defect CID 1063351: Uninitialized pointer field
(UNINIT_CTOR) /src/mesa/drivers/dri/i965/brw_vec4_emit.cpp: 148
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
If brwNewProgram is asked to create a program for an unrecognized
target, don't bother falling back on _mesa_new_program(). That just
hides bugs.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
v2: Use assert() rather than _mesa_problem().
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
330.frag is a direct copy of 150.frag.
330.glsl is 150.glsl combined with ARB_shader_bit_encoding.glsl.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
These are necessary in order to compile the built-in functions.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
glIsQuery is supposed to return false for names returned by glGenQueries
until their first use. BeginQuery is a use, but QueryCounter is also a
use.
From the ARB_timer_query spec:
"A timer query object is created with the command
void QueryCounter(uint id, enum target);
[...] If <id> is an unused query object name, the
name is marked as used [...]"
Fixes Piglit's spec/ARB_timer_query/query-lifetime.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Cc: mesa-stable@lists.freedesktop.org
I assume this should have been part of commit
7727fbb7c5. This (obviously) fixes a lot tests.
Signed-off-by: Henri Verbeet <hverbeet@gmail.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Commit 53e20b8b introduced the use of a template to initialize some
common fields. Move this copying of fields to before the common vp3
fields are initialized.
Reported-by: Martin Peres <martin.peres@labri.fr>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Christian König <christian.koenig@amd.com>
The cmps.f.* instruction doesn't actually seem to give a float 1.0 or
0.0 output. It either needs a cov.u16f16 or add.s + sel.f16. This
makes SGT/SLT/etc more similar to CMP, so handle them in trans_cmp().
This fixes a bunch of piglit tests.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
It seems there are a number of cases where instructions have limitations
about taking reading src's from const register file, so make
get_unconst() a bit easier to use.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
We probably should get rid of assert() entirely, but at this stage it is
more useful for things to crash where we can catch it in a debugger.
With compile_error() we have a single place to set an error flag (to
bail out and return an error on the next instruction) so that will be a
small change later when enough of the compiler bugs are sorted.
But re-arrange/cleanup the error/assert stuff so we at least get a dump
of the TGSI that triggered it. So we see some useful output in piglit
logs.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Don't crash when no color buffer bound. Something caught when starting
to run piglit, fixes a hanful of piglit tests.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Category 4 instructions (rsq, rcp, sqrt, etc) seem to be unable to take
a const register as src. In these cases we need to move the src to a
temporary gpr first.
This is the second case of such a restriction, where the instruction
encoding appears to support a const src, but in fact the hw appears to
ignore that bit. So split things out into a helper that can be re-used
for any instructions which have this limitation.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Our current (rather naive) register assignment is based on mapping
different register files (INPUT, OUTPUT, TEMP, CONST, etc) based on the
max register index of the preceding file. But in some cases, the lowest
used register in a file might not be zero. In which case
file_count[file] != file_max[file] + 1.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Sometimes things other than color dst need saturating, like if there is
a 'clamp(foo, 0.0, 1.0)'. So for saturated dst add the extra
instructions to fix up dst.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
The 1st src to add.s needs (r) flag (repeat), otherwise it will end up:
add.s dst.xyzw, tmp.xxxx -1
instead of:
add.s dst.xyzw, tmp.xyzw, -1
Also, if we are using a temporary dst to avoid clobbering one of the src
registers, we actually need to use that as the dst for the sel
instruction.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
This patch adds support for:
PIPE_COMPUTE_CAP_MAX_INPUT_SIZE
PIPE_COMPUTE_CAP_MAX_LOCAL_SIZE
Return the values reported by the closed source driver for now.
Signed-off-by: Niels Ole Salscheider <niels_ole@salscheider-online.de>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Previously, the min/mag switchover point when using nearest/none mip
filter was effectively -0.5 which can't be right. Looks like new OpenGL
thinks it's ok if it's always 0.0 (older versions required 0.5 in some
cases), let's hope everybody else thinks that's fine too.
Refactor this slightly and get the per-quad/per-pixel min/mag decision
values further down to sampling, though still only the first component
is used yet.
While here also fix code trying to skip lod bias application etc. when
mipfilter is none, as this is still needed for determining min/mag filter.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
As of "2f142d59 build: Add --enable-gallium-osmesa flag." the pkgconfig
file from classic osmesa is no longer installed when building gallium
osmesa, so copy it to gallium osmesa and install the copy instead.
CC: "9.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This patch introduces the vec4_gs_visitor class, which translates
geometry shaders from GLSL IR to back-end opcodes.
This class is derived from vec4_visitor (which is also the base class
for vec4_vs_visitor), so as a result most of the back end code is
shared. The only parts that differ are:
- Geometry shaders use a different input payload organization, since
the inputs need to match up with the outputs of the previous
pipeline stage (vec4_gs_visitor::setup_payload() and
vec4_gs_visitor::setup_varying_inputs()).
- Geometry shader input array dereferences need a special stride
computation, since all geometry shader inputs are interleaved into
one giant array (vec4_gs_visitor::compute_array_stride()).
- There are no geometry shader system values
(vec4_gs_visitor::make_reg_for_system_value()).
- At the beginning of a geometry shader, extra data in R0 needs to be
zeroed out, and a vertex counter needs to be initialized
(vec4_gs_visitor::emit_prolog()).
- When EmitVertex() appears in the shader, the current contents of
output variables need to be emitted to the URB, and the vertex
counter needs to be incremented
(vec4_gs_visitor::visit(ir_emit_vertex *)).
- When generating a URB_WRITE message to output vertex data, the
current state of the vertex counter needs to be used to store a
write offset in the message header
(vec4_gs_visitor::emit_urb_write_header()).
- The URB_WRITE message that outputs vertex data needs to be sent
using GS_OPCODE_URB_WRITE, since VS_OPCODE_URB_WRITE would overwrite
the offsets in the message header
(vec4_gs_visitor::emit_urb_write_opcode()).
- At the end of a geometry shader, the final vertex count needs to be
delivered using a URB WRITE message
(vec4_gs_visitor::emit_thread_end()).
- EndPrimitive() functionality is not implemented yet
(vec4_gs_visitor::visit(ir_end_primitive *)).
- There is no support for assembly shaders
(vec4_gs_visitor::emit_program_code()).
v2: Make num_input_vertices const. Refer to registers as rN rather
than gN, for consistency with the PRM. Fix misspelling. Improve
comment in the ir_emit_vertex visitor explaining why we emit vertices
inside a conditional. Enclose the conditional code in the
ir_emit_vertex visitor between curly braces.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This will be used by geometry shaders to implement the EmitVertex()
function, since it requires writing data to a dynamically-determined
offset within the geometry shader's URB entry.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The arguments to brw_urb_WRITE() were getting pretty unwieldy, and we
have to add more flags to support geometry shaders anyhow.
Also plumb these flags through brw_clip_emit_vue(),
brw_set_urb_message(), and the vec4_instruction class.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
When I initially generalized the vec4_visitor class in preparation for
geometry shaders, I assumed that the setup_attributes() function would
need to be different between vertex and geometry shaders, but its
caller, setup_payload(), could be shared. So I made
setup_attributes() a virtual function.
It turns out this isn't true; setup_payload() needs to be different
too, since the geometry shader payload sometimes includes an extra
register (primitive ID) that has to come before uniforms.
So setup_payload() needs to be the virtual function instead of
setup_attributes().
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Both 3DSTATE_VS and 3DSTATE_GS have a dispatch_grf_start_reg control,
which determines the register where the hardware delivers data sourced
from the URB (push constants followed by per-vertex input data).
For vertex shaders, we always set dispatch_grf_start_reg to 1, since
R1 is always the first register available for push constants in vertex
shaders.
For geometry shaders, we'll need the flexibility to set
dispatch_grf_start_reg to different values depending on the behvaiour
of the geometry shader; if it accesses gl_PrimitiveIDIn, we'll need to
set it to 2 to allow the primitive ID to be delivered to the thread in
R1.
This patch eliminates the assumption that dispatch_grf_start_reg is
always 1. In vec4_visitor, we record the regnum that was passed to
vec4_visitor::setup_uniforms() in prog_data for later use. In
vec4_generator, we consult this value when converting an abstract
UNIFORM register to a concrete hardware register. And in the code
that emits 3DSTATE_VS, we set dispatch_grf_start_reg based on the
value recorded in prog_data.
This will allow us to set dispatch_grf_start_reg to the appropriate
value when compiling geometry shaders. Vertex shaders will continue
to always use a dispatch_grf_start_reg of 1.
v2: Make dispatch_grf_start_reg "unsigned" rather than "GLuint".
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This patch moves the following things into brw_vec4.{cpp,h}:
- struct brw_vec4_compile
- struct brw_vec4_prog_key
- brw_vec4_prog_data_compare()
- brw_vec4_prog_data_free()
This will allow us to avoid having to include brw_vs.h in
geometry-shader-specific files.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The patch that follows will move the definition of struct
brw_vec4_prog_key from brw_vs.h to brw_vec4.h, making it necessary for
brw_vs.h to include brw_vec4.h (because brw_vs.h defines struct
brw_vs_prog_key, which contains brw_vec4_prog_key as a member). Since
brw_vs.h is included from C source files, that means that brw_vec4.h
will need to be safe to include from C. Same for brw_shader.h, since
it is included by brw_vec4.h.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This is backwards from what we are going to want in the long term, which is:
- brw_vec4.h declares general-purpose vec4 infrastructure needed by
both VS and GS
- brw_vs.h includes brw_vec4.h and adds VS-specific parts.
- brw_gs.h includes brw_vec4.h and adds GS-specific parts.
Note that at the moment brw_vec.h contains a fair amount of
VS-specific declarations--I plan to address that in a later patch.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Otherwise any GS that requires lowering (e.g. one that uses
gl_ClipDistance as an input or output) will fail to work.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This patch extracts the following logic from
validate_vertex_shader_executable():
(a) Generate an error if the shader writes to both gl_ClipDistance and
gl_ClipVertex.
(b) Record whether the shader writes to gl_ClipDistance in
gl_shader_program for use by the back-end.
(c) Record the size of gl_ClipDistance in gl_shader_program for use by
transform feedback logic.
And moves it into a function that is shared between vertex and
geometry shaders.
Strictly speaking we only need to have shared logic for (b) and (c)
right now (since (a) only matters in compatibility contexts, and we're
only implementing geometry shaders in core contexts right now). But
the three are closely related enough that it seems sensible to keep
them together.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
enums were being converted twice resulting in incorrect values.
The extra conversion has been removed and the redundant assert is
removed also.
Cc: 9.2 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Brian Paul <brianp@vmware.com>
The previous value of (GLuint64) ~0 has some problems:
GL_MAX_SERVER_WAIT_TIMEOUT is supposed to be a GLuint64 value, but has
to be queried via GetInteger64v(), which returns a GLint64. This means
that some applications are likely to treat it as a signed integer, where
~0 means -1. Negative values are nonsensical and problematic.
When interpreted correctly, ~0 translates to about 0.58 million years,
which seems rather excessive.
This patch changes it to 0x1fff7fffffff, which is about 1.11 years.
This is still plenty long, and is the same as both an int64 and uint64.
Applications that accidentally store it in a 32-bit int/unsigned also
get a non-negative value, which is again the same as both int and
unsigned. This value was suggested by Ian Romanick.
v2: Add the ULL prefix on the constant (suggested by Ian).
Fixes Piglit's spec/!OpenGL 3.2/get-integer-64v.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
_mesa_meta_begin() sets up an orthographic project and initializes the
viewport based on the current drawbuffer's width and height. This is
likely the window size, since it occurs before the meta operation binds
any temporary buffers.
decompress_texture_image needs the viewport to be the size of the image
it's trying to draw. Otherwise, it may only draw part of the image.
v2: Actually set the projection properly too.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68250
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: Mak Nazecic-Andrlon <owlberteinstein@gmail.com>
Fixes inconsistent failure of gles2conform/GL2Tests/glUniform/glUniform.test
under gnome-shell. What follows is a description of the bug and its fix.
When intel_update_renderbuffers() allocates a miptree for a winsys
renderbuffer, it propagates the renderbuffer's format to become also the
miptree's format.
If the winsys color buffer format is SARGB, then, in the first call to
eglMakeCurrent, intel_gles3_srgb_workaround() changes the renderbuffer's
format to ARGB. That is, it changes the format from sRGB to non-sRGB.
However, it changes the renderbuffer's format *after*
intel_update_renderbuffers() has allocated the renderbuffer's miptree.
Therefore, when eglMakeCurrent returns, the miptree format (SARGB)
differs from the renderbuffer format (ARGB).
If the X server reallocates the color buffer,
intel_update_renderbuffers() will create a new miptree for the
renderbuffer. The new miptree's format (ARGB) will differ from old
miptree's format (SARGB). This mismatch between old and new miptrees
causes bugs.
Fix the bug by moving intel_gles3_srgb_workaround() to occur *before*
intel_update_renderbuffers().
CC: "9.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=67934
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Except for explicit derivs with cube maps which are very bogus anyway.
Just like explicit lod this is only used if no_quad_lod is set in
GALLIVM_DEBUG env var.
Minification is terrible on cpus which don't support true vector shifts
(but should work correctly). Cannot do the min/mag filter decision (if
they are different) per pixel though, only selecting different mip levels
works.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
block size depth is always 1 even for compressed formats (unless someone
invents true 3d compressed formats at least which we can't represent).
Nearest (and soa) path had it right.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
We have set up 3DSTATE_SBE (or 3DSTATE_SF on GEN6) in
ilo_shader_select_kernel_routing(). There is no need to pass the last shader
stage to the GPE function.
The Gallium implementation is apparently not ready for regular
consumption, so as much as I hate adding more build-time options, here's
another.
Acked-by: Brian Paul <brianp@vmware.com>
The variable means that UBO qualifiers are allowed in a particular
context (e.g., not allowed in a struct field declaration), rather than a
particular set of UBO qualifiers are valid.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This was invaluable when debugging the global copy propagation
algorithm. We may as well commit it in case someone needs to print
out the sets in the future.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
The (complicated!) math is all identical, there's just minimal differences how
sign bit is calculated plus there's an additional subtraction for the argument
going into the polynomial for cos.
The logic stays 100% the same (with a small exception, sign bit calculation for
sin is minimally simplified, applying sign mask after xoring the arguments
instead of applying it to each argument).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
IVB/BYT also has the same L3 cacheability control in MOCS as HSW,
so let's make use of it.
pts/xonotic and pts/reaction @ 1920x1080 gain ~4% on my IVB GT2. Most
other things show less gains/no regressions, except furmark which
loses some 10 points.
I didn't have a BYT at hand for testing.
v2: Don't check (brw->gen == 7) in gen7 functions. (chadv)
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Just spotted these unpopulated MOCS fields when comparing the code
against BSpec. Set the MOCS to the same as everywhere else in Haswell:
L3-cacheable.
v2: Annotate state packet fields (chadv).
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
The NVIDIA driver doesn't expose them, and piglit's
arb_texture_compression-invalid-formats expects them to not be there.
This, with the previous commit, fixes piglit
arb_texture_compression-invalid-formats.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: "9.2" <mesa-stable@lists.freedesktop.org>
This is required by the spec, and it's a bit tricky because the default
precision is scoped. As a result, I'm slightly abusing the symbol
table.
Fixes piglit no-default-float-precision.frag tests and the piglit
default-precision-nested-scope-0[1234].frag tests that are currently on
the piglit mailing list for review.
On IRC I got confirmation from cwabbot that ARM (Mali T6xx and T400)
enforces this requirement and from kusma that NVIDIA (Tegra2) enforces
this requirement. We should be safe from regressing shipping
applications.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "9.2" <mesa-stable@lists.freedesktop.org>
We never noticed this before because we previously didn't enfoce GLSL ES
fragement shader requirements that precision be defined. There may also
have been some interaction here with the addition of
GL_ARB_shading_language_420pack, but it doesn't appear to me that it
added any new bugs (just perhaps uncovered some old ones).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: "9.2" <mesa-stable@lists.freedesktop.org>
Going to need this soon (not going to bother with avx2 intrinsics at this time
but don't want to do workarounds for true vector shifts if llvm itself can use
them just fine and won't need the gazillion instruction emulation).
Not really tested other than my cpu returns 0 for these features...
(I have no idea if llvm actually would emit avx2/xop instructions neither...)
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Need to check the wrap mode of the actually used coords not a fixed 2.
While checking more than necessary would only potentially disable aos and
not cause any harm I'm pretty sure for 3d textures it could have caused
assertion failures (if s,t coords have simple filter and r not).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Turns out it is actually very complicated to figure out what a format really
is wrt range, as using channel information for determining unorm/snorm etc.
doesn't work for a bunch of cases - namely compressed, subsampled, other.
Also while here add clamping for uint/sint as well - d3d10 doesn't actually
need this (can only use ld with these formats hence no border) and we could
do this outside the shader for GL easily (due to the fixed texture/sampler
relation) do it here too just so I can forget about it.
v2: move border color clamping out of fetch texel. Also change it to clamp
the whole border vector at once (and use vectorized load of border color),
which saves a couple of instructions - needs some different handling of
mixed signed/unsigned formats so skip the per channel stuff and just derive
this from first channel except for special formats.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
There's a new debug value used to disable per-quad lod optimizations
in fragment shader (ignored for vs/gs as the results are just too wrong
typically). Also trying to detect if a supplied lod value is really a
scalar (if it's coming from immediate or constant file) in which case
sampler code can use this to stay on per-quad-lod path (in fact for
explicit lod could simplify even further and use same lod for both
quads in the avx case but this is not implemented yet).
Still need to actually implement per-element lod bias (and derivatives),
and need to handle per-element lod in size queries.
v2: fix comments, prettify.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The rules were writing files to e.g. util/u_indices_gen.py, but in an
out-of-tree build this directory doesn't exist in the build directory. So,
create the directories just in case.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Ross Burton <ross.burton@intel.com>
The LLVM R600 backend currently always uses separate VGPRs for these.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68162
(Centroid interpolation is identical to center interpolation without
multisampling, so the shader hardware was only pre-loading one set of
interpolation coefficients, and the pixel shader code was using
uninitialized values as the centroid interpolation coefficients)
Cc: mesa-stable@lists.freedesktop.org
Tested-by: Laurent Carlier <lordheavym@gmail.com>
Now that we have the number of samplers available, we don't need to
iterate over all 16. This should be particularly helpful for vertex
shaders.
v2: Use the correct shader program (caught by Paul Berry).
This needs to initialize the exact same set of sampler swizzles as
the actual key setup, or else we end up doing recompiles due to some
being XYZW and others being 0.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
The first field of a record in a UBO has the aligment of the record
itself.
Fixes piglit vs-struct-pad, fs-struct-pad, and (with the patch posted to
the piglit list that extends the test) layout-std140.
NOTE: The bit of strangeness with the version of visit_field without the
record_type poitner is because that method is pure virtual in the base
class. The original implementation of the class did this to ensure
derived classes remembered to implement that flavor. Now they can
implement either flavor but not both. I don't know a C++ way to enforce
that.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68195
Cc: "9.2 9.1" mesa-stable@lists.freedesktop.org
The outer-most record is passed into the visit_field method for
the first field. In other words, in the following structure:
struct S1 {
vec4 v;
float f;
};
struct S {
S1 s1;
S1 s2;
};
uniform Ubo {
S s;
};
s.s1.v would get record_type = S (because s1.v is the first non-record
field in S), and s.s2.v would get record_type = S1. s.s1.f and s.s2.f
would get record_type = NULL becuase they aren't the first field of
anything.
This new overload isn't used yet, but the next patch will add several
uses.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Cc: "9.2 9.1" mesa-stable@lists.freedesktop.org
For some reason, we didn't use this information even though the VS
backend has computed it (albeit poorly) for ages.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Unlike the FS, the VS backend already computed the binding table size.
However, it did so poorly: after compilation, it looked to see if any
pull constants/textures/UBOs were in use, and set num_surfaces to the
maximum surface index for that category. If the VS only used a single
texture or UBO, this overcounted by quite a bit.
The shader time surface was also noted at state upload time (during
drawing), not at compile time, which is inefficient. I believe it also
had an off by one error.
This patch computes it accurately, while also simplifying the code.
It also renames num_surfaces to binding_table_size, since num_surfaces
wasn't actually the number of surfaces used. For example, a VS that
used one UBO and no other surfaces would have set num_surfaces to
SURF_INDEX_VS_UBO(1) == 18, rather than 1. A bit of a misnomer there.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Computing the minimum size was easy, and done at compile-time for no
extra overhead here. Making the binding table smaller wastes less batch
space.
Adding the CACHE_NEW_WM_PROG dirty bit isn't strictly necessary, since
other atoms depend on it and flag BRW_NEW_SURFACES. However, it's best
to add it for clarity and safety. It shouldn't add any new overhead.
v2: Use binding_table_size, rather than max_surface_index.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
By tracking the maximum surface index used by the shader, we know just
how small we can make the binding table.
Since it depends entirely on the shader program, we can just compute
it once at compile time, rather than at binding table emit time (which
happens during drawing).
v2: Store binding_table_size, rather than max_surface_index, for
consistency with the VS (which needs to be able to represent 0
surfaces).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
SURF_INDEX_DRAW() has been the identity function since the dawn of time,
and both the shader code and binding table upload code relied on that,
simply using X rather than SURF_INDEX_DRAW(X).
Even if that continues to be true, using the macro clarifies the code.
The comment about draw buffers needing to be first in order for
headerless render target writes to work turned out to be wrong; with
this change, SURF_INDEX_DRAW can be changed to arbitrary indices and
everything continues working.
The confusion was over the "Render Target Index" field in the FB write
message header. If it were a binding table index, then RT 0 would have
to be at index 0 for headerless FB writes to work. However, it's
actually an index into the blend state table, so there's no problem.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: Paul Berry <stereotype441@gmail.com>
Now that we have the number of samplers available, we don't need to
iterate over all 16. This should be particularly helpful for vertex
shaders.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Previously, we computed sampler counts when generating the SAMPLER_STATE
table. By computing it earlier, we should be able to shorten a bunch of
loops.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
This allows us to avoid uploading the VS sampler state table if only the
fragment program changes.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Now, each shader stage has a sampler state table that only refers to the
samplers actually used by that problem. This should make the VS table
non-existant or very small.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Also upload separate sampler default/texture border color entries.
At the moment, this is completely idiotic: both tables contain exactly
the same contents, so we're simply wasting batch space and CPU time.
However, soon we'll only upload data for textures actually /used/ in
a particular stage, which will usually make the VS table empty and
very likely eliminate all redundancy. This is just a stepping stone.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Like the previous patch, this simply pushes direct access to brw->wm up
one level in the call chain. Rather than passing the whole array, we
just pass a pointer to the correct spot in the array, similar to what we
do for the actual sampler state structure.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
When we begin uploading separate sampler state tables for VS and FS,
we won't be able to use &brw->wm.sdc_offset[ss_index]. By passing it in
as a parameter, we push the problem up to the caller.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Currently, we only have a single sampler state table shared among all
stages, so we just copy wm.sampler_count into vs.sampler_count.
In the future, each shader stage will have its own SAMPLER_STATE table,
at which point we'll need these separate sampler counts.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
I believe the data flow analysis actually works now, and it should be
safe to re-enable global copy propagation. It even does things now.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Since the initial value for livein is an overestimation (0xffffffff),
it's extremely likely that it will shrink, which means we can't simply
OR in new bits - we need to fully recompute it based on the current
liveout values.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Since we start with an overestimation of livein (0xffffffff), successive
steps can actually take away values. This means we can't simply OR in
new liveout values; we need to recompute it from scratch at each
iteration of the fixed-point algorithm.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
The starting block always has livein = 0 and liveout = copy. Since we
start with real data, not estimates, there's no need to refine it with
the fixed point algorithm.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
The previous commit properly initialized liveout. This previous
(and incorrect) initialization is no longer necessary.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Previously, livein was initialized to 0 for all blocks. According to
the textbook, it should be the universal set (~0) for all blocks except
the one representing the start of the program (which should be 0).
liveout also needs to be initialized to COPY for the initial block.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
According to page 360 of the textbook, the proper formula for liveout
is:
CPout(n) = COPY(i) union (CPin(i) - KILL(i))
Previously, we omitted COPY.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Excluding the existing liveout bits is a deviation from the textbook
algorithm. The reason for doing so was to determine if the value
changed, which means the fixed-point algorithm needs to run for another
iteration.
The simpler way to do that is to save the value from step (N-1) and
compare it to the new value at step N.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
This is the "COPY" set from Muchnick's textbook, which is necessary
to do the dataflow algorithm correctly.
v2: Simplify initialization based on Paul Berry's observation that
out_acp contains exactly what needs to be in the COPY set.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Although this function currently only initializes the KILL set, it will
soon initialize other data flow sets as well.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
To compute the actual liveout/livein data flow values, we start with
some initial values and apply a fixed-point algorithm until they settle.
Previously, we iterated through all blocks, updating both liveout and
livein together in one pass. This is awkward, since computing livein
for a block requires knowing liveout for all parent blocks. Not all
of those parent blocks may have been processed yet.
This patch separates the two. First, we update liveout for all blocks.
At iteration N of the fixed-point algorithm, this uses livein values
from iteration N-1. Secondly, we update livein for all blocks. At
step N, this uses the liveout information we just computed (in step N).
This ensures each computation has a consistent picture of the data,
rather than seeing an random mix of data from steps N-1 and N depending
on the order of the blocks in the CFG data structure.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
This variable indicates that the fixed-point algorithm made changes to
the data at this step, so it needs to run for another iteration.
"progress" seems a nicer name for that.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
The fixed-point algorithm needs to run at least once, so a do-while loop
is more natural.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
The dataflow analysis used for global copy propagation is severely
broken, and I believe it doesn't actually do anything. Fixing it will
require a lot of changes, each of which might break things.
Once all the fixes land, we can re-enable this.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Any decent compiler will do this for us, although doing this
will make grepping through the code alot easier.
v2: In both mixer and query interface
v3: rebase
Reviewed-by: Christian König <christian.koenig@amd.com> [v1]
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Code should loop through and cleanup the three (VL_NUM_COMPONENTS) idct
buffers, rather than doing the first one three times.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Free any allocated memory and return BadAlloc if create_video_buffer()
has failed to create a buffer.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Not seen in the wild yet, but seems like a reasonable thing to do.
[suggested by Christian]
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
I was looking into some minor 422 issues/discrepencies I noticed long
ago using vdpau on my rv790.
I noticed that there is code that is halving height rather than width -
422 is full height AFAIK.
Making the changes below doesn't actually make any noticable difference
to what I was looking into.
Maybe there are more but here's three I've found so far
Reviewed-by: Christian König <christian.koenig@amd.com>
We are getting close to the maximum number of BRW_NEW_* bits that can
be stored in brw->state.dirty.brw without overflowing 32 bits, and
geometry shaders are going to add more. Add a STATIC_ASSERT so that
we will be alerted when we need to switch to 64 bits.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, we were asserting that each driver specified an NConfigOptions
exactly equal to the number of options they supplied, leading to frequent
bugs when people would forget to adjust the value when adjusting driver
options. Instead, just overallocate the table by a bit and leave sanity
checking to the assert in findOption().
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Consistently using a "The ___ driver hook." line at the the top of each
function's comment block makes it easy to see at a glance what function
is being implemented.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
This code upload performs batched uploads via a BO. By moving it out to
a separate file, intel_buffer_objects.c only provides the core buffer
object functionality.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
GL_APPLE_object_purgeable creates a mechanism for marking OpenGL objects
as "purgeable" so they can be thrown away when system resources become
scarce. It specifically applies to buffer objects, textures, and
renderbuffers.
The intel_buffer_objects.c file provides core functionality for GL
buffer objects, such as MapBufferRange and CopyBufferSubData. Having
texture and renderbuffer functionality in that file is a bit strange.
The 2010 copyright on the new file is because Chris Wilson first added
this code in January 2010 (commit 755915fa).
v2: Actually remember to call the new dd table setup function.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
The surface allocator understands the scanout flag just fine.
This seems to improve performance for Ubuntu Unity on top of st/xorg
and it fixes the cursor.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
This started as an attempt to add support for MSAA texture transfers and
MSAA depth-stencil decompression for the DB->CB copy path.
It has gotten a bit out of control, but it's for the greater good.
Some changes do not make much sense, they are there just to make it look
like the other driver.
With a few cosmetic modifications, r600_texture.c can be shared with
a symlink.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
This is glBlitFramebuffer support for MSAA surfaces as required by GL 3.0
and texturing as required by GL 3.2 and GL_ARB_texture_multisample.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
This is basic MSAA support which should work with most apps.
Some features are missing, those will be implemented by other commits.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
It moves all sampler view descriptors to a buffer.
It supports partial resource updates and it can also unbind resources
(required for FMASK texturing).
The buffer contains all sampler view descriptors for one shader stage,
represented as an array. On top of that, there are N arrays in the buffer,
which are used to emulate context registers as implemented by the previous
ASICs (each array is a context).
This uses the RCU synchronization approach to avoid read-after-write hazards
as discussed in the thread:
"radeonsi: add FMASK texture binding slots and resource setup"
CP DMA is used to clear the descriptors at context initialization and to copy
the descriptors from one context to the next.
v2: - use PKT3_DMA_DATA on CIK (I'll test CIK later)
- turn the bool CP DMA parameters into self-explanatory flags
- add a nice simple API for packet emission to radeon_winsys.h
- use 256 contexts, 128 causes texture corruption in openarena
It shouldn't be necessary to call radeon_winsys::cs_flush() from
radeonsi_launch_grid(), because the state tracker is responsible for
flushing the pipeline at the appropriate time. The current behavior is
also wrong, because radeonsi_launch_grid() submits packets to the
compute ring, but when the state tracker calls pipe->flush() everything
is submitted to the graphics ring. This has the potential to create a
race condition.
The downside of removing this flush is that the compute dispatch packets
will be sent to the graphics ring rather than the compute ring.
In the future we will need to come up with a way to detect 'compute'
command streams and submit them to the appropriate ring.
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Previously, INTEL_DEBUG=bat would dump messages like:
intel_mipmap_tree.c:1643: Batchbuffer flush with 456b used
This only reported the space used for command packets, and didn't
report any information on the space used for indirect state.
Now it dumps:
intel_context.c:366: Batchbuffer flush with 6128b (pkt) + 4288b (state)
= 10416b (31.8%)
This conveniently shows the breakdown of space used for packets vs.
state, as well as the percentage of batchbuffer space.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
We emit these before configuring depth in the normal path, or actually
using the depth buffer in BLORP - we just failed to emit them when
disabling depth altogether.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We emit these before configuring depth in the normal path, or actually
using the depth buffer in BLORP - we just failed to emit them when
disabling depth altogether.
On Sandybridge, this also requires the post_sync_nonzero flush.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously, copy propagation would cause bitcast_f2u(abs(float)) to
be performed in a single step, but the application of source modifiers
(abs, neg) happens after type conversion, leading to incorrect results.
That is, for bitcast_f2u(abs(float)) we would in fact generate code to
do abs(bitcast_f2u(float)).
For example, whereas bitcast_f2u(abs(float)) might result in a register
argument such as
(abs)g2.2<0,1,0>UD
v2: Set interfered = true and break in register_coalesce instead of
returning false.
Reviewed-by: Paul Berry <stereoytpe441@gmail.com>
Necessary to avoid combining a bitcast and a modifier into a single
operation. Otherwise if safe, the MOV should be removed by
copy-propagation or register coalescing.
With this and the next patch, there are only four changes in shader-db:
all a single extra instruction. The code does something like
mov a.w, -b.x
and copy propagation doesn't work because it only handles no-op
swizzles. Seems acceptable, given the known limitation of our copy
propagation.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Paul Berry <stereoytpe441@gmail.com>
Currently single sample scaled blits with GL_LINEAR filter falls
back to meta path. Patch removes this limitation in BLORP engine
and implements single sample scaled blit with bilinear filter.
No piglit, gles3 regressions are observed with this patch on Ivybridge.
V2: Use "sample" message to utilize the linear filtering functionality
built in to hardware.
V3: Define a bool variable (bilinear_filter) to handle the conditions
for GL_LINEAR blits.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
New function clamp_tex_coords() clamps the texture coordinates
to texture boundaries. This function will also be utilized later
for the BLORP implementation of single-sample scaled blit with
bilinear filter.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
When we talk about both multi-sample and single-sample scaled blits,
rect_grid_{x1, y1} are more appropriate variable names as compared
to sample_grid_{x1, y1}. There are no functional changes in this patch.
It just prepares for the BLORP implementation of single-sample scaled
blit with bilinear filter.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
This patch fixes a case of framebuffer blitting with renderbuffer
as color attachment and GL_LINEAR filter. Meta implementation of
glBlitFrambuffer() converts source color buffer to a texture and
uses it to do the scaled blitting in to destination buffer. Using
the exact source rectangle to create the texture does incorrect
linear filtering along the edges. This patch makes the changes to
extend the texture edges by one pixel in x, y directions. This
ensures correct linear filtering.
It fixes failing piglit fbo-attachments-blit-scaled-linear test.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
CC: "9.2" <mesa-stable@lists.freedesktop.org>
CC: "9.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
h264/mpeg4 remain disabled for pre-nvc0, there's some minor
bug/difference which causes the decoding to hang after some frames.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
The error code was changed from INVALID_VALUE to INVALID_OPERATION
in OpenGL 3.3. We should also generate an error when size is BGRA
and normalized is FALSE.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Sandybridge is the only platform that supports an IF instruction
with an embedded comparison. In this case, we need to emit a CMP
to go along with the SEL.
Fixes regressions in Piglit's glsl-fs-atan-3, fs-unpackHalf2x16,
fs-faceforward-float-float-float, isinf-and-isnan fs_basic, and
isinf-and-isnan fs_fbo.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68086
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Tested-by: lu hua <huax.lu@intel.com>
If clipdistance for one of the vertices is nan (or inf) then the
entire primitive should be discarded.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Doing the comparisons pre-filter is highly recommended by OpenGL (and d3d9)
and definitely required by d3d10.
This actually doesn't do it pre-filter but more "in-filter" as otherwise
need to push the comparisons even further down into fetch code and this
also trivially allows using a somewhat cheaper lerp.
Doing it pre-filter would actually have some performance advantage for UNORM
formats (because the comparisons should be done in texture format, we'd only
need to convert the shadow ref coord to texture format once, but in turn would
save converting the per-sample texture values to floats) but this gets a bit
messy as this has implications for border color handling as well (which needs
to be done prior to depth comparisons, hence would also need to convert border
color to texture format too or use some other tricks like doing separate border
color / shadow ref comparison and simply using that result directly when doing
border replacement).
Should make no difference for nearest filtering, and performance for linear
filtering should be mostly the same too (essentially have one more comparison
instruction per sample, and replace the sub/mul/add lerp with a sub/and/and/add
special "lerp" which all in all shouldn't be much of a difference).
v2: get rid of old code completely
Reviewed-by: Zack Rusin <zackr@vmware.com>
E.g. the Source engine seems to always write to gl_ClipVertex, but normally
doesn't enable any GL_CLIP_DISTANCEn states. This change removes some
irrelevant parts from the generated vertex shader code in such cases.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
This is a very well hidden bug found by accident (only the fixed glean
tstencil2 test so far seems to hit it).
We must use new mask with combined s_pass values and orig_mask values
for zpass/zfail stencil ops, otherwise both the sfail op and one of
zpass/zfail op are applied (probably not hit in most tests because
some of the ops tend to be KEEP usually).
Note: this is a candidate for the 9.2 branch.
Reviewed-by: Zack Rusin <zackr@vmware.com>
Should get rid of some float-to-int conversions (with negation).
No piglit regressions (with llvmpipe).
v2: fix bogus formatting spotted by Brian.
Reviewed-by: Brian Paul <brianp@vmware.com>
This also allows people who don't want to install the binary blobs
required for VP2 to still get MPEG decoding.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
There's no need to use a clip flag for NEGW on these gens, so
no reason we can't just enable 8 planes.
V2: - Bump (and document!) MAX_VERTS in the clip code.
- Fix clip flag masks in the clip unit state and in the shader
prolog
- Move this to the end of the series for less breakage.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
This does the same thing as we do for triangle clipping -- select the
appropriate source (either dot(hpos,fixed plane) or a clipdistance
slot).
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Nothing in the clipper uses gl_ClipVertex any more, so we don't care
where it is.
V2: Don't bother fishing out the clipvertex offset either.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Soon the dp4 is only going to be used for fixed clip planes.
V2: Remove old inaccurate comment about the behavior of this function;
add a better explanation above.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
V2: - Use the new VS_OPCODE_UNPACK_FLAGS_SIMD4X2 to correctly split the
flags for the two vertices being processed together.
- Don't apply bogus masking of clip flags. The set of plane enables
aren't included in the shader key, and we wouldn't want the
recompiles anyway.
V3: - Tidy up spurious instructions, name temps properly.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
[V2] Reviewed-by: Paul Berry <stereotype441@gmail.com>
Splits the bottom 8 bits of f0.0 for further wrangling
in a SIMD4x2 program. The 4 bits corresponding to the channels in each
program flow are copied to the LSBs of dst.x visible to each flow.
This is useful for working with clipping flags in the VS.
V3: - Fixup immediate types
- Teach scheduler about the hidden dep on flags
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
V2: Reviewed-by: Paul Berry <stereotype441@gmail.com>
We're about to have an instruction that depends on the flags but isn't
predicated. This lays the groundwork.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Previously we had disabled interpolation of the clip distances as a
special case, since they were unused.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
We need to produce clip flags for the vertex header on Gen4/5, so
clip plane lowering has to be done before we try to emit the flags/psiz
attribute.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
V2: We don't particularly care where they fall in the VUE map, as long
as they are allocated somewhere, and occupy two contiguous slots. Don't
fiddle with the SF layout at all -- there's no need.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Fixes build error introduced with commit
d1ba1055d9.
CC nouveau_video.lo
nouveau_video.c: In function 'nouveau_screen_get_video_param':
nouveau_video.c:866:33: error: 'screen' undeclared (first use in this function)
nouveau_video.c:866:33: note: each undeclared identifier is reported only once for each function it appear
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
This makes things a bit nicer, and more importantly it fixes an issue
where a "downgraded" array texture (due to view reduced to 1 layer and
addressed with (non-array) samplec instruction) would use the wrong
coord as shadow reference value. (This could also be fixed by passing
target through the sampler interface much the same way as is done for
size queries, might do this eventually anyway.)
And if we'd ever want to support (shadow) cube map arrays, we'd need
5 coords in any case.
v2: fix bugs (texel fetch using wrong layer coord for 1d, shadow tex
using wrong shadow coord for 2d...). Plus need to project the shadow
coord, and just for fun keep projecting the layer coord too.
Reviewed-by: Zack Rusin <zackr@vmware.com>
Instead of passing s,t,r coordinates pass a coord array - the reason is that
I need to pass more coords (in particular for shadow "coord", future will also
need another one for cube map arrays) so just pass them as an array.
Also, to simplify things, use fixed location for the shadow reference value I
want to get rid of the silly "where is the right coord value" game.
Keep old-style however for aos sampling (which is not going to need shadow
coord, though for cube map arrays it still would need fixing).
(Next patch will pass those through using the new arrangement directly from
sampler interface.)
v2: fix up soa split path (unreachable currently but still...)
Reviewed-by: Zack Rusin <zackr@vmware.com>
We need to put border color into texture format color space which
essentially means clamping for non-float, normalized formats (not entirely
sure if we're also meant to quantize the float but it's probably ok not to
do it thankfully).
For OpenGL we could do this easily outside generated code due to the
1:1 sampler/texture correspondence but not for d3d10 which is terrible
(as we recalculate a constant over and over again per shader invocation).
Fortunately border color should be rare enough that we don't care THAT much.
Reviewed-by: Zack Rusin <zackr@vmware.com>
If the fragment shader is null then pixel shader invocations have
to be equal to zero. And if we're running a null ps then clipper
invocations and primitives should be equal to zero but only
if both stancil and depth testing are disabled.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Calling the prepare outputs cleans up the slot assignments
for outputs, unfortunately aapoint and aaline didn't have
code to reset their slots after the initial setup, this
was messing up our slot assignments. The unfilled stage
was just missing the initial assignment of the face slot.
This fixes all of the reported piglit failures.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
In commit 8fc41df (glsl: Modify ir_set_program_inouts to handle
geometry shaders), when attempting to pattern match the "foo" part of
expressions such as:
foo[i][j]
foo[i]
I incorrectly called as_dereference_variable() on the subexpression
foo[i] instead of foo. As a result, the pattern never matched, so
ir_set_program_inouts would fall back on marking the entire variable
as used, rather than just the portion indexed by the array.
This didn't result in incorrect behaviour, but it could have resulted
in inefficiency by causing the back-end to allocate resources for
unused parts of an input or output array.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch adds the level query support to the video decoders
and uses some more reasonable defaults.
v2: (ck) add commit message
Reviewed-by: Christian König <christian.koenig@amd.com>
Previously we would emit a warning for empty declarations like
float;
We would also emit the same warning for things like
highp float;
However, this second case is most likely the application trying to set
the default precision. This makes the compiler generate a stronger
warning with some suggestion of a fix.
It really seems like this should be an error. I'll bet that 100% of the
time someone writes 'highp float;' the actually meant 'precision highp
float;'. Alas, both AMD and NVIDIA accept this syntax, and the spec
doesn't explicitly forbid it.
This makes piglit's precision-05.vert generate the following warnings:
0:12(11): warning: empty declaration with precision qualifier, to set the default precision, use `precision lowp float;'
0:13(12): warning: empty declaration with precision qualifier, to set the default precision, use `precision mediump int;'
v2: Add { } around a one-line if body and fix a comment. Suggested by
Ken.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "9.2" <mesa-stable@lists.freedesktop.org>
Previously, we were accidentally calling
handle_geometry_shader_input_decl() on non-input interface block
declarations, resulting in bogus error checking.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, if a geometry shader input was declared as a non-array, we
would flag the proper compiler error, but then before we got a chance
to report it to the client, handle_geometry_shader_input_decl() would
assertion fail.
With this patch, handle_geometry_shader_input_decl() ignores
non-arrays.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Move the arrays to the new header brw_multisample_state.h, which will be
shared with Broadwell code.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Place each array in the brw namespace by renaming it:
sample_positions_4x -> brw_multisample_positions_4x
sample_positions_8x -> brw_multisample_positions_8x
This prepares for moving the arrays to a header shared by gen6 and gen8.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
GLSL ES does not allow unsized arrays, and GLSL ES 1.00 does not allow
array initializers. However, GLSL ES 3.00 allows array initializers,
and the initializer can explicitly size the array. The specification
even includes some examples of this:
float x[] = float[2] (1.0, 2.0); // declares an array of size 2
float y[] = float[] (1.0, 2.0, 3.0); // declares an array of size 3
float a[5];
float b[] = a;
Move the unsized array check to after the initializer has been
processed. If the array is still unsized, generate the error. This
should have no effect in GLSL ES 1.00 because, as previously mentioned,
array initializers are not allowed.
Fixes piglit "glsl-es-3.00 compiler array-sized-by-initializer.vert".
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "9.1 9.2" <mesa-stable@lists.freedesktop.org>
The functional change is that now invalidate_framebuffer is called if
the texture is actually detached from one of the currently bound FBOs.
Previously this was only done for renderbuffers.
The remaining changes make the texture delete path look more similar to
the renderbuffer delete path. This includes adding relevant spec
quotations to justify the behavior.
Fixes piglit fbo-incomplete "delete texture of bound FBO" test.
v2: Move 'fb->Attachment[i].Texture == att' check from previous patch to
this patch... where it was intended to be in the first place. Noticed
by Chad.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Cc: "9.2" <mesa-stable@lists.freedesktop.org>
Also add a return value indicating whether any work was done.
This will be used by the next patch.
v2: Move 'fb->Attachment[i].Texture == att' check to the next
patch... where it was intended to be in the first place. Noticed by
Chad.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Cc: "9.2" <mesa-stable@lists.freedesktop.org>
Looks like the same issue that was seen with MULADD in trans slot on
R7xx also affects MULADD_IEEE (maybe all OP3 instructions and MULADD is
just a most frequently used?). So the workaround is to not allow affected
instructions to be placed into the trans slot.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=67927
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Cc: "9.2" <mesa-stable@lists.freedesktop.org>
FSEQ/FSGE/FSLT/FSNE work just the same as SEQ/SGE/SLT/SNE except skip the
select.
And just for consistency use the same appropriate ordered/unordered comparisons
for the old opcodes as well.
Reviewed-by: Zack Rusin <zackr@vmware.com>
Also while here add a bunch of other forgotten (integer) instructions to
tgsi_util_get_inst_usage_mask() (which isn't used for much except optimizing
away unused input components), though it may still be incomplete.
Reviewed-by: Zack Rusin <zackr@vmware.com>
Newer graphic languages don't want messy float mask results but instead true
"boolean" mask results for float comparisons. Otherwise just need to convert
the floats back to integers. Need to keep the old opcodes however due to both
legacy (gl and d3d9) needing them and because older hw can't really deal with
integers. These new FSEQ/FSGE/FSLT/FSNE opcodes are part of integer API and
hence must be supported if a driver claims to support glsl 1.30 (or
PIPE_SHADER_CAP_INTEGERS).
Reviewed-by: Zack Rusin <zackr@vmware.com>
Because we must maintain an exec_mask even if there's currently nothing
on the mask stack, we can still have an exec_mask at the end of the program.
Effectively, this mask should be set back to default when returning from main.
Without relying on END/RET opcode (I think it's valid to have neither) it is
actually difficult to do this, as there doesn't seem any reasonable place to
do it, so instead let's just say the exec_mask is invalid outside main (which
it really is effectively).
The problem is that geometry shader called end_primitive outside the shader
(in the epilogue), and as a result used a bogus mask, leading to bugs if we
had to set the (somewhat misnamed) ret_in_main bit anywhere. So just avoid
the mask combining function when called from outside the shader.
Reviewed-by: Zack Rusin <zackr@vmware.com>
The code was quite weird, the second comparison was in fact a complete no-op
and we can also do the comparison with the vector directly instead of scalar,
which should not also be faster but it is way more obvious how that mask
is actually going to look like.
Reviewed-by: Zack Rusin <zackr@vmware.com>
Instead of reducing masks to 0/1 simply use the mask directly as -1.
Also use some signed comparison instead of unsigned (as far as I understand
these values have to be (very) small and signed means llvm doesn't have to
apply additional logic to do the unsigned comparisons the cpu can't do).
Saves a couple of instructions in some test geometry shader here.
v2: that was a bit to much optimization, don't skip combining the masks...
Reviewed-by: Zack Rusin <zackr@vmware.com>
CMP instructions use BRW_ARF_NULL as a destination. Prior to this
patch, dump_instruction() decoded the destination as "???".
Now it decodes BRW_ARF_NULL as "(null)" and other ARFs numerically.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This resulted in printouts like:
246: cmp.cmod.f0.0
???, vgrf152, 0.000000f, (null),
With this patch, CMP is properly printed on one line.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Many GLSL shaders contain code of the form:
x = condition ? foo : bar
The compiler emits an ir_if tree for this, since each subexpression
might be a complex tree that could have side-effects and short-circuit
logic operations.
However, the common case is to simply pick one of two constants or
variable's values---which is exactly what SEL is for. Replacing IF/ELSE
with SEL also simplifies the control flow graph, making optimization
passes which work on basic blocks more effective.
The shader-db statistics:
total instructions in shared programs: 1655247 -> 1503234 (-9.18%)
instructions in affected programs: 949188 -> 797175 (-16.02%)
2,970 shaders were helped, none hurt. Gained 181 SIMD16 programs.
This helps Valve's Source Engine games (max -41.33%), The Cave
(max -33.33%), Serious Sam 3 (max -18.64%), Yo Frankie! (max -30.19%),
Zen Bound (max -22.22%), GStreamer (max -6.12%), and GLBenchmark 2.7
(max -1.94%).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The instruction
(+f0.0) SEL dst, src0, src1
will write either src0 or src1 to dst, depending on the predicate.
Unlike most predicated instructions, it always writes to dst.
fs_inst::is_partial_write() is supposed to return true if the whole
register is guaranteed to be written. The !inst->predicated check makes
sense for most instructions, which might not write the whole register,
but SEL is a special case.
This caused live interval analysis to ignore the destination of
predicated SEL instructions when computing "def" information.
Requires the previous commit to avoid regressions.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The existing inst->is_partial_write() already disallows predicated
instructions, so this has no functional change. However, it's worth
doing explicitly since the CSE pass does not consider the flag register.
This means it could blindly factor out operations that use the same
sources, but which have different condition codes set.
This prevents a regression in the next commit.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Usually, the driver creates both 8-wide and 16-wide variants of every
fragment shader. When 16-wide compilation fails, it logs a performance
warning explaining why only an 8-wide program exists.
However, when there are pull parameters, the driver won't even bother
trying the 16-wide compile (since it would fail). In this case, it
failed to emit a performance warning, leaving no explanation for the
missing 16-wide program.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Fix ilo_gpe_init_view_surface_for_buffer to allow buffer to be NULL, and add
ilo_gpe_set_view_surface_bo to set it later. This allows us to set up
SURFACE_STATE early for constant buffers backed by user buffers.
In finalize_index_buffer(), when the current index buffer was destroyed due to
u_upload_data(), it may happen that the new index buffer is at the same
address as the old one. Comparing the pointers to the two buffers could fail
to work, and 3DSTATE_INDEX_BUFFER would be incorrectly skipped.
Holding a reference to the current index buffer before calling u_upload_data()
should fix the problem.
Makes this flag appear in the output for INTEL_DEBUG=state
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
INTEL_DEBUG=vue now emits a listing of each slot in the VUE map,
and the corresponding interpolation mode.
V2: Fix whitespace issues.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
My previous attempt at doing so double-failed miserably (minification of
zero still gives one, and even if it would not the value was never written
anyway).
While here also rename the confusingly named int_vec bld as we have int vecs
of different sizes, and rename need_nr_mips (as this also changes out-of-bounds
behavior) to is_sviewinfo too.
Reviewed-by: Zack Rusin <zackr@vmware.com>
d3d10 has no notion of distinct array resources neither at the resource nor
sampler view level. However, shader dcl of resources certainly has, and
d3d10 expects resinfo to return the values according to that - in particular
a resource might have been a 1d texture with some array layers, then the
sampler view might have only used 1 layer so it can be accessed both as 1d
or 1d array texture (I think - the former definitely works). resinfo of a
resource decleared as array needs to return number of array layers but
non-array resource needs to return 0 (and not 1). Hence fix this by passing
the target from the shader decl to emit_size_query and use that (in case of
OpenGL the target will come from the instruction itself).
Could probably do the same for actual sampling, though it may not matter there
(as the bogus components will essentially get clamped away), possibly could
wreak havoc though if it REALLY doesn't match (which is of course an error
but still).
Reviewed-by: Zack Rusin <zackr@vmware.com>
Specifically, must return 0 for non-existent mip levels (and non-existent
textures which is an unsolved problem) for everything but total mip count.
Reviewed-by: Zack Rusin <zackr@vmware.com>
GLSL 1.50 incorporates the functionality of the
ARB_fragment_coord_conventions extension, so we need to make this
functionality available even if the extension isn't enabled.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
From section E.1 (Profiles and Deprecated Features of OpenGL 3.0)
of the OpenGL 3.0 spec:
"LineWidth is not deprecated, but values greater than 1.0
will generate an INVALID VALUE error"
From context it is clear that values greater than 1.0 should only
generate an INVALID VALUE error in a forward-compatible context.
The code was correctly quoting this spec text, but it was disallowing
all line widths in forward-compatible contexts, instead of just widths
greater than 1.0.
This patch introduces the correct check, so that setting a line width
of 1.0 or less is permitted.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We can't be injecting the primitive id's in the pipeline because
by that time the primitives have already been decomposed. To
properly number the primitives we need to handle the adjacency
primitives by hand. This patch moves the prim id injection into
the original primitive assembler and completely removes the
useless pipeline stage.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Without reseting the vertex id, with primitives where the same
vertex is used with different primitives (e.g. tri/lines strips)
our vbuf module won't re-emit those vertices with the changed
primitive id. So lets reset the vertex id whenever injecting
new primitive id to make sure that the vertex data is correctly
emitted.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Before inserting new front face and prim id outputs cleanup
the old extra outputs, otherwise our cache will use previous
output slots which will break as soon as outputs of the current
shader don't match the last.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
libEGL was incorrectly exporting *all* symbols, public and private.
This patch adds -fvisibility=hidden to libEGL's linker flags to ensure
that only symbols annotated with __attribute__((visibility("default")))
get exported.
Sanity-checked with libEGL's builtin DRI2 driver and the i965 DRI driver
by running Piglit on X/EGL and by running weston-gears on Weston as an
X client.
Sanity-checked with libEGL's Gallium driver (which is not built-in) and
the swrast Gallium driver by running es2gears_x11.
Kristian reviewed the symbol diff in `nm libEGL.so`.
CC: "9.2" <mesa-stable@lists.freedesktop.org>
CC: Ian Romanick <idr@freedesktop.org>
Acked-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
This is wrong both for OpenGL and d3d. (In fact clamping is a side effect
of converting to depth format, so this should really do quantization too
at least in d3d10 for the comparisons to be truly correct.)
Reviewed-by: Zack Rusin <zackr@vmware.com>
Clearly the returned values need to be per-element if the lod is per element.
Does not actually change behavior yet.
Reviewed-by: Zack Rusin <zackr@vmware.com>
This opcode is quite problematic in tgsi, while it tries to mirror
d3d10 resinfo it can't really do what's stated there due to missing
the crazy return type modifiers. Hence specify this is ignored along
with the swizzle.
(Other options would be to have multiple opcodes or specify the ret
type modifier maybe in dst_reg as there's padding bits left there but
it is the only instruction allowing this.)
Reviewed-by: Zack Rusin <zackr@vmware.com>
For d3d10 and ARB_robust_buffer_access_behavior, we are required to return
0 for out-of-bounds coordinates (for which we can just enable the code already
there was just disabled). Additionally, also need to return 0 for
out-of-bounds mip level and out-of-bounds layer. This changes the logic
so instead of clamping the level/layer, an out-of-bound mask is computed
instead in this case (actual clamping then can be omitted just like with
coordinates, since we set the fetch offset to zero if that happens anyway).
Reviewed-by: Zack Rusin <zackr@vmware.com>
While so far this only causes some harmless test failures, there's lots more
cpus with DAZ. All 64bit capable ones can do it (particularly relevant for
AMD cpus as they supported sse3 very very late) but if really necessary we
can check support for that for real with some more magic.
(In fact just about ANY cpu with sse2 can support DAZ, I believe the only
exception are first gen P4 (Willamette) and from those only early steppings
which can't do it it's almost like intel forgot to add it... - a real pity
though docs say you can't just try to set it as they will throw a GPF.)
While this was meant to address https://bugs.freedesktop.org/show_bug.cgi?id=67672
it does not fix it. Most likely the tests need fixing as I don't think
there's any guarantee about denorm handling in the reference math library
functions if the flags aren't set to standard values. Nevertheless enabling
DAZ on all cpus which can do it should be the right thing to do.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Should be much faster, seems to work in softpipe.
While here (also it's now disabled) fix up the pow factor - the former value
is what is in GL core it is however not actually accurate to fp32 standard
(as it is 1.0/2.4), and if someone would do all the accurate math there's no
reason to waste 8 mantissa bits or so...
v2: use real table generating function instead of just printing the values
(might take a bit longer as it does calculations on some 3+ million floats
but much more descriptive obviously).
Also fix up another inaccurate pow factor (this time in the python code) -
wondering where the couple one bit errors came from :-(.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Zack Rusin <zackr@vmware.com>
Since Wayland 1.2, struct wl_buffer and a few functions are deprecated.
References to wl_buffer are replaced with wl_resource and some getter
functions and calls to deprecated functions are replaced with the proper
new API. The latter changes are related to resource versioning.
Signed-off-by: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>
A listener is added just after the interface is bound, in
registry_handle_global().
Signed-off-by: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>
The helper provides a series of functions to easy the implementation
of the WL_bind_wayland_display extension on different platforms. But
even with the helpers there was still a bit of duplicated code between
platforms, with the drm authentication being the only part that
differs.
This patch changes the bufmgr interface to provide a self contained
object with a create function that takes a drm authentication callback
as an argument. That way all the helper functions are made static and
the "_helper" suffix was removed from the sources file name.
This change also removes the mix of Wayland client and server code in
the wayland drm platform source file. All the uses of libwayland-server
are now contained in native_wayland_drm_bufmgr.c.
Changes to the drm platform are only compile tested.
Signed-off-by: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>
Since llvm -3.4svn r187618, TargetOptions doesn't provide
RealignStack, so only enable it with llvm<3.4
This option must now be specified using function attributes, see LLVM
commit r187618
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
This command reads a value from memory and writes it to a register (the
opposite of MI_STORE_REGISTER_MEM). It's only available on Gen7+.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
This prevents a crash in a future patch.
_mesa_initialize_context() creates a default transform feedback object
by calling the NewTransformFeedbackObject() driver hook. Eventually,
we'll want to subclass that and allocate a buffer object. This means
passing brw->bufmgr to drm_intel_alloc_bo(), and crashing if it isn't
initialized yet.
The buffer manager is actually already initialized; we just hadn't
copied the pointer from intel_screen to intel_context quite early
enough.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Gen7+ supports four transform feedback streams. Using a function-like
macro makes it easy to access them by stream number or loop over them.
"GEN7_" prefixes are more common than "_IVB" suffixes, so use that.
Gen6 only supports a single stream, so the single #define should be
fine. However, SO_NUM_PRIM_STORAGE_NEEDED was a poor name. For one,
the word "NUM" doesn't appear in the actual name of the register.
It's also confusingly generic, as it doesn't exist on Gen7+. Add a
"GEN6_" prefix for clarity.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Gen7+ supports four transform feedback streams. Using a function-like
macro makes it easy to access them by stream number or loop over them.
"GEN7_" prefixes are more common than "_IVB" suffixes, so we use that.
Gen6 only supports a single stream, so the single #define should be
fine. However, SO_NUM_PRIMS_WRITTEN was confusingly generic, as it
doesn't exist on Gen7+. Add a "GEN6_" prefix for clarity.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Previously only the slice of a 3D texture was validated in the FBO
completeness check. This fixes the failure in the 'invalid layer of an
array texture' subtest of piglit's fbo-incomplete test.
v2: 1D_ARRAY textures have Depth == 1. Instead, compare against Height.
v3: Handle CUBE_MAP_ARRAY textures too. Noticed by Marek.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "9.1 9.2" mesa-stable@lists.freedesktop.org
This fixes the segfault in the 'invalid slice of 3D texture' and
'invalid layer of an array texture' subtests of piglit's fbo-incomplete
test.
The 'invalid layer of an array texture' subtest still fails.
v2: Fix off-by-one comparison error noticed by Chris Forbes. Also,
1D_ARRAY textures have Depth == 1. Instead, compare against Height.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> [v1]
Cc: "9.1 9.2" mesa-stable@lists.freedesktop.org
Allow user-generated names for glBindFramebufferEXT on desktop GL.
Disallow its use altogether for core profiles.
Names bound with glBindFramebuffer in desktop OpenGL are still
(incorrectly) shared across the share group instead of being
per-context. This gets us a bit closer to being strictly conformant.
v2: Disallow glBindFramebufferEXT in 3.1 by not installing it in the
dispatch table. Suggested by Jordan.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> [v1]
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> [v1]
Cc: mesa-stable@lists.freedesktop.org
Allow user-generated names for glBindRenderbufferEXT on desktop GL.
Disallow its use altogether for core profiles.
v2: Disallow glBindRenderbufferEXT in 3.1 by not installing it in the
dispatch table. Suggested by Jordan.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> [v1]
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> [v1]
Cc: mesa-stable@lists.freedesktop.org
These are only used on Gen4-5. Why waste the 8kB of space?
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
If gs is null, then freeing state->shader.tokens would result in a null
dereference.
Fixes "Dereference after null check" defect reported by Coverity.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Fixes "Uninitialized pointer read" defect reported by Coverity.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
unfilled_stage::face_slot is of type int.
Fixes "Unsigned compared against 0" defect reported by Coverity.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
pp_free_bos dereferences ppq without a null check.
Fixes "Dereference before null check" defect reported by Coverity.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
the issue is that stream output is run before the pipeline, which
means that unless we decompose the primitives before the so
then things crash. we could convert the entire stream output
code into a pipeline stage but it will take a bit, so for now
fix the crashes by simply re-adding the old input assembler
which is run before the SO.
Signed-off-by: Zack Rusin <zackr@vmware.com>
we used to have a face primitive assembler that we ran after if
the gs was missing but we had adjacency primitives in the pipeline,
lets convert it to a pipeline stage, which allows us to use it
to inject outputs (primitive id) into the vertices. it's also
a lot cleaner because the decomposition is already handled for us.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Inject front face only if the fragment shader uses it and
propagate through all channels because otherwise we'll
need to figure out the exact swizzle that the fs expects and
it's just simpler to make sure all the components within
the front face register are correctly set.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Replace "fulldecl->Semantic.Name/Index" with semName/semIndex.
Simplify if/else logic for TGSI_FILE_OUTPUT code.
Remove old comment.
Fix indentation.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Previously we would mark a renderbuffer as needing a depth resolve.
But, to support layered rendering, we need to look at the attachment
instead, since the attachment knows if layered rendering is being
used.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
This function is needed to support layered rendering. With
layered rendering, the attachment stores the state of whether
layered rendering is being used.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
When layered rendering is being used, we should not set
FORCE_ZERO_RTAINDEX in the clip state to allow render target
array values other than zero to be used.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
This restriction was related to programming the offset fields
of the depth buffer packet. We are now setting these offsets
to 0 now, so this restriction should no longer be required.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Previously we would always find the 2D sub-surface of interest,
and then program the surface to this location. Now we always
program the 3DSTATE_DEPTH_BUFFER at the start of the surface.
To select the lod/slice, we utilize the lod & minimum array
element fields.
As part of this change, we must revert 1f112ccf:
Revert "i965/gen7: Align all depth miplevels to 8 in the X direction."
We also must disable brw_workaround_depthstencil_alignment for
gen >= 7. Now the hardware will handle alignment when rendering
to additional slices/LODs.
v2:
* Merge with recent MOCS changes
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
For gen >= 7, we will use the lod/minimum-array-element fields to
support layered rendering. This means that we must restrict
the depth & stencil attachments to match in various more retrictive
ways. (Now the width, height, depth, LOD and layer must match)
The reason width, height, and depth must match is that the hardware
has a single set of width, height, and depth settings (in
3DSTATE_DEPTH_BUFFER) that affect both the depth and stencil buffers.
Since these controls determine the miptree layout, they need to be
set correctly in order for lod and minimum-array-element to work
properly. So the only way rendering can work is if the width,
height, and depth match.
In the future, if this restriction proves to be a problem (say
because some crucial client application relies on rendering to
different levels/layers of stencil and depth buffers), then we can
always work around the restriction by copying depth and/or stencil
data to a temporary buffer prior to rendering (much in the same way
that brw_workaround_depthstencil_alignment() does today for
gen < 7), but hopefully that won't be necessary.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
When performing hiz ops, we must ensure that the region sizes
have an 8 aligned width and 4 aligned height. We can tweak the
size for blorp hiz operations at LOD 0, but for the others we
can't. Therefore, we disable hiz for these miplevels if they
don't meet the size alignment requirements.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
This will be used in 3DSTATE_DEPTH_BUFFER in a later patch.
Note: Cube maps are treated as 2D arrays with 6 times as
many array elements as the cube map array would have.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
This will be used in 3DSTATE_DEPTH_BUFFER in a later patch.
Note: Cube maps are treated as 2D arrays with 6 times as
many array elements as the cube map array would have.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
In a future pass this will allow us to exit-early from this
routine to disable it for gen >= 7.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Some videos specify mb_adaptive_frame_field_flag instead of
field_pic_flag. This implies that the pic height needs to be halved, and
this field needs to be passed to the VP engine.
Cc: "9.2" mesa-stable@lists.freedesktop.org
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
The loop was iterating over all the fs inputs and setting them
to perspective interpolation, then after the loop we were
creating extra output slots with the correct interpolation. Instead
of injecting bogus extra outputs, just set the interpolation
on front face and prim id correctly when doing the initial scan
of fs inputs.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
clipping would drop the extra outputs because it always
used the number of standard vertex shader outputs, without
geometry shader or extra outputs. The commit makes sure
that clipping with geometry shaders which have more outputs
than the current vertex shader and with extra outputs correctly
propagates the entire vertex.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Draw module can decompose primitives into wireframe models, which
is a fancy word for 'lines', unfortunately that decomposition means
that we weren't able to preserve the original front-face info which
could be derived from the original primitives (lines don't have a
'face'). To fix it allow draw module to inject a fake face semantic
into outputs from which the backends can figure out the original
frontfacing info of the primitives.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Draw sometimes injects extra shader outputs (aa points, lines or
front face), unfortunately most of the pipeline and llvm code
didn't handle them at all. It only worked if number of inputs
happened to be bigger or equal to the number of shader outputs
plus the extra injected outputs. In particular when running
the pipeline which depends on the vertex_id in the vertex_header
things were completely broken. The patch adjust the code to
correctly use the total number of shader outputs (the standard
ones plus the injected ones) to make it all stop crashing and
work.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Instead of using the magical 4 use the above computed
vertex size. Doesn't change the behavior, just makes the code
a bit cleaner.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
when dumping shader outputs it's nice to have the integer
values of the outputs, in particular because some values
are integers.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Adding code to detect the usage of prim id and front face
semantics in fragment shaders.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
we forgot to add ucmp to the list of opcodes, so it was never
generated for ureg.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The spec says that front-face is true if the value is >0 and false
if it's <0. To make sure that we follow the spec, lets just
subtract 0.5 from our value (llvmpipe did 1 for frontface and 0
otherwise), which will get us a positive num for frontface and
negative for backface.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
We'll need proper values for max_gs_threads when we eventually support
geometry shaders. Also, we initialize it for every other platform.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Commit 2548092ad8 switched the sense of interpolation qualifier
checks in order to permit them on geometry shader in/out variables.
In doing so, it accidentally allowed interpolation qualifiers to be
applied to ordinary variables and function parameters.
Fixes a regression in Piglit's local-smooth-01.frag.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Commit 7cfefe6965 introduced a check for whether linked->Type equals
GL_GEOMETRY_SHADER. However, linked may be NULL due to an earlier error
condition.
Since the entire function after the error path is (or should be) guarded
by linked != NULL checks, we may as well just return early and remove
the checks.
Fixes crashes in 9 Piglit tests.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
v2:
- upon success close the given file descriptors
v3:
- use specific entry for dma buffers instead of the basic for
primes, and enable the extension based on the availability
of the hook
v4 (Chad):
- use ARRAY_SIZE
- improve the comment about the number of file descriptors
- in case of invalid format report EGL_BAD_ATTRIBUTE instead
of EGL_BAD_MATCH
- take into account specific error set by the driver.
v5:
- fix error handling
v6 (Chad):
- fix invalid plane count checking
v7 (Chad):
- fix indentation and reset loop counter before checking
for excess attributes
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Memory originating outside mesa stack is meant to be for reading
only. In addition, the restrictions imposed by the image external
extension should apply. For example, users shouldn't be allowed
to generare mip-trees based on these images.
v2 (Chad): document using full extension names, fix the comment
style itself and emit description of error
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
As specified in:
http://www.khronos.org/registry/egl/extensions/EXT/EGL_EXT_image_dma_buf_import.txt
Checking for the valid fourcc values is left for drivers avoiding
dependency to drm header files here.
v2: enforce EGL_NO_CONTEXT
v3: declare the extension as EGL (not GLES)
v4: do not update eglext.h manually but rely on update from
Khronos instead
v5: (Eric) report invalid context as EGL_BAD_PARAMETER instead of as
EGL_BAD_CONTEXT
v6: (Chad) fix the checking for valid hints. Before all values were
rejected.
v7: (Chad) comment style change from
/**
* Multi-
* line
into
/* Multi-
* line
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
v2: do not break ABI, but instead introduce new entry point for
dma buffers and bump up the dri-interface version to eight
v3 (Chad): allow the hook to specify an error originating from the
driver. For now only unsupported format is considered.
I thought about rejecting the hints also as they are
addressing only YUV sampling which is not supported at
the moment but then thought against it as the spec is
not saying one way or the other.
v4 (Eric, Chad): restrict to rgb formatted only
v5: rebased on top of i915/i965 split
v6 (Chad): document using full extension name
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Geometry shader support in the Mesa front end is still fairly
preliminary. Many features are untested, and the following things are
known not to work:
- The gl_in interface block
- The gl_ClipDistance input
- Transform feedback of geometry shader outputs
- Constants that are new in GLSL 1.50 (e.g. gl_MaxGeometryInputComponents)
This isn't a problem, since no back-end drivers currently enable
geometry shaders. However, to make sure no one gets the wrong
impression, emit a nasty warning to let the user know that geometry
shader support isn't complete.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Section 4.3.8.1 (Input Layout Qualifiers) of the GLSL 1.50 spec
contains some tricky rules for how the sizes of geometry shader input
arrays are related to the input layout specification. In essence,
those rules boil down to the following:
- If an input array declaration does not specify a size, and it
follows an input layout declaration, it is sized according to the
input layout.
- If an input layout declaration follows an input array declaration
that didn't specify a size, the input array declaration is given a
size at the time the input layout declaration appears.
- All input layout declarations and input array sizes must ultimately
match. Inconsistencies are reported as soon as they are detected,
at compile time if the inconsistency is within one compilation unit,
otherwise at link time.
- At least one compilation unit must contain an input layout
declaration.
(Note: the geom_array_resize_visitor class was contributed by Bryan
Cain <bryancain3@gmail.com>.)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
From the GLSL ES 3.00 spec:
"All indexes used to index a uniform block array must be constant
integral expressions."
Similar text exists in GLSL specs since 1.50.
When we implemented this, the only type of interface block supported
by Mesa was uniform blocks, so we required all indexes used to index
any interface block to be constant integral expressions.
Now that we are adding interface block support for GLSL 1.50, we need
a more specific check.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This gets piglit's geometry-basic test running.
TODO: Still need to validate that the GS layout qualifiers don't get used
in places they shouldn't (like an interface block, or a particular shader
input or output)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Next step is to validate them at link time.
v2 (Paul Berry <stereotype441@gmail.com>): Don't attempt to export the
layout qualifiers in the event of a compile error, since some of them
are set up by ast_to_hir(), and ast_to_hir() isn't guaranteed to have
run in the event of a compile error.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v3 (Paul Berry <stereotype441@gmail.com>): Use PRIM_UNKNOWN to
represent "not set in this shader".
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Limited semantic checking (compatibility between declarations, checking
that they're in the right shader target, etc.) is done.
v2: Remove stray debug printfs.
v3 (Paul Berry <stereotype441@gmail.com>): Process input layout
qualifiers at ast_to_hir time rather than at parse time, since certain
error conditions depend on the relative ordering between input layout
qualifiers, declarations, and calls to .length().
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We do some tests of qualifiers using a union containing an int and the
struct full of bitfields, so make sure the bitfields don't spill
outside the int.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
From section 2.15 (Geometry Shaders) the OpenGL 3.2 spec:
A program object that includes a geometry shader must also include
a vertex shader; otherwise a link error will occur.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
ARB_geometry_shader4 spec Errors:
"The error INVALID_VALUE is generated by ProgramParameteriARB if <pname>
is GEOMETRY_VERTICES_OUT_ARB and <value> is negative."
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In geometry shaders, outputs are consumed at the time of a call to
EmitVertex() (as opposed to all other shader types, where outputs are
consumed when the shader exits). Therefore, when packing geometry
shader output varyings using lower_packed_varyings, we need to do the
packing at the time of the EmitVertex() call.
This patch accomplishes that by adding a new visitor class,
lower_packed_varyings_gs_splicer, which is responsible for splicing
the varying packing code into place wherever EmitVertex() is found.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This patch modifies lower_packed_varyings to store the packing code it
generates in a temporary exec_list, and then splice that list into the
shader's main() function when it's done. This paves the way for
supporting geometry shader outputs, where we'll have to splice a clone
of the packing code before every call to EmitVertex().
As a side benefit, varying packing code is now emitted in the same
order for inputs and outputs; this should make debug output a little
easier to read.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Since geometry shader inputs are arrays (where the array index
indicates which vertex is being examined), varying packing needs to
treat them differently.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
From section 4.3.4 (Inputs) of the GLSL 1.50 spec:
Geometry shader input variables get the per-vertex values written
out by vertex shader output variables of the same names. Since a
geometry shader operates on a set of vertices, each input varying
variable (or input block, see interface blocks below) needs to be
declared as an array.
Therefore, the element type of each geometry shader input array should
match the type of the corresponding vertex shader output.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The documentation for gl_shader_program.Geom and gl_geometry_program
says that the former is copied to the latter at link time, but this
wasn't happening. This patch causes _mesa_ir_link_shader() to perform
the copy, and updates comment accordingly.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch creates a single function to copy the the UsesClipDistance
flag from gl_shader_program.Vert to gl_vertex_program. Previously
this logic was duplicated in the i965-specific function
brw_link_shader() and the core mesa function _mesa_ir_link_shader().
This logic will have to be expanded to support geometry shaders, and I
don't want to have to update it in two separate places.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This commit adds all of the parsing and semantics for GLSL 150 style
geometry shaders.
v2 (Paul Berry <stereotype441@gmail.com>): Add a few missing calls to
get_pipeline_stage(). Fix some signed/unsigned comparison warnings.
Fix handling of NULL consumer in assign_varying_locations().
v3 (Bryan Cain <bryancain3@gmail.com>): fix indexing order of 2D
arrays. Also, allow interpolation qualifiers in geometry shaders.
v4 (Paul Berry <stereotype441@gmail.com>): Eliminate
get_pipeline_stage()--it is no longer needed thanks to 030ca23 (mesa:
renumber shader indices according to their placement in pipeline).
Remove 2D stuff. Move vertices_per_prim() to ir.h, so that it will be
accessible from outside the linker. Remove
inject_num_vertices_visitor. Rework for GLSL 1.50.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
v5 (Paul Berry <stereotype441@gmail.com>): Split out
do_set_program_inouts() argument refactoring to a separate patch.
Move geom_array_resizing_visitor to later in the series.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
There's no reason to be clever about this. By making separate
allocations for vertex and fragment shaders, we'll allow geometry
shaders to be added without introducing any complication.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2 (Paul Berry <stereotype441@gmail.com>): Account for rework of
builtin_variables.cpp. Use INTERP_QUALIFIER_FLAT for gl_PrimitiveID
so that it will obey provoking vertex conventions. Convert to GLSL
1.50 style geometry shaders.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
v3 (Paul Berry <stereotype441@gmail.com>): Be less obscure about
setting interpolation field of gl_Primitive variables.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These correspond to the EmitVertex and EndPrimitive functions in GLSL.
v2 (Paul Berry <stereotype441@gmail.com>): Add stub implementations of
new pure visitor functions to i965's vec4_visitor and fs_visitor
classes.
v3 (Paul Berry <stereotype441@gmail.com>): Rename classes to be more
consistent with the names used in the GL spec.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, we assumed that the only way Mesa would expose geometry
shader support was via the ARB_geometry_shader4 extension. But this
extension has some extra complications over GL 3.2 (interactions with
compatibility-only features, and link-time initialization of the
constant gl_VerticesIn). So we want to allow for the possibility of
supporting GL 3.2 (with GLSL 1.50 style geometry shaders) even if
ctx->Extensions.ARB_geometry_shader4 is false.
This patch adds a new function, _mesa_has_geometry_shaders(), which
returns true if either ARB_geometry_shader4 is supported or the GL
version is at least 3.2 desktop. Since compute_version() only enables
GL 3.2 functionality when GLSL 1.50 support is present, a sufficient
way for a back-end to advertise geometry shader support is to set
ctx->Const.GLSLVersion >= 150.
v2: Remove unnecessary ctx->Const.GeometryShaders150 constant.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We can't just use a ".glsl" file since the Lod variants are only
available in vertex and geometry shaders, while the bias variants are
only available in the fragment shader.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Commit 586b4b5 (glsl: Also update implicit sizes of varyings at link
time) extended update_array_sizes() to apply to both uniforms and
shader ins/outs. However, doing creates problems for geometry
shaders, because update_array_sizes() assumes that variables with
matching names in different parts of the pipeline should have the same
sizes. With the addition of geometry shaders, this is no longer true
(e.g. both vertex and geometry shaders have a gl_ClipDistance output
variable, but there's no reason these variables should have the same
sizes).
The original reason for commit 586b4b5 (avoid problems with
gl_TexCoord being 0 length) has since been addressed by commit 6f53921
(linker: Ensure that unsized arrays have a size after linking). So go
ahead and switch update_array_sizes() back to only acting on uniforms.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
According to GLSL, indexing into an array or matrix with an
out-of-range constant results in a compile error. However, indexing
with an out-of-range value that isn't constant merely results in
undefined results.
Since optimization passes (e.g. loop unrolling) can convert
non-constant array indices into constant array indices, it's possible
that ir_set_program_inouts will encounter a constant array index that
is out of range; if this happens, just mark the whole array as used.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The code in ir_set_program_inouts that marks just a portion of a
variable as used (rather than the whole variable) only works on a few
kinds of indexing operations:
- Indexing into matrices
- Indexing into arrays of matrices, vectors, or scalars.
Fortunately these are the only kinds of indexing operations that we
expect to see; everything else is either handled by a
previously-executed lowering pass or prohibited by GLSL.
However, that could conceivably change in the future (the GLSL rules
might change, or we might modify the lowering passes). To avoid
mysterious bugs in the future, let's have ir_set_program_inouts report
an assertion failure if it ever encounters an unexpected kind of
indexing operation (and in release builds, fall back to just marking
the whole variable as used).
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch extracts the functions mark_whole_variable() and
try_mark_partial_variable() from the ir_set_program_inouts visitor
functions. This will make the code easier to follow when we add
geometry shader support.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Our previous justification for leaving this function out of glsl_type
was that it implemented counting rules that were specific to GLSL
1.50. However, these counting rules also describe the number of
varying slots that Mesa will assign to a varying in the absence of
varying packing. That's useful to be able to compute from outside of
the linker code (a future patch will use it from
ir_set_program_inouts.cpp). So go ahead and move it to glsl_type.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This will allow us to add geometry shader support without having to
add another boolean argument.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
llvm shifts are undefined for shift counts exceeding (or matching) bit width,
so need to apply a mask for the tgsi shift instructions.
v2: only use mask for the tgsi shift instructions, not for the build shift
helpers. None of the internal callers need this behavior, and while llvm can
optimize away the masking for constants there are legitimate cases where it
might not be able to do so even if we know that shift count must be smaller
than type width (currently all such callers do not use the build shift
helpers).
Reviewed-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
c shifts are undefined for shift counts exceeding (or matching) bit width,
so need to apply a mask (on x86 it actually would usually probably work as
shifts do masking on int domain shifts - unless some auto-vectorizer would
come along at last as simd domain does not mask the shift count).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Previously, nothing was said what happens with shift counts exceeding
bit width of the values to shift. In theory 3 behaviors are possible:
1) undefined (classic c definition)
2) just shift out all bits (so result is zero, or -1 potentially for ashr)
3) mask the shift count to bit width - 1
API's either require 3) or are ok with 1). In particular, GLSL (as well as a
couple uninteresting legacy GL extensions) is happy with undefined, whereas
both OpenCL and d3d10 require 3). Consequently, most hw also implements 3).
So, for simplicity we just specify that 3) is required rather than saying
undefined and then needing state trackers to work around it.
Also while here specify shift count as a vector, not scalar. As far as I
can tell this was a doc bug, neither state trackers nor drivers used scalar
shift count.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This hasn't done anything in a long time, and it's only used in a couple
places...which means we couldn't use it without doing a bunch of work
anyway.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Previously, if TEXTURE_IMMUTABLE_FORMAT was TRUE, the levels were allowed to
be set like usual, but ARB_texture_storage states:
> if TEXTURE_IMMUTABLE_FORMAT is TRUE, then level_base is clamped to the range
> [0, <levels> - 1] and level_max is then clamped to the range [level_base,
> <levels> - 1], where <levels> is the parameter passed the call to
> TexStorage* for the texture object
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Corey Richardson <corey@octayn.net>
This patch ensures that integers will pass through unscathed. Doing
(useless) computations on them is risky, especially when their bit
patterns correspond to values like inf or nan.
[V1-2]: Signed-off-by: Olivier Galibert <galibert at pobox.com>
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Adds support for interpolating noperspective varyings linearly in screen
space when clipping.
Based on Olivier Galibert's patch from last year:
http://lists.freedesktop.org/archives/mesa-dev/2012-July/024341.html
At this point all -fixed and -vertex interpolation tests work.
V5: Add brw_clip_compile.has_noperspective_shading rather than another
key flag.
V6: Real bools.
[V1-2]: Signed-off-by: Olivier Galibert <galibert at pobox.com>
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Acked-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously we only gave special treatment to the builtin color varyings.
This patch adds support for arbitrary flat-shaded varyings, which is
required for GLSL 1.30.
Based on Olivier Galibert's patch from last year:
http://lists.freedesktop.org/archives/mesa-dev/2012-July/024340.html
V5: Move key.do_flat_shading to brw_clip_compile.has_flat_shading
V6: Real bools.
[V1-2]: Signed-off-by: Olivier Galibert <galibert at pobox.com>
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously the SF only handled the builtin color varying specially.
This patch generalizes that support to cover user-defined varyings,
driven by the interpolation mode array set up alongside the VUE map.
Based on the following patches from Olivier Galibert:
- http://lists.freedesktop.org/archives/mesa-dev/2012-July/024335.html
- http://lists.freedesktop.org/archives/mesa-dev/2012-July/024339.html
With this patch, all the GLSL 1.3 interpolation tests that do not clip
(spec/glsl-1.30/execution/interpolation/*-none.shader_test) pass.
V5: Move key.do_flat_shading to brw_sf_compile.has_flat_shading; drop
vestigial hunks.
V6: Real bools.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The interpolation map (in brw->interpolation_mode) is a new auxiliary
structure alongside the post-GS VUE map, which describes the
interpolation modes for each VUE slot, for use by the clip and SF
stages.
This patch introduces a new state atom to compute the interpolation map,
and adjusts the program keys for the clip and SF stages, but it is not
actually used yet.
[V1-2]: Signed-off-by: Olivier Galibert <galibert at pobox.com>
V3: Updated for vue_map changes, intel -> brw merge, etc. (Chris Forbes)
V4: Compute interpolation map as a new state atom rather than tacking it
on the front of the clip setup
V5: Rework commit message, make interpolation_mode_map a struct.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We recently proposed a new syntax for stable-patch nominations such as:
CC: "9.2 and 9.1" <mesa-stable@lists.freedesktop.org>
and this has already appeared in the wild.
So we extend the regular expression to pick this up as well.
MP_TEMP_SIZE must be aligned to 0x8000, while TEMP_SIZE on NVE4_3D
must be aligned to 0x20000, so perform both alignments to be sure
we allocate enough space (actually the bo will most likely use 128
KiB pages and not aligning to that would be a waste anyway).
Cc: "9.2" mesa-stable@lists.freedesktop.org
YYLEX_PARAM is no longer supported as of Bison 3.0. Instead, the Bison
developers recommend using %lex-param.
%lex-param takes a type and variable name, similar to %parse-param,
so you can't pass an arbitrary expression like state->scanner. But Flex
insists on passing the actual scanner object, not an arbitrary object
like state.
To solve this, the parser defines a wrapper lex() function which accepts
"state," and calls Flex's lex() function with state->scanner.
Fixes the build with Bison 3.0. Also works with Bison 2.7.1.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=67354
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Tested-by: Laurent Carlier <lordheavym@gmail.com>
Cc: "9.2" mesa-stable@lists.freedesktop.org
YYLEX_PARAM is no longer supported as of Bison 3.0. Instead, the Bison
developers recommend using %lex-param.
%lex-param takes a type and variable name, similar to %parse-param,
so you can't pass an arbitrary expression like state->scanner. But Flex
insists on passing the actual scanner object, not an arbitrary object
like state.
To solve this, the parser defines a wrapper lex() function which accepts
"state," and calls Flex's lex() function with state->scanner.
Fixes the build with Bison 3.0. Also works with Bison 2.7.1.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=67354
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Tested-by: Laurent Carlier <lordheavym@gmail.com>
Cc: "9.2" mesa-stable@lists.freedesktop.org
I had removed it in commit 1e7776ca2b
because it was obviously wrong -- why do we care whether the server is a
version that emits events, if we're not watching for the server's events,
anyway? And why would you only invalidate on a server that emits
invalidate events, when the comment said to emit invalidates if the server
*doesn't*? Only, I missed that we otherwise don't flag that our buffers
might have changed at swap time at all, so the driver was only checking
for new buffers when triggered by the Viewport hack. Of course you don't
expect Viewport to be called after a swap.
So, this is effectively a revert of the previous commit, except that I
dropped the check for only emitting invalidates on a new server -- we
*always* need to invalidate if we're doing a SwapBuffers.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=63435
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "9.1 and 9.2" <mesa-stable@lists.freedesktop.org>
Previously we were using truncation, which gives the correct result
only for numbers in [0.5-1.0] range (because there's no mantissa bits
to do any rounding there).
This is frequently hit (and probably only used there) when converting
fragment depth to depth format (d24s8 etc.) or otherwise dealing with
depth format.
v2: as spotted by Jose, get rid of extra type (src_type is already unsigned).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The code that checks if some texture target is valid for
glGetTexLevelParameter*() was not programmed to check for multisampling
proxy textures. This made it impossible(?) to use the proxy textures
for their intended purpose as glGetTexLevelParameter*() would just fail
on you.
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: mesa-stable@lists.freedesktop.org
glTexStorage*() functions make textures immutable. This carries on to
proxy textures. Error checking in texture storage functions prevents
proxy textures from working after first time because internally, they
became immutable.
This commit makes the error checking ignore the immutability flag when
working with proxy textures.
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: mesa-stable@lists.freedesktop.org
When working with the glTexStorage*() functions, the error checking
checks that a non-default (i.e., non-zero) texture is currently bound.
However, this check made glTexStorage*() functions fail with proxy
textures when the default texture is bound. Proxy textures do not care
about the current texture bindings so for them this check should not
be done.
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: mesa-stable@lists.freedesktop.org
The function _mesa_get_tex_max_num_levels() is supposed to calculate
the number of mipmap levels but it was not written to handle proxy
textures, at best returning a maximum of 1 mipmap level. Because of
this, at least glTexStorage*() calls would incorrectly fail when used
with proxy textures with more than one mipmap level.
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: mesa-stable@lists.freedesktop.org
Free all our temporary buffers in one place at the end of the
function. Fixes memory leak detected by Coverity.
Note: This is a candidate for the 9.x branches
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: José Fonseca <jfonseca@vmware.com>
The variable 'usage' was being used for two different things.
Sometimes for PIPE_USAGE_x and other times for PIPE_TRANSFER_x.
This renames usage to access when we're talking about PIPE_TRANSFER_x
flags. Plus, add a bunch of comments to remind us what's going on.
Also, use unsigned for PIPE_TRANSFER_x bitmask to be consistent with
other places. And add a missing const qualifier.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
We should probably be using map()/unmap() when accessing resource
data, but this is a little better.
v2: assert that the resource is not a display target, per Jose.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
This was never a problem since the Mesa state tracker always gives
us a user-space constant buffer with buffer_offset=0. But if another
state tracker ever gave us a "HW" constant buffer with non-zero
buffer_offset we'd mis-render.
Also, use the correct buffer size. And move an assertion to the
top of the function.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
The cap means _can_ accept user-space constant buffers; it doesn't
mean _only_ accepts user-space constant buffers.
v2: also update the PIPE_CAP_USER_VERTEX_BUFFERS and
PIPE_CAP_USER_INDEX_BUFFERS descriptions as well. Per Jose.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
vblank_mode is read by dri_util.c and falls under the "dri2" driver name,
which is not connected to the actual Mesa/Gallium driver in any way.
Reviewed-by: Brian Paul <brianp@vmware.com>
PP saves current states to cso_context and then util_blit_pixels does
the same. cso_context doesn't like that and the original state is not
correctly restored.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Brian Paul <brianp@vmware.com>
Surprisingly all drivers supporting MSAA can already do this (r300g and r600g
for sure) and I think Christoph wanted to have this feature for his Nouveau
drivers anyway.
We recently adopted a new convention that patches can be nominated for the
stable branch by including a line in the commit message as follows:
CC: mesa-stable@lists.freedesktop.org
This is a convenient syntax as "git send-email" will notice this line and
automatically copy the resulting patch email to the mesa-stable mailing list.
Here we extend the regular expression in the get-pick-list.sh script to also
notice this pattern, (as well as the traditional "NOTE: This patch is a
candidate..." form.
The linker_error() function sets prog->LinkStatus to false. There's
no reason for the caller of linker_error() to also do so.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We're now emitting this error from a point where we have easy access
to the name of the block that failed to match, so go ahead and include
that in the error message, as we do for intrastage interface block
mismatches.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This patch changes link_shaders() so that it sets prog->LinkStatus to
true when it starts, and then relies on linker_error() to set it to
false if a link failure occurs.
Previously, link_shaders() would set prog->LinkStatus to true halfway
through its execution; as a result, linker functions that executed
during the first half of link_shaders() would have to do their own
success/failure tracking; if they didn't, then calling linker_error()
would add an error message to the log, but not cause the link to fail.
Since it wasn't always obvious from looking at a linker function
whether it was called before or after link_shaders() set
prog->LinkStatus to true, this carried a high risk of bugs.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously we failed to link (which is correct), but we did not output
an error message, which could have been confusing for users.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
A comment in link_intrastage_shaders(), and an if-test that followed
it, seemed to indicate that link_uniform_blocks() would return a
negative value in the event of an error. But this is not the
case--all error checking has already been performed by
validate_intrastage_interface_blocks(), and link_uniform_blocks() can
only return unsigned values.
So get rid of the if-test and change the return type of
link_intrastage_shaders() to clarify that it can only return unsigned
values.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
To have non-static buffers in local memory, it is necessary to pass them
as arguments to the kernel.
For r600, the correct lds size must be set to the SQ_LDS_ALLOC register.
The correct size is the clover size plus the size reported by the
compiler.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Here is an updated patch with no line wrapping and respecting 80-column limit (for my changes).
v2: Tom Stellard
- Create global arguments for constant buffers so we don't break
r600g.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
For GLSL programs, enabledTargets can have more than one bit set. For
example, a shader that uses sampler2D and samplerCube uniforms will have
both TEXTURE_2D_BIT and TEXTURE_CUBE_BIT set.
The code that sets _ReallyEnabled already handles this, selecting the
"highest priority" texture target. We should simply use that.
Fixes new Piglit test incomplete-textures-of-multiple-types.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62698
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Rather than having to keep track of all the build systems and their respecitve
definition of the mesa version, use a single top file VERSION. Every build
system is responsible for reading/parsing the file and using it
v2:
* remove useless bulletpoint from the documentation, suggested by Matt
* "Androing is Linux. Use '/' in stead of '\'", spotted by Chad V
* use cleaner code to get the version in scons, suggested by Chad V
v3:
* ensure leading and trailing whitespace characters are stripped while parsing
* android: handle GNU shell commands approapriately
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
We already skip this for API_OPENGL_CORE; ES2+ is very similar.
The primary user of the swrast context is GL_SELECT and GL_FEEDBACK,
which have never existed in ES.
This saves approximately 18MB of memory in GLBenchmark 2.7 Egypt (ES2).
No regressions in es3conform on Ivybridge.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Certain extensions only add functionality to particular shader stages.
(For example, ARB_draw_instanced only adds variables to the vertex
shader stage.)
Previously, we only allowed such extensions to be enabled in the shader
stages where they're useful. However, I've never found any text which
mandates that behavior; in my opinion, you should be able to turn on
extensions in any shader stage, even if they have no effect.
Fixes Piglit tests glslparsertest/glsl2/draw_buffers-05.vert and
ARB_draw_instanced/preprocessor/feature-macro-enabled.frag.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=29185
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
EGL_KHR_surfaceless_context extension allows contexts to be made current
without a default winsys fbo. This extension specifies what ES 1.1 and
2.0 should do (the ES 3.0 spec already does).
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Commit 8c3d3622d9 introduced a new assertion,
but since it causes lp_test_conv failures remove it again and let's hope
we don't really hit bugs caused by the potentially bogus code (it is possible
the assert() caught some cases which work correctly too).
The second 'const' says that the pointer itself is constant. This in
unenforcible in C++, so GCC emits a warning (see) below for each of
these functions in every file that includes glsl_types.h. It's a lot of
warning spam.
../../../src/glsl/glsl_types.h:176:58: warning: type qualifiers ignored on function return type [-Wignored-qualifiers]
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
The majority of calls to _mesa_glsl_error(), _mesa_glsl_warning(), and
_mesa_glsl_parse_state::check_version() use a message that begins with
a lower case letter and ends without a period. This patch makes all
messages follow that convention.
Also, error/warning messages shouldn't end in '\n', since
_mesa_glsl_msg() automatically adds '\n' at the end of the message.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Just like the UNORM case we need to use round to nearest, not trunc.
(There's also another problem, we're using the formula for SNORM->float
which will produce a value below -1.0 for the most negative value which
according to both OpenGL and d3d10 would need clamping. However, no actual
failures have been observed due to that hence keep cheating on that.)
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
I am not able to find _any_ rounding behavior specified for OpenGL for
float to half-float conversions. However, it is specified for fp11/fp10
which suggests round to next finite value but round-to-zero would also
be allowed, but finite values must not be flushed to infinity in either
case.
Hence I believe it makes sense to do the same for half-floats too.
We could probably also use round-to-zero consistently, which is in fact
required by d3d10 (but it doesn't seem to matter much).
Does not match the mesa core function doing the same though (which is
saying it was built to match intel gpus which I don't believe for a
second as it would cause failures in d3d10, moreover the PRM (for
ivy bridge, not listed in older manuals) while not specifying rounding
behavior clearly states finite numbers are never flushed to infinity).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
unlike OpenGL, the texel swizzle is embedded in the instruction, so honor
that.
(Technically we now execute both the sampler_view swizzle and the
per-instruction swizzle but this should be quite ok.)
v2: add documentation note as it's not obvious.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
GL_EXT_framebuffer_object differs from GL_ARB_framebuffer_object in ways
that we can't and don't implement in core profiles. Exposing it is a
lie, so we shouldn't do that.
It's possible the some other GL_EXT_framebuffer_* extensions should be
disabled, but it's not quite so clear cut.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If any component used the ZERO or ONE swizzle, its corresponding member
in the `swizzle` array would never be initialized. We *mostly* got away
with this, except when that memory happened to contain a value that
clobbered another channel when combined using BRW_SWIZZLE4().
NOTE: This is a candidate for stable branches.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This fixes the dri2 opening to check if DRI_PRIME is set,
and picks the correct drm device path to open, this along
with a change to libvdpau allows vdpauinfo to work at least,
Martin Peres tested with nouveau, and there seems to be a
further issue with final displaying, it only works sometimes,
but this patch is at least necessary to help debug further.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Christian König <christian.koenig@amd.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=67283
Tested-by: Armin K. <krejzi@email.com>
This reverts commit c9db037dc9.
Eric believes that the viewport hacks are still necessary for EGL;
invalidate events aren't hooked up properly.
This commit caused a regression where EFL applications wouldn't show
anything other than window decorations; GLBenchmark also showed issues.
The revert had conflicts due to the intel_context/brw_context merge.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=66606
Cc: mesa-stable@lists.freedesktop.org
Wrapping every character of an email address in <em> looks bizarre, and
makes it impossible to read the text. Apparently Brian did this in 2003
to try and obfuscate email addresses and avoid spam.
Of course, mesa-*@lists.freedesktop.org are public mailing lists and
trivial to find on the internet. So obfuscation buys us nothing
(assuming the <em> technique even works at all, which I doubt).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
LOLed-at-by: Matt Turner :)
Bump major version, as the change to require explicit
xa_context_flush(), the addition of the handle-type parameter to
xa_surface_handle(), and change of surface to ref/unref will require a
minor change in DDX.
For freedreno DDX, we have to create the scanout GEM bo in a special way
(until we have our own KMS/DRM kernel driver.. and even then for
phones/tablets you probably need to use the android drivers if you don't
want to port the lcd panel driver support). The easiest way to handle
this is let the DDX create the scanout bo, and then create the xa
surface from that.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
TargetOptions::NoFramePointerElimNonLeaf was removed in LLVM 3.4
r187093.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
The is_loop_terminator() function was asserting that the following
kind of if statement could never occur:
if (...) { } else { }
(presumably based on the assumption that such an if statement would be
eliminated by previous optimization stages). But that isn't the
case--it's possible that previous optimization stages might simplify
more complex code down to this empty if statement, in which case it
won't be eliminated until the next time through the optimization loop.
So is_loop_terminator() needs to handle it. Fortunately it's easy to
handle--it's not a loop terminator because it does nothing.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=64330
CC: mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Two callers of brw_search_cache() weren't initializing that function's
inout_offset parameter: brw_blorp_const_color_params::get_wm_prog()
and brw_blorp_const_color_params::get_wm_prog().
That's a benign problem, since the only effect of not initializing
inout_offset prior to calling brw_search_cache() is that the bit
corresponding to cache_id in brw->state.dirty.cache may not be set
reliably. This is ok, since the cache_id's used by
brw_blorp_const_color_params::get_wm_prog() and
brw_blorp_blit_params::get_wm_prog() (BRW_BLORP_CONST_COLOR_PROG and
BRW_BLORP_BLIT_PROG, respectively) correspond to dirty bits that are
not used.
However, failing to initialize this parameter causes valgrind to
complain. So let's go ahead and fix it to reduce valgrind noise.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=66779
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The linker matches up variables in interface blocks according to their
block name and variable name. When support for interface block arrays
was added in commit d6863acb, we renamed variables appearing in
interface blocks so that their name included the array size. For
example, in a block like this:
out foo {
float bar
} baz[3];
The variable "bar" would get renamed to "bar[3]".
This is unnecessary, and leads to problems in supporting geometry
shaders, since geometry shaders require vertex shader outputs which
are non-arrays to be linked up to geometry shader inputs which are
arrays.
This patch makes the behaviour of interface block arrays the same as
simple non-array interface blocks; in both cases, the variables
contained within them are not renamed.
Reviewed-by: Matt Turner <mattst88@gmail.com>
vertex id has to be unaffected by the start index (i.e. when calling
draw arrays with start_index = 5, the first vertex_id has to still
be 0, not 5) and it has to be equal to the index when performing
indexed rendering (in which case it has to be unaffected by the
index bias). This fixes our behavior.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The instance id system value always starts at 0, even if the
specified start instance is larger than 0. Instead of implicitly
setting instance id to instance id plus start instance and then
having to subtract instance id when computing the buffer offsets
lets just set instance id to the proper instance id. This fixes
instance id computation and cleansup buffer offset computation.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
There are earlier returns for PIPE_FUNC_NEVER and PIPE_FUNC_ALWAYS. The
switch value of 'func' cannot be either of those values.
Fixes "Logically dead code" defects reported by Coverity.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Looks like a thinko, "Hey, constant buffers can be at most 64 KiB
in size, offset can't be larger." But it can, of course.
I think piglit lacks a test for UBO and BindBufferRange that
tests if it actually works.
Since disabling denorms in draw_vbo() we require the util_cpu_caps to be
initialized there. Hence add another util_cpu_detect() call in
draw_create_context() which should ensure this.
(There is another call in draw_get_option_use_llvm() which only gets called
with x86 (not x86_64) but calling it always there wouldn't help since it most
likely wouldn't get called when compiling without llvm, so leave it alone
there.)
This fixes https://bugs.freedesktop.org/show_bug.cgi?id=66806.
(Because util_cpu_caps wasn't initialized when first calling util_fpstate_get()
hence it returning zero, but it would later get initialized by rtasm translate
code hence when draw call returned it unmasked all exceptions by calling
util_fpstate_set(). This was happening only with DRAW_USE_LLVM=0 or not
compiling with llvm, otherwise the llvm init code was calling it on time too.)
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Zack Rusin <zackr@vmware.com>
Tested-by: Vinson Lee <vlee@freedesktop.org>
The codeword must be unsigned (otherwise will shift in 1's from above when
merging low/high parts so some texels decode wrong).
This also affects gallium's util/u_format_rgtc.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Zack Rusin <zackr@vmware.com>
For AVX it's not sufficient to only rely on the cpuid flags. If the CPU
supports these extensions, but the OS doesn't, issuing these insns will
trigger an undefined opcode exception.
In addition to the AVX cpuid bit we also need to:
* test cpuid for OSXSAVE support
* XGETBV to check if the OS saves/restores AVX regs on context switches
See "Detecting Availability and Support" at
http://software.intel.com/en-us/articles/introduction-to-intel-advanced-vector-extensions
Signed-off-by: Andre Heider <a.heider@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Test infs, zeros and nans with our arith functions to assure
correct/defined behavior with those values.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Same as log2_safe, which means that it can handle infs, 0s and
nans.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Only the floating point operarators change everything else
is the same so it makes sense to share the code.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
sin/cos for anything not finite is nan and everything else has
to be between [-1, 1].
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
That means that if input is:
* - less than zero (to and including -inf) then NaN will be returned
* - equal to zero (-denorm, -0, +0 or +denorm), then -inf will be returned
* - +infinity, then +infinity will be returned
* - NaN, then NaN will be returned
It's a separate function because the checks are a little bit costly
and in most cases are likely unnecessary.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
exp(0) has to be exactly 1, exp(-inf) has to be 0, exp(inf) has
to be inf and exp(nan) has to be nan, this fixes all of those
cases.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Both D3D10 and OpenCL say that if one the inputs is nan then
the other should be returned. To preserve that behavior
the patch fixes both the sse and the non-sse paths in both
functions and adds helper code for handling nans.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
It's not the first time that, due to missing build dependencies or
incomplete commits, we end up with a broken libGL.so that's missing
symbols, causing all tests to fail catastrophically.
Instead try to catch this sort of issues earlier.
Almost all of the functions between the ARB and the EXT share the same
GLX protocol because the functionality is, essentially, identical.
However, there are some differences between the extensions:
- In the ARB extension, names must come from glGenBuffers.
- In the ARB extension, framebuffer objects are not shared (but they are
in the EXT).
For these reasons, glBindFramebuffer and glBindRenderbuffer have
different GLX protocol opcodes than their EXT counterparts. Currently
these functions alias each other in the dispatch table. This makes it
impossible to be truly spec conformant.
This patch enables fixing the conformance issue by splitting
glBindFramebuffer / glBindFramebufferEXT and glBindRenderbuffer /
glBindRenderbufferEXT into separate dispatch table entries.
Patches will be available shortly to:
- Fix the conformance issue.
- Stop advertising the EXT in OpenGL 3.1 (or core profiles).
HOWEVER, this does represent a compatibility break between the loader
(libGL or the Xserver GLX module) and the driver. Mesa drivers compiled
without this change will request a single dispatch table entry for
glBindFramebuffer and glBindFramebufferEXT. Since the updated loader
has different entries for each, the request will fail, and the driver
will die in a fire.
Drivers built with the change should continue to load fine on loaders
without the change. In this case, the driver will separately ask for
entries for glBindFramebuffer and glBindFramebufferEXT, and the loader
will tell it the same location. Since the loader in the server's GLX
module is not (yet) updated, this should not be a problem. We also do
not advertise the ARB extension from the server, so, again, this should
not be a problem for the server.
HOWEVER, this means that DRI1 drivers (remember mga_dri.so?) will no
longer load with libGL build hereafter. That means this patch will need
to be back ported to the 8.0 branch.
v2 (idr): Added missing GLX protocol opcodes for the EXT functions and
corrected the opcodes for the ARB functions. Updated GLX indirect_api
unit test and dispatch sanity unit test.
Signed-off-by: Tomasz Lis <tomasz.lis@intel.com>
Signed-off-by: Bartosz Zawistowski <bartosz.l.zawistowski@intel.com>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> [v1]
Any driver that supports GLSL 1.30 should be able to handle this
extension, as it's entirely implemented in the GLSL compiler.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
While all the work is in the shared GLSL compiler, this extension
requires GLSL 1.30, which is currently only supported on Gen6+.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
layout(binding = N) is equivalent to calling glUniformBlockBinding(_,N).
This currently only handles the GLSL 1.40 case - no interface names, no
arrays of uniform blocks. This is okay since we don't yet support GLSL
1.50, and don't expose ARB_shading_language_420pack in ES 3.0.
v2: Move into the other function; use binding, not constant_value.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Paul Berry <stereotype441@gmail.com>
Without an instance name, there is no ir_variable representing the
actual uniform block declaration. When the linker goes to set uniform
initializers, it only sees the members as ir_variables; never the block.
So, unfortunately, the members need to know about the binding.
There has to be a better way to do this.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Normally, uniform array variables are initialized by array literals.
That is, val->type->array_elements >= storage->array_elements.
However, samplers are different. Consider a declaration such as:
layout(binding = 5) uniform sampler2D[3];
The initializer value is a single integer (5), while the storage has 3
array elements. The proper behavior here is to increment one for each
element; they should be initialized to 5, 6, and 7.
This patch introduces new code for sampler types which handles both
arrays of samplers and single samplers correctly.
v2: Move into the other function; use binding, not constant_value.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Paul Berry <stereotype441@gmail.com>
Sampler uniforms and uniform blocks do not have a var->constant_value.
Instead, they have an integer var->binding value.
This makes extending set_uniform_initializer() somewhat problematic: it
assumes that there is an ir_constant * which represents the initializer,
and that it's safe to dereference that without any NULL checks.
Instead, this patch creates an analogous function for binding
qualifiers, and calls one or the other as appropriate.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
There is existing code to handle sampler uniform initializers. Prior to
GLSL 4.20's "binding" keyword, sampler uniforms don't have initializers
at all, so this is somewhat surprising.
The existing code is broken into two cases: one where both the variable and
initializer are arrays, and a second where the variable and initializer are
scalars.
The first case should never occur, since array-typed initializers do not
exist for sampler uniforms. Even with the binding keyword, the
initializer is a single integer which represents the texture unit to use
for the first array element.
The second is apparently used for some fixed-function code.
v2: Rewrite the commit message - suggested by Paul.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
All compilation units need to agree on the binding point, if they
specify one at all.
v2: Use binding, not constant_value.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Rather than creating a new "binding" field in ir_variable, we reuse
constant_value since the linker code for handling uniform initializers
uses that.
Since UBOs and samplers can't otherwise have initializers/constant
values, there shouldn't be a conflict.
v2: Propagate the new binding variable around too.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
These are not used yet, but they exist and are copied appropriately.
v2: Add an explicit "int binding" variable rather than reusing
constant_value, as suggested by Paul Berry.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
The "binding" qualifier only applies to UBO blocks and samplers, along
with arrays of those types. (It would also apply to images and atomic
counters, but we don't support those yet.)
This also validates sampler bindings against the maximum number of
texture units, and UBO bindings against the number of uniform buffer
binding points.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Nothing actually uses this yet.
v2: Remove >= 0 checks. They'll be handled in later validation.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
The idea of this code is to disallow layout(...) sections with the
deprecated "varying" or "attribute" keywords, unless a few select
extensions are enabled which allow a more relaxed check.
In order to detect a layout(...) section, the code checks for a number
of layout qualifiers. However, it failed to check for all of them,
which could lead to layout(...) not being detected when it should.
By replacing this with has_layout(), we properly check for all layout
qualifiers, and also guarantees that new qualifiers added in the future
will not be forgotten.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
These were already semi-relaxed, since the storage qualifier rule
already skipped when 420pack was enabled.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The GL_ARB_shading_language_420pack extension/GLSL 4.20 split centroid
off into a new category, "auxiliary storage qualifiers," and allow these
to be placed anywhere in the series. So we have to stop recognizing
"centroid in"/"centroid out"/"centroid varying" in the grammar and get
more creative.
The same approach used before works here, too.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This is necessary for the parser to be able to accept precision
qualifiers not immediately adjacent to the type, such as "const highp
inout float foo".
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Currently, we store precision in ast_type_specifier, rather than
ast_type_qualifier. This works because precision is the last qualifier,
and immediately adjacent to the type.
Default precision statements (such as "precision highp float") are
represented as ast_type_specifier objects, with a boolean to indicate
that it's a default precision statement rather than an ordinary type.
ast_type_specifier::precision will be moving to ast_type_qualifier soon,
in order to support arbitrary qualifier ordering. However, we still
need to store a "this is a precision statement" flag /and/ the default
precision in ast_type_specifier.
This patch changes the boolean into a new field, default_precision.
If default_precision != ast_precision_none, it's a precision statement
with the specified precision. Otherwise, it's an ordinary type.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This makes the complier accept both "const in" and "in const".
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This will make it easy to support both "const in" and "in const", as
required by GLSL 4.20/ARB_shading_language_420pack.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
"Parameter direction qualifier" is a new term I invented just now; it's
not part of any GLSL specification.
This paves the way handling multiple parameter qualifiers, in any order,
as required by GLSL 4.20/ARB_shading_language_420pack.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Most of ast_type_qualifier is simply a bitfield (represented as a
structure of unsigned:1 bits in a union with an unsigned). However, it
also contains ARB_explicit_attrib_location's location/index fields.
In the past, this has worked by simply returning the layout qualifier's
ast_type_qualifier and merging the other bits into it. However, that's
not obvious until you break it by switching $1 and $2.
Using merge_qualifier() copies them appropriately, and also properly
overrides layout qualifiers. It also checks for duplicate qualifiers,
which renders some of the checks in the previous patch unnecessary.
However, those checks provide better error messages, such as "Duplicate
interpolation qualifier", rather than just "duplicate qualifier".
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This makes the compiler accept invariant, storage, layout, and
interpolation qualifiers in any order when ARB_shading_language_420pack
is enabled.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The GL_ARB_shading_language_420pack extension/GLSL 4.20 allow qualifiers
to be specified in (basically) any order. In order to support this, we
can't hardcode the ordering restrictions in the grammar.
This patch alters the grammar to accept invariant, storage, layout, and
interpolation qualifiers in any order, but adds C code to enforce the
ordering requirements. In the 420pack case, we should be able to simply
skip the error checks.
As a bonus, this also lets us generate decent error messages, rather
than Bison's awful "unexpected TOKEN" errors.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
"Auxiliary storage qualifiers" is the new term given to "centroid",
"patch", and "sample" by GLSL 4.20/GL_ARB_shading_language_420pack.
Even though we only support "centroid", it's useful to add this now
so that all auxiliary storage qualifiers get handled in the right places
once they're eventually supported.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This makes it easy to check if any storage qualifiers are set.
"centroid" is not considered a storage qualifier. In the old language
rules, you can't specify "centroid" by itself; it's always "centroid
in", "centroid out", or "centroid varying." So one of the other storage
qualifiers will always be set; there's no need to specifically check for
centroid.
In the new 4.20 rules, centroid is an auxiliary storage qualifier, not a
storage qualifier.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This makes it easy to check if any layout qualifiers are set.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
All four URB packets need to be programmed together in order for the GPU
state to be valid. Putting them in separate BEGIN..ADVANCE blocks is
risky: if we're nearing the end of a batch, the batch could be flushed
inbetween two of the commands, causing the URB programming to be split
into two batchbuffers.
This -might- be okay with hardware contexts, but it offers no advantages
over keeping them together, and has a potential for hangs.
Putting them into a single BEGIN..ADVANCE block ensures they'll be kept
in the same batch, which seems wise.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Change from "not cacheable" to "cacheable" in L3.
Do so for the draw upload path and blorp.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Change from "not cacheable" to "cacheable" in L3.
Do so for the draw upload path and blorp.
In blorp, change only the PS packet, because the VS packet is disabled.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Change from "not cacheable" to "cacheable" in L3.
Do so for the draw upload path and blorp.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Change from "not cacheable" to "cacheable" in L3.
Do so for the draw upload path and blorp.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Some render types, such as floating-point, aren't valid with EGL.
Return NULL in those cases to drop them.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Mark __DRI_ATTRIB_FLOAT_MODE as deprecated, and introduce new flags to
__DRI_ATTRIB_RENDER_TYPE for float modes. Both signed float
(fbconfig_float) and unsigned (packed_float) are introduced. The old
attribute should be set for both float modes.
v2 (idr): Require that the render mode from the DRI attributes matches the
render mode of the config exactly. This is the behavior of the old code.
Signed-off-by: Tomasz Lis <tomasz.lis@intel.com>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Make sure that init_fbconfig_for_chooser sets correct value of
drawableType for visual configs and fbconfigs.
Signed-off-by: Tomasz Lis <tomasz.lis@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Correctly handle the value of renderType in GLX context. In case of the
value being incorrect, context creation fails.
v2 (idr): indirect_create_context is just a memory allocator, so don't
validate the GLX_RENDER_TYPE there. Fixes regressions in several
GLX_ARB_create_context piglit tests.
Signed-off-by: Tomasz Lis <tomasz.lis@intel.com>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
v2 (idr): Open-code the check for GLX_RENDER_TYPE.
dri2_convert_glx_attribs can't be called from here because that function
only exists in direct-rendering builds. Also add a stub version of
indirect_create_context_attribs to tests/fake_glx_screen.cpp to prevent
'make check' regressions.
Signed-off-by: Tomasz Lis <tomasz.lis@intel.com>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Set the correct values of renderType in glXCreateContext and
init_fbconfig_for_chooser.
Signed-off-by: Tomasz Lis <tomasz.lis@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Correctly handle the value of renderType and drawableType in
fbconfig. Modify glXInitializeVisualConfigFromTags to read the parameter
value, or detect it if it's not there.
v2 (idr): If there was no GLX_RENDER_TYPE property, set the type based
purely on the rgbMode as the previous code did. It is impossible for
floatMode to be set at this point, so we can't have a float config. The
previous code regressed a large number of piglit GLX tests because those
tests don't set GLX_RENDER_TYPE in the glXChooseConfig call. Restoring
the old behavior for that case fixes those regressions.
Also fix handling of GLX_DONT_CARE for GLX_RENDER_TYPE. Fixes a
regression in glx-dont-care-mask.
Signed-off-by: Tomasz Lis <tomasz.lis@intel.com>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Make sure that context creation routines are provided with the value of
RENDER_TYPE retrieved from GLX attribs.
v2 (idr): Minor formatting changes. Change type of
dri2_convert_glx_attribs render_type parameter to uint32_t to silence
some GCC warnings.
Signed-off-by: Tomasz Lis <tomasz.lis@intel.com>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Make sure that renderType property value is stored in GLX context while
it's being created. Further patches will be provided to make the value
correspond to fbconfig's renderType.
v2 (idr): Move a hunk from the next patch to this patch to prevent a
build break.
Signed-off-by: Tomasz Lis <tomasz.lis@intel.com>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The L3 controls are identical on all platforms, but LLC differs:
- Ivybridge has a "cache in LLC" flag
- Baytrail has no LLC, but instead has a snoop bit:
"data accesses in this page must be snooped in the CPU caches."
- Haswell has writeback/uncached flags for LLC and eLLC (eDRAM).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
The current gen_matypes logic assumes that the host compiler will produce
information that is useful for the target compiler. Unfortunately, this
is not the case whenever cross-compiling.
When we detect that we're cross-compiling and using GCC, use the target
compiler to produce assembly from the gen_matypes.c source, then process
it with a shell script to create a usable header. This is similar to how
the linux kernel creates its asm-offsets.c file.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
This is required in case a wrapper or symlink is used. This patch
has also been sent upstream, awaiting moderation.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Andreas Oberritter <obi@saftware.de>
Adds the dependencies of builtin_compiler as sources when cross
compiling instead of using libtool to share compilation with src/glsl.
The builtin_compiler executable is built for the host when cross
compiling so it doesn't make sense to share compilation with src/glsl
built for the target in this case.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44618
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Jonathan Liu <net147@gmail.com>
Usually with fixed point renderbuffers clamping is done as part of conversion.
However, since we blend in float format, we essentially skip all conversion
steps pre-blend but since this is still a fixed point renderbuffer we must
still clamp the inputs in this case. Makes no difference for piglit though.
Obviously we could skip this if fragment color clamping is enabled, but a)
this is deprecated in OpenGL (d3d never had it) and b) we don't support it
natively so it gets baked into the shader.
Also add some comment about logic ops being broken for srgb, luckily no test
tries to do that as there's no easy fix...
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Zack Rusin <zackr@vmware.com>
We were fixing up the blend factor to ZERO, however this only works correctly
with fixed point render buffers where the input values are clamped to 0/1
(because src_alpha_saturate is min(As, 1-Ad) so can be negative with unclamped
inputs). Haven't seen any failure anywhere due to that with fixed point SNORM
buffers (which clamp inputs to -1/1) but it should apply there as well (snorm
blending is rare, even opengl 4.3 doesn't require snorm rendertargets at all,
d3d10 requires them but they are not blendable).
Doesn't look like piglit hits this though (some internal testing hits the
float case at least). (With legacy OpenGL we could theoretically still use the
fixup to zero if the fragment color clamp is enabled, but we can't detect that
easily since we don't support native clamping hence it gets baked into the
shader.)
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Zack Rusin <zackr@vmware.com>
Adds H.264 and MPEG2 codec support via VP2, using firmware from the
blob. Acceleration is supported at the bitstream level for H.264 and
IDCT level for MPEG2.
Known issues:
- H.264 interlaced doesn't render properly
- H.264 shows very occasional artifacts on a small fraction of videos
- MPEG2 + VDPAU shows frequent but small artifacts, which aren't there
when using XvMC on the same videos
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Use grep -w instead of the empty string escape sequences
which are less portable. Makes the grep tests
function as intended on OpenBSD.
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Vinson Lee <vlee@freedesktop.org>
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Fixes this build error on OpenBSD 5.3.
In file included from ../../src/mesa/main/ff_fragment_shader.cpp:53:
./../glsl/ir_optimization.h:64: error: comma at end of enumerator list
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Fixes these build errors on OpenBSD 5.3.
In file included from ../../src/mesa/main/errors.h:47,
from ../../src/mesa/main/imports.h:41,
from ../../src/mesa/main/ff_fragment_shader.cpp:32:
../../src/mesa/main/mtypes.h:3286: error: comma at end of enumerator list
../../src/mesa/main/mtypes.h:3296: error: comma at end of enumerator list
../../src/mesa/main/mtypes.h:3303: error: comma at end of enumerator list
../../src/mesa/main/mtypes.h:3356: error: comma at end of enumerator list
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Use "or" instead of "add" (this is a classic select sequence, which at
least newer llvm versions can actually recognize (3.2+?), and the "add"
might prevent that - and we really don't want an add instead of an or with
avx if it isn't recognized (even without avx logic ops might be cheaper)).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Instead of just ignoring the srgb/linear conversions, simply call the
corresponding conversion functions, for all of pack/unpack/fetch,
both for float and unorm8 versions (though some don't make a whole
lot of sense, i.e. unorm8/unorm8 srgb/linear combinations).
Refactored some functions a bit so don't have to duplicate all the code
(there's a slight change for packing dxt1_rgb, as there will now be
always 4 components initialized and sent to the external compression
function so the same code can be used for all, the quite horrid and
ad-hoc interface (by now) should always have worked with that).
Fixes llvmpipe/softpipe piglit texwrap GL_EXT_texture_sRGB-s3tc.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Scheduler/register allocator in r600-sb was developed and optimized
on evergreen (VLIW-5) hardware, so currently it's not optimal for
VLIW-4 chips.
This patch should improve performance on cayman gpus due to better alu
packing, but also it tends to increase register usage, so overall positive
effect on performance has to be proven by real benchmarks yet.
Some results with bfgminer kernel on cayman:
source bytecode: 60 gprs, 3905 alu groups,
sbcl before the patch: 45 gprs, 4088 alu groups,
sbcl with this patch: 55 gprs, 3474 alu groups.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Ex-scalar instructions that became multislot on cayman do replicate result
to all channels - handle them similar to DOT4.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Actually PS doesn't make sense for cayman and isn't even mentioned in
cayman docs, but llvm backend currently uses it in bytecode and, assuming
that hw seems to be mostly ok with it, this will allow sb to parse such
source bytecode correctly.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Every function but the above four uses explicitly sized types for their
src and dst arguments. Even fetch_rgba_{s,u}int follows the convention.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Marek Olšák <maraeo@gmail.com>
MCJIT is the only supported LLVM JIT on AArch64 and ARM (the regular
JIT has bit-rotted badly on ARM and doesn't exist on AArch64.)
Signed-off-by: Kyle McMartin <kyle@redhat.com>
Signed-off-by: Dave Airlie <airlied@gmail.com>
Historically, we indented grammar production rules with a single 8-space
tab, but code inside of blocks used Mesa's 3-space indents.
This meant when editing code, you had to use an 8-space tab for the
first level of indentation, and 3-spaces after that. Unless you
specifically configure your editor to understand this, it will get the
indentation wrong on every single line you touch, which quickly devolves
into a colossal waste of time.
It's also inconsistent with every other file in the entire project.
This patch removes all tabs and moves to a consistent 3-space indent.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
When working on a parser, it's very easy to accidentally introduce
new shift/reduce conflicts. Failing the build guarantees they'll
be noticed and fixed.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
The single remaining shift/reduce conflict was the classic ELSE problem:
292 selection_rest_statement: statement . ELSE statement
293 | statement .
ELSE shift, and go to state 479
ELSE [reduce using rule 293 (selection_rest_statement)]
$default reduce using rule 293 (selection_rest_statement)
The correct behavior here is to shift, which is what happens by default.
However, resolving it explicitly will make it possible to fail the build
on new errors, making them much easier to detect.
The classic way to solve this is to use right associativity:
http://www.gnu.org/software/bison/manual/html_node/Non-Operators.html
Since there is no THEN token in GLSL, we need to fake one. %right THEN
creates a new terminal symbol; the %prec directive says to use the
precedence of that terminal.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
opt_return_value was not initialized if mode != ast_return.
Fixes "Uninitialized pointer field" defect reported by Coverity.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
This should fix missing symbols in a osmesa built against shared glapi
osmesa build. All opengl exports were missing that are defined in the
static glapi, so link against both to fix this.
This is a candidate for the stable series.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=47824
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
We always emit U,V,R coordinates for this message, but the sampler gets
very angry if we pass garbage in the R coordinate for at least some
texture formats.
Fill the remaining coordinates with zero instead.
Fixes broken rendering on GM45 in Source games, and in VDrift.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=65236
NOTE: This is a candidate for stable branches.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
brw_tex_layout.c sets up the align_w/h fields, and has all the
appropriate spec references already.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
The Sandybridge code had a citation for the range of the "Maximum Number
of Threads" field, and the Ivybridge code just mentioned the "BSpec" in
general. That's documented in the obvious place, so people can find it
without a spec reference.
The real value of the comment is to say "we tried zero, and it exploded,
so program it to a valid number even if pixel shading is off."
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Unfortunately, the workaround text never made it into the Sandybridge
PRM, so we still have to refer to the BSpec.
It also wasn't obvious why we needed this workaround at all, since we
don't currently do VS passthrough - but BLORP can turn off the VS.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Sadly, the Ivybridge PRM can't be cited, as it is missing the relevant
text for some reason. However, the Sandybridge PRM has the text Chad
originally quoted, and the modern BSpec has the same text.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
I cut and pasted these comments from the Gen4 code during Ivybridge
enabling, and didn't understand what they meant at the time.
The data cache is NOT the same as the sampler cache on Ivybridge.
The sampler cache has L1 and L2 caches in addition to the L3 cache,
while data port messages to the "data cache" hit L3 directly.
This means that the sampler domain is technically wrong, but we stopped
caring about read/write domains quite a while ago. The kernel just
flushes all the caches at the end of each batchbuffer, and our render to
texture code flushes the sampler caches when necessary.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Presumably, this comment exists to justify the usage of
I915_GEM_DOMAIN_SAMPLER for this relocation. At one point, this was
necessary to ensure that the right flushing was done to keep caches
coherent. These days, the kernel just flushes everything, so I don't
think it matters.
Still, the comment is interesting, so leave it in place.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
The Ivybridge PRM adds new SFIDs and lists them in a different volume
than Sandybridge, so it's worth adding a reference.
I also removed the BSpec reference, as the section it referred to
was moved somewhere, and I couldn't find it. This leaves one Haswell
SFID without a citation, but we can add one once the PRMs are out.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Just use the new conversion functions to do the work. The way it's plugged
in into the blend code is quite hacktastic but follows all the same hacks
as used by packed float format already.
Only support 4x8bit srgb formats (rgba/rgbx plus swizzle), 24bit formats never
worked anyway in the blend code and are thus disabled, and I don't think anyone
is interested in L8/L8A8. Would need even more hacks otherwise.
Unless I'm missing something, this is the last feature except MSAA needed for
OpenGL 3.0, and for OpenGL 3.1 as well I believe.
v2: prettify a bit, use separate function for packing.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The splitting of a draw call into several draw commands was broken, because
the split sometimes took place in the middle of a primitive. The splitting
was supposed to be dealing with the case when there are more indices than
the maximum size of a CS.
This commit throws that code away and uses a real index buffer instead.
https://bugs.freedesktop.org/show_bug.cgi?id=66558
Cc: mesa-stable@lists.freedesktop.org
_mesa_ast_set_aggregate_type walks through declarations initialized with
C-style aggregate initializers and stops when it runs out of LHS
declarations or RHS expressions.
In the example
vec4 v = {{{1, 2, 3, 4}}};
_mesa_ast_set_aggregate_type would not recurse into the subexpressions
(since vec4s do not contain types that can be initialized with an
aggregate initializer) to set their <constructor_type>s. Later in ::hir
we would dereference the NULL pointer and segfault.
If <constructor_type> is NULL in ::hir we know that the LHS and RHS
were unbalanced and the code is illegal.
Arrays, structs, and matrices were unaffected.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Previously, we had a separate function for setting up the built-in
variables for each combination of shader stage and GLSL version
(e.g. generate_110_vs_variables to generate the built-in variables for
GLSL 1.10 vertex shaders). The functions called each other in ad-hoc
ways, leading to unexpected inconsistencies (for example,
generate_120_fs_variables was called for GLSL versions 1.20 and above,
but generate_130_fs_variables was called only for GLSL version 1.30).
In addition, it led to a lot of code duplication, since many varyings
had to be duplicated in both the FS and VS code paths. With the
advent of geometry shaders (and later, tessellation control and
tessellation evaluation shaders), this code duplication was going to
get a lot worse.
So this patch reworks things so that instead of having a separate
function for each shader type and GLSL version, we have a function for
constants, one for uniforms, one for varyings, and one for the special
variables that are specific to each shader type.
In addition, we use a class, builtin_variable_generator, to keep track
of the instruction exec_list, the GLSL parse state, commonly-used
types, and a few other variables, so that we don't have to pass them
around as function arguments. This makes the code a lot more compact.
Where it was feasible to do so without introducing compilation errors,
I've also gone ahead and introduced the variables needed for
{ARB,EXT}_geometry_shader4 style geometry shaders. This patch takes
care of everything except the GS variable gl_VerticesIn, the FS
variable gl_PrimitiveID, and GLSL 1.50 style geometry shader inputs
(using the gl_in interface block). Those remaining features will be
added later.
I've also made a slight nomenclature change: previously we used the
word "deprecated" to refer to variables which are marked in GLSL 1.40
as requiring the ARB_compatibility extension, and are marked in GLSL
1.50 onward as requiring the compatibilty profile. This was
misleading, since not all deprecated variables require the
compatibility profile (for example gl_FragData and gl_FragColor, which
have been deprecated since GLSL 1.30, but do not require the
compatibility profile until GLSL 4.20). We now consistently use the
word "compatibility" to refer to these variables.
This patch doesn't introduce any functional changes (since geometry
shaders haven't been enabled yet).
Reviewed-by: Matt Turner <mattst88@gmail.com>
v2: Rename "typ" -> "type". Add blank line between inline functions
and declarations in builtin_variable_generator class. Use the
standard comment "/* FALLTHROUGH */" for compatibility with static
code analysis tools.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In certain rare cases (such as those involving dereference of a
literal constant array of structs),
flatten_named_interface_blocks_declarations's rvalue visitor may be
invoked on an ir_dereference_record whose variable_referenced() method
returns NULL.
Check for this case to avoid a segfault.
Prevents crashes in piglit tests
{vs,fs}-deref-literal-array-of-structs.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Vertex shader inputs are not allowed to be arrays until GLSL 1.50. We
were accidentally enabling them for GLSL 1.40 (although we haven't
written any tests for them, so it's not clear whether they actually
work).
NOTE: although this is a simple bug fix, it probably isn't sensible to
cherry-pick it to stable release branches, since its only effect is to
cause incorrectly-written shaders to fail to compile.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes undefined results if a back color is written, but the
corresponding front color is not, and only backfacing primitives are
drawn. Results are still undefined if a frontfacing primitive is drawn,
but that's OK.
The other reasonable way to fix this would have been to just pick
the one color slot that was populated, but that dilutes the value of
the tests.
On Gen6+, the fixed function clipper and triangle setup already take
care of this.
Fixes 11 piglits:
spec/glsl-1.10/execution/interpolation/interpolation-none-gl_Back*Color-*
NOTE: This is a candidate for stable branches.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Some lame compilers can't do exp2f() and as far as I can tell they can't do
exp2() (with doubles) neither so instead of providing some workaround for
that (wouldn't actually be too bad just replace with pow) and since it is
used with a constant only just use the precalculated constant.
When only the offset to the index buffer is changed, we can skip the
3DSTATE_INDEX_BUFFER if we always use 0 for the offset, and add
(offset / index_size) to Start Vertex Location in 3DPRIMITIVE.
srgb-to-linear is using 3rd degree polynomial for now which should be _just_
good enough. Reverse is using some rational polynomials and is quite accurate,
though not hooked into llvmpipe's blend code yet and hence unused (untested).
Using a table might also be an option (for srgb-to-linear especially).
This does not enable any new features yet because EXT_texture_srgb was already
supported via util_format fallbacks, but performance was lacking probably due
to the external function call (the table used by the util_format_srgb code may
not be all that much slower on its own).
Some performance figures (taken from modified gloss, replaced both base and
sphere texture to use GL_SRGB instead of GL_RGB, measured on 1Ghz Sandy Bridge,
the numbers aren't terribly accurate):
normal gloss, aos, 8-wide: 47 fps
normal gloss, aos, 4-wide: 48 fps
normal gloss, forced to soa, 8-wide: 48 fps
normal gloss, forced to soa, 4-wide: 47 fps
patched gloss, old code, soa, 8-wide: 21 fps
patched gloss, old code, soa, 4-wide: 24 fps
patched gloss, new code, soa, 8-wide: 41 fps
patched gloss, new code, soa, 4-wide: 38 fps
So there's a performance hit but it seems acceptable, certainly better
than using the fallback.
Note the new code only works for 4x8bit srgb formats, others (L8/L8A8) will
continue to use the old util_format fallback, because I can't be bothered
to write code for formats noone uses anyway (as decoding is done as part of
lp_build_unpack_rgba_soa which can only handle block type width of 32).
Compressed srgb formats should get their own path though eventually (it is
going to be expensive in any case, first decompress, then convert).
No piglit regressions.
v2: use lp_build_polynomial instead of ad-hoc polynomial construction, also
since keeping both linear to srgb functions for now make sure both are
compiled (since they share quite some code just integrate into the same
function).
v3: formatting fixes and bugfix in the complicated (disabled) linear-to-srgb
path.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
We had to disable fast rsqrt before because it wasn't precise enough etc.
However in situations when we know we're not going to need more precision
we can still use a fast rsqrt (which can be several times faster than
the quite expensive sqrt). Hence introduce a new helper which does exactly
that - it is probably not useful calling it in some situations if there's
no fast rsqrt available so make it queryable if it's available too.
v2: use fast_rsqrt consistently instead of rsqrt_fast, fix indentation,
let rsqrt use fast_rsqrt.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
gl_TexCoord was deprecated in GLSL 1.30. In GLSL 1.40 it was marked
as ARB_compatibility-only, and in GLSL 1.50 and above it was marked as
only appearing in the compatibility profile. It has never appeared in
GLSL ES.
However, Mesa erroneously included it in all desktop versions of GLSL,
even versions 1.40 and 1.50 (which do not currently support the
compatibility profile). This patch makes gl_TexCoord available in the
compatibility profile (and GLSL versions 1.30 and prior) only.
NOTE: although this is a simple bug fix, it probably isn't sensible to
cherry-pick it to stable release branches, since its only effect is to
cause incorrectly-written shaders to fail to compile.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The compiler does not know that ilo_3d_pipeline_estimate_size() is pure and
can be eliminated in a release build in gen6_pipeline_end(). Move the call
into the assert().
The AC_CHECK_FILE macro can't be used for cross compiling as it will
result in "error: cannot check for file existence when cross compiling".
Replace it with the AS_IF macro.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Signed-off-by: Jonathan Liu <net147@gmail.com>
GLSL spec says that rsq is undefined for src<=0, but the D3D10
spec says it needs to be a NaN, so lets stop taking an absolute
value of the source which completely breaks that behavior. For
the gl program we can simply insert an extra abs instrunction
which produces the desired behavior there.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
TGSI_OPCODE_KIL and KILP had confusing names. The former was conditional
kill (if any src component < 0). The later was unconditional kill.
At one time KILP was supposed to work with NV-style condition
codes/predicates but we never had that in TGSI.
This patch renames both opcodes:
TGSI_OPCODE_KIL -> KILL_IF (kill if src.xyzw < 0)
TGSI_OPCODE_KILP -> KILL (unconditional kill)
Note: I didn't just transpose the opcode names to help ensure that I
didn't miss updating any code anywhere.
I believe I've updated all the relevant code and comments but I'm
not 100% sure that some drivers had this right in the first place.
For example, the radeon driver might have llvm.AMDGPU.kill and
llvm.AMDGPU.kilp mixed up. Driver authors should review their code.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
KILP is really unconditional fragment kill.
We've had KIL and KILP transposed forever. I'll fix that next.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The code happened to work in the past since the (scalar) src args
effectively always have a swizzle of .xxxx, .yyyy, .zzzz, or .wwww so
whether you grab the X or Y component doesn't really matter. Just
fixing the code to make it look right.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
This update fixes the problem with duplicated typedefs for
GLclampf and GLclampd in the previous version.
It also changes some parameter types for glDebugMessageCallbackARB()
and glTransformFeedbackVaryingsEXT().
Note we should someday update the glapi-gen code so that it
understands void pointer parameters. Currently, the Python code
only understands "GLvoid *" but not "void *". Luckily, the
compilers don't seem to complain about mixing GLvoid and void.
If the size argument isn't a multiple of four, we would have read/
copied uninitialized memory.
Fixes an issue reported by Myles C. Maxfield <myles.maxfield@gmail.com>
They are a non-standard GCC extension that's not widely supported by
other C/C++ compilers.
Use a dynamic array instead.
Trivial. Should fix the MSVC build.
Required by GL_ARB_shading_language_420pack.
Parts based on work done by Todd Previte and Ken Graunke, implementing
basic support for C-style initializers of arrays.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Will be used in a later commit to differentiate between a structure type
declaration and a variable declaration of a struct type. I.e., the
difference between
struct S { float x; }; (is_declaration = true)
and
S s; (is_declaration = false)
Also note that is_declaration = true for
struct S { float x; } s;
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Will be used in a future commit. An ast_type_specifier is stored (rather
than an ast_struct_specifier) with the idea that we may have more
general uses for this in the future. struct names are prefixed with
'#ast.' to avoid collisions with the glsl_types in the symbol table.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The code float a[2] = float[2]( 3.4, 4.2, 5.0 ); previously generated
this:
error: array constructor must have at least 2 parameters
when in fact it requires exactly two.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romainck@intel.com>
libglslcore.la and libglcpp.la that are built with builtin_compiler are also
linked to by drivers not using libdricore. Since there is no public symbol in
them, it is better to mark all symbols hidden.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We mark ARB_uniform_buffer_object as enabled under ES 3 since it
contains that functionality, which tricked the compiler into tokenizing
"row_major".
Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This patch adds support for some math optimizations that are generally
considered unsafe, that's why they are currently disabled for compute
shaders.
GL requirements are less strict, so they are enabled for
for GL shaders by default. In case of any issues with
applications that rely on higher precision than guaranteed by GL,
'sbsafemath' option in R600_DEBUG allows to disable them.
v2 - always set proper src vector size for transformed instructions
- check for clamp modifier in the expr_handler::fold_assoc
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
From BSpec: 3D-Media-GPGPU Engine > 3D Pipeline > Pixel >
Pixel Backend > MCS Buffer for Render Target(s) [DevIVB+]:
[DevHSW:GT3]: Clear rectangle must be aligned to two times
the number of pixels in the table shown below...
Observed no piglit, gles3conform regressions with this patch.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=65744
It seems __builtin_ia32_ldmxcsr is only available on gcc and only when
-msse is used. xmmintrin.h/pmmintrin.h provide portable intrinsics, but
these too are only available with gcc when -msse/-msse3 are set.
scons build always sets -msse on x86 builds, but autotools doesn't seem
to.
We could try to get this working on gcc x86 without -msse by emitting
assembly, but I believe that in this day and age we really should be
building Mesa with -msse and -msse2.
The D3D10 spec is very explicit about treatment of denorm floats and
the behavior is exactly the same for them as it would be for -0 or
+0. This makes our shading code match that behavior, since OpenGL
doesn't care and on a few cpu's it's faster (worst case the same).
Float16 conversions will likely break but we'll fix them in a follow
up commit.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Most functions no longer use intel_context, so this patch additionally
removes the local "intel" variables to avoid compiler warnings.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Chris Forbes <chrisf@ijw.co.nz>
Acked-by: Paul Berry <stereotype441@gmail.com>
Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
Things worked out in the past because both brw and intel share the same
memory address (by virtue of intel being the first member of brw).
However, brw is what actually gets rzalloc'd (brw_context.c:285), so
freeing that seems safer and more obvious.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Chris Forbes <chrisf@ijw.co.nz>
Acked-by: Paul Berry <stereotype441@gmail.com>
Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
This makes brw_context available in every function that used
intel_context. This makes it possible to start migrating fields from
intel_context to brw_context.
Surprisingly, this actually removes some code, as functions that use
OUT_BATCH don't need to declare "intel"; they just use "brw."
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Chris Forbes <chrisf@ijw.co.nz>
Acked-by: Paul Berry <stereotype441@gmail.com>
Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
These files have forward declarations for intel_context. This makes
brw_context available in the same places without further #include
monkeying.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Chris Forbes <chrisf@ijw.co.nz>
Acked-by: Paul Berry <stereotype441@gmail.com>
Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
brw_context.h includes intel_context.h, but additionally makes the
brw_context structure available. Switching this allows us to start
using brw_context in more places.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Chris Forbes <chrisf@ijw.co.nz>
Acked-by: Paul Berry <stereotype441@gmail.com>
Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
brwCreateContext() has a lot of random things to do. Factoring out the
part that initializes ctx->Const values and shader compiler options
makes the main function a bit easier to read.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Chris Forbes <chrisf@ijw.co.nz>
Acked-by: Paul Berry <stereotype441@gmail.com>
Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
Technically, needs_ff_sync was set on Gen5+, but it was only consulted
in the clipper threads and quad/lineloop decomposition code, which are
both Gen4-5 only. So in reality it only identified Ironlake.
The named flag doesn't really clarify things, and seems like overkill.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Chris Forbes <chrisf@ijw.co.nz>
Acked-by: Paul Berry <stereotype441@gmail.com>
Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
perf_debug() doesn't add a newline for you; without this, all the
INTEL_DEBUG=perf output was jumbled together.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Resolves the following gcc warning
opt_flip_matrices.cpp:84:32: warning: unused variable 'deref'
v2: keep the variable, but wrap it in a ifndef NDEBUG block
(suggested by Ian)
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Resolves the following gcc warnings
warning: 'iface_type_name' may be used uninitialized in this function
warning: 'var_mode' may be used uninitialized in this function
Note: The variables are initialised to UNKNOWN and ir_var_auto
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
driver->ProgramStringNotify is only called for ARB programs, fixed
function vertex programs, and ir_to_mesa (which isn't used by the i965
back-end). Therefore, even after geometry shaders are added,
brwProgramStringNotify should only ever be called with a target of
GL_VERTEX_PROGRAM_ARB or GL_FRAGMENT_PROGRAM_ARB.
This patch adds an assertion to clarify that.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It's done automatically for vertex buffers, but not for constant buffers,
textures, and colorbuffers.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
This should increase performance if constant uploads are done with the CP DMA,
because only the cache that needs to be flushed is flushed.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
also flushing any cache in evergreen_emit_cs_shader seems to be superfluous
(we don't flush caches when changing the other shaders either)
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
1. flush SH with read caches
2. add flag for DB flushes
3. add flag for CB flushes
v2: flush all CBs, remove redundant emit_state variable.
v3: Marek: also set the new flags in r600_context_flush, the CP dma functions,
and texture_barrier, and rename them
Signed-off-by: Marek Olšák <maraeo@gmail.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
The winsys should do this, because it measures how much time we spend
in buffer_map doing synchronization, which can be viewed with the gallium
HUD.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
It was wrong, because the offset shouldn't be applied to MSAA depth buffers.
This small cleanup should prevent such issues in the future.
This fixes a lockup in "piglit/fbo-depthstencil default_fb -samples=n".
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Include src0 alpha in the RT write message when using MRT, so it is used
for the alpha test instead of the normal per-RT alpha value.
Fixes broken rendering in Dota2 under Wine [FDO #62647].
No Piglit regressions on Ivybridge.
V2: reuse (and simplify) existing sample_alpha_to_coverage flag in
the FS key, rather than adding another redundant one.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewd-by: Paul Berry <stereotype441@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62647
NOTE: This is a candidate for the stable branches.
The logic for choosing number of lods was bogus.
(The code should ultimately handle the case of only one lod even with multiple
quads but currently can't.)
It is perfectly valid for the swizzle to be bigger than 2. For example the
texel offsets could be
SAMPLE ..., IMM[0].zzz
What is not correct is for chan_index to be bigger than 2.
Trivial.
Shaders need a lot of work still. Basic stuff generally works, so this
is basically just fine for gnome-shell, OA etc at this point.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
The assertion was always broken but the code unused until enabling the
per-element lod code. Fixes piglit texelFetch vs isampler1D and similar
tests (only run with GL 3.0 version override).
d3d10 requires per-pixel lod calculations for explicit lod, lod bias and
explicit derivatives, and we should probably do it for OpenGL too - at least
if they are used from vertex or geometry shaders (so doesn't apply to lod
bias) this doesn't just affect neighboring pixels.
Some code was already there to handle this so fix it up and enable it.
There will no doubt be a performance hit unfortunately, we could do better
if we'd knew we had a real vector shift instruction (with variable shift
count) but this requires AVX2 on x86 (or a AMD Bulldozer family cpu).
Don't do anything for lod bias and explicit derivatives yet, though
no special magic should be needed for them neither.
Likewise, the size query is still broken just the same.
v2: Use information if lod is a (broadcast) scalar or not. The idea would be
to base this on the actual value, for now just pretend it's a scalar in fs
and not a scalar otherwise (so, per-pixel lod is only used in gs/vs but same
code is generated for fs as before).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The semantics for overflow detection are a bit tricky with
indexed rendering. If the base index in the elements array
overflows, then the index of the first element should be used,
if the index with bias overflows then it should be treated
like a normal overflow. Also overflows need to be checked for
in all paths that either the bias, or the starting index location.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The comparison, incorrectly, was greater-than-or-equal to
elt max.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The texture alignment unit functions are called from brw_tex_layout.c,
so it makes sense to put them there. Since the only caller of
intel_get_texture_alignment_unit() is in brw_tex_layout.c, it could be
made into a static function. However, this patch instead simply folds
it into the caller, as it's only two lines anyway.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
intel_miptree_create_layout() calls intel_get_texture_alignment_unit()
and then immediately calls brw_miptree_layout(). There are no other
callers.
intel_get_texture_alignment_unit() populates the miptree's alignment
unit fields, which are used by brw_miptree_layout() to determine where
to place each miplevel. Since brw_miptree_layout() needs those to be
present, it makes sense to have it initialize them as the first step.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
The driver is compiled in C99 mode, so this is not a problem. It's
slighlty tidier.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This uses Doxygen style for the file comments, and generally makes it
more consistent with the rest of the driver.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Now that both 2DArray and Cube layouts are taken care of by helper
functions, it's easy to just call the right function for each
generation. This is a little cleaner than falling through.
This also reworks the comments. Referencing "Volume 1" of the BSpec
isn't very helpful, since that's only available inside Intel, and it
doesn't even use volume numbers. Also, "Ironlake...finally" sounds a
bit strange considering that almost all hardware uses the 2D array
approach. At this point, Gen4 is the only special case.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
maxBatchSize was only ever initialized to BATCH_SZ, and a few places
used BATCH_SZ directly anyway.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
brw_annotate_aub() is the only implementation of this function, so it
makes sense to just call it directly.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
brw_debug_batch() is the only implementation of this function, so it
makes sense to just call it directly.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
brw_render_target_supported() is the only implementation of this
function, so it makes sense to just call it directly.
Rather than adding an #include of brw_wm.h, this patch moves the
prototype to brw_context.h. Prototypes seem to be in rather arbitrary
places at the moment, and either place seems as good as the other.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
brw_is_hiz_depth_format() is the only implementation of this function,
so it makes sense to just call it directly.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
These functions translate GLenum comparison operations into the hardware
enumerations. They should never be passed something other than a GL
comparison operator, or something is very broken.
Assertions seem more appropriate than fprintf.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Both intel_context.h and brw_defines.h have #defines for comparison
functions, stencil ops, blending logic ops, and blending factors.
They're exactly the same values, so it makes sense to pick one.
brw_defines.h is the logical place for this kind of stuff, so this patch
converts intel_state.c to use the set defined there.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
The __DRI_USE_INVALIDATE extension was added in May 11th, 2010 by commit
4258e3a2e1. At this point, it's unlikely that anyone's using the
right mix of new and old components to hit this path. Deleting it
removes an untested code path and cleans up the driver a bit.
Cc: Kristian Høgsberg <krh@bitplanet.net>
Cc: Keith Packard <keithp@keithp.com>
This wasn't called from anywhere; presumably it was used to examine
brw_regs when debugging shader assembly. However, it prints registers
in a different notation than brw_disasm.c which everyone is used
to...which means I doubt anyone will want to use it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Having a header file for a single prototype seems rather excessive.
Plus, the actual function is in brw_clear.c, not intel_clear.c, so
there isn't even the .c/.h filename symmetry one might expect.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
This was only used for BOs backed by system memory on i915. With that
gone, there's nothing that even sets source to non-zero, so this is
purely dead code.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Commit cf31a19300 removed support for BOs
backed by system memory, as it was only useful for i915. However, it
removed a little too much code: intel_bufferobj_buffer() used to call
intel_bufferobj_alloc_buffer(), and after that commit, it didn't.
This led to NULL pointer dereferences in several test cases, such as
es3conform's transform_feedback_state_variables test.
This commit restores the allocation, preserving the original behavior.
It may not be the cleanest approach, but tidying should come later.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=66432
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
This eliminates built-in varyings such as gl_Color, gl_SecondaryColor,
gl_TexCoord, and gl_FogFragCoord if they are unused by the next stage or
not written at all (e.g. gl_TexCoord elements). The gl_TexCoord array is
broken down into separate vec4s if needed.
v2: - use a switch statement in varying_info_visitor::visit(ir_variable*)
- use snprintf
- disable the optimization for GLES2
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This ensures that inter-shader outputs and inputs are properly eliminated
across 3 or more shader stages. The behavior is unchanged with 2 or less
shader stages.
For example, elimination of unused FS inputs causes elimination of matching
GS outputs, which causes elimination of the GS inputs that were needed for
evaluation of the eliminated GS outputs, which causes elimination of
matching VS outputs. An unused FS input is all that's needed to trigger
this chain reaction.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
See my explanation in mtypes.h.
v2: don't do this in gallium
v3: also updated the comment at the gl_shader_type definition
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This patch adds texture() for isamplerCubeArray and usamplerCubeArray,
which were entirely missing.
It also makes texture() with a LOD bias fragment shader specific. The
main GLSL specification explicitly says that texturing with LOD bias
should not be allowed for vertex shaders.
Affects Piglit's ARB_texture_cube_map_array/compiler/tex_bias-01.vert.
which tries to use bias in a vertex shader. Currently, it expects this
to pass (so this patch regresses the test), but I've sent a patch to
reverse the expected behavior (so this patch would fix the updated test):
http://lists.freedesktop.org/archives/piglit/2013-June/006123.html
NOTE: This is a candidate for stable branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
If reg->Register.Indirect is true then the immediate is not truly a
constant LLVM expression.
There is no performance regression in using LLVMBuildBitCast, as it will
fallback to LLVMConstBitCast internally when the argument is a constant.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Zack Rusin <zackr@vmware.com>
Current implementation of ext_framebuffer_multisample_blit_scaled in
i965/blorp uses nearest filtering for multisample scaled blits. Using
nearest filtering produces blocky artifacts and negates the benefits
of MSAA. That is the reason why extension was not enabled on i965.
This patch implements the bilinear filtering of samples in blorp engine.
Images generated with this patch are free from blocky artifacts and show
big improvement in visual quality.
Observed no piglit and gles3 regressions.
V3:
- Algorithm used for filtering assumes a rectangular grid of samples
roughly corresponding to sample locations.
- Test the boundary conditions on the edges of texture.
V4:
- Clip texcoords and use conditional MOVs.
- Send texture dimensions as push constants.
- Remove the optimization in case of scaled multisample blits.
V5:
- Move mcs_fetch() inside the 'for' loop after computing pixel coordinates.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Acked-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
We were incorrectly computing the buffer offset when using the
instances. The buffer offset is always equal to:
start_instance * stride + (instance_num / instance_divisor) *
stride
We were completely ignoring the start instance quite
often producing instances that completely wrong, e.g. if
start instance = 5, instance divisor = 2, then on the first
iteration it should be:
5 * stride, not (5/2) * stride as we'd have currently, and if
start instance = 1, instance divisor = 3, then on the first
iteration it should be:
1 * stride, not 0 as we'd have.
This fixes it and adjusts all the code to the changes.
Signed-off-by: Zack Rusin <zackr@vmware.com>
clipper invocations are computed earlier (of course
before the emittion) so this code was adding bogus
numbers to already computed clipper invocations.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Integers could easily overflow is the starting instance
was large enough. Instead of letting bogus counts through
set the instance to max if it overflown and let our
regular buffer overflow computation handle it.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Our buffer overflow arithmetic was susceptible to integer
overflows which was the buffer overflow logic to break.
Lets use the llvm overflow intrinsics to check for integer
overflows while computing the stride/needed buffer size.
Signed-off-by: Zack Rusin <zackr@vmware.com>
We weren't taking into account the size of element
that is to be fetched, which meant that it was possible
to overflow the buffer reads if the stride was very
close to the end of the buffer, e.g. stride = 3, buffer
size = 4, and the element to be read = 4. This should
be properly detected as an overflow.
Signed-off-by: Zack Rusin <zackr@vmware.com>
In the generic Unix case use the "unsigned long" type instead of 32-bit
integers so that the type sizes are consistant on 64-bit machines between X11
and not-X11.
Signed-off-by: Ross Burton <ross.burton@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
eglplatform.h defaults to X11 on Unix unless told otherwise, so if we're doing a
build without any X11 support tell it so that we don't try including headers
that don't exist.
Also set GL_PC_FLAGS so that the definition is in egl.pc, so that applications
using EGL don't try to pull in X11 headers on systems where EGL was configured
without X11 support.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=64959
Signed-off-by: Ross Burton <ross.burton@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The only reason the checks existed were paranoia, when I first
wrote the code I wasn't sure it was correct. Now that I am,
the asserts triggered when XBMC was dropping frames, so remove it.
NOTE: This is a candidate for the 9.1 branch.
The assembly parser can be used to load r300 assembly dumps
and run them through any of the r300 compiler passes.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Allows MSAA colorbuffers, which have a CMASK automatically and don't
need any further special handling, to be fast cleared. Instead
of clearing the buffer, set the clear color and the CMASK to the
cleared state.
Fast clear is used only when all bound colorbuffers fulfill certain
conditions: a CMASK is required, we have to be able to create a clear
color value for the format and the texture mustn't contain multiple
images. Technically, it should be possible to support array textures
and cubemaps if all images are attached to the framebuffer,
but this does not appear to be common.
v2: fix fast clear check
v3: Marek: - disable fast clear with 128-bit formats, which are unsupported
- set tex->dirty_level_mask in r600_clear, so that the driver knows
the resource must be decompressed/expanded
- return early from r600_clear if there's nothing else to do
Signed-off-by: Marek Olšák <maraeo@gmail.com>
b04a295a4a removed seemingly unnecessary
code in get_query. Turns out this code could in fact be reached - while
timestamps are always binned, if there are no bins (which happens if fb
size is 0) then the rasterization query code filling this in is still
never executed.
So fix this up by filling in some timestamp, but do it at EndQuery time
not GetQuery time which should be more appropriate.
Makes piglit arb_timer_query-timestamp-get happy again.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
While i915 does have hardware contexts in hardware, we don't expect there
to ever be SW support for it (given that support hasn't even made it back
to gen5 or gen4).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Every driver left in Mesa that enables one also enables the other.
There's no reason to let it be optional.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
In Mesa, this extension is implemented purely in software. Drivers may
*optionally* provide optimized paths. If a driver enables,
GL_ARB_texture_multisample, it gets GL_ARB_texture_storage_multisample
for free.
NOTE: This has the side effect of enabling the extension in Gallium
drivers that enable GL_ARB_texture_multisample.
v2 (Ken): Still prevent multisample texture targets in TexParameter for
implementations that don't support multisampling.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
In Mesa, this extension is implemented purely in software. Drivers may
*optionally* provide optimized paths.
NOTE: This has the side effect of enabling the extension in the radeon,
r200, and nouveau drivers.
v2: Minor whitespace tidying (suggested by Brian).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
This extension just provides some of the most basic software framework
for GLSL. Without GL_ARB_vertex_shader or GL_ARB_fragment_shader,
applications still cannot use GLSL. There's no value in
conditionalizing support for this extension.
NOTE: This has the side effect of enabling the extension in the radeon,
r200, and nouveau drivers.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
This extension just provides some of the most basic software framework
for GLSL. Without GL_ARB_vertex_shader or GL_ARB_fragment_shader,
applications still cannot use GLSL. There's no value in
conditionalizing support for this extension.
NOTE: This has the side effect of enabling the extension in the radeon,
r200, and nouveau drivers.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Every driver left in Mesa enables this extension all the time. There's
no reason to let it be optional.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Every driver left in Mesa enables this extension all the time. There's
no reason to let it be optional.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Every driver left in Mesa enables this extension all the time. There's
no reason to let it be optional.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Every driver left in Mesa enables this extension all the time. There's
no reason to let it be optional.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Commit bab755a made the implementation a no-op, and it was only ever
enabled by software rasterizers.
v2: Move the spec into docs/specs/OLD since it's now obsolete
(squashed patch from Andreas Boll)
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
_mesa_enable_sw_extensions enables all the extensions (and more) that
the others enable. Also, don't duplicate the DXTn checks.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
This copy of the source file is only used for GEN >= 4, so extensions
that are enabled for GEN >= 4 are always enabled.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This copy of the source file is only used for GEN >= 4, so extensions
that are enabled for GEN >= 3 are always enabled.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This copy of the source file is only used for GEN <= 3, so remove the
dead code.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
brw_wm_surface_state.c has gotten rather large and unwieldy. At this
point, it consists of two separate portions:
1. Surface format code
This includes the giant table of surface formats and what features
they support on each generation, as well as the code to translate
between Mesa formats and hardware formats.
This is used across all generations.
2. Binding table (SURFACE_STATE) related code.
This is the code to generate SURFACE_STATE entries for renderbuffers,
textures, transform feedback buffers, constant buffers, and so on, as
well as the code to assemble them into binding tables.
This is only used on Gen4-6; gen7_surface_state.c has Gen7+ code.
Since the two are logically separate, and one is reused on every
generation while the other is not, it makes a lot of sense to split
them out. It should also make finding code easier.
No code is changed by this patch. I simply copied the file then deleted
portions of both.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
On CIK, DB switches back to using per-surface tiling
parameters rather than the tile index used on SI.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Assertions are not sufficient to check for null pointers as they don't
show up in release builds. So, return ZeroVec/dummyReg instead of NULL
pointer in get_{src,dst}_register_pointer(). This should calm down the
warnings from static analysis tool.
Note: This is a candidate for the 9.1 branch.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
If allocation fails in intel_miptree_create_layout(), don't proceed to
dereference the miptree. Return an early NULL.
Fixes static analysis error reported by Klocwork.
Note: This is a candidate for the 9.1 branch.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
This was just ignored (unless for some reason like unfilled polys draw was
handling this).
I'm not convinced of that code, putting the float for the clamp in the key
isn't really a good idea. Then again the other floats for depth bias are
already in there too anyway (should probably have a jit_context for the
setup function), so this is just a quick fix.
Also, the "minimum resolvable depth difference" used isn't really right as it
should be calculated according to the z values of the current primitive
and not be a constant (of course, this only makes a difference for float
depth buffers), at least for d3d10, so depth biasing is still not quite right.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
If there are queries active the opaque optimization reseting the bin needs to
be disabled.
(Not really tested since the bug was discovered by code inspection not
an actual test failure.)
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This patch fixes segfaults observed when enabling the post processing
features. When the format is not supported, or a texture cannot be
created, the code must gracefully handle failure and report the error to
the calling code for proper failure handling.
To accomplish this the following changes were made to the filters.h
prototypes:
- bool return for pp_init_func
- Added pp_free_func for filter specific resource destruction
Fixes segfaults from backtraces:
* util_destroy_blit
pp_free
* u_transfer_inline_write_vtbl
pp_jimenezmlaa_init_run
pp_init
This patch also uses tgsi_alloc_tokens to allocate temporary tokens in
pp_tgsi_to_state, instead of allocating the array on the stack. This
fixes the following stack corruption segfault in pp_run.c:
* _int_free
aaline_delete_fs_state
pp_free
Bug Number: 1021843
Reviewed-by: Brian Paul <brianp@vmware.com>
OpenGL doesn't support this but d3d10 does.
It is a bit of a pain as it is necessary to keep track of queries
still active at the end of a scene, which is also why I cheat a bit
and limit the amount of simultaneously active queries to (arbitrary)
16 (simplifies things because don't have to deal with a real list
that way). I can't think of a reason why you'd really want large
numbers of overlapping/nested queries so it is hopefully fine.
(This only affects queries which need to be binned.)
v2: don't copy remainder of array when deleting an entry simply replace
the deleted entry with the last one (order doesn't matter).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Previously lp_rast_begin_query commands were always inserted into each bin,
and re-issued if the scene was restarted, while lp_rast_end_query commands
were executed for each still active query at the end of tile rasterization.
Also, the ps_invocations and vis_counter were set to zero when the respective
command was encountered.
This however cannot work for multiple queries of the same type (note that
occlusion counter and occlusion predicate while different type were also
affected).
So, change the logic to always set the ps_invocations and vis_counter to zero
at the start of tile rasterization, and then use "start" and "end" per-thread
query values when encountering the begin/end query commands instead, which
should work for multiple queries of the same type. This also means queries do
not have to be reissued in a new scene, however they still need to be finished
at end of tile rasterization, so a list of queries still active at the end of
a scene needs to be maintained.
Also while here don't bin the queries which don't do anything in rasterization.
(This change does not actually handle multiple queries of the same type yet,
as the list of active queries is just a simple fixed array and setup can still
only have one query active per type.)
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Now that i915's forked off, they don't need to live in a shared directory.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Chad Versace <chad.versace@linux.intel.com>
Acked-by: Adam Jackson <ajax@redhat.com>
(and I hear second hand that idr is OK with it, too)
Of this 15000 lines of code in intel/, we've identified 4000 lines that
are trivially unnecessary for i915, and another 1000 that are pointless for
i965, and expect to find more as time goes on. Split the i915 driver off,
so that we can continue active development on i965 without worrying about
breaking i915.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Chad Versace <chad.versace@linux.intel.com>
Acked-by: Adam Jackson <ajax@redhat.com>
(and I hear second hand that idr is OK with it, too)
This has the (intended!) side effect that vertex shader inputs and
fragment shader outputs will appear in the IR in the same order that
they appeared in the shader code. This results in the locations being
assigned in the declared order. Many (arguably buggy) applications
depend on this behavior, and it matches what nearly all other drivers
do.
Fixes the (new) piglit test attrib-assignments.
NOTE: This is a candidate for stable release branches (and requires the
previous commit to prevent a regression in OpenGL ES 2.0 conformance
test stencil_plane_operation).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
The checks to determine when the data can be uploaded in an interleaved
fashion can be tricked by certain data layouts. For example,
float data[...];
glVertexAttribPointer(0, 4, GL_FLOAT, GL_FALSE, 16, &data[0]);
glVertexAttribPointer(1, 4, GL_FLOAT, GL_FALSE, 16, &data[4]);
glDrawArrays(GL_POINTS, 0, 1);
will hit the interleaved path with an incorrect size (16 bytes instead
of 32 bytes). As a result, the data for attribute 1 never gets
uploaded. The single element draw case is the only sensible case I can
think of for non-interleaved-that-looks-like-interleaved data, but there
may be others as well.
To fix this, make sure that the end of the element in the array being
checked is within the stride "window." Previously the code would check
that the begining of the element was within the window.
NOTE: This is a candidate for stable branches.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
The 20130624 version of glext.h changed this to match the
glMultiDrawElements() function which already had the extra const
qualifier.
Fixes warnings/errors that seem to vary from one compiler to the next.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
vec4_visitor::generate_code() switches on vec4_instruction::opcode and
calls into the brw_eu_emit.c layer to generate code for some of them.
It then has a default case which calls generate_vec4_instruction() to
handle the rest...which switches on opcode and handles the rest of the
cases.
The split apparently is that generate_code() handles the actual hardware
opcodes (BRW_OPCODE_*) while generate_vec4_instruction() handles the
virtual opcodes (SHADER_OPCODE_* and VS_OPCODE_*). But this looks
fairly arbitrary, and it makes more sense to combine the two switches.
This patch moves the cases from generate_code() into the helper function
so that generate_code() isn't as large.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Commit 526ffdfc03 attempted to generalize
the source register type assertions to allow D and UD. However, the
src1 and src2 assertions actually checked src0.type against D and UD due
to a copy and paste bug.
It also began setting the source and destination register types based on
dest.type, ignoring src0/src1/src2.type completely. BFE and BFI2 may
actually pass mixed D/UD types and expect them to be ignored, which is
arguably a bit sloppy, but not too crazy either.
This patch simply removes the source register assertions as those values
aren't used anyway. It also clarifies the comment above the block that
sets the register types.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Commit 526ffdfc03 relaxed the type
assertions in brw_alu3 to allow D/UD types (required by BFE and BFI2).
This lost us the strict type checking for MAD and LRP, which require
all four types to be float.
This patch adds a new ALU3F wrapper which checks these once again.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Over the last few years, the compiler has grown to support 7 different
language versions and 6 extensions that add new built-in types. With
more and more features being added, some of our core code has devolved
into an unmaintainable spaghetti of sorts.
A few problems with the old code:
1. Built-in types are declared...where exactly?
The types in builtin_types.h were organized in arrays by the language
version or extension they were introduced in. It's factored out to
avoid duplicates---every type only exists in one array. But that
means that sampler1D is declared in 110, sampler2D is in core types,
sampler3D is a unique global not in a list...and so on.
2. Spaghetti call-chains with weird parameters:
generate_300ES_types calls generate_130_types which calls
generate_120_types and generate_EXT_texture_array_types, which calls
generate_110_types, which calls generate_100ES_types...and more
Except that ES doesn't want 1D types, so we have a skip_1d parameter.
add_deprecated also falls into this category.
3. Missing type accessors.
Common types have convenience pointers (like glsl_type::vec4_type),
but others may not be accessible at all without a symbol table (for
example, sampler types).
4. Global variable declarations in a header file?
#include "builtin_types.h" in two C++ files would break the build.
The new code addresses these problems. All built-in types are declared
together in a single table, independent of when they were introduced.
The macro that declares a new built-in type also creates a convenience
pointer, so every type is available and it won't get out of sync.
The code to populate a symbol table with the appropriate types for a
particular language version and set of extensions is now a single
table-driven function. The table lists the type name and GL/ES versions
when it was introduced (similar to how the lexer handles reserved
words). A single loop adds types based on the language version.
Explicit extension checks then add additional types. If they were
already added based on the language version, glsl_symbol_table simply
ignores the request to add them a second time, meaning we don't need
to worry about duplicates and can simply list types where they belong.
v2: Mark uvecs and shadow samplers as ES3 only, and 1DArrayShadow as
unsupported in ES entirely. Add a touch more doxygen.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Using a random glsl_type convenience pointer as an array is a really bad
idea, for all the reasons mentioned in the previous commit.
The new glsl_type::bvec() function is simpler anyway.
Prevents breakage in the next commit.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Currently, vector types are linked together closely: the glsl_type
objects for float, vec2, vec3, and vec4 are all elements of the same
array, in that exact order. This makes it possible to obtain vector
types via pointer arithmetic on the scalar type's convenience pointer.
For example, float_type + (3 - 1) = vec3.
However, relying on this is extremely fragile. There's no particular
reason the underlying type objects need to be stored in an array. They
could be individual class members, possibly with padding between them.
Then the pointer arithmetic would break, and we'd get bad pointers to
non-heap allocated data, causing subtle breakage that can't be detected
by valgrind. Cue insanity.
Or someone could simply reorder the type variables, causing us to get
the wrong type entirely. Also cue insanity.
Writing this explicitly is much safer. With the new helper functions,
it's a bit less code even.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This patch introduces new functions to quickly grab a pointer to a
vector type. For example:
glsl_type::bvec(4) returns glsl_type::bvec4_type
glsl_type::ivec(3) returns glsl_type::ivec3_type
glsl_type::uvec(2) returns glsl_type::uvec2_type
glsl_type::vec(1) returns glsl_type::float_type
This is less wordy than glsl_type::get_instance(GLSL_TYPE_BOOL, 4, 1),
which can help avoid extra word wrapping.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
In glapi_priv.h we always need the typedef for the GLclampx type
since GL_OES_fixed_point is now defined in glext.h but the
GLclampx type is not. GLclampx is not used by anything in glext.h
but we need it for GL ES dispatch.
This is a huge patch because the structure of the file has been
changed.
The following extensions are new, however:
GL_AMD_interleaved_elements
GL_AMD_shader_trinary_minmax
GL_IBM_static_data
GL_INTEL_map_texture
GL_NV_compute_program5
GL_NV_deep_texture3D
GL_NV_draw_texture
GL_NV_shader_atomic_counters
GL_NV_shader_storage_buffer_object
GL_NVX_conditional_render
GL_OES_byte_coordinates
GL_OES_compressed_paletted_texture
GL_OES_fixed_point
GL_OES_query_matrix
GL_OES_single_precision
And these extensions were removed:
GL_FfdMaskSGIX
GL_INGR_palette_buffer
GL_INTEL_texture_scissor
GL_SGI_depth_pass_instrument
GL_SGIX_fog_scale
GL_SGIX_impact_pixel_texture
GL_SGIX_texture_select
Reviewed-by: José Fonseca <jfonseca@vmware.com>
This prevents trampling beyond the end of the command stream during flushes.
NOTE: This is a candidate for the stable branches.
Reported-by: Christoph Bumiller <christoph.bumiller@speed.at>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
max_threads cannot be greater than 28. It is either 21 or 28.
Fixes "Logically dead code" defect reported by Coverity.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
We want to access the user buffer, if available, when primitive restart is
enabled and the restart index/primitive type is not natively supported.
And since we are handling index buffer uploads in the driver with this change,
we can also work around misalignment of index buffer offsets.
Rename ilo_finalize_states() to ilo_finalize_3d_states(), and bind
pipe_draw_info to the context when it is called. This saves us from having to
pass pipe_draw_info around in several places.
The polygon offset math used for triangles by the WM is "OffsetUnits * 2 *
MRD + OffsetFactor * m" where 'MRD' is the minimum resolvable difference
for the depth buffer (~1/(1<<16) or ~1/(1<<24)), 'm' is the approximated
slope from the GL spec, and '2' is this magic number from the original
i965 code dump that we deviate from the GL spec by because "it makes glean
work" (except that it doesn't, because of some hilarity with 0.5 *
approximately 2.0 != 1.0. go glean!).
This clipper code for unfilled polygons, on the other hand, was doing
"OffsetUnits * garbage + OffsetFactor * m", where garbage was MRD in the
case of 16-bit depth visual (regardless the FBO's depth resolution), or
128 * MRD for 24-bit depth visual.
This change just makes the unfilled polygons behavior match the WM's
filled polygons behavior.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
There's no reason to care about the window system visual's depth for
handling polygon offset in an FBO, and it could only lead to pain.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The separate function for the fallback checks wasn't particularly
clarifying things, so I put the improved checks in the caller. (Note that
the dropped _mesa_update_state() had already happened once at the start of
the caller)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I think we've all added instrumentation at one point or another to see
what's being called in blorp. Now you can quickly get output like:
Testing glCopyPixels(depth).
intel_hiz_exec depth clear to mt 0x16d9160 level 0 layer 0
intel_hiz_exec depth resolve to mt 0x16d9160 level 0 layer 0
intel_hiz_exec hiz ambiguate to mt 0x16d9160 level 0 layer 0
intel_hiz_exec depth resolve to mt 0x16d9160 level 0 layer 0
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Commit 551c991606 tried to avoid spilling
registers that were trivially colorable. But since we do optimistic
coloring, the top of the stack also contains nodes that are not trivially
colorable, so we need to consider them for spilling (since they are some
of our best candidates).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=58384
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=63674
NOTE: This is a candidate for the 9.1 branch.
It should never happen, but it does, and at this point, you're going to
_mesa_problem() and abort() (unless it's just in precompile). Give the
developer something to look at.
Some shells does not set variables sequentially in a statement i.e. "a=X
b=${a}" won't set "b" to "X" but empty value.
This patch introduce ";" to make sure "mo" is set properly before "lang"
assignment.
Bugzilla: https://bugs.gentoo.org/show_bug.cgi?id=471302
The last piece of code with an effect was flagging _NEW_BUFFERS. Only,
that is already flagged from everything that calls this function: Mesa GL
state updates flag it before even calling down into the driver, and the
calls from the DRI2 window system framebuffer update path end up flagging
it as part of the ResizeBuffers() hook.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The computed fields are updated appropriately as part of the normal draw
call path due to _NEW_BUFFERS being set.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
For winsys FBOs, the bounds are appropriately updated immediately upon
_mesa_resize_framebuffer(). For user FBOs, they're updated as part of the
normal draw path state update due to _NEW_BUFFERS having been flagged.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Of the places noting a _NEW_DEPTH dependency, all were already checking
for _NEW_BUFFERS if appropriate.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2/3 packets depending on Stencil._Enabled already checked for
_NEW_BUFFERS, so just add _NEW_BUFFERS to the remaining one.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The viewport (ctx->Viewport._WindowMap) doesn't change with drawable size
changes, and we update scissor (ctx->DrawBuffer->_Xmin and friends) on
_NEW_BUFFERS in things like brw_sf_state.c.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Things like brw_sf.c that need to know about orientation are already
recomputing on _NEW_BUFFERS.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
_mesa_resize_framebuffer(), the default value of the ResizeBuffers hook,
already checks for a window system framebuffer and walks the renderbuffers
calling AllocStorage().
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
_mesa_resize_framebuffer(), the default value of the ResizeBuffers hook,
already checks for a window system framebuffer and walks the renderbuffers
calling AllocStorage().
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This existed to tell the core not to call GetBufferSize, except that even
if you didn't set it nothing happened because nobody had a GetBufferSize.
v2: Remove two more instances of setting the field (from Brian)
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Only the GDI driver set it to non-NULL any more, and that driver has a
Viewport hook that should keep it limping along as well as it ever has.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
commit 26d86d26f9 added
gl_shader_program::UniformLocationBaseScale. According to the code
comments in that commit, UniformLocationBaseScale "must be >=1".
UniformLocationBaseScale is of type unsigned. Coverity reported a "Macro
compares unsigned to 0" defect as well.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The function does array bounds checking. Note, this exposes a
bug in the svga_mark_surface_dirty() function: we're calling
svga_age_texture_view() with a texture slice instead of mipmap
level. This can lead to a failed assertion. That'll be fixed next.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
- use mgwhelp -- the successor for bfdhelp which does not have a hard
dependency on BFD, and works on 64bits.
- use a macro instead of hand-typing to dispatch DbgHelp functions
- dump line numbers
- dump module names when symbols are not available
- support 64bits.
- add comments
Reviewed-by: Brian Paul <brianp@vmware.com>
Because our code couldn't handle it we were skipping rendering
if we detected overflows. According to the spec we should
still render but with all 0 vertices, which is what the llvm
code already does. So for the llvm paths lets enable processing
even if an overflow condition has been detected.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Before we could easily overflow if start+count>max integer. To
avoid it we can just iterate over the count. This makes sure
that we never crash, since most of the overflow conditions
is already handled.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Make pass_render_condition() available for blitter, and check for render
condition in (and only in) clear(), clear_render_target(), and
clear_depth_stencil().
Add ilo_shader_select_kernel_routing() to construct 3DSTATE_SBE. It is called
in ilo_finalize_states(), rather than in create_fs_state(), as it depends on
VS/GS and rasterizer states.
With this change, ilo_shader_internal.h is no longer needed for
ilo_gpe_gen6.c.
This allows us to remove ilo_shader_internal.h from ilo_gpe_gen7.c. The
unfinished code in 3DSTATE_DS, 3DSTATE_HS, and INTERFACE_DESCRIPTOR_DATA are
partly or entirely removed.
The unmodified pipe_stream_output_info describes its outputs as if they are in
TGSI_FILE_OUTPUT. Remap the register indices to where they appear in the VUE.
TGSI_SEMANTIC_PSIZE needs a little care because it is at the W channel.
When a new VS kernel is generated, a newly added function,
ilo_gpe_init_vs_cso(), is called to construct 3DSTATE_VS command in
ilo_shader_cso. When the command needs to be emitted later, we copy the
command from the CSO instead of constructing it dynamically.
Add ilo_shader_get_type() to query the type (PIPE_SHADER_x) of the shader.
Add ilo_shader_get_kernel_offset() and ilo_shader_get_kernel_param() to query
the cache offset and various kernel parameters of the selected kernel.
Add ilo_shader_select_kernel() to replace the dependency table,
ilo_shader_variant_init(), and ilo_shader_state_use_variant().
With the changes, we no longer need to include ilo_shader_internal.h in
ilo_state.c.
Replace ilo_shader_state_create() by
ilo_shader_create_vs()
ilo_shader_create_gs()
ilo_shader_create_fs()
ilo_shader_create_cs()
Rename ilo_shader_state_destroy() to ilo_shader_destroy(). The old
ilo_shader_destroy() is renamed to ilo_shader_destroy_kernel().
To prevent segfaults in the AA line module, the code will check for a
valid pointer to the aaline_stage in the draw context.
Fixes segfault from backtrace:
* aaline_stage_from_pipe
aaline_delete_fs_state
Reviewed-by: Brian Paul <brianp@vmware.com>
swrastGetImage rounds the pitch up to 4 bytes for compatibility reasons
that are explained in drisw_glx.c:bytes_per_line, so drisw_update_tex_buffer
must do the same.
Fixes window skew seen while running firefox over vnc on a 16-bit screen.
NOTE: This is a candidate for the stable branches.
[ajax: fixed typo in comment]
Reviewed-by: Stéphane Marchesin <marcheu@chromium.org>
Signed-off-by: Richard Sandiford <rsandifo@linux.vnet.ibm.com>
Squashed commit of the following:
commit 0857a7e105bfcbc4d1431b2cc56612094c747ca3
Author: Richard Sandiford <r.sandiford@uk.ibm.com>
Date: Tue Jun 18 12:25:07 2013 -0400
gallivm: Fix lp_build_rgba8_to_fi32_soa for big endian
Reviewed-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Richard Sandiford <r.sandiford@uk.ibm.com>
commit 0d65131649a8aa140e2db228ba779d685c4333e3
Author: Richard Sandiford <r.sandiford@uk.ibm.com>
Date: Tue Jun 18 12:25:07 2013 -0400
gallivm: Fix big-endian machines
This adds a bit-shift count to the format table, and adds the concept of
vector or bitwise alignment on gathers.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Richard Sandiford <r.sandiford@uk.ibm.com>
commit 9740bda9b7dc894b629ed38be9b51059ce90818f
Author: Richard Sandiford <r.sandiford@uk.ibm.com>
Date: Tue Jun 18 12:25:07 2013 -0400
llvmpipe: Fix convert_to_blend_type on big-endian
Reviewed-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Richard Sandiford <r.sandiford@uk.ibm.com>
commit ae037c2de0f029e4e99371c0de25560484f0d8df
Author: Richard Sandiford <r.sandiford@uk.ibm.com>
Date: Tue Jun 18 12:25:06 2013 -0400
util: Convert color pack to packed formats
This fixes them on big-endian.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Richard Sandiford <r.sandiford@uk.ibm.com>
commit 5b05ac0c89ae092ea8ba5bba9f739708d7396b5c
Author: Richard Sandiford <r.sandiford@uk.ibm.com>
Date: Tue Jun 18 12:25:06 2013 -0400
graw-xlib: Convert to packed formats
Reviewed-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Richard Sandiford <r.sandiford@uk.ibm.com>
commit 51396e7d098cb6ff794391cf11afe4dbf86dbea0
Author: Richard Sandiford <r.sandiford@uk.ibm.com>
Date: Tue Jun 18 12:25:06 2013 -0400
format: Convert to packed formats
Reviewed-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Richard Sandiford <r.sandiford@uk.ibm.com>
commit 417b60bc66eb450e68a92ab0e47f76e292b385e6
Author: Adam Jackson <ajax@redhat.com>
Date: Tue Jun 18 12:25:06 2013 -0400
st/dri: Convert to packed formats
Reviewed-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Richard Sandiford <r.sandiford@uk.ibm.com>
commit 0934b2e022a5e0847d312c40734e2b44cac52fd8
Author: Richard Sandiford <r.sandiford@uk.ibm.com>
Date: Tue Jun 18 12:25:06 2013 -0400
st/xlib: Convert to packed formats
Reviewed-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Richard Sandiford <r.sandiford@uk.ibm.com>
commit a307ea3c3716a706963acce7966b5e405ba11db9
Author: Richard Sandiford <r.sandiford@uk.ibm.com>
Date: Tue Jun 18 12:25:06 2013 -0400
gbm: Convert to packed formats
Reviewed-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Richard Sandiford <r.sandiford@uk.ibm.com>
commit 53eebdd253e1960a645ea278f31d7ef6a6cf4aeb
Author: Richard Sandiford <r.sandiford@uk.ibm.com>
Date: Tue Jun 18 12:25:06 2013 -0400
tests: Convert to packed formats
Reviewed-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Richard Sandiford <r.sandiford@uk.ibm.com>
commit 2f77fe3ee524945eacd546efcac34f7799fb3124
Author: Adam Jackson <ajax@redhat.com>
Date: Tue Jun 18 13:07:37 2013 -0400
gallium: Document packed formats
Signed-off-by: Adam Jackson <ajax@redhat.com>
commit 1f1017159ce951f922210a430de9229f91f62714
Author: Richard Sandiford <r.sandiford@uk.ibm.com>
Date: Tue Jun 18 12:25:06 2013 -0400
gallium: Introduce 32-bit packed format names
These are for interacting with buffers natively described in terms of
bit shifts, like X11 visuals:
uint32_t xyzw8888 = (x << 0) | (y << 8) | (z << 16) | (w << 24);
Define these in terms of (endian-dependent) aliases to the array-style
format names.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Richard Sandiford <r.sandiford@uk.ibm.com>
commit 6cc7ab1ee66ed668da78c1d951dfd7782b4e786a
Author: Adam Jackson <ajax@redhat.com>
Date: Mon Jun 3 12:10:32 2013 -0400
gallium: Document format name conventions
v2:
- Fix a channel name thinko (Michel Dänzer)
- Elaborate on SCALED versus INT
- Add links to DirectX and FOURCC docs
Signed-off-by: Adam Jackson <ajax@redhat.com>
commit df4d269e7fb62051a3c029b84147465001e5776e
Author: Adam Jackson <ajax@redhat.com>
Date: Tue Jun 18 12:25:06 2013 -0400
gallivm: Remove all notion of byte-swapping
Signed-off-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Adam Jackson <ajax@redhat.com>
The result isn't always 0 in this case (depends on query type),
so instead of special casing this just use the ordinary path (should result
in correct values thanks to initialization in query_begin/end), just
skipping the fence wait.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This fixes the bytestream parsing of mpeg-1 stream, but still leaves
open a number of issues with the interpretation:
- IDCT mismatch control is not correct for MPEG-1.
- Slices do not have to start and end on the same horizontal row of macroblocks.
- picture_coding_type = 4 (D-pictures) is not handled.
- full_pel_*_vector is not handled.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
The results of a bary.f do not appear to be immediatley available, but
there is no explicit sync bit. Instead the compiler must just ensure
that there are a minimum number of instructions following the bary
before use of the result of the bary. We aren't clever enough for that
so just throw in some nop's.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
If we are accumulating result into tmp.x, and need a mov to final
destination, we want to move the .x component into all of the components
enabled from the read dest's writemask, ie. we want:
MOV dst.xyzw tmp.xxxx
rather than:
MOV dst.xyzw tmp.xyzw
Signed-off-by: Rob Clark <robclark@freedesktop.org>
This code had no relation to ir_to_mesa.cpp, since it was also used by
intel and state_tracker, and most of it was duplicated with the standalone
compiler (which has periodically drifted from the Mesa copy).
v2: Split from the ir_to_mesa to shaderapi.c changes.
Acked-by: Paul Berry <stereotype441@gmail.com> (v1)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
There was nothing ir_to_mesa-specific about this code, but it's not
exactly part of the compiler's core turning-source-into-IR job either.
v2: Split from the ir_to_mesa to glsl/ commit, avoid renaming the sh
variable.
Acked-by: Paul Berry <stereotype441@gmail.com> (v1)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I noticed this while trying to merge code with the builtin compiler, which
does set it.
Note that this causes two regressions in piglit in
default-precision-sampler.* which try to link without a vertex or fragment
shader, due to being run under the desktop glslparsertest binary (using
ARB_ES3_compatibility) that doesn't know about this requirement.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
We were duplicating this code all over the place, and they all would need
updating for the next set of shader targets.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
We have ir->print() to do the old declaration of a visitor and having the
IR accept the visitor (yuck!). And now you can call _mesa_print_ir()
safely anywhere that you know what an ir_instruction is.
A couple of missing printf("\n")s are added in error paths -- when an
expression is handed to the visitor, it doesn't print '\n' (since it might
be a step in printing a whole expression tree).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
No more forgetting to #include "ir_print_visitor.h" when doing temporary
debug code, or forgetting and leaving it in after removing your temporary
debug code. Also, available from C code so you don't need to move the
caller to C++ just to call it (see also: ir_to_mesa.cpp).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Based from the code from the good old python state tracker.
Extremely handy to diagnose regressions in state trackers.
Reviewed-by: Brian Paul <brianp@vmware.com>
Not used yet but there's a couple of places in llvmpipe which should use this
(occlusion count is currently very inefficent if there's no cpu popcnt
instruction).
Handle PIPE_QUERY_GPU_FINISHED and PIPE_QUERY_TIMESTAMP_DISJOINT, and
also fill out the ps_invocations and c_primitives from the
PIPE_QUERY_PIPELINE_STATISTICS (the others in there should already
be handled). Note that ps_invocations isn't pixel exact, just 16 pixel
exact but I guess it's better than nothing.
Doesn't really seem to work correctly but there's probably bugs elsewhere.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The driver can do render_condition but wasn't handling the occlusion
and so_overflow predicates (though the latter might not work yet due
to gs support).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The semantics didn't really make sense, not really matching neither d3d9
(though the docs are all broken there) nor d3d10. So make it match d3d10
semantics, which actually gives meaning to the "disjoint" part.
Drivers are fixed up in a very primitive way, I have no idea what could
actually cause the counter to become unreliable so just always return
FALSE for the disjoint part.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Move some functions from the svga_tgsi_insn.h header into the
svga_tgsi_insn.c file since they're only used there. Plus, add
comments and fix formatting.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The new code makes the shader cache manages all shaders and be able to upload
all of them to a caller-provided bo as a whole.
Previously, we uploaded only the bound shaders. When a different set of
shaders is bound, we had to allocate a new kernel bo to upload if the current
one is busy.
When doing blit using the 3D engine, the rasterizer cso may be NULL.
Ported from nvc0 commit 8aa8b0539.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
We need to set up a handler for the global_remove event that gets sent
out when a global gets removed. Without the handler we end up calling
a NULL pointer.
https://bugs.freedesktop.org/show_bug.cgi?id=65910
NOTE: This is a candidate for the stable branches.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
When rendering to a texture with BaseLevel set, the miptree may be laid
out such that BaseLevel is in level 0 of the miptree (to avoid wasting
memory on unused levels between 0 and BaseLevel-1). In that case, we
have to shift our render target's level down to the appropriate level of
the smaller miptree.
The WebGL test in combination with a meta code relating to
glGenerateMipmap also triggered a similar failure scenario.
This GPU hang regression was introduced by c754f7a8.
Bugzilla: http://bugs.freedesktop.org/show_bug.cgi?id=65324
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
This reverts commit 41966fdb3b.
While it's a lot cleaner it causes regressions because
the draw interface is always called from the draw functions
of the drivers (because the buffers need to be mapped) which
means that the stream output buffers endup being cleared on
every draw rather than on setting.
Signed-off-by: Zack Rusin <zackr@vmware.com>
honor render_condition for clear_render_target and clear_depth_stencil.
Also add minimal support for occlusion predicate, though it can't be active
at the same time as an occlusion query yet.
While here also switchify some large if-else (actually just mutually
exclusive if-if-if...) constructs.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
For conditional rendering this makes it possible to skip rendering
if either the predicate is true or false, as supported by d3d10
(in fact previously it was sort of implied skip rendering if predicate
is false for occlusion predicate, and true for so_overflow predicate).
There's no cap bit for this as presumably all drivers could do it trivially
(but this patch does not implement it for the drivers using true
hw predicates, nvxx, r600, radeonsi, no change is expected for OpenGL
functionality).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Add ilo_gpe_init_zs_surface() to construct
3DSTATE_DEPTH_BUFFER
3DSTATE_STENCIL_BUFFER
3DSTATE_HIER_DEPTH_BUFFER
at surface creation time. This allows fast state emission in draw_vbo().
This gets us support for blitting to attachment types other than
textures.
v2: fix up comments from review by Kenneth.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Paul Berry <stereotype441@gmail.com>
Now any caller (such as glCopyPixels()) can benefit from it, and it only
changes the correct subset of the destination instead of a whole teximage.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Apparently we don't have any piglit tests for this, because it would have
assertion failed in a debug build, or just rendered wrong in a non-debug
build if the destination wasn't covering whole tiles.
v2: Use the new macros.
Reviewed-by: Paul Berry <stereotype441@gmail.com> (v1)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
We're going to add more BCS_SWCTRL setup instances soon, and you have to
be careful to have the set and restore atomic with the rendering that's
done, so that our state doesn't leak out to other rendering processes.
v2: Rewrite the patch to have batch begin/advance macros so that magic
numbers don't get sprinkled around (and so you don't mix up your
do-I-need-to-reset vs what-do-I-reset-to logic, which I nearly did in
the next patch when first writing it)
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Intel had brokenness here, and I'd like to continue moving Mesa toward
hiding 1D_ARRAY's ridiculousness inside of the core, like we did with
MapTextureImage. Fixes copyteximage 1D_ARRAY on intel.
There's still an impedance mismatch in meta when falling back to read and
texsubimage, since texsubimage expects coordinates into 1D_ARRAY as
(width, slice, 0) instead of (width, 0, slice).
v2: Fix offset of scanline reads from the source. (Thanks Brian!), replace
dd.h comment with Paul's text and replace early exit with an assert.
Reviewed-by: Brian Paul <brianp@vmware.com> (v1)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
Reviewed-by: Paul Berry <stereotype441@gmail.com> (v1)
I noticed this code didn't work as advertised while doing some passing around
of TGSI shaders and trying to reparse them, and things failing.
This seems to fix it here for at least the small test case I hacked into a
graw test.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Commit 1f82bf12ed inadvertently broke it, checking for __IEEE_FLOAT on all
Alpha machines instead of only on VMS as before.
NOTE: This is a candidate for the 9.1 branch.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Andreas Boll <andreas.boll.dev@gmail.com>
Signed-off-by: Sven Joachim <svenjoac@gmx.de>
Fixes window skew seen while running gnome on a 16-bit screen over vnc.
NOTE: This is a candidate for stable release branches.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Richard Sandiford <rsandifo@linux.vnet.ibm.com>
Fixes a crash seen while running gnome on a 16-bit screen over vnc.
NOTE: This is a candidate for stable release branches.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Richard Sandiford <rsandifo@linux.vnet.ibm.com>
byteswap.h and bswap_32 aren't portable, replace them with calls to
gallium's util_bswap32 as suggested by Mark Kettenis. Lets these files
build on OpenBSD.
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
gl can use elts without setting indices, in which case
our eltMax was set to 0 and always invoking the overflow
condition. So by default set eltMax to maximum, it will
be curbed by draw_set_indexes (if it ever comes) and if
not then it will let gl's glVertexPointer/glDrawArrays
work correctly. Fixes piglit's
triangle-rasterization-overdraw test.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Moves clearing of the draw so target buffers to the draw
module. They had to be cleared in the drivers before
which was quite messy.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Port BLT code in ilo_blit.c to BLT-based blitting methods of ilo_blitter. Add
BLT-based clears. The latter is verifed with util_clear(), but it is not in
use yet.
Primitive restart with an arbitrary cut index was first supported as of
Haswell. It's very doubtful that they'd take that away in future
hardware, so we may as well alter the check now.
The PRM suggests a larger layout, mostly to support having
gl_ClipDistance[] somewhere predictable for the fixed-function clipper
-- but it didn't actually arrive in Gen5.
Just use the same layout for both Gen4 and Gen5.
No Piglit regressions.
Improves performance in CS:S Video Stress Test by ~3%.
V2: - Remove now-useless function for determining the SF URB read offset
- Remove now-unused BRW_VARYING_SLOT_POS_DUPLICATE
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Required by ARB_shading_language_420pack. Note that the 420pack spec
incorrectly specifies their values as (Min, Max) = (-7, 8) when they
should be (-8, 7) as listed in the GLSL 4.30 and ESSL 3.0 specs.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Assert that we do not support user vertex/index/constant buffers. Issue a
warning when a sampler view is created for a resource without
PIPE_BIND_SAMPLER_VIEW.
The temporary texture should have either PIPE_BIND_RENDER_TARGET or
PIPE_BIND_DEPTH_STENCIL set in addition to PIPE_BIND_SAMPLER_VIEW.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
There are strict limits on those registers. Define the maximums
and use them instead of magic numbers. Also allows us to add
some extra sanity checks.
Suggested by Brian.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
We don't need the clamped variable, because we can just
return early. We should also do the regular culling after
the distance culling passes.
All spotted by Brian.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
When a resource is busy and is mapped with
PIPE_TRANSFER_DISCARD_WHOLE_RESOURCE, the underlying bo is replaced. We need
to mark states affected by the resource dirty.
With this change, we no longer have to emit vertex buffers and index buffer
unconditionally.
Even with hardware contexts, since we do not pin resources, we have to re-emit
the states so that the resources are referenced (by cp->bo) and their offsets
are updated in case they are moved. This also allows us to elimiate cp flush
in is_bo_busy().
Problem: The IEEE float optimized version of UNCLAMPED_FLOAT_TO_UBYTE
in macros.h computed incorrect results for inputs in the range
0x3f7f0000 (=0.99609375) to 0x3f7f7f80 (=0.99803924560546875)
inclusive. 0x3f7f7f80 is the IEEE float value that results in 254.5
when multiplied by 255. With rounding mode "round to closest even
integer", this is the largest float in the range 0.0-1.0 that is
converted to 254 by the generic implementation of
UNCLAMPED_FLOAT_TO_UBYTE. The IEEE float optimized version
incorrectly defined the cut-off for mapping to 255 as 0x3f7f0000
(=255.0/256.0). The same bug was present in the function
float_to_ubyte in u_math.h.
Fix: The proposed fix replaces the incorrect cut-off value by
0x3f800000, which is the IEEE float representation of 1.0f. 0x3f7f7f81
(or any value in between) would also work, but 1.0f is probably
cleaner.
The patch does not regress piglit on llvmpipe and on i965 on sandy
bridge.
Tested-by Stéphane Marchesin <marcheu@chromium.org>
Reviewed-by Stéphane Marchesin <marcheu@chromium.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Page flipping generates an invalidate event every frame, causing reallocations
of all private resources (MSAA and depth-stencil).
Reusing the resources may improve performance (especially under memory
pressure).
Reviewed-by: Brian Paul <brianp@vmware.com>
We have to use pipe->blit, not resource_copy_region, so that the read buffer
is resolved if it's multisampled. I also removed the CPU-based copying,
which just did format conversion (obsoleted by the blit).
Also, the layer/slice/face of the read buffer is taken into account (this was
ignored).
Last but not least, the format choosing is improved to take float and integer
read buffers into account.
Reviewed-by: Brian Paul <brianp@vmware.com>
There were 2 issues with it:
- resource_copy_region doesn't allow different sample counts of both src
and dst, which can occur if we blit between a window and a FBO, and
the window has an MSAA colorbuffer and the FBO doesn't.
(this was the main motivation for using pipe->blit)
- blitting from or to a non-zero layer/slice/face was broken, because
rtt_face and rtt_slice were ignored.
blit_copy_pixels is now used even if the formats and orientation of
framebuffers don't match.
Reviewed-by: Brian Paul <brianp@vmware.com>
We did downsample (=resolve) MSAA resources to make ReadPixels work with MSAA
GLX visuals, which was enough for read-only color-only transfers.
This commit makes write color transfers and depth-stencil transfers work
in a similar manner. It does downsampling in transfer_map and upsampling
in transfer_unmap.
Reviewed-by: Brian Paul <brianp@vmware.com>
There isn't any difference between 32_FLOAT and 32_*INT in vertex fetching.
Both of them don't do any format conversion.
Reviewed-by: Brian Paul <brianp@vmware.com>
Previously we would generate uniform locations as (slot << 16) +
array_index. We do this to handle applications that assume the location
of a[2] will be +1 from the location of a[1]. This resulted in every
uniform location being at least 0x10000. The OpenGL 4.3 spec was
amended to require this behavior, but previous versions did not require
locations of array (or structure) members be sequential.
We've now encountered two applications that assume uniform values will
be "small." As far as we can tell, these applications store the GLint
returned by glGetUniformLocation in a int16_t or possibly an int8_t.
THIS BEHAVIOR IS NOT GUARANTEED OR IMPLIED BY ANY VERSION OF OpenGL.
Other implementations happen to have both these behaviors (sequential
array elements and small values) since OpenGL 2.0, so let's just match
their behavior.
Fixes "3D Bowling" on Android.
NOTE: This is a candidate for stable release branches.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-and-tested-by: Chad Versace <chad.versace@linux.intel.com>
This is used by _mesa_uniform_merge_location_offset and
_mesa_uniform_split_location_offset to determine how the base and offset
are packed. Previously, this value was hard coded as (1U<<16) in those
functions via the shift and mask contained therein. The value is still
(1U<<16), but it can be changed in the future.
The next patch dynamically generates this value.
NOTE: This is a candidate for stable release branches.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-and-tested-by: Chad Versace <chad.versace@linux.intel.com>
Use new util_fill_box helper for util_clear_render_target.
(Also fix off-by-one map error.)
v2: handle non-zero z correctly in new helper
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This patch adds code to place mcs_state into INTEL_MCS_STATE_RESOLVED
for miptrees that are capable of supporting fast color clears. This
will have no effect on buffers that don't undergo a fast color clear;
however, for buffers that do undergo a fast color clear, an MCS
miptree will be allocated (at the time of the first fast clear), and
will be used thereafter.
Reviewed-by: Eric Anholt <eric@anholt.net>
In certain circumstances the memory region underlying a miptree is
shared with other miptrees, or with other code outside Mesa's control.
This happens, for instance, when an extension like GL_OES_EGL_image or
GLX_EXT_texture_from_pixmap extension is used to associate a miptree
with an image existing outside of Mesa.
When this happens, we need to disable fast color clears on the miptree
in question, since there's no good synchronization mechanism to ensure
that deferred clear writes get performed by the time the buffer is
examined from the other miptree, or from outside of Mesa.
Fortunately, this should not be a performance hit for most
applications, since most applications that use these extensions use
them for importing textures into Mesa, rather than for exporting
rendered images out of Mesa. So most of the time the miptrees
involved will never experience a clear.
v2: Rework based on the fact that we have decided not to use an
accessor function to protect access to the region.
Reviewed-by: Eric Anholt <eric@anholt.net>
Resolve color buffers that have been fast-color cleared:
1. before texturing from the buffer (brw_predraw_resolve_buffers())
2. before using the buffer as the source in a blorp blit
(brw_blorp_blit_miptrees())
3. before mapping the buffer's miptree (intel_miptree_map_raw(),
intel_texsubimage_tiled_memcpy())
4. before accessing the buffer using the hardware blitter
(intel_miptree_blit(), do_blit_bitmap())
v2: Rework based on the fact that we have decided not to use an
accessor function to protect access to the region.
Reviewed-by: Eric Anholt <eric@anholt.net>
We already had code in intel_downsample_for_dri2_flush() for
downsampling front and back buffers when multisampling was in use.
This patch extends that function to perform fast color clear resolves
when necessary.
To account for the additional functionality, the function is renamed
to simply intel_resolve_for_dri2_flush().
Reviewed-by: Eric Anholt <eric@anholt.net>
This patch implements the "render target resolve" blorp operation.
This will be needed when a buffer that has experienced a fast color
clear is later used for a purpose other than as a render target
(texturing, glReadPixels, or swapped to the screen). It resolves any
remaining deferred clear operation that was not taken care of during
normal rendering.
Fortunately not much work is necessary; all we need to do is scale
down the size of the rectangle primitive being emitted, run the
fragment shader with the "Render Target Resolve Enable" bit set, and
ensure that the fragment shader writes to the render target using the
"replicated color" message. We already have a fragment shader that
does that (the shader that we use for fast color clears), so for
simplicity we re-use it.
Reviewed-by: Eric Anholt <eric@anholt.net>
The fragment shaders that to do color clears will be re-used to
perform so-called "render target resolves" (the resolves associated
with fast color clears). To prepare for that, this patch expands the
class hierarchy for blorp params by adding
brw_blorp_const_color_params (which will be used for all blorp
operations where the fragment shader outputs a constant color).
Some other data structures and functions were also renamed to use
"const_color" nomenclature where appropriate.
Reviewed-by: Eric Anholt <eric@anholt.net>
Since we defer allocation of the MCS miptree until the time of the
fast clear operation, this patch also implements creation of the MCS
miptree.
In addition, this patch adds the field
intel_mipmap_tree::fast_clear_color_value, which holds the most recent
fast color clear value, if any. We use it to set the SURFACE_STATE's
clear color for render targets.
v2: Flag BRW_NEW_SURFACES when allocating the MCS miptree. Generate a
perf_debug message if clearing to a color that isn't compatible with
fast color clear. Fix "control reaches end of non-void function"
build warning.
Reviewed-by: Eric Anholt <eric@anholt.net>
On Gen7+, MCS buffers are used both for compressed multisampled color
buffers and for "fast clear" of single-sampled color buffers.
Previous to this patch series, we didn't support fast clear, so we
only used MCS with multisampled bolor buffers.
As a first step to implementing fast clears, this patch modifies the
code that sets up SURFACE_STATE so that it configures the MCS buffer
whenever it is present, regardless of whether we are multisampling or
not.
Reviewed-by: Eric Anholt <eric@anholt.net>
This patch includes code to update the fast color clear state
appropriately when rendering occurs. The state will also need to be
updated when a fast clear or a resolve operation is performed; those
state updates will be added when the fast clear and resolve operations
are added.
v2: Create a new function, intel_miptree_used_for_rendering() to
handle updating the fast color clear state when rendering occurs.
Reviewed-by: Eric Anholt <eric@anholt.net>
This patch ifdefs out intel_mipmap_tree::mcs_mt when building the i915
(pre-Gen4) driver (MCS buffers aren't supported until Gen7, so there
is no need for this field in the i915 driver). This should make it a
bit easier to implement fast color clears without undue risk to i915.
Reviewed-by: Eric Anholt <eric@anholt.net>
When processing a buffer received from the X server,
intel_process_dri2_buffer() examines intel_region::name to determine
whether it's received a brand new buffer, or the same buffer it
received from the X server the last time it made a request.
However, this didn't work properly, because in the call to
intel_miptree_create_for_dri2_buffer(), we create a fresh intel_region
object to represent the buffer, and this was causing us to forget the
buffer's previous name.
This patch fixes things by copying over the region name when creating
the fresh intel_region object.
At the moment, this is just a minor performance optimization.
However, when fast color clears are added, it will be necessary to
ensure that the fast color clear state for a buffer doesn't get
discarded the next time we receive that buffer from the X server.
Reviewed-by: Eric Anholt <eric@anholt.net>
There is really nothing in struct intel_bo, and having it alias drm_intel_bo
makes the winsys impose almost zero overhead.
We can make the overhead gone completely by making the functions static
inline, if needed.
The motivation is to kill tiling and pitch in struct intel_bo. That requires
us to make tiling and pitch not queryable, and be passed around as function
parameters.
See two commits ago for the rationale. This allows us to delete the
whole gen7_cc_state.c file.
This does move these commands before the depth stall flushes from
brw_emit_depthbuffer, which may be a problem. The documentation for
3DSTATE_DEPTH_BUFFER mentions that depth stall flushes are required
before changing any depth/stencil buffer state, but explicitly lists
3DSTATE_DEPTH_BUFFER, 3DSTATE_HIER_DEPTH_BUFFER, 3DSTATE_STENCIL_BUFFER,
and 3DSTATE_CLEAR_PARAMS. It does not mention this particular packet
(_3DSTATE_DEPTH_STENCIL_STATE_POINTERS).
No observed Piglit regressions on Sandybridge or Ivybridge.
Together with the last two commits, this makes a cairo-gl benchmark
faster by 0.324552% +/- 0.258355% on Ivybridge. No statistically
significant change on Sandybridge. (Thanks to Eric for the numbers.)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, we would:
1. Emit the new indirect state.
2. Flag CACHE_NEW_BLEND_STATE.
3. Rely on later state atoms to notice CACHE_NEW_BLEND_STATE and emit a
pointer to the new indirect state.
This is rather cumbersome: it requires two state atoms instead of one,
and there's a strict ordering dependency in the list. Plus, the code
gets spread across two functions (or even files in the case of Gen7+).
Gen7+ has a packet to update just the blend state pointer, so it makes a
lot of sense to simply emit that right away. Gen6 has a combined packet
which updates blending, the color calculator, and depth/stencil state;
however, each can still be modified independently.
This drops the Gen6 micro-optimization where we tried to only emit one
packet that changed all three states. State updates are pretty cheap.
CACHE_NEW_BLEND_STATE is no longer necessary, so drop it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Works similarly to clip distance. If the cull distance is negative
for all vertices against a specific plane then the primitive
is culled.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
cull distance is analogous to clip distance. If a register is
given this semantic, then the values in it are assumed to be a
float32 distance to a plane. Primitives will be completely
discarded if the plane distance for all of the vertices in
the primitive are < 0.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
We need to figure out the number of invocations of the clipper
before the emit, because in the emit we are after clipping
where the number of primitives will be equal to number of clipper
invocations minus the clipped primitives. So our computations
were always off by the number of clipped primitives.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Draw depended on clip_plane_enable being set in the rasterizer
to use clipdistance registers for clipping. That's really
unfriendly because it requires that rasterizer state to have
variants for every shader out there. Instead of depending on
the rasterizer lets extract the info from the available state:
if a shader writes clipdistance then we need to use it and we
need to clip using a number of planes equal to the number
of writen clipdistance components. This way clipdistances
just work.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
we were always fetching the info from the vertex shader, but if
geometry shader is present it should be used as the source of
that info.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Fixes piglit texture-packed-formats regression. We need to implement
more XBGR formats here eventually, but many are UINT/SINT formats
which swrast doesn't handle yet anyway (integer textures).
Bugzilla https://bugs.freedesktop.org/show_bug.cgi?id=64935
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Set env var RADEON_VA=0 to disable VM on Cayman/Trinity.
Useful for debugging.
Note: this is a candidate for the 9.1 branch.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Having figured out what was going on with piglit fbo-depth copypixels
GL_DEPTH_COMPONENT32F (falling all the way back to swrast on CopyPixels to
a float depth buffer), I'm not inclined to fix the problem currently but
it seems worth saving someone else the debug time.
Reviewed-by: Matt Turner <mattst88@gmail.com>
We do a lot of multiplies by 3 or 4 for skinning shaders, and we can avoid
the sequence if we just move them into the right argument of the MUL.
On pre-IVB, this means reliably putting a constant in a position where it
can't be constant folded, but that's still better than MUL/MACH/MOV.
Improves GLB 2.7 trex performance by 0.788648% +/- 0.23865% (n=29/30)
v2: Fix test for pre-sandybridge.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com> (v1)
This is a trivial port of 1d6ead3804 from
the FS.
No significant performance difference on trex (misplaced the data, but it
was about n=20).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This is different from how we do it in the FS - we are using MAD even when
some of the args are constants, because with the relatively unrestrained
ability to schedule a MOV to prepare a temporary with that data, we can
get lower latency for the sequence of instructions.
No significant performance difference on GLB2.7 trex (n=33/34), though it
doesn't have that many MADs. I noticed MAD opportunities while reading
the code for the DOTA2 bug.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
draw_vertex_buffer declared the size field to be a size_t, but the LLVM
code used an int32 instead. This caused problems on big-endian 64-bit
targets, because the first 32-bit chunk of the 64-bit size_t was always 0.
In one sense size_t seems like a good choice for a size, so one fix
would have been to try to get the LLVM code to use the equivalent of
size_t too. However, in practice, the size is taken from things like ~0
or width0, both of which are int-sized, so it seemed simpler to make the
size field int-sized as well.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Richard Sandiford <rsandifo@linux.vnet.ibm.com>
Without this, llvmpipe ends up giving a zero size to all uncompressed textures
on non-x86 systems, since align() cannot handle a 0 alignment.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Richard Sandiford <rsandifo@linux.vnet.ibm.com>
lp_build_add and lp_build_sub have fallback code for cases
that cannot be handled by known intrinsics. For UNORM formats,
this code was using modulo rather than saturating arithmetic.
This fixes some rendering issues for a gnome session on System z.
It also fixes various piglit tests on z, such as
spec/ARB_color_buffer_float/GL_RGBA8-render.
The patch deliberately doesn't tackle the more complicated
SNORM case.
Tested against piglit on x86_64 and System z with no regressions.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Richard Sandiford <rsandifo@linux.vnet.ibm.com>
Now that Gen6+ relies on hardware contexts, we don't need to record an
occlusion query value at the end of each batch. That means we no longer
need to reserve space for the absurd number of PIPE_CONTROLs required to
do that on Sandybridge.
See commit 4e087de51a, which bumped this
up to 60 bytes. This is not quite a revert, as it uses 24 bytes instead
of 16, and saves the comments. As far as I can tell, the old value of
16 bytes was just wrong, so we shouldn't go back to that.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
We always allocate the maximum amount of space and never change it, so
it makes sense to do it once. Programming it on startup also lets us
skip re-programming it from BLORP.
This removes a tiny amount of overhead from our drawing loop.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
This removes a tiny bit of code from our drawing loop.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Now that we emit invariant state at startup (and never select the media
pipeline), the 3D pipeline will always already be selected, even if BLORP
is the first operation. So this is unnecessary.
v2: Fix unused variable warning (intel_context is no longer used).
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Now that we have hardware contexts, we can safely initialize our GPU
state once at startup, rather than needing a state atom with the
BRW_NEW_CONTEXT flag set.
This removes a tiny bit of code from our drawing loop.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
The existing code already returned a boolean; this just clarifies that.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
We already implemented this for ES3, so we just need to turn it on.
Fixes 6 Piglit tests:
spec/glsl-1.50/compiler/built-in-functions/determinant-mat[234].{vert,frag}
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Page 17 of the GLSL 1.50.11 specification states:
"There is a built-in macro definition for each profile the
implementation supports. All implementations provide the following
macro:
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Previously we only supported "#version 150". This patch recognizes
"compatibility" to give the user a more descriptive error message.
Fixes Piglit's version-150-core-profile test.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
If we didn't successfully parse the #version line, there's no point in
continuing with parsing and compiling: it's already failed.
Furthermore, it can actually be harmful: right after handling #version,
we call _mesa_glsl_initialize_types(), which checks state->es_shader and
language_version. If it isn't valid, it hits an assertion failure.
Fixes Piglit's "invalid-version-es." When processing "#version 110 es",
our code set state->es_shader and state->language_version = 110. It
then properly determined that this was invalid and flagged an error.
Since we continued anyway, we hit the assertion mentioned above.
NOTE: This is a candidate for the 9.1 branch.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
The GPU (at least a3xx, but I think also a2xx) can render directly to
memory, bypassing tiling. Although it can't do this if blend, depth,
and a few other features of the pipeline are enabled. This direct
memory mode can be faster for some sorts of operations, such as simple
blits. In particular, this significantly speeds up XA by avoiding to
pull the entire dest pixmap into GMEM, render tiles, and write it all
back out again. This should also speed up resource copy-region and
blit.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
The adreno a3xx GPU is found in newer snapdragon devices, such as the
nexus4. The a3xx is GLESv3 and OpenCL capable, although that is not
enabled yet in gallium.
Compared to a2xx, it introduces an entirely new unified shader ISA, and
re-shuffles all or nearly all of the registers. The good news is that
(for the most part) the registers are more orthogonal, not combining
unrelated state in a single register. And that there is a lot more
flexibility, so we don't need to patch and re-emit the shader like we
did on a2xx.
The shader compiler is currently quite dumb, there would be a lot of
room for improvement with an optimizing pass. Despite that, with the
a320 in my nexus4 it seems to be ~2-3x faster compared to the a220 in my
HP touchpad.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Split the parts that are specific to adreno a2xx series GPUs from the
parts that will be in common with a3xx, so that a3xx support can be
added more cleanly.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
We use 128bit vector interleave for untwiddling in the blend code (with
256bit vectors). llvm generates terrible code for this for some reason,
so instead of generating a shuffle for 2 128bit vectors use a
extract/insert shuffle instead (it only seems to matter we're not using
128bit wide vectors for the shuffle). This decreases instruction count of
the blend code generated for a rgba8 render target without blending from
169 to 113 with llvm 3.1 and from 136 to 114 in llvm 3.2/3.3, and I got
a ~8% (llvm 3.1) and ~5% (3.2/3.3) performance improvement in gears.
(The generated code is still not terribly good as we could actually avoid
the interleaving completely but llvm can't know this.)
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Probably due to CRLF endings, the discovery of python import statements
was not working on Windows builds, causing incremental builds to often
fail unless one wiped out the build directory.
NOTE: This is a candidate for stable branches.
We flush pending rendering before running CopySubBuffer, which
ensures that the right bits get to the screen.
NOTE: This is a candidate for stable release branches.
Reviewed-by: Brian Paul <brianp@vmware.com>
The coordinates need to be inverted between glX and gallium.
NOTE: This is a candidate for stable release branches.
Reviewed-by: Brian Paul <brianp@vmware.com>
Just like we produce from inside the Intel driver, this can help provide
information quickly about FBO incompatibility problems (particularly when
using apitrace replay).
Currently, in driver-marked incompleteness cases, you'll get both the
driver message and the core message on Intel. Until the other drivers are
fixed to produce output, I think this is better than not putting in a
message for driver-marked incomplete.
Reviewed-by: Brian Paul <brianp@vmware.com>
When a fake front buffer is in use, if we request the front buffer
(using screen->dri2.loader->getBuffersWithFormat()), the X server
copies the real front buffer to the fake front buffer and returns the
fake front buffer. We sometimes make redundant requests for the front
buffer (due to using a single counter to track invalidates for both
the front and back buffers), so there's a danger of pending front
buffer rendering getting overwritten when the redundant front buffer
request occurs.
Previous to this patch, intel_update_renderbuffers() worked around
that problem by sometimes doing intel_flush() and intel_flush_front()
before calling intel_query_dri2_buffers(). But it only did the
workaround when the front buffer was bound for drawing; it didn't do
it when the front buffer was bound for reading.
This patch moves the workaround code to intel_query_dri2_buffers(), so
that it happens in exactly the circumstances where it is needed.
This should fix some of the sporadic failures in Piglit tests
fbo-sys-blit and fbo-sys-sub-blit.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
The patch that follows will fix a bug that prevents
intel_flush_front() from being called often enough. In doing so, it
will create a situation where intel_flush_front() is called during the
initial call to glXMakeCurrent(). In this circumstance,
ctx->DrawBuffer hasn't been initialized yet and is NULL. Fortunately,
intel->front_buffer_dirty is false, so intel_flush_front() doesn't
actually need to do anything. To avoid a segfault, swap the order of
terms in intel_flush_front()'s if statement.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
piglit OpenGL ES 3.0/minmax now passes. This was also one of the subcase
failures in OpenGL 3.2/minmax (and still is, because our value is too low
for 3.2, but at least we report what it is).
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part of fixing piglit OpenGL ES 3.0/minmax.
v2: s/_gles3/_es3/ in extra name, for consistency (review by Matt).
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
Part of fixing piglit OpenGL ES 3.0/minmax.
v2: s/_gles3/_es3/ in extra name, for consistency (review by Matt).
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
These functions must clear all bound layers, not just the first.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Believe it or not but these two are actually the first two functions which
really belong in this file nowadays.
Reviewed-by: Brian Paul <brianp@vmware.com>
Mostly just make sure the layer parameter gets passed through to the right
places (and get clamped, can do this at setup time), fix up clears to
clear all layers and disable opaque optimization. Luckily don't need to
touch the jitted code.
(Clears invoked via pipe's clear_render_target method will not work however
since the pipe_util_clear function used for it doesn't handle clearing
multiple layers yet.)
v2: per Brian's suggestion, prettify var initialization and add some comments,
add assertion for impossible layer specification for surface.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Transfers always use z/depth for layers no matter if it's a 1d or 2d array
texture, we don't follow OpenGL's crazyness there. Luckily this appears to
only be a doc bug, everyone doing the right thing already.
While here also document z/depth parameter for cube map arrays.
v2: fix typo spotted by Eric Anholt
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Checking if array_size is greater than 1 is not enough for single-layered
array textures.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Moved draw_arrays() to st_draw_feedback.c and removed draw_arrays_instanced().
draw_arrays() was used by nobody else. Now there's just one "draw" entrypoint
into the draw module.
Signed-off-by: Brian Paul <brianp@vmware.com>
This change came from the discovery that the STATIC_ASSERT to check that
the number of register file strings didn't actually work.
Similar changes could be made for the other string arrays in tgsi_string.c
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Removes the special-case suppression of gl_ClipVertex in the VUE map.
Also calculate vertex outcodes for user clip planes based on
gl_ClipVertex if written; otherwise gl_Position.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When clipping triangles against a user clip plane, and gl_ClipVertex
is provided in the vertex, use it instead of hpos.
TODO: A similar change should be made at some point for line clipping.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Introduce ilo_surface_cso and initialize it in create_surface(). With the
change, we can emit SURFACE_STATE directly from the CSO and remove
emit_surf_SURFACE_STATE(). We do not deal with depth/stencil surfaces yet.
Introduce ilo_cbuf_cso and initialize it in set_constant_buffer(). As
ilo_view_surface is embedded in ilo_cbuf_cso, switch to emit_SURFACE_STATE()
for constant buffers and remove emit_cbuf_SURFACE_STATE().
Introduce ilo_view_cso and initialize it in create_sampler_view(). Add
emit_SURFACE_STATE() to GPE, which can emit SURFACE_STATE from
ilo_view_surface.
Moving the work to create time reduces the work at emit time.
Saves time overall as create work is only done once.
Fix compiler warning in gen7_pipeline_sol.
[olv: remember pipe_alpha_state instead of pipe_depth_stencil_alpha_state in
ilo_dsa_state]
Introduce ilo_ve_cso and initialize it in create_vertex_elements_state().
This commit goes a step further by setting up mappings from HW VB to PIPE VB,
which we failed to do previously. That allows us to support instanced
rendering.
Remove hiz and dsa from the parameters. We would know whether HiZ buffer
exists from ilo_texture once it is supported. DSA state should not affect
3DSTATE_DEPTH_BUFFER.
Introduce ilo_sampler_cso and initialize it in create_sampler_state(). This
saves us from having to perform CPU-intensive calculations to construct
hardware sampler states in draw_vbo().
This allows us to memcpy() the state in draw_vbo(). Add ilo_init_states() and
ilo_cleanup_states() that are called when contexts are created and destroyed
respectively, and properly set the initial scissor state in ilo_init_states().
Introduce ilo_viewport_cso and initialize it in set_viewport_states(). This
saves us from having to perform CPU-intensive calculations to construct
hardware viewport states in draw_vbo().
Define and use
struct ilo_sampler_state;
struct ilo_view_state;
struct ilo_cbuf_state;
struct ilo_resource_state;
struct ilo_global_binding;
in ilo_context.
We were counting uniforms located in UBOs against the default uniform
block limit, while not doing any counting against the specific combined
limit.
Note that I couldn't quite find justification for the way I did this, but
I think it's the only sensible thing: The spec talks about components, so
each "float" in a std140 block would count as 1 component and a "vec4"
would count as 4, though they occupy the same amount of space. Since GPU
limits on uniform buffer loads are surely going to be about the size of
the blocks, I just counted them that way.
Fixes link failures in piglit
arb_uniform_buffer_object/maxuniformblocksize when ported to geometry
shaders on Paul's GS branch, since in that case the max block size is
bigger than the default uniform block component limit.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Putting the human readable device names directly in the PCI ID list
consolidates things in one place. It also makes it easy to customize
the name on a per-PCI ID basis without a huge code explosion.
Based on a patch by Kristian Høgsberg.
v2: Fix 830M/845G names and #undef CHIPSET (caught by Emit Velikov).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This patch unifies mesa's PACKAGE_VERSION on autotools, scons and
Android build systems.
Current behaviour is:
- Autotools uses 9.2.0 as PACKAGE_VERSION
- Scons and Android use 9.2-devel as PACKAGE_VERSION
With this patch all three build systems use 9.2.0-devel as
PACKAGE_VERSION.
Reviewed-by: Brian Paul <brianp@vmware.com>
RADEON_GEM_WAIT_IDLE is declared DRM_IOW but mesa
uses it with drmCommandWriteRead instead of drmCommandWrite
which leads to the ioctl being unmatched and returning an
error on at least OpenBSD.
Problem originally noticed in libdrm by Mark Kettenis.
Dave Airlie pointed out that mesa has the same issue.
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
When building dri-swrast, use gallium_check_st to set HAVE_COMMON_DRI.
Commit 07f2dee7 added setting of HAVE_COMMON_DRI in gallium_check_st.
But the dri-swrast case did not use gallium_check_st.
So dri/common was still not built.
v2: set HAVE_COMMON_DRI=yes instead of using gallium_check_st
NOTE: This is a candidate for the 9.1 branch.
(Depends on 7de78ce5 and 07f2dee)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=61821
Signed-off-by: Andreas Boll <andreas.boll.dev@gmail.com>
The main change is to use MCJIT rather than the old JIT, which will never
be supported for System z. The endianness part is by example since the
patch was tested on a glibc system.
Signed-off-by: Richard Sandiford <rsandifo@linux.vnet.ibm.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
This was always doing per-pixel alignment which isn't necessary, except
for the buffer case (due to the per-element offset). The disabled code
for calculating it was incorrect because it assumed that always the full
block would be fetched, which may not be the case, so fix this up.
The original code failed for instance for r10g10b10a2 the alignment would
have been calculated as 4 (block_width) * 4 (bytes) so 16, but the actual
fetch may have only fetched 2 values at a time, hence only alignment 8 -
it is unclear what exactly would happen in this case (alignment larger
than size to fetch).
So just use the (already calculated) fetch size instead and get alignment
from that which should always work, no matter if fetching 1,2 or 4 pixels.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
For rendering to buffers, we cannot have any y alignment.
So make sure that tile clear commands only clear up to the fb width/height,
not more (do this for all resources actually as clearing more seems
pointless for other resources too). For the jit fs function, skip execution
of the lower half of the fragment shader for the 4x4 stamp completely,
for depth/stencil only load/store the values from the first row
(replace other row with undef).
For the blend function, also only load half the values from fs output,
replace the rest with undefs so that everything still operates on the
full 4x4 block to keep code the same between 4x1 and 4x4 (except for
load/store of course which also needs to skip (store) or replace these
values with undefs (load))., at the cost of slightly less optimal code
being produced in some cases.
Also reduce 1d and 1d array alignment too, because they can be handled the
same as buffers so don't need to waste memory.
v2: don't try to run special blend code for 4x1, (very) slightly less
complexity if we just use the same code as for 4x4 which may or may not
make it easier to optimize in the future (as we care a lot more about 4x4
performance than 1d).
v2: don't use undef values for unused fs src outputs with llvm 3.1 as it
apparently can trigger a bug in llvm.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Some parameters were used inconsistently, for instance not using
block_width/block_height/block_size for deferring number of pixels
but rather relying on guesses from the number of fragment shaders etc,
so fix this up (no actual change in behavior since the block size stays
fixed). (Though most of the code would work with different block_height,
with three exceptions, one being the hacked r11g11b10 conversions and
twiddle code which only work with block_height 2 not 1, and the last
one being blend vector type not being 128bit wide.)
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
There's no good reason why it can't handle 2x4f->1x8ub, 1x4f->1x4ub and
1x8f->1x8ub cases, there might be legitimate reasons why we don't have
enough input vectors for a full destination vector, and using pack
intrinsics should still be much better than using generic conversion
(it looks like convert_alpha from the blend code might hit this though
I suspect it could be avoided).
v2: add another test vector format to lp_test_conv so this gets tested.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The code was designed to handle no-op concat but failed (unless the
caller was using same pointer for src and dst).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
We've never properly supported more than one address register. There
isn't even a field in prog_src_register or prog_dst_register to indicate
which address register to use if RelAddr!=0.
In the state tracker, clamp MaxAddressRegs against MAX_PROGRAM_ADDRESS_REGS
since many gallium drivers do support more.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=65226
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Blorp and the hardware blitter can't be used to implement
CopyTexSubImage when the image type is 1D_ARRAY, because of a
coordinate system mismatch (the Y coordinate in the source image is
supposed to be matched up to the Z coordinate in the destination
texture).
The hardware blitter path (intel_copy_texsubimage) contained a perf
debug warning for this case, but it failed to actually fall back. The
blorp path didn't even check.
Fixes piglit test "copyteximage 1D_ARRAY".
Reviewed-by: Eric Anholt <eric@anholt.net>
Commit 045612c (intel: Add an assert for glCopyTexSubImage() being
called on MSAA buffers) added an assertion to intel_copy_texsubimage()
to make sure that multisampling was not in use, based on the
assumption that glCopyTexSubImage() can't legally be used with
multisampling.
However, there is one case where glCopyTexSubImage() can legally be
used with multisampling: when the source buffer is a multisampled
window system buffer. If the source and destination color formats
don't match, the blorp path will fail, so intel_copy_texsubimage()
will be called. In this case, we need intel_copy_texsubimage() to
return false so that we fall back to meta to do the copy. (The
multisampled source buffer won't cause a problem for the meta path,
because it uses glReadPixels, which forces a multisample resolve).
It's still safe to assert that the destination image is
single-sampled, because it's not legal to call glCopyTexSubImage() on
multisampled textures.
Fixes some failures with piglit tests "copyteximage
{1D,2D,CUBE,RECT,2D_ARRAY}" (with "samples=..." argument).
Reviewed-by: Eric Anholt <eric@anholt.net>
Okay I now understand why Frank would want to run away, this is
my attempt at fixing the CVE out of bounds access to constants
outside the range. This attempt converts any illegal constants
to constant 0 as per the GL spec, and is undefined behaviour.
A future patch should add some debug for users to find this out,
but this needs to be backported to stable branches.
CVE-2013-1872
v2: drop the last hunk which was a separate fix (now in master).
hopefully fix the indentations.
v3: don't fail piglit, the whole 8/16 dispatch stuff was over
my head, and I spent a while figuring it out, but this one is
definitely safe, one piglit pass extra on my Ironlake.
NOTE: This is a candidate for stable branches.
Signed-off-by: Dave Airlie <airlied@redhat.com>
We were copying the source stencil data onto the destination depth data.
Fixes piglit copyteximage other than 1D_ARRAY.
v2: Fix unintentional dropping of the "don't double-copy for packed
depth/stencil" check. While blorp is only supported on separate
stencil hardware at the moment, hopefully that will change soon.
Review by Jordan.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
When making v2 of da2880bea0, I carefully
checked all of the calls in that commit to see that I'd updated them, but
forgot to update the new calls in the later commits such as
.e845c5cf7abce55759501a473459aff3bf25c9ca. As a result, we were getting Y
tiled temporaries even though the whole point of the temporary was to
untile!
The steady state of the intro scene of lightsmark goes from 13 to 17 fps.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=65154
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
To trigger the bug, it suffices to have a line-continuation followed by
a newline and then a non-line-continuation backslash.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This loop-control condition with a post-decrement operator would lead to
an underflow of collapsed_newlines. This in turn would cause a subsequent
execution of the loop to labor inordinately trying to return the loop-control
variable to a value of 0 again.
Fix this by dis-intertwining the test and the decrement.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=65112
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When a gl_client_array is created with glColorPointer,
gl_client_array::Normalized is true. This caused the translation from the
gl_client_array's type to a BRW_SURFACEFORMAT to assertion fail.
Fixes the spinning cube's color in Android 4.2's ApiDemos.apk,
"Graphics > OpenGL ES".
Fixes assertion failure in mesa-demos/src/egl/opengles1/tri_x11 on Haswell
and Ivybridge:
brw_draw_upload.c:287: get_surface_type: Assertion `0' failed.
No Piglit regressions on Haswell.
Note: This is a candidate for the 9.1 branch.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=42182
Issue: AXIA-2954
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
It was changed from 0 to allow shader outputs at 0 that are
different from position.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Rather than pointing the surface_state directly at a single
sub-image of the texture for rendering, we now point the
surface_state at the top level of the texture, and configure
the surface_state as needed based on this.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
We were crashing when GL_READ_BUFFER == GL_NONE. Check for NULL
pointers and reorganize the code. The spec doesn't say which error
to generate in this situation, but NVIDIA raises GL_INVALID_OPERATION.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=65173
NOTE: This is a candidate for the stable branches.
Tested-by: Vedran Rodic <vrodic@gmail.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Before, on the second call to GenerateMipmap we were enabling two
vertex arrays for the current vertex array object, rather than
the private generate-mipmap vertex array object. This caused
things to blow up elsewhere.
This patch moves the array enables into the block where the
generate-mipmap vertex array object is created, as we do in
the setup_ff_generate_mipmap() function.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=60518
NOTE: This is a candidate for the stable branches.
Tested-by: core13@gmx.net
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Since pipe_surface already has all the necessary fields no interface
changes are necessary except adding a new shader semantic value
(TGSI_SEMANTIC_LAYER).
(Note that what GL knows as "gl_Layer" variable d3d10 is naming
"RENDER_TARGET_ARRAY_INDEX".)
v2: drop cap bit (just tied to geometry shader), add docs.
Surprising this bug survived so long, we were missing a clamp (in the
linear filtering version).
(Valgrind complained a lot about invalid reads with piglit texwrap,
I've also seen spurios failures in this test which might have
happened due to this. Valgrind probably didn't complain before the
alignment reduction in llvmpipe to 4x4 since the test is using tiny
textures so the reads were still always well within allocated area.)
While here, also do an effective clamp (after half subtraction)
of [0,length-0.5] instead of [0, length-1] which saves an instruction
(the filtering weight could be different due to this, but only if
both texels point to the same max texel so it doesn't matter).
(Both changes are borrowed from PIPE_TEX_CLAMP_TO_EDGE case.)
Note: This is a candidate for the stable branches.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
One of the assertion made no sense for buffer rendertargets
(due to the union), so drop it. (The same assertion is present already in
the path for texture surfaces later.).
v2: make assertion completely accurate (suggested by Jose).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
brw->ib.type is reset to -1 at the start of each batch. If there's no
index buffer, it won't get updated to a sensible value, resulting in
_mesa_primitive_restart_index's "Invalid index buffer type" assertion
tripping.
Fixes a regression since 7c87a3b5da.
NOTE: This is a candidate for the 9.1 branch (and should be squashed).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=65195
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
The overallocation was very bad especially for things like 1d array
textures which got blown up by a factor of 64. (Even ordinary smallish
2d textures benefit a lot from this, a mipmapped 64x64 rgba8 texture
previously used 7*16kB = 112kB instead of now ~22kB.)
4x4 is chosen because this is the size the jit functions run on, so
making it smaller is going to be a bit more complicated.
It is actually not strictly 4x4 pixel, since we'd want to avoid situations
where different threads are rendering to the same cacheline so we keep
cacheline size alignment in x direction (often 64bytes).
To make this work introduce new task width/height parameters and make
sure clears don't clear the whole tile if it's a partial tile. Likewise,
the rasterizer may produce fragments outside the 4x4 blocks present in a
tile, so don't call the jit function for them.
This does not yet fix rendering to buffers (which cannot have any y
alignment at all), and 1d/1d array textures are still overallocated by a
factor of 4.
v2: replace magic number 4 with LP_RASTER_BLOCK_SIZE, fix size of buffers
allocated (needed in case we render to them).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
These were mostly just a waste of memory and cache pressure, and were
really only used for debugging.
This change reduces instruction count (as measured by callgrind's Ir
event) of gnome-shell-perf-tool on Ivybridge by 3.5% ± 0.015% (n=20).
Signed-off-by: Adam Jackson <ajax@redhat.com>
This fixes the following build errors on powerpc:
CC glapi_dispatch.lo
In file included from glapi_dispatch.c:90:0:
../../../../../src/mapi/glapi/glapitemp.h:1640:1: error: no previous
prototype for 'glReadBufferNV' [-Werror=missing-prototypes]
../../../../../src/mapi/glapi/glapitemp.h:4198:1: error: no previous
prototype for 'glDrawBuffersNV' [-Werror=missing-prototypes]
../../../../../src/mapi/glapi/glapitemp.h:6377:1: error: no previous
prototype for 'glFlushMappedBufferRangeEXT'
[-Werror=missing-prototypes]
../../../../../src/mapi/glapi/glapitemp.h:6389:1: error: no previous
prototype for 'glMapBufferRangeEXT' [-Werror=missing-prototypes]
../../../../../src/mapi/glapi/glapitemp.h:6401:1: error: no previous
prototype for 'glBindVertexArrayOES' [-Werror=missing-prototypes]
../../../../../src/mapi/glapi/glapitemp.h:6413:1: error: no previous
prototype for 'glDeleteVertexArraysOES' [-Werror=missing-prototypes]
../../../../../src/mapi/glapi/glapitemp.h:6433:1: error: no previous
prototype for 'glGenVertexArraysOES' [-Werror=missing-prototypes]
../../../../../src/mapi/glapi/glapitemp.h:6445:1: error: no previous
prototype for 'glIsVertexArrayOES' [-Werror=missing-prototypes]
NOTE: This is a candidate for the 9.0 and 9.1 branches.
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
clientDriverNameLength is a CARD32 and needs to be bounds checked before
adding one to it to come up with the total size to allocate, to avoid
integer overflow leading to underallocation and writing data from the
network past the end of the allocated buffer.
NOTE: This is a candidate for stable release branches.
Reported-by: Ilja Van Sprundel <ivansprundel@ioactive.com>
Signed-off-by: Alan Coopersmith <alan.coopersmith@oracle.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
busIdStringLength is a CARD32 and needs to be bounds checked before adding
one to it to come up with the total size to allocate, to avoid integer
overflow leading to underallocation and writing data from the network past
the end of the allocated buffer.
NOTE: This is a candidate for stable release branches.
Reported-by: Ilja Van Sprundel <ivansprundel@ioactive.com>
Signed-off-by: Alan Coopersmith <alan.coopersmith@oracle.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
For formats such as GL_COMPRESSED_SRGB_S3TC_DXT1_EXT we need to
have both the GL_EXT_texture_sRGB and GL_EXT_texture_compression_s3tc
extensions. This patch adds the missing check for the later.
Found when checking out https://bugs.freedesktop.org/show_bug.cgi?id=65173
NOTE: This is a candidate for the stable branches.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
When we've changed draw_find_shader_output to return -1 instead
of 0 on non found attribs we broke the default behavior of
draw, which was to always redirect those to the first (0th) slot.
To preserve that behavior if draw_emit_vertex_attr notices a
mismatched vertex attrib, it just redirects it to the first slot
(instead of trying to use negative index in an array).
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
In traditional multisampled framebuffer rendering, color samples must be
explicitly resolved via BlitFramebuffer before doing the scaled blitting
of the framebuffer. So, scaled blitting of a multisample framebuffer
takes two separate calls to BlitFramebuffer.
This patch implements the functionality of doing multisampled scaled
resolve using just one BlitFramebuffer call. Important changes involved
in this patch are listed below:
- Use float registers to scale and offset texture coordinates.
- Change offset computation to consider float coordinates.
- Round the scaled coordinates down to nearest integer.
- Modify src texture coordinates clipping to account for scaling..
- Linear filter is not yet implemented in blorp. So, don't use
blorp engine to do single sampled scaled blitting.
V3: Fix nearest filtering issue in scaled blits. Makes failing piglit
fbo-blit-stetch test and framebuffer_blit_functionality_magnifying_blit.test
in gles3 CTS pass.
Observed no piglit, gles3 CTS regressions on sandybridge & ivybridge with
this patch.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
These changes are required to implement scaled blitting in blorp
in my next patch.
No regressions observed in piglit quick-driver.tests with this patch.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
This reverts commit 98dfd59a04.
The patch was clearly not Piglit tested, as it caused at least 225
tests to start crashing with assertion failures. That was before my
desktop tanked and the test run died completely.
Remove hash function on shader variants. Nature of variants limits them to a
small number and thus its more efficient to just do a memory compare of the
actual shader structures rather than compute and compare hashes.
This is my attempt at fixing this as the CVE is making RH security team
care enough to make me look at this. (please upstream, security fixes are
more important than whatever else you are doing, if for no other reason than
it saves me having to fix stuff I've no real clue about).
Since Frank's original fix was denied, here is my attempt to just
alias all constants that are out of bounds < 0 or > nr_params to constant 0,
hopefully this provides the undefined behaviour idr requires..
CVE-2013-1872
v2: drop the last hunk which was a separate fix (now in master).
hopefully fix the indentations.
NOTE: This is a candidate for stable branches.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Set fs_visitor::params_remap to NULL in the constructor.
This variable was potentially tested in fs_visitor::remove_dead_constants()
before being set.
NOTE: This is a candidate for stable release branches.
Signed-off-by: Frank Henigman <fjhenigman@google.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Viewport index should only be used on a per primitive basis, so
instead of fetching it from each vertex, potentially making each
vertex in a primitive use a different viewport index, which is
obviously broken, make sure that we only fetch from the first
vertex in the primitive making the viewport index the same
for the entire primtive.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca<jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
We need to clamp to make sure invalid shader doesn't crash our
driver. The spec says to return 0-th index for everything that's
out of bounds.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca<jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
If the viewport index is larger than the PIPE_MAX_VIEWPORTS,
then the first (0-th) viewport should be used.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca<jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
draw_find_shader_output like most of the code in draw used to
depend on position always being at output slot 0. which meant
that any other attribute being at 0 could signify an error.
unfortunately position can be at any of the output slots, thus
other attributes can occupy slot 0 and we need to mark the ones
which were not found by something else. This commit changes
draw_find_shader_output so that it returns -1 if it can't
find the given attribute and adjust the code that depended
on it returning >0 whenever it correctly found an attrib.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca<jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Largely related to making sure the rasterizer can correctly
pick out the correct scissor box for the current viewport.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca<jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
This adds support for multiple viewports to the draw module.
Multiple viewports depend on the presence of geometry shaders
which can write the viewport index.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca<jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Gallium supported only a single viewport/scissor combination. This
commit changes the interface to allow us to add support for multiple
viewports/scissors.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Reviewed-by: José Fonseca<jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
It's incorrect and isn't used any longer.
v2: Actually flush vertices/flag _NEW_TRANSFORM on RestartIndex change.
NOTE: This is a candidate for the 9.1 branch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
GL_PRIMITIVE_RESTART_FIXED_INDEX is only supposed to apply to
glDrawElements*. This code is for legacy drawing paths and display
lists, so it shouldn't apply.
NOTE: This is a candidate for the 9.1 branch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The derived _RestartIndex field is an attempt to support both
GL_PRIMITIVE_RESTART and GL_PRIMITIVE_RESTART_FIXED_INDEX (part of ES
3.0). Gallium drivers don't appear to support ES 3.0 yet, so they don't
need to use it. Plus, it's broken and going to go away soon.
NOTE: This is a candidate for the 9.1 branch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Pre-Haswell hardware doesn't support an arbitrary restart index, and
instead compares the index buffer value against 0xFF for byte-size
buffers, 0xFFFF for short-size buffers, or 0xFFFFFFFF for unsigned
integer buffers.
OpenGL allows the restart index to be an arbitrary unsigned integer.
When comparing against byte/short types, the index buffer value should
be promoted to a full 32-bit integer before doing the comparison. The
restart index is /not/ supposed to be masked to byte/short size.
This means that with certain restart indexes, the comparison should
always fail. For example, a restart index of 0xF000FFFF should never
match any byte/short index buffer values due to the extra high bits.
We must not enable hardware primitive restart in such a case. For now,
fall back to software primitive restart as it's the simplest fix. In
the future, we could detect restart indexes that will never match and
skip both hardware and software primitive restart.
NOTE: This is a candidate for stable branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The code that updates the ctx->Array._RestartIndex derived state mashed
it to 0xFFFFFFFF when GL_PRIMITIVE_RESTART_FIXED_INDEX was enabled
regardless of the index buffer type. It's supposed to be 0xFF for byte,
0xFFFF for short, or 0xFFFFFFFF for integer types.
The new _mesa_primitive_restart_index() helper gets this right.
The hardware appears to compare against the full 32-bit value some of
the time, causing primitive restart not to occur when it should. The
fact that it works some of the time is rather frightening.
Fixes sporadic failures in the ES 3 instanced_arrays_primitive_restart
conformance test when run in combination with other tests.
NOTE: This is a candidate for the 9.1 branch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This gets the correct restart index for unsigned byte/short types when
using GL_PRIMITIVE_RESTART_FIXED_INDEX.
NOTE: This is a candidate for the 9.1 branch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The derived state approach currently used (_RestartIndex) doesn't work:
in the GL_PRIMITIVE_RESTART_FIXED_INDEX case, the restart index depends
on the index buffer's data type, and that isn't known until draw time.
The existing code also fails to obey the GL 4.3 rules which say that
FIXED_INDEX takes precedence over normal primitive restart.
This helper function correctly determines the restart index, and will
replace the derived state.
NOTE: This is a candidate for the 9.1 branch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The derived _PrimitiveRestart enable flag combines the PrimitiveRestart
and PrimitiveRestartFixedIndex enable flags. However, DrawArrays is not
supposed to do FixedIndex restart:
From the OpenGL 4.3 Core specification, section 10.3.5 (page 302):
"If PRIMITIVE_RESTART_FIXED_INDEX is enabled, primitive restart is not
performed for array elements transferred by any drawing command not
taking a type parameter, including all of the *Draw* commands other
than *DrawElements*."
The OpenGL ES 3.0 specification agrees by omission:
"When DrawElements, DrawElementsInstanced, or DrawRangeElements
transfers a set of generic attribute array elements to the GL..."
Notably, DrawArrays is not included in the list of draw calls that
take PRIMITIVE_RESTART_FIXED_INDEX into consideration.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously it would assertion fail in debug builds (though the correct
value was returned in a non-debug build). Marking it as a candidate for
stable even though it has no current consumers in the stable branches, in
case one shows up in a later backport.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=64727
NOTE: This is a candidate for stable branches.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We were expanding the live range too far, breaking register_coalesce_2()
and compute_to_mrf() on 16-wide shaders. Turning it back on improves
GLB2.7 performance by 0.239355% +/- 0.0850649% (n=398). shader-db stats
are:
total instructions in shared programs: 1627211 -> 1609262 (-1.10%)
instructions in affected programs: 450351 -> 432402 (-3.99%)
While 33 new 16-wide shaders are gained, 70 are lost. Despite that,
tropics (the app that lost the most 16-wide) shows a .41% +/- .16%
(n=7/8, first-run outlier removed) performance improvement on my HSW.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The scheduler didn't know about uniform-type accesses, and if a uniform
access was last in a 16-wide, we'd walk off the end of the array. This
never happened, because we'd never coalesce out all the GRFs, due to a bug
to be fixed in the next commit.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
i965 and radeon use ra_set_node_reg() to force payload registers to
specific registers while exposing those registers to the allocator still.
We were treating those register nodes as unsuccessfully allocated in the
ra_simplify() step, leading to walking the registers again to do
optimistic coloring even if there was nothing left ot do.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Since the introduction of default-to-SARGB8 window system framebuffers,
non-blorp hardware lost blit acceleration for these two paths between the
window system and ARGB8888 textures. Since we shouldn't be doing any
conversion anyway, just compatibility-check the linear variants of the
formats.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=61954
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Tobias Jakobi <tjakobi@math.uni-bielefeld.de>
r600g needs it too, so add ipo in the common radeon_llvm_check().
radeonsi compiled and linked, but it failed at dynamic link time
with a missing symbol.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Eliminate the rest of the no longer needed layout logic.
(It is possible some code could be simplified a bit further still.)
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Since the glBitmap() MRT change, it's unused. There was basically no way
to responsibly use this function since MRT was introduced.
Reviewed-and-tested-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Paul Berry <stereotype441@gmail.com>
Any 32-bit format got ARGB8888 handling (including, say, GL_RG1616), and
anything else got 16-bit (including, say, GL_R8), which could potentially
hang the GPU by writing out of bounds.
NOTE: This is a candidate for the stable branches.
Reviewed-and-tested-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Paul Berry <stereotype441@gmail.com>
We'd only hit color buffer 0 even if multiple draw buffers were bound.
NOTE: This is a candidate for the stable branches.
Reviewed-and-tested-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Paul Berry <stereotype441@gmail.com>
This will ensure that we have resolves if we ever extend this to
glTexSubImage(), and fixes missing image start offset handling.
The texture buffer alloc ended up getting moved up, because we want to
look at the format of the image's actual mt to see if we'll end up
blitting the right thing, in the case of packed depth/stencil uploads.
This is the last caller of intelEmitCopyBlit() on a miptree-wrapped BO.
Reviewed-and-tested-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Paul Berry <stereotype441@gmail.com>
The previous code was missing depth resolves, that had only been prevented
due to no blitting of Y tiling. The pair of flip args in the new blit
function means that we can just drop the pack->Invert fallback.
Reviewed-and-tested-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Paul Berry <stereotype441@gmail.com>
I needed to do this for the PBO blit cases to use intel_miptree_blit().
But this also actually partially fixes a bug in EGLImage handling: We
can't share regions across contexts, because regions have a refcount that
isn't protected by a mutex, and different contexts can be simulataneously
accessed from multiple threads. Now we just need to get regions out of
__DRIImage. There was also a missing use of image->offset in the EGLImage
renderbuffer storage code.
Reviewed-and-tested-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Paul Berry <stereotype441@gmail.com>
In a bit of debug code, we no longer have the inter-slice x/y to print.
But I think the level/slice is more useful in this case for looking at
what's getting mapped, especially given that INTEL_DEBUG=blit will tell
you the other value.
Reviewed-and-tested-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Paul Berry <stereotype441@gmail.com>
While this is a bit more CPU work, it also is less code to handle this
path, and fixes problems with 32k-pitch textures and missing resolves.
v2: Add error checking in new code.
Reviewed-and-tested-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
Acked-by: Paul Berry <stereotype441@gmail.com>
For a blit-uploaded temporary, it's faster on current hardware to memcpy
the data into a linear CPU mapping than to go through the GTT.
v2: Turn the not-fully-supported mask into 3 supported enum values.
Reviewed-and-tested-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
Reviewed-by: Paul Berry <stereotype441@gmail.com> (v2)
Reviewed-by: Chad Versace <chad.versace@linux.intel.com> (v2)
This is just in case someone else trips over this due to our weird reuse
of this code in glBlitFramebuffer().
Reviewed-and-tested-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Paul Berry <stereotype441@gmail.com>
I think we've measured no performance difference from this in the past,
except that the blorp code can do things like multisample resolves.
Prevents piglit regression in the next commit when a testcase started
trying to do a multisampled resolve through the old glCopyTexSubImage()
path.
Reviewed-and-tested-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Paul Berry <stereotype441@gmail.com>
We were protected for a long time by the fact that depth was Y tiled and
you couldn't blit Y. Now that we can blit Y, we were failing to resolve
depth in glCopyPixels().
Note in the comment about swrast, that the swrast map path does resolves
appropriately already.
Reviewed-and-tested-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Paul Berry <stereotype441@gmail.com>
I had previously asserted that it was hard to write a useful, simpler
blit function, but I think this might be it.
This has the side effect of extending the 32k pitch check to a few more
places that were missing it.
v2: Update comment for being moved inside intel_miptree_blit().
Reviewed-and-tested-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Paul Berry <stereotype441@gmail.com>
Right now, the callers in i965 don't expect a nonzero page offset to
actually occur (since that's being handled elsewhere), but it seems
like a trap to leave it this way.
Reviewed-and-tested-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Paul Berry <stereotype441@gmail.com>
It appears that `sizeof(Class::member)` is either non-standard or
merely unsupported in MSVC.
So use `sizeof(instance->member)` instead, which is guaranteed to work
everywhere.
Also promote the assert to a static assert.
Trivial.
The problem is the sampler units are allocated from the same pool for all
shader stages, so if a vertex shader uses 12 samplers (0..11), the fragment
shader samplers start at index 12, leaving only 4 sampler units
for the fragment shader. The main cause is probably the fact that samplers
(texture unit -> sampler unit mapping, etc.) are tracked globally
for an entire program object.
This commit adapts the GLSL linker and core Mesa such that the sampler units
are assigned to sampler uniforms for each shader stage separately
(if a sampler uniform is used in all shader stages, it may occupy a different
sampler unit in each, and vice versa, an i-th sampler unit may refer to
a different sampler uniform in each shader stage), and the sampler-specific
variables are moved from gl_shader_program to gl_shader.
This doesn't require any driver changes, and it fixes piglit/max-samplers
for gallium and classic swrast. It also works with any number of shader
stages.
v2: - converted tabs to spaces
- added an assertion to _mesa_get_sampler_uniform_value
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
to match the size of ctx->Texture.Unit, and it will also fix
piglit/max-samplers with the following commit.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Some Gallium drivers were crashing, because the array was not large enough.
v2: clamp the per-shader maximum in st/mesa, then sum them all up
NOTE: This is a candidate for the stable branches.
Set up CB_SHADER_MASK register according to pixel shader exports, and enable
some minimal state for colour buffer 1 in case dual source blending is used.
TGSI_TEXTURE_BUFFER is one-dimensional. Assert that exec_tex() is never
called with TGSI_TEXTURE_BUFFER.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This patch improves handling of unconditional KILL instructions inside
the conditional blocks, uncovering more opportunities for if-conversion.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
PRED_SET instructions that update exec mask should be scheduled immediately
prior to the "if-then-else" block, because any instruction that is
inserted after alu clause with PRED_SET and before conditional block is
also conditionally executed by hw (exec mask is already updated at that
moment).
Propbably it's better to make PRED_SET a part of conditional
"if-then-else" block in the IR to handle this more cleanly,
but for now this temporary solution should prevent the problem.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
For compute shaders we need to let the backend know that
GPRs 0 and 1 are preloaded with some compute-specific input
values, otherwise any use of these regs without previous
definition is considered as undefined value and usually
is simply replaced with 0.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
This allows GVN rewrite pass to propagate non-const (register)
values to FETCH source operands, helping to eliminate unnecessary
copies in some cases.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
We have to assume that all GPRs in compute shader can be indirectly
addressed because LLVM backend doesn't provide any indirect array info.
That's why for compute shaders GPR array is created that covers all used
GPRs (0..r600_bytecode::ngpr-1), but this seriously restricts register
allocation in sb.
This patch checks for actual use of indirect access in the shader and
if it's not used then GPR array is not created, so that regalloc is not
unnecessarily restricted.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
It turns out the MI_LOAD_REGISTER_IMM approach doesn't work on Haswell,
and regressed essentially all the transform feedback Piglit tests.
This morally reverts eaa6fbe6d5. However,
the code is still simpler than it was. On BeginTransformFeedback, we
simply flush the batch and set the SOL reset flag so that the next batch
will start with zeroed offsets. There's still no software counting.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=64887
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Verify that interface blocks match when linking separate shader
stages into a program.
Fixes piglit glsl-1.50 tests:
* linker/interface-blocks-vs-fs-member-count-mismatch.shader_test
* linker/interface-blocks-vs-fs-member-order-mismatch.shader_test
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Verify that interface blocks match when combining compilation
units at the same stage. (For example, when merging all vertex
shaders.)
Fixes piglit glsl-1.50 test:
* linker/interface-blocks-multiple-vs-member-count-mismatch.shader_test
v5 (Ken): Rename to link_interface_blocks.cpp and drop the separate .h
file for consistency with other linker code. Remove "ok" variable.
Fold cross_validate_interface_blocks into its caller.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
With this change we now support interface block arrays.
For example, cases like this:
out block_name {
float f;
} block_instance[2];
This allows Mesa to pass the piglit glsl-1.50 test:
* execution/interface-blocks-complex-vs-fs.shader_test
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Convert interface blocks with instance names into flat
interface blocks without an instance name.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
For interface blocks, there are three separate namespaces for
uniform, input and output blocks.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously uniform blocks allowed for the 'uniform' keyword
to be used with members of a uniform blocks. With interface
blocks 'in' can be used on 'in' interface block members and
'out' can be used on 'out' interface block members.
The basic_interface_block rule will verify that the same
qualifier type is used with the block and each member.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
An interface block member may specify the type:
in {
in vec4 in_var_with_qualifier;
};
When specified with the member, it must match the same
type as interface block type.
It can also omit the qualifier:
uniform {
vec4 uniform_var_without_qualifier;
};
When the type is not specified with the member,
it will adopt the same type as the interface block.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Interface blocks in GLSL 150 allow an instance name to be used.
v2:
* use state->check_version
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously only 'uniform' was allowed for uniform blocks.
Now, in/out can be parsed, but it will only be allowed for
GLSL >= 150.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Enables guardband clipping when the viewport covers the entire render
target.
No piglit regressions on Ironlake.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
Relaxes the validation of
OPTION ARB_precision_hint_{nicest,fastest};
to allow duplicate options. The spec says that both /nicest/ and
/fastest/ cannot be specified together, but could be interpreted
either way for respecification of the same option.
Other drivers (NVIDIA etc) accept this, and at least one Unity3D game
expects it to succeed (Kerbal Space Program).
V2: Add spec quote.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
need_flush was uninitialized if hw3d->new_batch was true.
Fixes "Uninitialized scalar variable" defect reported by Coverity.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
'type' was not fully initialized when calling lp_build_context_init.
Fixes "Uninitialized scalar variable" defect reported by Coverity.
NOTE: This is a candidate for the stable branches.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Initially we had NUM_TEX_TILE_ENTRIES of 50, however this was using too much
memory (mostly because the tile cache is operating on fixed max current
sampler views which could be fixed but that's another topic). So it was
decreased to 4. However this is a ridiculously low number which can't
actually really work (the number of tiles needed for as little as
a single quad with linear_mipmap_linear is 2 to 8 for a 2d texture, and
4 to 16 for a 3d texture), as it just about guarantees there will be
cache thrashing sometimes (just about always for 3d textures in fact, since
while there are 4 entries the cache is direct mapped).
So increase that number to 16 (which is still on the low side for direct
mapped cache though I guess using something like 4-way associativity would
be more effective than increasing this further) which has at least some good
chance to avoid thrashing. Since we don't want to increase memory requirements
however in turn decrease the tile size accordingly from 64 to 32 (as a bonus
point this also decreases the cost of texture thrashing which might still
happen sometimes).
I've seen performance improvement in the order of factor ~200 (specifically,
drawing the first frame from the replay from bug 41787 needs "only" ~10s
instead of ~30min, meaning I can actually compare the output with other
drivers...) with this.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This optimization disabled mask checks if the shader is simple enough.
While this should work correctly, the problem is that it can hide real issues
because shaders in practice are usually complex enough (8 instructions or 1
texture is already enough) so this doesn't get used, whereas dumbed-down
tests which should hit all the same code paths suddenly do something quite
different. This was the reason that bug 41787 could not be easily tracked as
stencil test not working correctly (piglit would in fact have failed some
tests without that optimization).
So disable it for now, it's unclear if it's much of a win in any case.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
We actually did early depth/stencil test and late depth/stencil write even
when the shader could kill the fragment (alpha test or discard). Since it
matters for the new stencil value if the fragment is killed by depth/stencil
test or by the shader (in which case it will not reach the depth/stencil
test) this simply cannot work (we also would possibly skip writing the new
stencil value due to mask checks but this is a secondary issue).
So use late depth test / late depth write instead in this case.
(No piglit changes as it doesn't seem to hit such bogus early depth test
/ late depth write path.)
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
We did mask checks between depth/stencil testing and depth/stencil write.
This meant that if the depth/stencil test killed off all fragments we never
actually wrote the new stencil value. This issue affected all early/late
test/write combinations.
So move the mask check after depth/stencil write (for early depth test,
could do the same for late depth test but might not be worth it at that
point so just skip it there).
This addresses https://bugs.freedesktop.org/show_bug.cgi?id=41787.
Piglit does not hit this issue because of the simple_shader optimization
in generate_fs_loop() which means we're skipping the mask checks.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This was meant to disable some code which isn't needed when depth/stencil
isn't written. However, there's more code which wouldn't be needed in that
case so having the condition there was just odd (llvm will drop all the code
anyway).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
* We generate a static library for Haiku
Gallium targets as our port system combines
the compiled rendering code into a modular
ar for each module (for example, our port
system combines llvm libsoftpipe.a libllvmpipe.a
into a single ar for the Haiku build system.
I'd like the Gallium hgl target scons build
system to do this some day, however how is
beyond me at the moment. This is a first step.
Set lod/layer related fields of 3DSTATE_DEPTH_BUFFER. Since we always point
to a single level/layer, those fields are always zero and this commit
effectively makes no change.
While at it, make it easier to disable manual slice offset calculation.
The view extent was set to be the same as the depth while it should be set to
the number of layers. It makes a difference for 3D textures.
Also use this as a chance to clean up the code.
No need to emit 3DSTATE_SO_BUFFER and 3DSTATE_SO_DECL_LIST when SO is
disabled. As the implicit flush done by the commands is also gone, emit an
explicit flush.
Most of the work in BeginTransformFeedback is only necessary on Gen6.
We may as well just skip it on Gen7+.
v2: Add an intel->gen == 6 assert.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Now that we have hardware contexts, we don't need to continually
reprogram the GS_SVBI_INDEX registers. They're automatically saved and
restored with the context, so they can just increment over time. We
only need to reset them when starting transform feedback.
There's also no reason to delay until the next drawing operation; we can
just emit the packet immediately. However, this means we must drop the
initialization in brw_invariant_state, as BeginTransformFeedback may
occur before the first drawing in a context.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
EXT_transform_feedback isn't yet supported on Gen4-5, so none of this
query code is actually used. This also means we can remove some of the
surrounding support code.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Failing to get a hardware context now means failing to load the driver,
so this code will never get hit.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
and add assertions to prevent buffer overflow. This fixes corruption
of the si_shader struct.
NOTE: This is a candidate for the 9.1 branch.
[ Cherry-pick of r600g commit da33f9b919 ]
Reviewed-by: Marek Olšák <maraeo@gmail.com>
- recent gdb handles DWARF fine (tested both with version
7.1.90.20100730 from mingw-w64 project, and 7.5-1 from mingw project)
- http://people.freedesktop.org/~jrfonseca/bfdhelp/ was updated to
handle DWARF
- stabs requires ugly hacks to prevent compilation failures
- mixing stabs/dwarf prevents proper backtraces (which is inevitable,
given that the MinGW C runtime is pre-built with DWARF)
For example, without this change I get:
(gdb) bt
#0 _wassert (_Message=0xf925060 L"Num < NumOperands && \"Invalid child # of SDNode!\"",
_File=0xf60b488 L"llvm/include/llvm/CodeGen/SelectionDAGNodes.h", _Line=534)
at ../../../../mingw-w64-crt/misc/wassert.c:51
#1 0x0368996b in _assert (_Message=0x39d7ee4 "Num < NumOperands && \"Invalid child # of SDNode!\"",
_File=0x39d7e94 "llvm/include/llvm/CodeGen/SelectionDAGNodes.h", _Line=534)
at ../../../../mingw-w64-crt/misc/wassert.c:44
#2 0x00000004 in ?? ()
#3 0x00000004 in ?? ()
#4 0x0f60b488 in ?? ()
#5 0x00000000 in ?? ()
While with this change I get:
(gdb) bt
#0 _wassert (_Message=0xfb982e8 L"Num < NumOperands && \"Invalid child # of SDNode!\"",
_File=0xefbcb40 L"llvm/include/llvm/CodeGen/SelectionDAGNodes.h", _Line=534)
at ../../../../mingw-w64-crt/misc/wassert.c:51
#1 0x039c996b in _assert (_Message=0x3d17f24 "Num < NumOperands && \"Invalid child # of SDNode!\"",
_File=0x3d17ed4 "llvm/include/llvm/CodeGen/SelectionDAGNodes.h", _Line=534)
at ../../../../mingw-w64-crt/misc/wassert.c:44
#2 0x033111cc in getOperand (Num=4, this=<optimized out>)
at llvm/include/llvm/CodeGen/SelectionDAGNodes.h:534
#3 getOperand (i=4, this=<optimized out>)
at llvm/include/llvm/CodeGen/SelectionDAGNodes.h:779
#4 llvm::SelectionDAG::getNode (this=0xf00cb08, Opcode=79, DL=..., VT=..., N1=..., N2=...)
at llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp:2859
#5 0x03377b20 in llvm::SelectionDAGBuilder::visitExtractElement (this=0xfb45028, I=...)
at llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp:2803
[...]
Reviewed-by: Brian Paul <brianp@vmware.com>
Emit XY_SRC_COPY_BLT to do the job. Since ETC1 textures cannot be mapped for
reading, as is required by util_copy_resource_region, this fixes copying of
ETC1 textures.
The problem with cp hooks is that when we switch from 3D ring to 2D ring, and
when there are active queries, we will emit 3D commands to 2D ring because
the new-batch hook is called.
This commit introduces the idea of cp owner. When the cp is flushed, or when
another owner takes place, the current owner is notified, giving it a chance
to emit whatever commands there need to be. With this mechanism, we can
resume queries when the 3D pipeline owns the cp, and pause queries when it
loses the cp. Ring switch will just work.
As we still need to know when the cp bo is reallocated, a flush callback is
added.
Now that we have hardware contexts and can use MI_STORE_REGISTER_MEM,
we can use the GPU's pipeline statistics counters rather than going out
of our way to count primitives in software.
Aside from being simpler, this also paves the way for Geometry Shaders,
which can output an arbitrary number of primitives on the GPU. It will
also allow us to use hardware primitive restart when these queries are
in use.
The GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN query is easy: it
corresponds to the SO_NUM_PRIMS_WRITTEN/SO_NUM_PRIMS_WRITTEN0_IVB
counters.
The GL_PRIMITIVES_GENERATED query is trickier. Gen provides several
statistics registers which /almost/ match the semantics required:
- IA_PRIMITIVES_COUNT
The number of primitives fetched by the VF or IA (input assembler).
This undercounts when GS is enabled, as it can output many primitives.
- GS_PRIMITIVES_COUNT
The number of primitives output by the GS. Unfortunately, this
doesn't increment unless the GS unit is actually enabled, and it
usually isn't.
- SO_PRIM_STORAGE_NEEDED*_IVB
The amount of space needed to write primitives output by transform
feedback. These naturally only work when transform feedback is on.
We'd also have to add the counters for all four streams.
- CL_INVOCATION_COUNT
The number of primitives processed by the clipper. This doesn't work
if the GS or SOL throw away primitives for rasterizer discard.
However, it does increment even if the clipper is in REJECT_ALL mode.
Dynamically switching between counters would be painfully complicated,
especially since GS, rasterizer discard, and transform feedback can all
be switched on and off repeatedly during a single query.
The most usable counter is CL_INVOCATION_COUNT. The previous two
patches reworked rasterizer discard support so that all primitives hit
the clipper, making this work.
v2: Occlusion query bug fixes removed and squashed in earlier patches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
This has more of a negative impact than the previous patch, as on Gen6
passing primitives through to the clipper means we actually have to make
the GS thread write them to the URB.
I don't see another good solution though, and rasterizer discard is not
the most common of cases, so hopefully it won't be too terrible.
v2: Add a perf_debug; resolve rebase conflicts on the brw dirty flags;
remove the rasterizer_discard field from brw_gs_prog_key.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net> [v1]
Reviewed-by: Paul Berry <stereotype441@gmail.com>
In order to implement the GL_PRIMITIVES_GENERATED query in a sane
fashion on our hardware, we can't discard primitives until the clipper.
The patch after next explains the rationale.
By setting the clipper to REJECT_ALL mode, all primitives get thrown away,
so rendering is still appropriately disabled.
This may negatively impact performance in the rasterizer discard case,
but it's unclear how much and this hasn't been observed to be a
bottleneck in any application we've looked at. The clipper is the very
next stage in the pipeline, so I don't think it will be terrible.
v2: Add a perf_debug; resolve rebase conflicts on the brw dirty flags.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
We don't currently use the clipper statistics, but we'll soon use
CL_INVOCATIONS_COUNT to implement the GL_PRIMITIVES_GENERATED query.
The number of primitives generated is not supposed to be altered during
operations such as glGenerateMipmap.
Prevents spec/EXT_transform_feedback/generatemipmap prims_generated
from breaking when we start using pipeline statistics registers to
implement the GL_PRIMITIVES_GENERATED query in a few commits.
v2: Use the BRW_NEW_META_IN_PROGRESS flag for correct state handling.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net> [v1]
Reviewed-by: Paul Berry <stereotype441@gmail.com>
This will allow us to disable statistics during meta operations.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Hardware contexts greatly simplify the query object code. The pipeline
statistics counters get saved and restored with the context, which means
that we don't need to worry about other workloads polluting them.
This means that we can simply write a single pair of values (one at
BeginQuery and one at EndQuery) rather than a series of pairs. This
also means we don't need to worry about the BO getting full. We also
don't need to delay BO allocation and starting snapshot until the first
draw.
The generation split here is a little off: technically, Ironlake can also
support hardware contexts. However, the kernel currently doesn't, and
even if it were to do so someday, we'd need to wait a while before
bumping the kernel requirement to take advantage of it.
v2: Incorporate Paul's feedback.
- Clarify which functions are Gen4/5-only via assertions and comments.
- Change how driver hook initialization happens.
- Update comments.
- Squash a bug fix from a later commit here where it belongs.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net> [v1]
Acked-by: Paul Berry <stereotype441@gmail.com>
BLORP is used for operations like glClear, glCopyTexImage, and
glBlitFramebuffer which aren't supposed to contribute fragments toward
occlusion queries.
This prevents Piglit tests from breaking in the next commit.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Hardware contexts are necessary to reasonably support OpenGL 3.2.
In particular, we currently maintain software counters for transform
feedback buffer offsets and counters, which relies on knowing the number
of primitives generated. Geometry shaders violate that assumption.
At the time of writing, Debian has moved to Kernel 3.8, which means most
people probably have a newer kernel by now. It's also worth noting that
this patch won't land until Mesa 10 which is currently targeted for
September. By that point, even more people will have a newer kernel.
Also, don't bother trying to allocate contexts on pre-Gen6, as it
currently will always fail, and if this changes in the future, we'll
need to reevaluate our hw_ctx/gen checks.
This patch leaves the code for flagging BRW_NEW_CONTEXT on new
batchbuffers if hw_ctx == NULL since that still occurs pre-Gen6.
Also remove the Gen7+ check for kernel 3.3, since it's now redundant.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Kernel 3.3 introduced the SOL reset execbuf parameter, needed for GL 3.0
on Ivybridge. Bumping the requirement will give an obvious error
message rather than simply reporting GL 2.1.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
brw_link_shader() unconditionally calls lower_vector_insert() with true
as the second parameter. This means that both constant and variable
indexed expressions will get lowered, so we should never see this in the
backend.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
do_vec_index_to_swizzle() should remove any vector extract operations
with a constant index. It's unconditionally called from
do_common_optimization().
do_vec_index_to_cond_assign() should remove the rest, and it is
unconditionally called from brw_link_shader(). This means that we
should never see ir_binop_vector_extract in the backend.
Silences compiler warnings.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Now that we can handle it both for sampling and as depth/stencil enable it.
Passes nearly all additional piglit tests which are now performed, with two
exceptions (one being a framebuffer blit which fails for all other formats
including stencil too as we don't support stencil blits, the other reporting
a unexpected GL error so doesn't look to be llvmpipe's fault).
We need to split up the depth and stencil values in this case, and there's
some new logic required to handle float depth and stencil simultaneously.
Also make sure we get the 64bit zs clear values and masks propagated
correctly.
We do rendering to linear color buffers for quite some time, and since
switching to linear depth buffers all the tiled/linear logic was unused.
So get rid of (most) of it - there's still some LAYOUT_NONE things and
late allocation of resources which probably could be simplified.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The code avoided first_layer parameter in the sampler interface (and needing
to do another calculation at runtime) by fixing up the base texture pointer
instead. Unfortunately, this didn't actually work as we have mip-first
texture layout so fixing up the base ptr by a fixed amount is very wrong if
there are mipmaps present. The wrong offsets caused misrendering and crashes.
Fix this by just adjusting the individual mip level offsets instead.
Spotted by Jose.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Since we can only sample either depth or stencil but not both only load
the required bits which makes things a bit easier (it requires special
handling since the format doesn't fit into 32bit).
The logic for deciding if depth or stencil should be sampled is a bit odd,
but seems to be what other drivers and statetrackers do: if it's a format with
both depth and stencil (or just with depth) then sample depth, for sampling
stencil a sampler view format with only stencil is required.
Also while here fix up stencil sampling for other formats as well, though
this isn't supported by mesa (ARB_stencil_texturing), and while blits would
use it they don't work neither since they'd also need stencil export.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
I don't know what this code was trying to do but whatever it was it couldn't
have worked since negation of integer boolean inputs while not specified as
outright illegal (not yet at least) won't do anything since it doesn't affect
the result of comparison with zero at all. In fact it looks like the whole
instruction can just be omitted.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Now that the rb has a reference to the teximage, we didn't need anything
else out of the attachment.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We keep having to pass the attachments around with our gl_renderbuffers
because that's the only way to find what the gl_renderbuffer actually
refers to. This is a step toward removing that (though drivers still need
the Zoffset as well).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is the opportunity that radeon and intel drivers rely on for flushing
render targets that may get reused as textures. Before EGL, that only
happened for GL_TEXTURE attachments.
Fixes piglits:
KHR_gl_renderbuffer_image/renderbuffer-texture
OES_EGL_image/renderbuffer-texture
NOTE: This is a candidate for the 9.1 branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This change was meant as a stepping stone to use PMADDUBSW SSSE3
instruction, but actually this refactoring by itself yields a 10%
speedup on texture intensive shaders (e.g, Google Earth's ocean water
w/o S3TC on a Ivy Bridge machine), while giving yielding exactly the
same results, whereas PMADDUBSW only gave an extra 5%, at the expense of
2bits of precision in the interpolation.
I belive that the speedup of this change comes from the reduced register
pressure (as 8.8 fixed point intermediates take twice the space of 8bit
unorm).
Also, not dealing with 8.8 simplifies lp_bld_sample_aos.c code
substantially -- it's no longer necessary to have code duplicated for
low and high register halfs.
Note about lp_build_sample_mipmap(): the path for num_quads > 1 is never
executed (as it is faster on AVX to split the 256bit wide texture
computation into two 128bit chunks, in order to leverage integer
opcodes). This path might be useful in the future, so in order to
verify this change did not break that path I had to apply this change:
@@ -1662,11 +1662,11 @@ lp_build_sample_soa(struct gallivm_state *gallivm,
/*
* we only try 8-wide sampling with soa as it appears to
* be a loss with aos with AVX (but it should work).
* (It should be faster if we'd support avx2)
*/
- if (num_quads == 1 || !use_aos) {
+ if (/* num_quads == 1 || ! */ use_aos) {
if (num_quads > 1) {
if (mip_filter == PIPE_TEX_MIPFILTER_NONE) {
LLVMValueRef index0 = lp_build_const_int32(gallivm, 0);
/*
and then run texfilt mesademo:
LP_NATIVE_VECTOR_WIDTH=256 ./texfilt
Ran whole piglit without regressions.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The SROA and function inliner passes are espically important, because
they optimize away unsupported features: functions and indirect
private memory access.
When an application is using PBOs, we attempt to use the BLT engine to
perform ReadPixels. If that fails due to some restrictions, it's useful
to raise a performance warning.
In the non-PBO case, we always use a CPU mapping since getting the data
into client memory requires a CPU-side copy. This is a very common case,
so raising a performance warning is annoying. In particular, apitrace's
image dumping code hits this path, causing it to print hundreds of
thousands of performance warnings via ARB_debug_output. This tends to
obscure actual errors or other important messages.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
When intel_finalize_mipmap_tree() calls intel_miptree_copy_teximage()
to reassemble a depth miptree that has been broken apart into pieces
(to deal with misalignment of levels/layers within the miptree), it
just copies the depth data, not the HiZ data. This is reasonable,
since the alignment restrictions of HiZ are a large part of the reason
why the miptree had to be broken apart in the first place. However,
in order for the depth copy to be sufficient, we need to do a depth
resolve first, to make sure any deferred depth writes that are in the
HiZ buffer get performed.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=64662 and
https://bugs.freedesktop.org/show_bug.cgi?id=64659.
NOTE: This is a candidate for stable release branches.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Whether HiZ is enalbed or not, separate stencil is supported and enforced on
GEN7+. Now that we support separate stencil resources, we know how to emit
3DSTATE_STENCIL_BUFFER.
For allocations, we need to support stencil-only and separate stencil
resources. For mapping, we need to support software tiling and
packing/unpacking for separate stencil resources.
There were 2 issues with it:
1) The texture format which should be used for texturing was only set
in gl_texture_image::TexFormat, which wasn't used for sampler views.
2) Textures are sometimes reallocated under some circumstances
in st_finalize_texture, which is unacceptable if the texture comes
from a window system.
The issues are resolved as follows:
1) If surface_based is true (texture_from_pixmap, etc.), store the format
in a new variable st_texture_object::surface_format.
2) Don't reallocate a surface-based texture in st_finalize_texture.
Also don't use st_ChooseTextureFormat is st_context_teximage, because
the format is dictated by the caller.
This fixes the glx-tfp piglit test.
Reviewed-by: Adam Jackson <ajax@redhat.com>
This fixes and enables texturing with compressed MSAA colorbuffers
on Evergreen and Cayman. For the first time, multisample textures work
on Cayman.
This requires the libdrm flag RADEON_SURF_FMASK.
v2: require libdrm_radeon 2.4.45
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
This should have no change on driver operation, but it means that when you
wonder why some format isn't supported natively, you can just look at the
table above, instead of wondering if maybe there's an appropriate entry in
the surface formats table that is already supported.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously we would expand it to RGBA_FLOAT16. This format now comes out
as framebuffer incomplete, but it seems worth the memory savings if that's
what people are asking for (and GL3 does list it under "texture-only"
color formats)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The next commit introduces what is apparently our first one, which tripped
over this in glReadPixels.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is a step on the way to removing some of our code for forcing alpha
to 1, but I want easy bisecting so I'll add groups of formats separately.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
All drivers now clamp this to the appropriate range for the bound
stencil buffer when emitting stencil state.
NOTE: This is a candidate for stable branches.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Clamps the stencil reference value to the range representable in the
currently-bound draw framebuffer's stencil attachment.
V2: Add spec quote.
NOTE: This is a candidate for stable branches.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Always check if a bo is busy in choose_transfer_method() since we always need
to map it in either map() or unmap(). Also determine how a bo is mapped in
choose_transfer_method().
The indices are not consecutive when using the geometry shader,
which means we were extracting non existing values. Create
an array of linear indices and always use it instead of the passed
indices. Found by Jose.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Pass in the size of the index buffer, when available, and use it
to handle out of bounds conditions. The behavior in the case of
an overflow needs to be the same as with other overflows in the
vertex processing pipeline meaning that a vertex should still
be generated but all attributes in it set to zero.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
the number of vertices to fetch doesn't necessarily equal the
total number of input vertices, e.g. we might want to fetch
a single vertex but then draw it twice. Lets use the correct
number of input vertices in the statistics.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
We would crash when stride was bigger than the size of the buffer.
The correct behavior is to just fetch zero's in this case.
Unfortunatly with user_buffer's there's no way to validate the size
because currently we're just not getting it. Adjust the draw interface
to pass the size along the mapped buffer, which works perfectly
for buffer backed vertex_buffers and, in future, it will allow
us to plumb user_buffer sizes through the same interface.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The support is analogous to the way we handle indirect addressing
in temporaries, except that we don't have to worry about storing
(after declarations) and thus we'll able to keep using the old
code when indirect addressing isn't used. In other words we're
still using constants directly, unless the instruction has
immediate register with indirect addressing.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Static initialization of internal libstdc++ data related to iostream
causes segfaults with some apps.
This patch replaces all uses of std::ostream and std::ostringstream in sb
with custom lightweight classes.
Prevents segfaults with ut2004demo and probably some other old apps.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Parsing and ir construction is required for optimization only,
it's unnecessary if we only need to print shader dump.
This should make new disassembler more tolerant to any new
features in the bytecode.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Resource mapping is distinct from resource allocation, and is going to get
more and more complex. Move the related functions to a new file to make the
separation clear.
Mesa's extension table incorrectly lists this GL_OES_texture_npot as
ES2-only. It's also an ES1 extension. This patch adds ES1 to the
extensions API mask.
From the GL_OES_texture_npot spec:
OpenGL ES 1.0 or OpenGL ES 2.0 is required. This extension is
written against OpenGL ES 1.1.12 and OpenGL ES 2.0.25.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now that all the places that used to generate array derefeneces of
vectors have been changed to generate either ir_binop_vector_extract or
ir_triop_vector_insert (or both), remove all support for dealing with
this deprecated construct.
As an added safeguard, modify ir_validate to reject ir_dereference_array
of a vector.
v2: Convert tabs to spaces. Suggested by Eric.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Like with type conversions on out parameters, some extra copies need to
occur to handle these cases. The fundamental problem is that
ir_binop_vector_extract is not an lvalue, but out and inout parameters
must be lvalues. A previous patch delt with a similar problem in the
LHS of ir_assignment.
v2: Convert tabs to spaces. Suggested by Eric.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Variable indexing into vectors using ir_dereference_array is being
removed, so this lowering pass has to generate something different.
v2: Convert tabs to spaces. Suggested by Eric.
v3: Simplify code slightly by assuming that elements of
gl_ClipDistanceMESA will always be vec4. Suggested by Paul.
v4: Fairly substantial rewrite based on the rewrite of "glsl: Convert
lower_clip_distance_visitor to be an ir_rvalue_visitor"
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
ir_call was changed long ago to be a statement rather than an
expression. That makes this comment no longer valid.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Right now the lower_clip_distance_visitor lowers variable indexing into
gl_ClipDistance into variable indexing into both the array
gl_ClipDistanceMESA and the vectors of that array. For example,
gl_ClipDistance[i] = f;
becomes
gl_ClipDistanceMESA[i >> 2][i & 3] = f;
However, variable indexing into vectors using ir_dereference_array is
being removed. Instead, ir_expression with ir_triop_vector_insert will
be used. The above code will become
gl_ClipDistanceMESA[i >> 2] =
vector_insert(gl_ClipDistanceMESA[i >> 2], i & 3, f);
In order to do this, an ir_rvalue_visitor will need to be used. This
commit is really just a refactor to get ready for that.
v4: Split the least amount of refactor from the rest of the code
changes.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Now ir_dereference_array of a vector will never occur in the RHS of an
expression.
v2: Add back the { } around the if-statement body to make it more
readable. Suggested by Eric.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The ast_array_index code can't know whether to generate an
ir_binop_vector_extract or an ir_triop_vector_insert. Instead it will
always generate ir_binop_vector_extract, and the LHS and RHS have to be
re-written.
v2: Convert tabs to spaces. Suggested by Eric.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This will eventually replace do_vec_index_to_cond_assign. This lowering
pass is called in all the places where do_vec_index_to_cond_assign or
do_vec_index_to_swizzle is called.
v2: Use WRITEMASK_* instead of integer literals. Use a more concise
method of generating broadcast_index. Both suggested by Eric.
v3: Use a series of scalar compares instead of a single vector compare.
Suggested by Eric and Ken. It still uses 'if (cond) v.x = y;' instead
of conditional assignments because ir_builder doesn't do conditional
assignments, and I'd rather keep the code simple.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Lower ir_binop_vector_extract with a non-constant index to a series of
conditional moves. This is exactly like ir_dereference_array of a
vector with a non-constant index.
v2: Convert tabs to spaces. Suggested by Eric.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Lower ir_binop_vector_extract with a constant index to a swizzle. This
is exactly like ir_dereference_array of a vector with a constant index.
v2: Convert tabs to spaces. Suggested by Eric.
v3: Correctly call convert_vector_extract_to_swizzle in
ir_vec_index_to_swizzle_visitor::visit_enter(ir_call *ir). Suggested by
Ken.
v4: Use CLAMP instead of MIN2(MAX2()). Suggested by Ken.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Use a first function that extract the vector being indexed and the index
from the deref. Call the second function that does the real work.
Coming patches will add a new ir_expression for variable indexing into a
vector. Having the lowering pass split into two functions will make it
much easier to lower the new ir_expression.
v2: Convert tabs to spaces. Suggested by Eric.
v3: Move some bits from a later patch back to this patch so that it
actually compiles. Suggested by Ken.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The new opcode is used to generate a new vector with a single field from
the source vector replaced. This will eventually replace
ir_dereference_array of vectors in the LHS of assignments.
v2: Convert tabs to spaces. Suggested by Eric.
v3: Add constant expression handling for ir_triop_vector_insert. This
prevents the constant matrix inversion tests from regressing. Duh.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The new opcode is used to get a single field from a vector. The field
index may not be constant. This will eventually replace
ir_dereference_array of vectors. This is similar to the extractelement
instruction in LLVM IR.
http://llvm.org/docs/LangRef.html#extractelement-instruction
v2: Convert tabs to spaces. Suggested by Eric.
v3: Add array index range checking to ir_binop_vector_extract constant
expression handling. Suggested by Ken.
v4: Use CLAMP instead of MIN2(MAX2()). Suggested by Ken.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Commit b765740 (glsl: Pass struct shader_compiler_options into
do_common_optimization.) added a new parameter to
do_common_optimization() but didn't update test_optpass.cpp, causing
"make check" to break.
This patch makes the proper updates to test_optpass.cpp so that the
build succeeds again.
This pass flips (matrix * vector) operations to (vector *
matrixTranspose) for certain built-in matrices (currently
gl_ModelViewProjectionMatrix and gl_TextureMatrix).
This is equivalent, but results in dot products rather than multiplies
and adds. On some hardware, this is more efficient.
This pass is conditionalized on ctx->mvp_with_dp4, the flag drivers set
to indicate they prefer dot products.
Improves performance in Lightsmark by 1.01131% +/- 0.162069% (n = 10)
on a Haswell GT2 system. Passes Piglit on Ivybridge.
v2: Use struct gl_shader_compiler_options instead of plumbing through
another boolean flag for this purpose.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Doing matrix multiplies with DP4s is fewer instructions than MUL/ADD,
especially since we don't support MAD in the vertex shader.
Not observed to improve performance in any fixed function applications,
but is useful for the next patch.
I've left this unset for the fragment shader because the scalar backend
can't use DP4 and does have MAD support.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This flag essentially tells the compiler whether it prefers
dot products or multiply/adds for matrix operations. As such,
ShaderCompilerOptions seems like the right place for it.
This also lets us specify it on a per-stage basis. This patch makes all
existing users set the flag for the Vertex Shader stage only, as it's
currently only used for fixed-function vertex programs. That will
change soon, and I wanted to preserve the existing behavior.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
do_common_optimization may need to make choices about whether to emit
certain kinds of instructions. gl_context::ShaderCompilerOptions
contains exactly that information, so it makes sense to pass it in.
Rather than passing the whole array, pass the structure for the stage
that's currently being worked on.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
We can't include shaderobj.h from the standalone utilities, so we
unfortunately have to copy this function.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Marek added these new formats in commit f9fa725690, but
without comments relating to the packing. Sometimes the naming is
confusing, so these comments are helpful in determining whether two
formats are compatible.
The new comments are based on my reading of format_unpack.c.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
- don't reference a buffer for a local variable
(that's never useful unless it can be the only reference to the buffer)
- check if the buffer is not NULL
- set buffer_size as specified with BindBufferRange
NOTE: This is a candidate for the 9.1 branch.
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
v2: move the flagging from intel_bufferobj_data to intel_bufferobj_alloc_buffer
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
If Const.CheckArrayBounds is false, the only code using _MaxElement is
glDrawRangeElements, so I changed it and explained in the code why
_MaxElement is not very useful there.
BTW, the big magic number was copied to the letter
from _mesa_update_array_max_element.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
The limits should not be different and OpenGL requires both to be at least 32,
which is also the maximum limit on radeon.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Const.MaxTextureImageUnits -> Const.FragmentProgram.MaxTextureImageUnits
Const.MaxVertexTextureImageUnits -> Const.VertexProgram.MaxTextureImageUnits
etc.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Shaders are unified on most hardware (= same limits in all stages).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
We were not allowed to say the "GT3" name, but we really needed to
have the PCI IDs because too many people had such machines, so we had
to make the GT3 machines work as GT2.
Let's just say that GT2_PLUS was a short for GT2_PLUS_1 :)
NOTE: This is a candidate for stable branches.
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Haswell's GT3 variant offers 32kB of URB space for push constants, while
GT1 and GT2 match Ivybridge, providing 16kB. Update the code to reserve
the full 32kB on GT3.
v2: Specify push constant size correctly. I thought GT3 reinterpreted
the value as multiples of 2kB, but it doesn't. You simply have to
program an even number.
NOTE: This is a candidate for stable branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
This is the same change as the previous commit to the FS. A very few VSes
are regressed by 1 or 2 instructions, which look recoverable with a bit
more dead code elimination.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously, we would sometimes not consider a write to a register to
extend the end of the interval, nor would we consider a read before a
write to extend the start. This made for a bunch of complicated logic
related to how to treat the results when dead code might be present.
Instead, just extend the interval and fix dead code elimination to know
how to remove it.
Interestingly, this actually results in a tiny bit more optimization:
total instructions in shared programs: 1391220 -> 1390799 (-0.03%)
instructions in affected programs: 14037 -> 13616 (-3.00%)
v2: Fix a theoretical problem with the simd16 workaround if dst == src,
where we would revert the bump of the live range.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
Now tells Gallium that ilo supports primitive restart.
Updated ilo_draw_vbo to be able to check that the indexed
primitive being rendered can actually be supported in HW. If not,
will break up into individual prims similar to what Mesa does.
[olv: a minor fix after rebasing and formatting]
Before, if we failed to allocate the index buffer we'd silently
return from st_draw_vbo() without drawing anything. We should
raise GL_OUT_OF_MEMORY to give some indication that something went
wrong.
Note: This is a candidate for the stable branches.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
It helps a bit with vertex shader performance on i915g
(a couple percent faster with openarena).
I have tried most other passes, and they weren't showing
any measurable improvement. Note that my vertex shaders
didn't have loops, so maybe the loop optimizations could
still be useful in the future.
Reviewed-by: Brian Paul <brianp@vmware.com>
Unfortunately the surface formats table is now splattered across multiple
chapters. All surface format enums from brw_defines.h are present, but
only support for them that is mentioned in the public specs is included
here.
v2 (from Ken): Mark R32G32B32A32_SFIXED as unsupported on Ivybridge.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Do not propagate a copy if source and destination are identical.
Otherwise code like
MOV TEMP[0].xyzw, TEMP[0].wzyx
MOV TEMP[1].xyzw, TEMP[0].xyzw
is changed to
MOV TEMP[0].xyzw, TEMP[0].wzyx
MOV TEMP[1].xyzw, TEMP[0].wzyx
This fixes Piglit test shaders/glsl-copy-propagation-self-2 for drivers that
use Mesa IR.
NOTE: This is a candidate for the stable branches.
Signed-off-by: Fabian Bieler <fabianbieler@fastmail.fm>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Do not propagate a copy if source and destination are identical.
Otherwise code like
MOV TEMP[0].xyzw, TEMP[0].wzyx
MOV TEMP[1].xyzw, TEMP[0].xyzw
is changed to
MOV TEMP[0].xyzw, TEMP[0].wzyx
MOV TEMP[1].xyzw, TEMP[0].wzyx
This fixes Piglit test shaders/glsl-copy-propagation-self-2 for gallium drivers.
NOTE: This is a candidate for the stable branches.
Signed-off-by: Fabian Bieler <fabianbieler@fastmail.fm>
Reviewed-by: Brian Paul <brianp@vmware.com>
The constant packets for gen6 are too small for gen7, and while IVB seems
happy with them HSW blows up. Fix it by emitting the correct packets on
gen7, for all stages.
v2: Include the packets instead of just skipping them.
NOTE: This is a candidate for the stable branches.
Reviewed-and-tested-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Emit EGL_BAD_CONTEXT if the user passes a context to
eglCreateImageKHR(type=EGL_ANDROID_image_native_buffer).
From the EGL_ANDROID_image_native_buffer spec:
* If <target> is EGL_NATIVE_BUFFER_ANDROID and <ctx> is not
EGL_NO_CONTEXT, the error EGL_BAD_CONTEXT is generated.
Note: This is a candidate for the stable branches.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
This basically reverts commit
2acc719374.
With the previous change, we're not batchbuffer limited any
longer. So we actually start seeing a performance difference
between X and Y tiling. X tiling is funny because it is
faster for screen-aligned quads but slower in games. So let's
use Y tiling which is 10% faster overall.
Now that we don't throttle at every batchbuffer, we can shrink
the size of batchbuffers to achieve early flushing. This gives
a significant speed boost in a lot of games (on the order of
20%).
It should be TGSI_TYPE_UNSIGNED, not TGSI_TYPE_FLOAT.
Fixed also gallivm not_emit_cpu() to use uint build context.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Acked-by: Roland Scheidegger <sroland@vmware.com>
Move the body of tgsi_opcode_infer_dst_type() to a new helper function,
tgsi_opcode_infer_type(), and call the helper function from
tgsi_opcode_infer_dst_type(). The diff looks complicated simply because the
code is moved around.
A following commit will make tgsi_opcode_infer_src_type() call
tgsi_opcode_infer_type().
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Acked-by: Roland Scheidegger <sroland@vmware.com>
Reorder opcodes by their assigned numbers. This makes it easier to see the
differences between tgsi_opcode_infer_src_type() and
tgsi_opcode_infer_dst_type().
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Acked-by: Roland Scheidegger <sroland@vmware.com>
Make use of tgsi_util_get_texture_coord_dim() to replace the big switch table.
There is a subtle difference with this change. When TXP is used with an array
texture, the layer is now also projected. This behavior matches the TGSI doc.
Since GLSL does not allow TXP on an array texture, I am not sure which
behavior is correct or preferred.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Acked-by: Roland Scheidegger <sroland@vmware.com>
This util function returns the dimension of the texture coordinates for a
texture target, and the location of the shadow reference value.
For example, when the texture target is TGSI_TEXTURE_SHADOW2D, the dimension
of the texture coordinates is 2, and the location of the ref value is 2
(that is, the Z channel).
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Acked-by: Roland Scheidegger <sroland@vmware.com>
Adds the remaining integer opcodes, and some opcodes are moved to more
appropriate places, along with getting rid of the (already nearly empty)
ps_2_x section. Though the CAP bits for some of these are still a bit in
the air so the documentation isn't quite as watertight as is desirable.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
We can replace CNDxx with MOV (and possibly eliminate after
propagation) in following cases:
If src1 is equal to src2 in CNDxx instruction then the result doesn't
depend on condition and we can replace the instruction with
"MOV dst, src1".
If src0 is const then we can evaluate the condition at compile time and
also replace it with MOV.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Use the same limit for kcache constants in alu group on r6xx as on other
chips (two const pairs). Relaxing this will require additional checks to
make sure that all 4 consts in the group come from 2 kcache sets (clause
limit), probably without noticeable improvements of shader performance.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Everyone was doing effectively the same thing, except for some funky code
reuse in Intel, and swrast mistakenly recomputing _BaseFormat instead of
using the texture's _BaseFormat. swrast's sRGB handling is left in place,
though it should be done by using _mesa_get_render_format() at render time
instead (as-is, it will miss updates to GL_FRAMEBUFFER_SRGB).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We're looking for the logical width of our level, which is what
image->Width2/Height2 is. The previous code relied on MSAA textures being
only level 0.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
There was some comment about trying to avoid marking resolves in
updownsample, but if the downsample is never actually rendered to, then
the required resolve tracked in the downsample will never be executed, so
who cares?
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Only lower bitfieldInsert to BFM+BFI (and don't lower
bitfieldExtract at all) since three-source instructions are now
usable in the vertex shader.
v3: Lower bitfield_insert in the same pass with everything else, since
it doesn't produce any instructions to be lowered (the other two
lowering passes that were in a previous iteration of this series
emitted subtractions which needed to be lowered).
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> [v2]
v2: Rebase on LRP addition.
Use fix_3src_operand() when emitting BFE and BFI2.
Add BFE and BFI2 to is_3src_inst check in
brw_vec4_copy_propagation.cpp.
Subtract result of FBH from 31 (unless an error) to convert
MSB counts to LSB counts
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Don't bother scalarizing ir_binop_bfm, since its results are
identical for all channels.
v2: Subtract result of FBH from 31 (unless an error) to convert
MSB counts to LSB counts.
v3: Use op0->clone() in ir_triop_bfi to prevent (var_ref
channel_expressions) from appearing multiple times in the IR.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> [v2]
Specifically
bfe - for bitfieldExtract()
bfi1 and bfi2 - for bitfieldInsert()
bfrev - for bitfieldReverse()
cbit - for bitCount()
fbh - for findMSB()
fbl - for findLSB()
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Also update asserts to allow BFE and BFI2, which take (unsigned)
doubleword arguments.
v2: Allow BRW_REGISTER_TYPE_UD for src1 and src2 as well.
Assert that src2.type (instead of src0.type) matches dest.type since
it's the primary argument and src0 and src1 might correctly have
different types.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> [v1]
i965/Gen7+ and Radeon/Evergreen+ have bfm/bfi instructions to implement
bitfieldInsert() from ARB_gpu_shader5.
v2: Add ir_binop_bfm and ir_triop_bfi to st_glsl_to_tgsi.cpp.
Remove spurious temporary assignment and dereference.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
v2: Order bits from LSB end (31 - count) for ir_unop_find_msb.
v3: Add ir_triop_bitfield_extract as an exception to the op[0]->type ==
op[1]->type assertion in ir_constant_expression.cpp.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> [v2]
This library is very small, so there is not much to gain from building
it as a shared library. Also, when linking statically with LLVM, a
shared libradeonllvm exports LLVM symbols and creates problems when
used with other shared objects that also link statically to LLVM.
Reviewed-by: Mathias.Froehlich@web.de
The LLVM C API is considered stable and should never change, so it
is much more desirable to use than the LLVM C++ API, which is constantly in
flux.
v2:
- Split target initialization and lookup into separate functions
Reviewed-by: Mathias.Froehlich@web.de
This does not solve all of the problems with using LLVM in a
multithreaded enivronment, but it should help in some cases.
Reviewed-by: Mathias.Froehlich@web.de
This leads to crashes when multiple threads try to compile compute
shaders in the same time.
Fixes a crash in bfgminer when using more than one thread.
Of the 3 controls in the extension, one was kept in GL core and the other
two were explicitly deprecated and the reasonable default behavior was
encoded in the spec. By not exposing the extension, we avoid shader
recompiles when switching between float and unorm color buffers.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This cleans up some funny-looking code in some unigine shaders I was
looking at. Also slightly helps on planeshift and a few shaders in an
upcoming Valve release.
total instructions in shared programs: 1653715 -> 1653587 (-0.01%)
instructions in affected programs: 16550 -> 16422 (-0.77%)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
It should be unsigned, not enum pipe_flush_flags.
Fixed a build error:
src/gallium/state_trackers/egl/android/native_android.cpp:426:29: error:
invalid conversion from 'int' to 'pipe_flush_flags' [-fpermissive]
v2: replace all occurrences of enum pipe_flush_flags by unsigned
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
[olv: document the parameter now that the type is unsigned]
Improves glb2.7 performance at a misaligned size by 2.3% +/- 0.7% (n=11).
The workaround was to avoid bad primitive/surface sizes, but that's worked
around as of a14dc4f92c. (One might note
that pre-gen7 we don't know that the right half of an 8x4 at the right
edge is actually our pixels, but we're already clobbering those pixels for
depth resolves anyway and more work would be required to avoid that).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
A surprising number of apps and benchmarks have poor code like this:
glBegin(GL_LINE_STRIP);
glVertex(v1);
glVertex(v2);
glEnd();
// Possibly some no-op state changes here
glBegin(GL_LINE_STRIP);
glVertex(v3);
glVertex(v4);
glEnd();
// repeat many, many times.
The above sequence can be converted into:
glBegin(GL_LINES);
glVertex(v1);
glVertex(v2);
glVertex(v3);
glVertex(v4);
glEnd();
Similarly for GL_POINTS, GL_TRIANGLES, etc.
Merging was already implemented for GL_QUADS in the display list code.
Now other prim types are handled and it's also done for immediate mode.
In one case:
before after
-----------------------------------------------
number of st_draw_vbo() calls: 141 45
number of _mesa_prims issued: 7520 632
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
gallium lies. buffer_size is not actually buffer_size but available
size, which is 'buffer_size - buffer_offset' so by adding buffer
offset we'd incorrectly compute overflow.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
In ureg src registers could have an indirect register that was
either a temp or an addr register, while dst registers allowed
only addr. That made moving between them a little difficult so
make them behave the same way and allow temp's and addr registers
as indirect files for both (tgsi supports it, just ureg didn't).
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
A lot of them were missing. Others were moved from the Compute ISA
to a new Integer ISA section as that seemed more appropriate.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Eliminating this we no longer need to copy between linear and swizzled layout.
This is probably not quite ideal since it's a bit more work for now, could do
some optimizations by moving depth testing outside the fragment shader loop
(but tricky for early depth test as we don't have neither the mask nor the
interpolated z in the right order handy).
The large amount of tile/untile code is no longer needed will be deleted
in next commit.
No piglit regressions.
v2: change a forgotten LAYOUT_NONE to LAYOUT_LINEAR.
v3: fix (bogus) uninitialized variable warnings, add comments, fix a bad type
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Assigning a struct only copies the members - any padding is left as is.
Thus this code:
struct foo_t foo;
foo = bar;
leaves the padding of foo intact, ie uninitialized random garbage.
This patch fixes constant shader recompiles by initializing the struct
to zero. For completeness, memcpy is used to copy the key to the shader
struct.
NOTE: This is a candidate for the stable branches.
Signed-off-by: Lauri Kasanen <cand@gmx.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Signed-off-by: Andreas Boll <andreas.boll.dev@gmail.com>
One build system for linux/unix only drivers should be enough.
Additionally the nouveau target was disabled anyway.
Acked-by: Jose Fonseca <jfonseca@vmware.com>
post_scheduler clears interference set for reallocatable values when
the value becomes live first time, and then updates it to take into
account modified order of operations, but this was not handled properly
if the value appears first time as a source in copy operation.
Fixes issues with webgl demo: http://madebyevan.com/webgl-water/
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
New disassembler is not completely isolated yet from further processing
in r600g/sb that is not required for printing the dump, so it has higher
probability to fail in case of any unexpected features in the bytecode.
This patch adds "sbdisasm" flag for R600_DEBUG that allows to use new
disassembler in r600g/sb for shader dumps when shader optimization
is not enabled.
If shader optimization is enabled, new disassembler is used by default.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Mesa build is too complex to rely on successful builds. On refactorings
it is always a good idea to use git grep to prevent missing cases:
$ git grep u_assembled_primitive
src/gallium/auxiliary/draw/draw_pt_fetch_shade_pipeline_llvm.c: u_assembled_primitive(in_prim);
The differences from the previous releases that affect st/egl are
- logging macros are prefixed with an 'A'
- dequeueBuffer() and enqueueBuffer() require an additoinal argument for
fence fd, acquired from libsync
Additionally, include gralloc_drm.h with extern "C".
The function returns the number of reduced/tessellated primitives for the
given vertex count.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Acked-by: Zack Rusin <zackr@vmware.com>
Switch to '>=' for comparisons, and it becomes obvious that the comparison for
PIPE_PRIM_QUAD_STRIP was wrong.
Add minimum vertex count check for PIPE_PRIM_LINE_LOOP. Return 1 for
PIPE_PRIM_POLYGON with 3 vertices.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Acked-by: Zack Rusin <zackr@vmware.com>
As a side effect, primitives with adjacency are now correctly validated.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Acked-by: Zack Rusin <zackr@vmware.com>
Move together (or add) functions to decompose/reduce/assemble a primitive,
give them consistent names, and document them. Add u_prim_vertex_count() so
that the vertex count information can be used elsewhere.
u_assembled_primitive() will be removed in a folow-on commit.
[olv: fix a warning when -Wold-style-declaration is enabled]
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Acked-by: Zack Rusin <zackr@vmware.com>
While this is ignorant of dependency control, it's still good for a 0.39%
+/- 0.08% performance improvement on GLBenchmark 2.7 (n=548)
v2: Rewrite as a subclass of the base class for the FS instruction
scheduler, inheriting the same latency information.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
These will get virtualized as we add VS scheduling support.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
I need this so I can look at vec4 and fs registers' files from the same
.cpp file without namespaces. As far as I can tell we never rely on the
particular numerical values of the files, though I thought it sounded like
a good idea when doing the VS (it turns out having 0 be BAD_FILE is nicer).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This will free instruction scheduling to make better choices. No
statistically significant performance difference on GLB2.7 (n=93).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
instead of crashing just fill zeros at the input slots that don't
match, that's the mandated behavior and it avoids debug asserts.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
It's valid because we reuse certain arithmetic operations
for both signed and unsigned types (e.g. uadd, umad, which
have a bit unfortunate naming)
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The GPU apparently goes looking for constants even though there are no
shader stages enabled, and gets stuck because we haven't told it there are
no constants to collect. If any other user of the 3D pipeline had run
(even the Render accel of the X server!) since power on, then the in-GPU
constant buffers would have been set up with some contents we didn't use,
and we would succeed.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=56416
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Dave Airlie <airlied@redhat.com>
NOTE: This is a candidate for the stable branches.
We are already emitting a EVENT_TYPE_CACHE_FLUSH_AND_INV_EVENT packet
when this flush flag is set, so flushing the dest caches with a
SURFACE_SYNC should not be necessary.
The motivation for this change is that emitting a SURFACE_SYNC packet with
the CB bits set was causing compute shaders to hang on Cayman.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Remove all the glDraw* functions from the GLvertexformat structure.
The point of that dispatch struct is to handle all the functions which
dispatch differently depending on whether we're inside glBegin/End.
glDraw* are never allowed inside glBegin/End so we can remove those
entries.
This simplifies the code paths and gets rid of quite a bit of code.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
If the currently compiled primitive state is PRIM_UNKNOWN we should
not return true from _mesa_inside_dlist_begin_end(). This lets us
simplify the calls to that function.
Note, the call to _mesa_inside_dlist_begin_end() in vbo_save_EndList()
should have probably been checking for PRIM_UNKNOWN too, but it wasn't.
So there's no code change change.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This is set during context creation/initialization. We know we're
not inside glBegin/glEnd at this point so use PRIM_OUTSIDE_BEGIN_END.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The old code didn't make sense. The clause in question did the
same thing as the next else-if clause. If we're already executing
a glBegin/End pair and we're starting a new primitive, that's an
error.
Fixes more failures in piglit gl-1.0-beginend-coverage test.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Functions like glDrawArrays, glDrawElements, etc. are illegal between
glBegin/glEnd and should generate GL_INVALID_OPERATION.
Fixes several piglit gl-1.0-beginend-coverage failures.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The _save_OBE_DrawArrays/Elements/RangeElements() functions are
called when building a display list and we know we're outside
glBegin/End.
We shouldn't call the normal _mesa_validate_DrawArrays/Elements()
functions here because those functions only work properly in immediate
mode or during dlist execution. At dlist compile time, we can't call
_mesa_update_state(), etc. and examine the current state since it won't
apply when the list is executed later.
Fixes several failures in piglit's gl-1.0-beginend-coverage test.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
If we're in GL_COMPILE_AND_EXECUTE mode and inside glBegin, calling
glEndList() should generate an error.
Fixes a failure in piglit's gl-1.0-beginend-coverage test.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The old code was hard to understand and not entirely correct.
Note that PRIM_INSIDE_UNKNOWN_PRIM is no longer set anywhere so
we'll be able to remove that next.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
...in terms of new _mesa_is_valid_prim_mode(). We need a mode validater
function that doesn't depend on current state for the display list code.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
These values pertain to display lists, and the new types of geometry
shader primitives can be used in display lists.
And add new PRIM_MAX constant for follow-on changes.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This removes the test for _mesa_inside_dlist_begin_end().
If ctx->Driver.CurrentSavePrimitive==PRIM_UNKNOWN (the initial value),
_mesa_inside_dlist_begin_end() will, confusingly, return TRUE.
So we didn't set the ctx->ListState.Current.ShadeModel value and it
remained in its indeterminate state.
This didn't effect correctness, but it defeated the intended optimization
of dropping redundant glShadeModel() state changes in order to
coalesce sequences of drawing commands.
Verified with new piglit gl-1.0-dlist-shademodel test.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Without this patch, radeon_uvd failed to find the libdrm includes:
In file included from radeon_uvd.c:48:
../../winsys/radeon/drm/radeon_winsys.h:44:35: error:
libdrm/radeon_surface.h: No such file or directory
Signed-off-by: Lauri Kasanen <cand@gmx.com>
When checking framebuffer completeness, we test each attachment.
We verify that all attachments are consistent in terms of layers.
1. They must all be layered, or all non-layered
2. If they are layered, they must match in depth
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
If glFramebufferTexture is used, then the framebuffer attachment is
layered.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
With glFramebufferTexture, a renderbuffer may support
all layers of the texture, so we need the depth of the
renderbuffer to check for consistency which is required
for framebuffer completeness.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This like the fifth attempt to fix the issue.
Also with the new "validating" flag, we can set recalculate_inputs to FALSE
earlier in vbo_bind_arrays, because _mesa_update_state won't change it.
NOTE: This is a candidate for the stable branches.
v2: fixed a typo
Reviewed-by: Brian Paul <brianp@vmware.com>
The shadow comparitor needs to be loaded into the Z component of the
last DWord.
Fixes es3conform's shadow_execution_vert and oglconform's
shadow-grad advanced.textureGrad.1D tests on Haswell.
NOTE: This is a candidate for stable branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
According to the Ivybridge PRM, Volume 4 Part 1, page 130, in the
section for the sample_d message: "The r coordinate contains the faceid,
and the r gradients are ignored by hardware."
This doesn't match GLSL, which provides gradients for all of the
coordinates. So we would need to do some math to compute the face ID
before using sample_d. We currently don't have any code to do that.
However, we do have a lowering pass that converts textureGrad to
textureLod, which solves this problem. Since textureGrad on three
components is sufficiently obscure, it's not a performance path.
For now, only handle samplerCubeShadow; we need tests for samplerCube
and samplerCubeArray.
Fixes es3conform's shadow_comparison_frag test on Haswell.
NOTE: This is a candidate for stable branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Consider the following shader:
vec4 f(vec4 v) { return v; }
vec4 f(vec4 v);
The prototype exactly matches the signature of the earlier definition,
so there's absolutely no point in it. However, it doesn't appear to
be illegal. The GLSL 4.30 specification offers two relevant quotes:
"If a function name is declared twice with the same parameter types,
then the return types and all qualifiers must also match, and it is the
same function being declared."
"User-defined functions can have multiple declarations, but only one
definition."
In this case the same function was declared twice, and there's only one
definition, which fits both pieces of text. There doesn't appear to be
any text saying late prototypes are illegal, so presumably it's valid.
Unfortunately, it currently triggers an assertion failure:
ir_dereference_variable @ <p1> specifies undeclared variable `v' @ <p2>
When we process the second line, we look for an existing exact match so
we can enforce the one-definition rule. We then leave sig set to that
existing function, and hit sig->replace_parameters(&hir_parameters),
unfortunately nuking our existing definition's parameters (which have
actual dereferences) with the prototype's bogus unused parameters.
Simply bailing out and ignoring such late prototypes is the safest
thing to do.
Fixes Piglit's late-proto.vert as well as 3DMark/Ice Storm for Android.
NOTE: This is a candidate for stable branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Ian Romanick <idr@freedesktop.org>
The three users of GALLIUM_PIPE_LOADER_LIBS (OpenCL, gallium-gbm,
gallium tests) don't appear to need libws_xlib.la.
Tested-by: Tom Stellard <thomas.stellard@amd.com>
Tested-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
It guarded the function prototype of pipe_loader_sw_probe, whose use (in
pipe_loader.c) and definition (in pipe_loader_sw.c) were not guarded.
Both are built into libpipe_loader.la if HAVE_LOADER_GALLIUM, which is
enable_gallium_loader in configure.ac.
Tested-by: Tom Stellard <thomas.stellard@amd.com>
Tested-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The number of samples is already available in the miptree data
structure, so there's no need to pass it in.
I suspect this may fix a subtle bug because in one case
(intel_renderbuffer_update_wrapper) we were always passing zero for
num_samples, even though the buffer in question was not guaranteed to
be single-sampled. But I wasn't able to find a failing test case.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Technically it's legal for geometry shader to not emit any
vertices. It's silly, but perfectly legal, so lets make draw
stop crashing if it happens.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The upside is less CPU overhead in fiddling with GL error handling, the
ability to use the constant color write message in most cases, and no GLSL
clear shaders appearing in MESA_GLSL=dump output. The downside is more
batch flushing and a total recompute of GL state at the end of blorp.
However, if we're ever going to use the fast color clear feature of CMS
surfaces, we'll need this anyway since it requires very special state
setup.
This increases the fail rate of some the GLES3conform ARB_sync tests,
because of the initial flush at the start of blorp. The tests already
intermittently failed (because it's just a bad testing procedure), and we
can return it to its previous fail rate by fixing the initial flush.
Improves GLB2.7 performance 0.37% +/- 0.11% (n=71/70, outlier removed).
v2: Rename the key member, use the core helper for sRGB, and use
BRW_MASK_* enums, fix comment and indentation (review by Paul).
v3: Rewrite a comment, drop a silly temporary variable (review by Ken)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: const-qualify ctx, and add a comment about the function (recommended
by Brian and Kenneth).
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
Improves GLB2.7 performance 0.13% +/- 0.09% (n=104/105, outliers removed).
More importantly, once color glClear()s are done through blorp in the next
commit, this reduces regression in GLES3 conformance tests that rely on
queueing up many glClear()s and having the GPU report being still busy in
an ARB_sync query after that.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Collects various statistical information for each shader
and total stats for contexts.
Printed with R600_DEBUG=sb,sbstat
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
In some cases we use value::gvn_source field to link values that
are known to be equal before gvn pass (e.g. results of DOT4 in different
slots of the same alu group), but then source value may become dead later
and this confuses further passes.
This patch resets value::gvn_source to NULL in the dce_cleanup pass
if it points to dead value.
Fixes segfault during shader optimization with ETQW.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
It's not a complete register pressure tracking, yet it helps to prevent
register allocation problems in some cases where they were observed.
The problems are uncovered by false dependencies between fetch instructions
introduced by some recent changes in TGSI and/or default backend.
Sometimes we have code like this:
...
SAMPLE R5.xyzw, R5.xyzw
... store R5.xyzw somewhere
MOV R5.x, <next x coord>
MOV R5.y, <next y coord>
SAMPLE R5.xyzw, R5.xyzw
... <may be repeated a lot of times>
With 2D resources, z and w in SAMPLE src reg aren't used and can be simply
masked, but shader backend doesn't have this information, so it's
considered as data dependency by optimization algorithms.
This results in more clean shader code and may improve the quality of
optimized code produced by r600-sb due to eliminated false dependencies
in some cases.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
The remaining bits happen to do nothing that
_swrast_span_render_start()/finish() don't do.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
It's not really span code ever since we stopped using spans for S8.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
In the case of renering to windows in X, we would render to stale buffers
(or not render at all!) if you hit a MapRenderbuffer as the first thing
done to your window after new buffers are ready to be collected in DRI2.
I think this also covers the weird comment about irb->mt being missing
sometimes.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
I always forget how we do this for compressed textures.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Now that everything goes through ImageSlices[], we can rely on the
driver's existing texture mapping function.
A big block of code goes away on Radeon that looks like it was to deal with
the validate that happened at SpanRenderStart, which no longer occurs since we
don't need validation for the MapTextureImage hook.
v2: Rewrite comment about ImageSlices, fix duplicated swImages, touch up
unmap loop.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
Reviewed-by: Brian Paul <brianp@vmware.com>
This code is trying to deal with providing a map in the case that
AllocTexImageBuffer was called, which is hooked up to the swrast variant.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
MapTextureImage has the exact same logic, except it can also handle
swrast-allocated buffers.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
For hardware drivers with pitch alignment requirements, a
non-power-of-two-sized texture format won't end up being an integer number
of pixels per row. Also, avoids having to change our units between
MapTextureImage's rowStride and swrast's RowStride.
This doesn't fully convert the compressed texel fetch path, but does make
sure we don't drop any bits (not that we'd expect to).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
This is a step toward allowing drivers to use their normal mapping paths,
instead of requiring that all slice mappings come from an aligned offset
from the first slice's map.
This incidentally fixes missing slice handling in FXT1 swrast.
v2: Use slice height helper function.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
v2: Move slice height calculation to a helper function (recommeded by Brian).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
Reviewed-by: Brian Paul <brianp@vmware.com>
This function going to get used a lot more in upcoming patches.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Improves GLB2.7 performance on my HSW by 0.671455% +/- 0.225037% (n=62).
v2: Make is_valid_3src() a method of the fs_reg. (recommended by Ken)
Reviewed-by: Matt Turner <mattst88@gmail.com> (v1)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
Improves GLB2.7 trex performance 1.01985% +/- 0.721366% on my IVB (n=10)
and by 3.38771% +/- 0.584241% (n=15) on my HSW, due to a 32x32 ARGB8888
cubemap going from untiled to tiled.
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
It appears that Z16 on Intel hardware is in fact slower than Z24, so
people are getting surprisingly hurt when trying to use Z16 as a
performance-versus-precision tradeoff, or when they're targeting GLES2 and
that's all you get.
GL 3.0+ have Z16 on the list of required exact format sizes, but GLES
doesn't, so choose the better-performing layout in that case. Improves
GLB 2.7 trex performance at 1920x1080 by 10.7% +/- 1.1% (n=3) on my IVB
system.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
For this multi-page single statement, my thought the end was to that the
next block was mis-indented, rather than that the dropped indentation
actually indicated the end of the loop.
In almost all of our cases, getters that are turned on for only some API
variants will have an extension listed as one of the things that can
enable it, and thus api_check gets set. For extra_gl30_es3 (used for
NUM_EXTENSIONS, MAJOR_VERSION, MINOR_VERSION) on a GL 2.1 context, though,
we would check twice, not find either one, but never actually throw the
error.
While we may provide the extension, we need to tell applications that they
can't actually use it:
An implementation can either set QUERY_COUNTER_BITS_ARB to the
value 0, or to some number greater than or equal to n. If an
implementation returns 0 for QUERY_COUNTER_BITS_ARB, then the
occlusion queries will always return that zero samples passed the
occlusion test, and so an application should not use occlusion
queries on that implementation.
These are entirely based on the opcode, which is available in
backend_instruction. It makes sense to only implement them in one
place.
This changes the VS implementation of is_tex() slightly, which now
accepts FS_OPCODE_TXB and SHADER_OPCODE_LOD. However, since those
aren't generated in the VS anyway, it should be fine.
This also makes is_control_flow() available in the VS.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
only report overflow for missing targets if they're actually being
used. if the targets are missing but are not being used by any
slot in the stream output declaration we should correctly just
ignore them.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Interpolation modes other than perspective-barycentric-pixel-center (and
their associated coefficients in the WM payload) only exist in Gen6 and
later.
Unfortunately, if a varying was declared as `centroid`, we would blindly
read the nonexistant values, and so produce all manner of bad behavior
-- texture swimming, snow, etc.
Fixes rendering in Counter-Strike Source and Team Fortress 2 on
Ironlake.
NOTE: This is a candidate for the 9.1 branch.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Tested-by: Jordan Justen <jordan.l.justen@intel.com>
We were crashing if one of the buffers wasn't set, we should
just treat it as an overflow. It's useful when using so
statistics because it allows one to figure out how much data
would be generated by so without actually writing any of it.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
we weren't adding the soa offsets when constructing the indices
for the gather functions. That meant that we were always returning
the data in the first vertex/primitive/pixel in the SoA structure
and not correctly fetching from all structures.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
We already hold the variable, just weren't providing access
to it.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
We weren't taking the buffer offset, destination offset or the
stride into consideration so we were frequently writing into
an overflown buffer.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
This was a very serious bug. We were always doing the viewport
transformations on the first output of the vertex shader. That means
that every application that was storing position in anything but
OUT[0] was outputing untransformed vertices and had broken output
for whatever it was storing at OUT[0]. Correctly take into
consideration where the vertex position is actually stored.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
There can be more stream output decls than shader outputs because
individual components from them can be split and distributed
among different so buffers.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Before, we'd incorrectly generate an error if we we tried to
replace a non-4x4 block near the edge of a NPOT compressed texture.
For example, if the dest image was 15 texels wide and xoffset=12
and width=3 we'd incorrectly generate GL_INVALID_OPERATION.
Verified with new tests added to piglit s3tc-errors test.
Note: This is a candidate for the stable branches.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
patch fixes a crash that happens if glTexSubImage2D is called with a
negative xoffset.
NOTE: This is a candidate for stable branches.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
After more thought/discussion, it seems it is better to handle this sort
of stuff in the state tracker.
So this reverts commit 12096f334b, except the
variant->key -> key shorthands.
This is a simple shader compiler that performs almost zero optimizations. The
generated code is usually much larger comparing to that generated by i965.
The generated code also requires many more registers.
Function-wise, it lacks register spilling and does not support most TGSI
indirections. Other than those, it works alright.
Courtesy of clang:
src/gallium/auxiliary/gallivm/lp_bld_sample.c:1483:10: warning: array index of '2' indexes past the end of an array (that contains 2 elements) [-Warray-bounds]
tmp[2] = lp_build_swizzle_aos(coord_bld, ddx_ddy[1], swizzle02);
^ ~
src/gallium/auxiliary/gallivm/lp_bld_sample.c:1430:10: note: array 'tmp' declared here
LLVMValueRef ddx_ddy[2], tmp[2], rho_vec;
^
src/gallium/auxiliary/gallivm/lp_bld_sample.c:1487:56: warning: array index of '2' indexes past the end of an array (that contains 2 elements) [-Warray-bounds]
rho_vec = lp_build_add(coord_bld, rho_vec, tmp[2]);
^ ~
src/gallium/auxiliary/gallivm/lp_bld_sample.c:1430:10: note: array 'tmp' declared here
LLVMValueRef ddx_ddy[2], tmp[2], rho_vec;
^
src/gallium/auxiliary/gallivm/lp_bld_sample.c:1491:56: warning: array index of '2' indexes past the end of an array (that contains 2 elements) [-Warray-bounds]
rho_vec = lp_build_max(coord_bld, rho_vec, tmp[2]);
^ ~
src/gallium/auxiliary/gallivm/lp_bld_sample.c:1430:10: note: array 'tmp' declared here
LLVMValueRef ddx_ddy[2], tmp[2], rho_vec;
^
Only 13 affected programs in shader-db, but they were all helped.
total instructions in shared programs: 368877 -> 368851 (-0.01%)
instructions in affected programs: 1576 -> 1550 (-1.65%)
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
Three-source instructions have a vertical stride overloaded to 4, which
prevents directly using vec4 uniforms as arguments. Instead we need to
insert a MOV instruction to do the replication for the three-source
instruction.
With this in place, we can use three-source instructions in the vertex
shader. While some thought needs to go into deciding whether its better
to use a three-source instruction rather than a sequence of equivalent
instructions (when one or more sources are uniforms or immediates), this
will allow us to skip a lot of ugly lowering code and use the BFE and
BFI2 instructions directly.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
This move the tracing timeout and printing into winsys and add
an debug environement variable for it (R600_DEBUG=trace_cs).
Lot of file touched because of winsys API changes.
v2: Do not write lockup file if ib uniq id does not match last one
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
There was a lot of code in evergreen_compute_internal.c that was not
being used at all and most of it was duplicating code from other parts
of the driver.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
We don't support pre-2.6 kernels anyway - the install docs say 2.6.28
for DRI - and apparently this confuses ld.so's sorting when multiple
libGLs are installed. Just remove it.
Note: this is a candidate for the stable branches.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Adam Jackson <ajax@redhat.com>
New textures or vertex buffers don't always require patching and
re-emitting the shaders. So do a better job of figuring out when we
actually have to patch the shader.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Removes 75/78 state-dependent recompiles in GLB2.7 (the remaining 3 are
due to FBO-rendering size predictions). We currently expose
GL_ARB_color_buffer_float on GL core, so we may mis-predict there, but I'm
about to send a patch for removing that silly extension in that case.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If a bug in an app/stater-tacker causes vertex buffer to fetch vertex
elements that are not bound, simply return zeros instead of crashing.
Reviewed-by: Brian Paul <brianp@vmware.com>
clang is supports most gcc options / extensions, with a some exceptions.
The biggest advantage of using clang is that compilation times are much
short.
One can tell scons to use clang when building by invoking it as
CC=clang CXX=clang++ scons libgl-xlib
From low to high bits, the sample positions are packed y0,x0,y1,x1...
Fixes arb_texture_multisample-sample-position piglit.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
We were assigning incorrect const register for immediates, and
potentially writing immediate const to the wrong location. This fixes
an incorrect-rendering bug with xonotic.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Set a few extra registers to make sure we are in proper state for
clearing. And also add some debug options to mark all state dirty in
clear and gmem operations to aid in debugging.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
There is a bit we need to set for 2D vs 3D fetch, to tell the hw whether
there are two or there valid input components.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
The previous approach of using the dst register as an intermediate
temporary doesn't work in a lot of cases. For example, if the dst
register is the same as one of the src registers.
For now, just simplify it and always allocate a new register to use as
an intermediate. In some cases this will result in more registers used
than required. I think the best solution would be to implement an
optimization pass to reduce the number of registers used, which would
also solve the problem we have now of not being able to use GPRs that
are assigned for TGSI_FILE_INPUT.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Opps, didn't notice that I had left it stubbed out.
Also, make things fail a bit more gracefully when things go wrong.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Really this should be set based on buffer format, not on color vs
depth/stencil. Probably there should be more formats that set the bit
as we add support for more render target formats.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Lighter weight then using streamout. Only evergreen
and newer asics support embedded data as src with
CP DMA.
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Unlike GEN6, the bits of entry count are distributed like this
width = (entry_count & 0x0000007f); /* bits [6:0] */
height = (entry_count & 0x001fff80) >> 7; /* bits [20:7] */
depth = (entry_count & 0x7fe00000) >> 21; /* bits [30:21] */
The maximum entry count is still limited to 2^27.
This was noted while going over the PRM. No test is impacted, because
1<<20 (the bit that moved) is much larger than GL_UNIFORM_BLOCK_MAX_SIZE,
GL_MAX_TEXTURE_BUFFER_SIZE, or MAX_*_UNIFORM_COMPONENTS.
v2: Explain more in the commit message (by anholt)
Reviewed-by: Eric Anholt <eric@anholt.net>
The inverse repeat count should taks up bits 31:15 and is in U1.16. Fixes
the "Restarting lines within a single Begin/End block" subtest of piglit
linestipple, and gets the other failing subtests much closer to passing.
v2: Rewrite commit message with more detailed piglit info (by anholt)
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, the only kind of ir_jump that would terminate a basic
block was "return". However, the other possible types of ir_jump
("break", "continue", and "discard") should terminate a basic block
too. This patch modifies basic block analysis so that it terminates a
basic block on any type of ir_jump, not just ir_return.
Fixes piglit test dead-code-break-interaction.shader_test.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Running piglit with this was causing all sort of weird stuff happening
to my desktop (Chromium webpages become blank, Qt Creator flickered,
etc). I tracked this down to shared memory segment leakage when GL is
not shutdown properly. The segments can be seen running `ipcs` and
looking for nattch==0.
This changes fixes this by calling shmctl(IPC_RMID) soon after creation
(which does not remove the segment immediately, but simply marks it for
removal when no more processes are attached).
This matches src/mesa/drivers/x11/xm_buffer.c behaviour.
v2:
- move shmctl(IPC_RMID) after XShmAttach() for *BSD, per Chris Wilson
- remove stray debug printfs, spotted by Ian Romanick
NOTE: This is a candidate for stable branches.
Reviewed-by: Brian Paul <brianp@vmware.com>
We need to handle the leading vertex information when
assembling primitives for the geometry shader otherwise
the resulting triangles will have vertices at incorrect
input locations.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
The commit below exposed a bug in dri2_add_config.
commit 3998f8c6b5
Author: Ralf Jung <post@ralfj.de>
Date: Tue Apr 9 14:09:50 2013 +0200
egl/x11: Fix initialisation of swap_interval
This little code snippet near the bottom of dri2_add_config,
if (double_buffer) {
...
conf->base.MinSwapInterval = dri2_dpy->min_swap_interval;
conf->base.MaxSwapInterval = dri2_dpy->max_swap_interval;
}
it never did what it claimed to do. The assignment never changed the value
of conf->base.MaxSwapInterval, because dri2_dpy->max_swap_interval was,
until the above exposing commit, unitialized here. That is,
conf->base.MaxSwapInterval was 0 before and after assignment. Ditto for
the min swap interval.
Above the troublesome code snippet, the call to _eglFilterArray rejects
the config as unmatching if its swap interval bounds differ from the base
config's. Before the exposing commit, at the call to _eglFilterArray, the
swap interval bounds were always [0,0], and hence no config was rejected
due to swap interval.
After the exposing commit, _eglFilterArray incorrectly rejected some
configs, which prevented dri2_egl_config::dri_double_config from getting
set for the rejected config, which resulted in a NULL pointer getting
passed into dri2CreateNewDrawable, and then segfault.
The solution: set the swap interval bounds before _eglFilterArray.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=63447
Tested-by: Lu Hua <huax.lu@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
The previous commit introduced extra words, breaking the formatting.
This text transformation was done automatically via the following shell
command:
$ git grep 'THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY' | sed 's/:.*$//' | xargs -I {} sh -c 'vim -e -s {} < vimscript2
where 'vimscript2' is a file containing:
/THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY/;/^ *$/ !fmt -w 78 -p '// '
:wq
Reviewed-by: Brian Paul <brianp@vmware.com>
The previous commit introduced extra words, breaking the formatting.
This text transformation was done automatically via the following shell
command:
$ git grep 'THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY' | sed 's/:.*$//' | xargs -I {} sh -c 'vim -e -s {} < vimscript
where 'vimscript' is a file containing:
/THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY/;/\*\// !fmt -w 78 -p ' * '
:wq
Reviewed-by: Brian Paul <brianp@vmware.com>
This brings the license text in line with the MIT License as published
on the Open Source Initiative website:
http://opensource.org/licenses/mit-license.php
Generated automatically be the following shell command:
$ git grep 'THE AUTHORS BE LIABLE' | sed 's/:.*$//g' | xargs -I '{}' \
sed -i 's/THE AUTHORS/THE AUTHORS OR COPYRIGHT HOLDERS/' {}
This introduces some wrapping issues, to be fixed in the next commit.
Reviewed-by: Brian Paul <brianp@vmware.com>
See previous commit for the rationale. These weren't caught by the
automatic conversion due to the "OR IBM" addition.
Reviewed-by: Brian Paul <brianp@vmware.com>
Generated automatically be the following shell command:
$ git grep 'BRIAN PAUL BE LIABLE' | sed 's/:.*$//g' | xargs -I '{}' \
sed -i 's/BRIAN PAUL/THE AUTHORS/' {}
The intention here is to protect all authors, not just Brian Paul. I
believe that was already the sensible interpretation, but spelling it
out is probably better.
More practically, it also prevents people from accidentally copy &
pasting the license into a new file which says Brian is not liable when
he isn't even one of the authors.
Reviewed-by: Brian Paul <brianp@vmware.com>
There used to be a derived state _ClampReadColor, so setting _NEW_COLOR
made sense. The state is gone now.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
We don't want to set the flag for Gallium.
I think only swrast needs the flag to be set for occlusion queries.
v2: fix stats_wm updates in i965
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
The functions don't affect driver state. There is no code that would rely
on vertices being flushed prior to changing the states, and no code that
would check for _NEW_STENCIL before using the states.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
both functions don't change the framebuffer in any way
(if mesa_meta is not used)
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
No driver checks the flag. Nobody uses it.
I also removed the FLUSH_VERTICES calls, because PixelStorei has no effect
on rendering.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
_NEW_TRANSFORM_FEEDBACK is not used by core Mesa, so it can be removed.
Instead, an new private flag is added to i965 to serve the same purpose.
If you're new to this:
* When creating a context. you can set private dirty flags
in gl_context::DriverFlags, eg.:
ctx->DriverFlags.NewStateX = BRW_NEW_STATE_X;
* When StateX is changed, core Mesa does:
ctx->NewDriverState |= ctx->DriverFlags.NewStateX;
* When you have to draw, read and clear ctx->NewDriverState.
* Pros: not touching NewState, the driver decides the mapping between
GL states and hw state groups, unlimited number of flags in core Mesa
(still limited number of flags in the driver though)
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
because the code looks at the visual if there is a depth or stencil buffer
before enabling depth or stencil, respectively.
NOTE: This is a candidate for the stable branches.
Reviewed-by: Brian Paul <brianp@vmware.com>
Squashed commit of the following:
commit 04c5fa2cbb8e89d6f2fa5a75af1cca03b1f6b852
Author: José Fonseca <jfonseca@vmware.com>
Date: Tue Apr 23 17:37:18 2013 +0100
gallium: s/lower_left_origin/bottom_edge_rule/
commit 4dff4f64fa83b9737def136fffd161d55e4f1722
Author: José Fonseca <jfonseca@vmware.com>
Date: Tue Apr 23 17:35:04 2013 +0100
gallium: Move diagram to docs.
commit 442a63012c8c3c3797f45e03f2ca20ad5f399832
Author: James Benton <jbenton@vmware.com>
Date: Fri May 11 17:50:55 2012 +0100
gallium: Replace gl_rasterization_rules with lower_left_origin and half_pixel_center.
This change is necessary to achieve correct results when using OpenGL
FBOs.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
This fixes a crash when a resource cannot be mapped to the CPU's address space
because it's too big.
This puts a global pipe_context in r600_screen, which is guarded by a mutex,
so that we can use pipe_context when there isn't one around.
Hopefully our multi-context support is solid.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
NOTE: This is a candidate for the 9.1 branch.
Although this might be useful for ARB_clear_buffer_object,
I need it for initializating resources in r600g.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
v2: comment cleanups
NOTE: This is a candidate for the 9.1 branch.
Number of vertices to fetch doesn't always equal the number of input
vertices. To correctly compute the number if IA primitives we need
to use the total number of input vertices, not only those that
need to be fetched.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
TGSI geometry shader input declerations are of the IN[][2] format
and the dimensions of the array have to be deduced from the input
primitive property.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
We want to be able to reset certain parts of the pipeline,
in particular the input primitive index, but only either with
seperate invocations of the draw_vbo or new instances. In all
other cases (e.g. new invocations due to primitive restart)
that data needs to be preserved. Add a function through which
we can reset instance dependent data.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Same approach as in the llvmpipe, if the geometry shader is
null and we have stream output then attach it to the vertex
shader right before executing the draw pipeline.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
The code doesn't set brw->query.obj to NULL, it sets query->bo to NULL.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
TEMP is not the only register file that accept unsigned. OUT too.
Actually, what determines the appropriate type of the destination value is
not the opcode, but rather the register.
Also cleanup/simplify code. Add a few more asserts, but also make
code more robust by handling graceful if assert fails.
This fixes segfault / assertion in the included vert-uadd.sh graw shader.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
This should be reusable for other non-gallium drivers, so we can make the
extension always be available.
v2: Add a more detailed comment than the old function had (recommended
by Brian).
Reviewed-by: Brian Paul <brianp@vmware.com> (v1)
I noticed a fallback in regnum through sysprof, and wanted a nicer way to
get information about it.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2.7 was a particularly trouble ridden release.
Furthermore, the bug no longer can be reproduced ever since the
first_level state was taken in account.
Reviewed-by: Brian Paul <brianp@vmware.com>
They are supported on LLVM 3.1, at least on x86. (I haven't tested on PPC
though.)
Actually lp_build_linear_mip_levels() already has been emitting them for
some time.
This avoids intrinsics, which tend to be an obstacle for certain
optimization passes.
Reviewed-by: Brian Paul <brianp@vmware.com>
There will be a new IR for a3xx, which has a very different shader ISA
(more scalar oriented). So rename to avoid conflicts later when I start
adding a3xx support to the gallium driver.
Signed-off-by: Rob Clark <Rob Clark robdclark@freedesktop.org>
The standalone shader assembler needed some meta-data to know about
attributes/varyings/etc, to do the shader linkage. We don't need these
parts with gallium/tgsi, so just get rid of it.
Signed-off-by: Rob Clark <Rob Clark robdclark@freedesktop.org>
Should be able to handle all things which make this tricky to implement.
Fallthroughs, including most notably into/out of default, should be handled
correctly but are quite a mess.
If we see largely unoptimized switches in the wild should probably think
about some "real" switch optimization pass, e.g. things like this:
switch
case1
someinst
brk
case2
default
case3
someinst
brk
case4
someinst
endswitch
are legal, but the pointless case2/case3 statements not only cause condition
evaluation but will turn this into a "fake" fallthrough case (because
mask and defaultmask are already updated for case2 when default is
encountered) requiring executing code twice.
If default is at the end though, there's never any code re-execution, and
if that's not the case if there's no fallthrough in (not even a fake one)
and out of default there's no code re-execution neither.
v2: add comments, and use enum for break type instead of magic boolean.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
It seems there was a typo in gallivm breakc handling (I am actually still
not sure it is really needed but otherwise that statement really should go
away). Also fix the wrong src argument type, even though they weren't really
used.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
While initially that opcode probably was meant for something along the
lines of sm3 break_comp it has never worked that way (not even the
argument count was right) and now the opcode has quite different
semantics so just remove it. (Discovered by Jose Fonseca)
This is still not really correct, since at least for sm 4.0
the nesting limit is 64 per subroutine, and subroutine nesting itself
has a limit of 32, so since we have a flat stack we'd need 32*64.
But this should probably be better fixed with per-subroutine stacks,
since otherwise these structures get really big (like 100kB for the
lp_exec_mask).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Input assembler needs to be able to decompose adjacency primitives
into something that can be understood by the rest of the pipeline.
The specs say that the adjacency primitives are *only* visible
in the geometry shader, for everything else they need to be
decomposed. Which in most of the cases is not an issue, because
the geometry shader always decomposes them for us, but without
geometry shader we were passing unchanged adjacency primitives
to the rest of the pipeline and causing crashes everywhere. This
commit introduces a primitive assembler which, if geometry
shader is missing and the input primitive is one of the
adjacency primitives, decomposes them into something
that the rest of the pipeline can understand.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
pre_clip_pos is a float[4] we just used (*float)[4] to be able to
jump within the array of vertex_headers with it. So if the idx
happened to be anything but 0, we'd actually read from some garbage
in memory. Change it to just be a simple pointer instead of casting
it to something that it's not. As suggested by Jose.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This is the kind of information that would have been present for GLX, if
GLX supported modern GL. This allows these entrypoints to get automatic
asynchronous marshalling code generated for glthread.
This bug is currently benign, since get_called_parameter_string() is
currently only used for functions that return true for
glx_function.has_different_protocol(), and none of those functions
include padding. However, in order to implement marshalling of GL API
functions, we'll need to use get_called_parameter_string() far more
often.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Without this patch, $$.negate, $$.rgba_valid, and $$.xyzw_valid take
on garbage values. At the moment this problem is benign (the garbage
values happen to be zero), but in my experiments executing GL
operations on a background thread, the garbage values change, leading
to piglit failures.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Since stdbool.h's "true" and "false" are #defines, they got expanded when
used as macro arguments, and that expanded value was stored in the
XML string, producing XML that driconf would then fail to parse.
Currently no drivers included stdbool along with driconf, but I keep
accidentally doing so on intel as we move towards using normal C.
v2: rebase on master.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
It's next to useless, since it just allows you to turn off VDPAU and
XvMC with a single switch. Just check whether Gallium drivers are
enabled instead.
Reviewed-by: Christian König <christian.koenig@amd.com>
Most test pass, issue are with border color and swizzle.
Based on ircnick<maelcum> patch.
v2: Restaged commit hunk
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
v2: Remove left over code
v3: Restage properly the commit so hunk of first one are not in
second one.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Not all are supported as render targets.
The state tracker fallback of using RGBA instead of RGBX currently
fails for blending, we could work around this by clearing their alpha
to 1 and modifying the color mask to disable writing alpha.
This is the only sane solution for nv50 and nvc0 (really, trust me),
but since on other hardware the border colour is tightly coupled with
texture state they'd have to undo the swizzle, so I've added a cap.
The dependency of update_sampler on the texture updates was
introduced to avoid doing the apply_depthmode to the swizzle twice.
v2: Moved swizzling helper to u_format.c, extended the CAP to
provide more accurate information.
Per message on mesa-users list, this wasn't working before.
Note: This is a candidate for the stable branches.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Turns out the previous "fix" for handling per-pixel face selection and
derivatives didn't work out that well - the derivatives were wrong by
quite a bit, in theory transformation of the derivatives into cube space
should work, but would be _a lot_ more work than the "simplified" transform
used.
So, for explicit derivatives, I'm just giving up and go back to not honoring
them.
For implicit derivatives (and the fake explicit ones) however we try
something a little different, we just calculate rho as we would for a 3d
texture, that is after scaling the coords by the inverse major axis.
This gives the same results as calculating the derivs after projection of
the coords to the same face as long as all pixels hit the same face (and
only without rho_no_opt, otherwise it should be a bit worse). And when
not all pixels are hitting the same face, the results aren't so hot but
not catastrophically bad (I believe not off by more than a factor of 2 without
no_rho_approx and not more than sqrt(2) with no_rho_approx). I think this is
better than just picking the wrong face but who knows...
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This will calculate rho correctly as
sqrt(max((ds/dx)^2 + (dt/dx)^2 + (dr/dx)^2), (ds/dx)^2 + (dt/dx)^2 + (dr/dx)^2))
instead of max(|ds/dx|,|dt/dx|,|dr/dx|,|ds/dy|,|dt/dy,|dr/dy|)
(for 3 coords - 2 coords work analogous, for 1 coord there's no point doing
the exact version), for both implicit and explicit derivatives.
While such approximation seems to be allowed in OpenGL some APIs may be less
forgiving, and the error can be quite large (sqrt(2) for 2 coords, sqrt(3) for
3 coords so wrong by nearly one mip level in the latter case).
This also helps to single out "real" bugs from "expected" ones, so it is debug
only (though at least combined with no_brilinear I didn't really see much of a
performance difference but only tested with a debug build - at least with
implicit mipmaps the instruction count is almost exactly the same though the
instructions are more complex (1 sqrt and mul/adds instead of and/max mostly).
The code when the option isn't set stays exactly the same.
v2: rename no_rho_opt to no_rho_approx.
Reviewed-by: Brian Paul <brianp@vmware.com>
Currently the vdpau and xvmc detection code, is enabled for all builds. The
state trackers exist only within gallium. Enable whenever at least one gallium
driver is selected
v2: removed stray '-a'
[mattst88 v3]: Removed stray $.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=63645
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
We were trying to use a destroy method from a deleted context.
This fix is based on what's in the svga driver.
Reviewed-by: Stéphane Marchesin <marcheu@chromium.org>
Fixes issue identified by Klocwork analysis:
'attribute_map' array elements might be used uninitialized in this
function (vec4_visitor::lower_attributes_to_hw_regs).
The attribute_map array contains the mapping from shader input
attributes to the hardware registers they are stored in.
vec4_vs_visitor::setup_attributes() only populates elements of this
array which, according to core Mesa, are actually used by the shader.
Therefore, when vec4_visitor::lower_attributes_to_hw_regs() accesses
the array to lower a register access in the shader, it should in
principle only access elements of attribute_map that contain valid
data. However, if a bug ever caused the driver back-end to access an
input that was not flagged as used by core Mesa, then
lower_attributes_to_hw_regs() would access uninitialized memory, which
could cause illegal instructions to get generated, resulting in a
possible GPU hang.
This patch makes the situation more robust by using memset() to
pre-initialize the attribute_map array to zero, so that if such a bug
ever occurred, lower_attributes_to_hw_regs() would generate a (mostly)
harmless access to r0. In addition, it adds assertions to
lower_attributes_to_hw_regs() so that if we do have such a bug, we're
likely to discover it quickly.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
For some reason I made this happen under indirect rendering,
I think we might have a leak, valgrind gave out, so I said I'd
fix the basic problem.
NOTE: This is a candidate for stable branches.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
No longer pass -a flag to the get_hash_generate.py script to specify
OpenGL, ES1, ES2, etc. This updates the autoconf, scons and android
build files too (so we can bisect).
This is the last of the API-dependent conditional compilation in
core Mesa.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
we were ignoring leading/provoking vertex settings which was
breaking decomposition of some strips.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
we were using the wrong vars, reporting incorrect stream output
statistics.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
We were always treating the vertex index as a scalar but when the
shader is using indirect addressing it will be a vector of indices
for each channel. This was causing some nasty crashes insides
LLVM.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
If a window's minimized we get a zero-size window. Skip the SwapBuffers
in that case to avoid some warning messages with the VMware svga driver.
Internal bug #996695
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The caller of NewTextureObject does the right thing if NULL is returned,
so this function should do the right thing too.
NOTE: This is a candidate for stable branches.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The specification says that the geometry shader should exit if the
number of emitted vertices is bigger or equal to max_output_vertices and
we can't do that because we're running in the SoA mode, which means that
our storing routines will keep getting called on channels that have
overflown (even though they will be masked out, but we just can't skip
them).
So we need some scratch area where we can keep writing the overflown
vertices without overwriting anything important or crashing.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Can happen if we were using stream output without geometry
shader, by returning early we avoid a crash.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This is a basic implementation of the pipeline statistics in the
draw module. The interface is similar to the stream output statistics
and also requires that the callers explicitly enable it.
Included is the implementation of the interface in llvmpipe and
softpipe. Only softpipe enables the pipeline statistics capability
though because llvmpipe is lacking gathering of the fragment shading
and rasterization statistics.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The issue with SOA execution and end_primitive opcode is that it
can be executed both when we haven't emitted any vertices, in
which case we don't want to emit an empty primitive, and when
the execution mask is zero and the execution should be skipped. We
handled only the latter of those conditions. Now we're combining the
execution mask with a mask created from emitted vertices to handle
both cases. As a result we don't need the pending_end_primitive
flag which was broken because it was static and could be affected
by both above mentioned conditions at run-time.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Same as with llvmpipe: we can't be divind/moding by zero and we
need to make sure that dividing/moding by zero produces 0xffffffff.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Add a note to update PACKAGE_VERSION for Android and scons builds
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Handle legacy/obsolete specs as well
List all specs in extensions.html
Mark 'OLD' extensions as obsolete in extensions.html
Update the spec location in old relnotes
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
relnotes-*html > relnotes/*html
RELNOTES-* > relnotes/*
fix links, css and frames
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
TGSI_OPCODE_IF condition had two possible interpretations:
- src.x != 0.0f
- Mesa statetracker when PIPE_SHADER_CAP_INTEGERS was false either for
vertex and fragment shaders
- gallivm/llvmpipe
- postprocess
- vl state tracker
- vega state tracker
- most old drivers
- old internal state trackers
- many graw examples
- src.x != 0U
- Mesa statetracker when PIPE_SHADER_CAP_INTEGERS was true for both
vertex and fragment shaders
- tgsi_exec/softpipe
- r600
- radeonsi
- nv50
And drivers that use draw module also were a mess (because Mesa would
emit float IFs, but draw module supports native integers so it would
interpret IF arg as integers...)
This sort of works if the source argument is limited to float +0.0f or
+1.0f, integer 0, but would fail if source is float -0.0f, or integer in
the float NaN range. It could also fail if source is integer 1, and
hardware flushes denormalized numbers to zero.
But with this change there are now two opcodes, IF and UIF, with clear
meaning.
Drivers that do not support native integers do not need to worry about
UIF. However, for backwards compatibility with old state trackers and
examples, it is advisable that native integer capable drivers also
support the float IF opcode.
I tried to implement this for r600 and radeonsi based on the surrounding
code. I couldn't do this for nouveau, so I just shunted IF/UIF
together, which matches the current behavior.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
v2:
- Incorporate Roland's feedback.
- Fix r600_shader.c merge conflict.
- Fix typo in radeon, spotted by Michel Dänzer.
- Incorporte Christoph Bumiller's patch to handle TGSI_OPCODE_IF(float)
properly in nv50/ir.
This patch adds PCI IDs for Bay Trail (sometimes called Valley View).
As far as the 3D driver is concerned, it's very similar to Ivybridge,
so the existing code should work just fine.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Assume the maximum pixel size (16 bytes per pixel). In addition to
moving redundant malloc and free calls outside the loop, this fixes a
potential resource leak when a surface is mapped and the malloc fails.
This also makes blit_nearest look a bit more like blit_linear.
v2: Use MAX_PIXEL_BYTES instead of 16. Suggested by Ken.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This was originally discovered by Klocwork analysis:
Possible memory leak. Dynamic memory stored in 'srcBuffer0'
allocated through function 'malloc' at line 566 can be lost at line
746
However, I think the problem is actually much worse. Since the memory
is freed after the first pass through the loop, the released buffer may
be used on the next iteration!
NOTE: This is a candidate for stable release branches.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
There is a hardware bug on Cayman where a BREAK/CONTINUE followed by
LOOP_STARTxxx for nested loops may put the branch stack into a state
such that ALU_PUSH_BEFORE doesn't work as expected. Workaround this
by replacing the ALU_PUSH_BEFORE with a PUSH + ALU
Fixes piglit tests EXT_transform_feedback/order*
v2: Use existing loop count and improve comment
v3: [Vadim Girlin] Set jump address for PUSH instructions
NOTE: This is a candidate for the 9.1 branch
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Instead of emitting configuration values (e.g. number of gprs used) in a
predefined order, the LLVM backend now emits these values in
register/value pairs. The first dword contains the register address and
the second dword contians the value to write.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Inserting the value for the second quad in the wrong place for the
following shuffle. This meant the row or image stride was undefined which is
quite catastrophic, can lead to bogus texels fetched or just segfault.
This code is only hit for SoA path currently, still surprising it
didn't crash more or caused more visible issues (I think llvm used a
broadcast shuffle for the undefined parts of the vector, hence the undefined
value for the second quad was just the same as that from the first quad,
so as long as both quads hit the same mip level everything was fine, and since
lower mips always have the same large stride it made it less likely to
hit out-of-bound memory in case of differing lods).
Note: this is a candidate for stable branches.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Null platform IDs are OK according to the spec, but some applications have
been reported to get paranoid and assume that our NULL platform is unusable.
As it doesn't hurt to have device enumeration separate from the rest of the
device code (quite the opposite, it makes the code cleaner), make the API use
an actual platform object that keeps track of the available devices instead of
the former NULL pointer.
Reported-and-reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Signed-off-by: Francisco Jerez <currojerez@riseup.net>
I don't see a sensible value to use in this path, but we shouldn't ever
hit this outside of developer new-texture-target enabling.
Reviewed-by: Matt Turner <mattst88@gmail.com>
We don't want to store this thing in the class, and we do need the
definition to be at the top of the function and held onto until the end
here, so there's not much to do besides (void) reference it.
Reviewed-by: Matt Turner <mattst88@gmail.com>
This was copy and pasted from can_reswizzle_dst(), and we can just fold it
in instead to avoid the warning.
Reviewed-by: Matt Turner <mattst88@gmail.com>
I think this actually clarifies what's going on in the asserts a bit,
given how many regions we've got floating around.
Reviewed-by: Matt Turner <mattst88@gmail.com>
We assert that failure doesn't happen, but it fixes a warning in the
release build and it would at least give working behavior for a user by
falling back to the normal texsubimage path.
Reviewed-by: Matt Turner <mattst88@gmail.com>
This was silly -- checking that we didn't overflow the array by dividing
the array size by 2 and then multiplying it back up by 2.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Asserts don't stop execution in release builds, so we would continue on to
use an uninitialized format value. Just take the failure path, which
appears to continue up the call stack for a while.
Reviewed-by: Matt Turner <mattst88@gmail.com>
The call has no side effects, and moving it into the assert cleans up a
compile warning in the release build.
Reviewed-by: Matt Turner <mattst88@gmail.com>
We have this support for firsthalf/sechalf instructions, which would be
called in the !has_compr4 (aka original gen4) 16-wide case. We currently
only support 16-wide for gen5+, so we weren't tripping over this, but it
would have been a problem if we ever try to enable it.
Reviewed-by: Matt Turner <mattst88@gmail.com>
This is a poor substitute for proper global dead code elimination that
could replace both our current paths, but it was very easy to write. It
particularly helps with Valve's shaders that are translated out of DX
assembly, which has been register allocated and thus have a bunch of
unrelated uses of the same variable (some of which get copy-propagated
from and then left for dead).
shader-db results:
total instructions in shared programs: 1735753 -> 1731698 (-0.23%)
instructions in affected programs: 492620 -> 488565 (-0.82%)
v2: Fix comment typo
Reviewed-by: Matt Turner <mattst88@gmail.com>
This instruction doesn't update its IR destination, it just moves from
payload to f0. This caused the dead code elimination pass I'm adding to
dead-code-eliminate the first step of interpolation.
Reviewed-by: Matt Turner <mattst88@gmail.com>
These checks were all over, and every time I wrote one I had to try to
decide again what the cases were for partial updates.
v2: Fix inadvertent reladdr check removal.
Reviewed-by: Matt Turner <mattst88@gmail.com>
The other dispatch tables (Exec and Save) are freed, but BeginEnd is
never freed. This was found by inspection why investigating the leak of
shared state in _mesa_initialize_context.
NOTE: This is a candidate for stable branches
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Back up at line 1017 (not shown in patch), we add a reference to the
shared state. Several places after that may divert to the error
handler, but, as far as I can tell, nothing ever unreferences the shared
state.
Fixes issue identified by Klocwork analysis:
Resource acquired to 'shared->TexMutex' at line 1012 may be lost
here. Also there is one similar error on line 1087.
NOTE: This is a candidate for the stable branches.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
dri2_create_surface can fail for a variety of reasons, including bad
input data. Dereferencing the NULL pointer and crashing is not okay.
Fixes issue identified by Klocwork analysis:
Pointer 'surf' returned from call to function 'dri2_create_surface'
at line 285 may be NULL and will be dereferenced at line 291.
NOTE: This is a candidate for the stable branches.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Duh.
Fixes issues identified by Klocwork analysis:
Pointer 'table' returned from call to function 'calloc' at line 115
may be NULL and will be dereferenced at line 117.
and
Suspicious dereference of pointer 'table' before NULL check at line
119.
NOTE: This is a candidate for the stable branches.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Ensure that process_array_type never returns NULL, and let
process_array_type handle the case where the supplied base type is NULL.
Fixes issues identified by Klocwork analysis:
Pointer 'type' returned from call to function 'get_type' at line
1907 may be NULL and may be dereferenced at line 1912.
and
Pointer 'field_type' checked for NULL at line 4160 will be
dereferenced at line 4165. Also there is one similar error on line
4174.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes issue identified by Klocwork analysis:
Pointer 'field_type' returned from call to function 'glsl_type' at
line 4126 may be NULL and may be dereferenced at line 4139. Also
there are 2 similar errors on line(s) 4165, 4174.
In practice, it should be impossible to actually get NULL in here
because a syntax error would have already caused compilation to halt.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Pointed out by gcc
nve4_compute.c: In function 'nve4_launch_grid':
nve4_compute.c:511:7: warning: 'ret' may be used uninitialized in
this function [-Wmaybe-uninitialized]
if (ret)
^
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Edit by Christoph Bumiller:
Set it to -1 to indicate failure and only when it's actually required.
As otherwise it is unused - pointed out by gcc
nve4_compute.c:586:20: warning: 'nve4_cache_split_name' defined but not used [-Wunused-function]
static const char *nve4_cache_split_name(unsigned value)
^
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
For debug build we'll hit the assert, for release we are going to emit random data
as subOp is used uninitilised. Spotted by gcc
codegen/nv50_ir_emit_nv50.cpp: In member function 'void nv50_ir::CodeEmitterNV50::emitATOM(const nv50_ir::Instruction*)':
codegen/nv50_ir_emit_nv50.cpp:1554:12: warning: 'subOp' may be used uninitialized in this function [-Wmaybe-uninitialized]
uint8_t subOp;
^
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
v2: I rewrote this to use the sample positions properly.
v3: rewrite properly to use bitfield to cast back to signed ints
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds support to the mesa state tracker for ARB_texture_multisample.
hardware doesn't seem to use a different texture instructions, so
I don't think we need to create one for TGSI at this time.
Thanks to Marek for fixes to sample number picking.
v2: idr pointed out a bug in how we picked the max sample counts,
use new internal format chooser interface to pick proper answers.
v3: use st_choose_format directly, it was okay, fix anding of masks.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
I've no idea when sample_chan would ever be 4 here, but 4 is most
definitely wrong, array textures have it as 3 as well.
Also the cayman code though unused is obviously wrong.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Since the vec4_visitor and vec4_generator classes are going to be
re-used for geometry shaders, we can't enable their debug
functionality based on (INTEL_DEBUG & DEBUG_VS) anymore. Instead, add
a debug_flag boolean to these two classes, so that when they're
instantiated the caller can specify whether debug dumps are needed.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Geometry shader inputs are arrays, but they use an unusual array
layout: instead of all array elements for a given geometry shader
input being stored consecutively, all geometry shader inputs are
interleaved into one giant array. As a result, the array stride we
use to access geometry shader inputs must be equal to the size of the
input VUE, rather than the size of the array element.
This patch introduces a new virtual function,
vec4_visitor::compute_array_stride(), which will allow geometry shader
compilation to specialize the computation of array stride to account
for the unusual layout of geometry shader input arrays. It also
renames the local variable that the ir_dereference_array visitor uses
to store the stride, to avoid confusion.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch introduces a new function,
vec4_visitor::lower_attributes_to_hw_regs(), which replaces registers
of type ATTR in the instruction stream with the hardware registers
that store those attributes. This logic will need to be common
between the vertex and geometry shaders.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch introduces a new function, vec4_visitor::emit_vertex(),
which contains the code for emitting vertices that will need to be
common between the vertex and geometry shaders.
Geometry shaders will need to use a different message header, and a
different opcode, for their URB writes, so we introduce virtual
functions emit_urb_write_header() and emit_urb_write_opcode() to take
care of the GS-specific behaviours.
Also, since vertex emission happens at the end of the VS, but in the
middle of the GS, we need to be sure to only call
emit_shader_time_end() during VS vertex emission. We accomplish this
by moving the call to emit_shader_time_end() into the VS
implementation of emit_urb_write_opcode().
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Since this function is going to get used for geometry shaders too, it
deserves a more generic name: generate_vec4_instruction.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch removes the following field from vec4_generator, since it
is not used:
- struct brw_vs_compile *c
And changes the following field:
- struct gl_vertex_program *vp => struct gl_program *prog
With these changes, vec4_generator no longer refers to any VS-specific
data structures. This will pave the way for re-using it for geometry
shaders.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
v2: Use the name "prog" rather than "p".
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The next patch is going to change the type of vec4_generator::vp from
struct gl_vertex_program * to struct gl_program *, and rename it. The
sensible name to change it to is vec4_generator::prog. However, prog
is already used. Since the existing vec4_generator::prog is of type
struct gl_shader_program, it makes sense to rename it to shader_prog.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch moves the following data structures from vec4_visitor to
vec4_vs_visitor, since they contain VS-specific data:
- struct brw_vs_compile *c (renamed to vs_compile)
- struct brw_vs_prog_data *prog_data (renamed to vs_prog_data)
- src_reg *vp_temp_regs
- src_reg vp_addr_reg
Since brw_vs_compile and brw_vs_prog_data also contain vec4-generic
data, the following pointers are added to the base class, to allow it
to access the vec4-generic portions of these data structures:
- struct brw_vec4_compile *c
- struct brw_vec4_prog_key *key
- struct brw_vec4_prog_data *prog_data
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
v2: Use shorter names in the base class and longer names in the
derived class.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch moves functions from vec4_visitor to vec4_vs_visitor that
deal with ARB (assembly) vertex programs. There's no point in having
these functions in the base class since we don't intend to support
assembly programs for the GS stage. The following functions are
moved:
- setup_vp_regs
- get_vp_dst_reg
- get_vp_src_reg
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The system values handled by vec4_visitor::visit(ir_variable *) are
VS-specific (vertex ID and instance ID). This patch moves the
handling of those values into a new virtual function,
make_reg_for_system_value(), so that this VS-specific code won't be
inherited by geomtry shaders.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch makes the following vec4_visitor functions virtual, since
they will need to be implemented differently for vertex and geometry
shaders. Some of the functions are renamed to reflect their generic
purpose, rather than their VS-specific behaviour:
- setup_attributes
- emit_attribute_fixups (renamed to emit_prolog)
- emit_vertex_program_code (renamed to emit_program_code)
- emit_urb_writes (renamed to emit_thread_end)
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch just creates the derived class; later patches will migrate
VS-specific functions and data structures from the base class into the
derived class.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This will allow the generic parts to be re-used for geometry shaders.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
v2: Put urb_read_length and urb_entry_size in the generic struct.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In patches that follow, we'll be splitting structs brw_vs_prog_data
and brw_vs_compile into a vec4-generic base struct and a VS-specific
derived struct (this will allow the vec4-generic code to be re-used
for geometry shaders). Having brw_vs_compile point to
brw_vs_prog_data makes it difficult to do this cleanly.
Fortunately most of the functions that use brw_vs_compile (those in
the vec4_visitor class) already have access to brw_vs_prog_data
through a separate pointer (vec4_visitor::prog_data). So all we have
to do is use that pointer consistently, and plumb prog_data through
the few remaining functions that need access to it.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch modifies the arguments to brw_compute_vue_map() so that
they no longer bake in the assumption that we are generating a VUE map
for vertex shader outputs. It also makes the function non-static so
that we can re-use it for geometry shader outputs.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The vec4_visitor functions don't use any VS specific data from
vec4_visitor::vp. So rename it to "prog" and change its type from
struct gl_vertex_program * to struct gl_program *. This will allow
the code to be re-used for geometry shaders.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
v2: Use the name "prog" rather than "p".
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The next patch is going to change the type of vec4_visitor::vp from
struct gl_vertex_program * to struct gl_program *, and rename it. The
sensible name to change it to is vec4_visitor::prog. However, prog is
already used in backend_visitor (which vec4_visitor derives from).
Since backend_visitor::prog is of type struct gl_shader_program *, it
makes sense to rename it to shader_prog.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The comment above glsl_type::name claimed that it could sometimes be
NULL. This was wrong--it is never NULL. Many error handling paths
would segfault if it were. (Anonymous structs are assigned names like
"#anon_struct_0001"--see the ast_struct_specifier constructor in
glsl_parser_extras.cpp.)
Fix the comment and add assertions to validate that it really is never
NULL.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Just everything you need for UVD with r600g and radeonsi.
v2: move UVD code to radeon subdir, clean up build system additions,
remove an unused SI function, disable tiling on SI for now.
v3: some minor indentation fix and rebased
v4: dpb size calculation fixed
v5: implement proper fall-back in case the kernel doesn't support UVD,
based on patches from Andreas Boll but cleaned up a bit more.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
The EGLConfig attributes EGL_MIN/MAX_SWAP_INTERVAL were incorrectly set to
0 and 0. This prevented clients from setting the swap interval to a
reasonable value, like 1 or 2.
Swap interval worked correctly in Mesa 9.0. The commit below introduced
the bug.
commit 7e9bd2b2ed
Author: Eric Anholt <eric@anholt.net>
Date: Tue Sep 25 14:05:30 2012 -0700
egl: Add support for driconf control of swapinterval.
Note: This is a candidate for the 9.1 branch.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=63078
[chadv: Wrote commit message]
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
If a region is larger than the estimated aperture size, we map/unmap it
by copying with the BLT engine. Which means we can't use Y-tiling.
Fixes Piglit max-texture-size and tex3d-maxsize, which regressed in my
recent change to use Y-tiling by default on Gen6+. This was due to a
botched merge conflict resolution.
v2: Return a mask of valid tilings from intel_miptree_select_tiling.
This allows us to avoid the X-tiling fallback if Y-tiling is actually
mandatory.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
gen7_blorp_emit_depth_stencil_config() is only called when
params->depth.mt is non-null. Therefore, it's not necessary to do an
"if (params->depth.mt)" test inside it. The presence of this if test
was misleading static analysis tools (and briefly, me) into thinking
that gen7_blorp_emit_depth_stencil_config() might sometimes access
uninitialized data and dereference a null pointer.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
both mov and ucmp can be used to move variables of any type.
correctly note that about ucmp in the tgsi_info and make
sure gallivm can handle that by correctly casting the untyped
moves.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
We were using simple temporaries, without using alloca or phi
nodes which meant that on every iteration of the loop our
temporaries, which were holding the number of vertices and
primitives which were emitted, were being reset to zero. Now
we're using alloca to allocate those variables to preserve
them across conditionals.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
We were missing the implementation of PIPE_QUERY_SO_STATISTICS
query, this change implements it on top of the existing
facilities.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
We want to both make sure we never divide by zero to not generate
sigfpe and that divide by zero is guaranteed to return 0xffffffff.
Based on José idea.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
we break when the mask values are 0 not, 1, plus it's bit comparison
not a floating point comparison. This fixes both.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Enable hiz by setting intel_context::has_hiz. However, to work around
a hardware bug, we selectively enable hiz for only nicely aligned miptree
slices.
No Piglit regressions on Haswell 0x0d26 rev07 when based atop
mesa-master-4ad3601.
Improves the performance of GLB27_TRex_C24Z16_FixedTimeStep by 18.52%
(hsw-0x0d26-rev07; kernel-3.9.0-rc1; GLBenchmark 2.7.0 Release a68901;
samples=3).
v2: Replace the check for IS_HASWELL(devid) in intel_miptree_slice_has_hiz()
with a conditional set of has_hiz. [for anholt]
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
When appropriate, replace each check `hiz_mt != NULL` with either a call
to intel_miptree_slice_has_hiz() or intel_renderbuffer_has_hiz(). No
behavioral change.
This prepares for selectively enabling hiz on individual miptree slices
for Haswell.
This refactoring had several side effects.
1. To prevent new warnings about discarding the const qualifier,
I removed 'const' from some variable declarations in
intel_validate_framebuffer(). The alternative was to add const
qualifiers to multiple function signatures in the
intel_renderbuffer_has_hiz call graph. Since the dominant convention
in the Intel code is to not qualify function parameters as const,
I chose to remove rather than add const qualifiers.
2. I changed the signature of brw_emit_depth_stencil_hiz() by replacing
`struct intel_mipmap_tree *hiz_mt` with `bool hiz`. The function used
hiz_mt mostly as a boolean indicator of the presence of hiz, so the
signature change is consistent with the patch's goal.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Add new parameters `depth_level` and `depth_layer`, which specify depth
miptree's slice of interest. A following patch will pass the new
parameters through to intel_miptree_slice_has_hiz().
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
The new fields define the 2D miptree slice to be used. A following patch
will pass the new fields through to intel_miptree_slice_has_hiz().
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
On Haswell, HiZ will selectively be enabled on individual miptree slices
to workaround a hardware bug. The new field 'has_hiz' indicates if HiZ is
enabled for a given slice.
Also add two new accessor functions for this field.
intel_miptree_slice_has_hiz
intel_renderbuffer_has_hiz
The new field and accessor functions are not yet used. Also, this patch
introduces no behavioral change because, in this patch,
intel_miptree_alloc_hiz() sets has_hiz for all slices.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
The hardware docs and the simulator require that the rectangle primitive
emitted during fast depth clears and hiz resolves must be aligned to 8x4
pixels.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
This allows the computation of the offset to get written directly into the
message source.
shader-db results:
total instructions in shared programs: 3308390 -> 3283025 (-0.77%)
instructions in affected programs: 442998 -> 417633 (-5.73%)
No difference in GLB2.7 low res (n=9).
Reviewed-by: Matt Turner <mattst88@gmail.com>
We have several places in our pull constant handling where we make a
temporary src_reg for an int, and then turn it into a dst. In doing so,
we were writing to the dst.xyzw, so we never register coalesced it with a
later mov from dst.x to real_dst.x.
These extra channels written would be removed if we had channel-wise DCE
in the backend, but we don't. Fix it for now by just not writing these
extra channels that won't get used.
Reviewed-by: Matt Turner <mattst88@gmail.com>
The software-tracked transform feedback offsets (svbi_0_starting_index)
are incorrect in the presence of primitive restart, so we were actually
updating it with a bogus value if the batch wrapped and we emitted the
packet again during a single transform feedback. By reducing state
emission, we avoid the bug.
Fixes piglit OpenGL 3.1/primitive-restart-xfb flush
Reviewed-by: Paul Berry <stereotype441@gmail.com>
NOTE: This is a candidate for the 9.1 branch.
The software-tracked transform feedback offsets (svbi_0_starting_index)
are incorrect in the presence of primitive restart, so we can't reliably
compute offsets for our buffer pointers after a batch flush. Thanks to HW
contexts, our transform feedback offsets are now saved, so we can just
keep using the ones from before the batch wrap.
Fixes piglit OpenGL 3.1/primitive-restart-xfb flush
Reviewed-by: Paul Berry <stereotype441@gmail.com>
NOTE: This is a candidate for the 9.1 branch.
"ctx->DrawBuffer->Visual" might be invalid if (NewState &_NEW_BUFFERS) != 0.
v2: also fix:
- RGBA_INTEGER_MODE_EXT
- RGBA_FLOAT_MODE_ARB (also check API support)
- FRAMEBUFFER_SRGB_CAPABLE_EXT
NOTE: This is a candidate for stable branches.
Reviewed-by: Eric Anholt <eric@anholt.net>
Gen7.5 (Haswell) hardware supports primitive restart for all primitive
types. It also handles all possible primitive restart indices.
Rather than specialize both can_cut_index_handle_restart_index() and
the switch statement in can_cut_index_handle_prims() for Haswell, just
return early if the hardware is Haswell because we know it can handle
everything.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
brw_draw.c contains a trim() function which modifies the vertex count
for quads and quad strips in order to discard dangling vertices. In
principle this shouldn't be necessary, since hardware since Gen4 is
capable of discarding dangling vertices by itself. However, it's
necessary because as a hack to speed up rendering on Gen 4-5, we
sometimes convert quads to trifans and quad strips to tristrips. The
trim() function isn't necessary on Gen6 and up.
This patch documents why and when the trim() function is necessary,
and avoids calling it when it's not needed.
This will avoid creating problems when we enable hardware support for
primitive restart of quads and quad strips on Haswell.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
The call to emit_shader_time_end() before the second URB write was
conditioned with "if (eot)", but eot is always false in this code
path, so emit_shader_time_end() was never being called for vertex
shaders that performed 2 URB writes.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Drawing subtitles didn't increased the dirty area of the surface.
Reported and tested by freeedrich on irc.
v2: don't clear the surface
Signed-off-by: Christian König <christian.koenig@amd.com>
In the mailing list discussion of "glsl/linker: fix varying packing
for non-flat integer varyings." (commit 7862bde), we concluded that
since the bug only applies to integral variables, it is safer to just
apply the bug fix to integer varyings. I forgot to make the change
before pushing the patch upstream. (Note: we aren't aware of any bugs
in commit 7862bde; it just seems wise to be on the safe side).
This patch makes the change. Assuming commit 7862bde gets
cherry-picked back to 9.1, this commit should be cherry-picked too.
NOTE: This is a candidate for the 9.1 release branch.
When a varying is consumed by transform feedback, but is not used by
the fragment shader, assign_varying_locations() sets its interpolation
type to "flat" in order to ensure that lower_packed_varyings never has
to deal with non-flat integral varyings (the GLSL spec doesn't require
integral vertex outputs to be flat if they aren't consumed by the
fragment shader).
A similar situation will arise when geometry shader support is added,
since the GLSL spec only requires integral vertex shader outputs to be
flat when they are consumed by the fragment shader. This patch
modifies the linker to handle this situation too.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
To minimize the variety of type conversions that lower_packed_varyings
needs to perform, it assumes that integral varyings are always
qualified as "flat". link_varyings.cpp takes care of ensuring that
this is the case (even in the circumstances where GLSL doesn't require
it).
This patch documents the assumption with an assertion, for ease in
future debugging.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Commit dfb57e7 (glsl: Fix error checking on "flat" keyword to match
GLSL ES 3.00, GLSL 1.50) relaxed the rules for integral varyings: they
only need to be declared as "flat" if they are a fragment shader
inputs. This allowed for the possibility of a vertex shader output
being a non-flat integer, provided that it was not matched to a
fragment shader input. A non-contrived situation where this might
arise is if a vertex shader generates some integral outputs which are
consumed by tranform feedback, but not by the fragment shader.
Unfortunately, lower_packed_varyings assumes that *all* integral
varyings are flat, regardless of whether they are consumed by the
fragment shader. As a result, attempting to create a non-flat
integral vertex output of a size that required packing (i.e. a size
other than ivec4 or uvec4) would cause an assertion failure in
lower_packed_varyings.
This patch prevents the assertion failure by forcing vertex shader
outputs to be "flat" whenever they are not consumed by the fragment
shader. This should have no effect on rendering since the "flat"
keyword only affects the behaviour of fragment shader inputs.
Fixes piglit test "spec/EXT_transform_feedback/nonflat-integral".
NOTE: This is a candidate for the 9.1 release branch.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
ir_print_visitor::visit(ir_variable *)'s mode[] array needs to match
the declaration of the enum ir_variable_mode. It's hard to verify
that at compile time, but at least we can use a STATIC_ASSERT to make
sure it's the right size.
This required adding ir_var_mode_count to the enum.
This patch updates the interp[] array to match the enum
glsl_interp_qualifier.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Add a STATIC_ASSERT to make sure the array is the correct size.
This required adding INTERP_QUALIFIER_COUNT to the enum.
Fixes uninitialized scalar variable defect reported by Coverity.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
The multiplication part of tgsi_umad did not work on Cayman, because it did
not populate the correct vector slots.
This fixed hardlocks in the EXT_transform_feedback/order tests.
NOTE: This is a candidate for the stable branches.
(might not be easy to cherry-pick though)
Signed-off-by: Marek Olšák <maraeo@gmail.com>
This option can force textures to be untiled. However, on Gen6+, depth
buffers must be Y-tiled. MSAA buffers also must be Y-tiled. So setting
this option on even a trivial application like glxgears causes assertion
failures in a debug build, and likely GPU hangs in a release build.
It's just giving users a license to shoot themselves in the foot.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
In the past, we preferred X-tiling for color buffers because our BLT
code couldn't handle Y-tiling. However, the BLT paths have been largely
replaced by BLORP on Gen6+, which can handle any kind of tiling.
We hadn't measured any performance improvement in the past, but that's
probably because compressed textures were all untiled anyway.
Improves performance in GLB27_TRex_C24Z16_FixedTime by 7.69231%.
v2: Rebase on top of Eric's untiled-for-larger-than-aperture changes.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
The code has no rationale for why we would force compressed textures to
be untiled, and it appears to work fine. Git archeology indicates that
it's been that way dating back to when we first started tiling.
Improves performance in GLB27_TRex_C24Z16_FixedTimeStep at 1280x720 by
10.0529% +/- 0.573075% (n=12). Improves performance in Xonotic by
4.56409% +/- 0.27965% (n=3).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This patch (1) extracts from intel_miptree_create() the spaghetti logic
that selects the tiling format, (2) rewrites that spaghetti into a lucid
form, and (3) moves it to a new function, intel_miptree_choose_tiling().
No behavioral change.
As a bonus, it is now evident that the force_y_tiling parameter to
intel_miptree_create() does not really force Y tiling.
v2 (Ken): Rebase on top of Eric's untiled-for-larger-than-aperture
changes. This required passing in the miptree.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
When moving the renderbuffer to a new miptree, we neglected to allocate
the hiz buffer for the new miptree. Oops.
Fixes all Piglit depthstencil-render-miplevels tests from crash to pass on
Sandybridge.
Note: This is a candidate for the 9.1 branch.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
calim pointed out we were getting mipmap levels for array multisamples,
this didn't make sense. So then I noticed this function takes last_level
so we are passing in a too high value here.
I think this should fix the case he was seeing.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Check the type of the array operand and the index operand before doing
other checks. This simplifies the code a bit now (eliminating the
error_emitted parameter), and enables some later functional changes.
The shader
uniform float x[6];
uniform sampler2D s;
void main() { gl_Position.x = xx[s + 1]; }
still generates (only) the two expected errors:
0:3(33): error: `xx' undeclared
0:3(39): error: Operands to arithmetic operators must be numeric
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously the shader
uniform float x[6];
void main() { gl_Position.x = x[1.0]; }
would have generated the errors
0:2(33): error: array index must be integer type
0:2(36): error: array index must be < 6
Now only
0:2(33): error: array index must be integer type
will be generated.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This puts all of the checks togeher for easier reading. It also means
that all the checks are blocked on array->type->is_array. Shortly this
will allow elimination of some is_error check work-arounds in this
function.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Also, document the reason for not checking for type->is_array in some of
the bound-checking cases.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
That last consumer of the return value was changed to not use it by the
previous commit.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The error_emitted flag is used in semantic checking to prevent spurious
cascading errors. For example,
void foo(sampler2D s, float a)
{
float x = a + (1.2 + s);
...
}
should only generate a single error. Without the error_emitted flag for
the first error, "a + ..." would also generate an error.
However, a bunch of cases in _mesa_ast_array_index_to_hir that were
setting error_emitted would mask legitimate errors. For example,
vec4 a[7];
float b = a[3.14];
should generate two error (float index and type mismatch in assignment).
The uses of error_emitted would cause only the first to be emitted.
This patch removes most of the places in _mesa_ast_array_index_to_hir
that would set the error_emitted flag.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I love 800+ line switch-statements as much as the next guy... Future
commits will make changes to this part of the AST-to-HIR conversion, and
extracting this code will make that a bit easier.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This still fails, since 8192*4bpp == 32768, which is too big to use the
blitter on.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Now that we require 2.6.39, there's no need to also check for 2.6.29.
Calling drm_intel_bufmgr_gem_enable_fenced_relocs() without checking
should be safe, as it simply sets a flag.
This does remove the check for zero fences available, but that doesn't
seem worth checking.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Eric Anholt <eric@anholt.net>
Chris Wilson's relaxed relocation patch landed in March 2011. Anyone
running pre-3.0 kernels probably isn't going to get the latest Mesa
anyway.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Eric Anholt <eric@anholt.net>
These were likely used for BRW_NEW_... dirty bit flags at one point, but
they're unused now.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Nobody uses this value, so there's no need to set it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
When I removed the proj_attrib_mask optimization, I also removed the
last consumer of this bit without realizing it.
Since nobody uses it, there's no point in flagging it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Clover needs the irreader component of llvm
v2: Check for irreader component
irreader is only available with LLVM 3.3 >= 177971
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Signed-off-by: Niels Ole Salscheider <niels_ole@salscheider-online.de>
It has 2 dependencies: glClampColor and the framebuffer, we might just as well
do the update where those two are changed.
v2: cosmetic changes from Brian's email
Reviewed-by: Brian Paul <brianp@vmware.com>
This should reduce shader recompilations with drivers that emulate fragment
color clamping, because we want the clamping to be enabled only if there is
a signed normalized or floating-point colorbuffer.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reported-by: `per` in #intel-gfx
The size of the cache key varies, so store the actual size as well as
the key blob itself, rather than just assuming it's the same as the size
passed in.
NOTE: This is a candidate for stable branches.
V2: Don't leave silly holes in structure; use unsigned instead of GLuint.
V3: Fix missing case for `last` match.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
This function is a holdover from r600g and is identical to
si_pm4_inval_texture_cache(), so it is not needed.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com
This target string now contains four values instead of three. The old
processor field (which was really being interpreted as arch) has been split
into two fields: processor and arch. This allows drivers to pass a
more a more detailed description of the hardware to compiler frontends.
v2:
- Adapt to libclc changes
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Add UTIL_FORMAT_LAYOUT_ETC to util_format_is_compressed. It was missing.
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Don't check if there's sampler support for stencil if we're not
going to actually blit/copy stencil values. Fixes the case where
we mistakenly said we can't support a blit of depth values from
S8Z24 to X8Z24.
Also, rename the is_stencil variable to dst_has_stencil to improve
readability.
NOTE: This is a candidate for the stable branches.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Switch to use the envytools generated headers for register/bitfield
definitions. This is the first step in preparing to add a3xx support,
since it avoids having conflicting names for a3xx and a2xx registers.
And since I'm using envytools for a3xx it is simpler to just use it for
everything.
This shouldn't cause any functional change, it is really just a lot of
renaming.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Otherwise we will not receive destroy windows events, causing framebuffers
to leak.
This happens particularly with java and jogl.
Tested with java + jogl, MATLAB.
VMware Internal Bug Number: 1013086.
Reviewed-by: Brian Paul <brianp@vmware.com>
At least on llvm 3.2 this appears to work fine. Tested on an Athlon XP
2600+, which has sse and 3dnow but not sse2.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Signed-off-by: Adam Jackson <ajax@redhat.com>
Build time option, set RADEON_CS_DUMP_ON_LOCKUP to 1 in radeon_drm_cs.h to
enable it.
When enabled after each cs submission the code will try to detect lockup by
waiting on one of the buffer of the cs to become idle, after a timeout it
will consider that the cs triggered a lockup and will write a radeon_lockup.c
file in current directory that have all information for replaying the cs.
To build this file :
gcc -O0 -g radeon_lockup.c -ldrm -o radeon_lockup -I/usr/include/libdrm
v2: Add radeon_ctx.h file to mesa git tree
v3: Slightly improve dumped file for easier editing, only dump first faulty cs
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
The default wrap mode (PIPE_TEX_WRAP_REPEAT) is incompatible with
unnormalized texcoords (at least for softpipe).
v2: use PIPE_TEX_WRAP_CLAMP_TO_EDGE
Reviewed-by: Marek Olšák <maraeo@gmail.com>
GLBenchmark 2.7's shaders contain conditional blocks like:
if (x) {
if (y) {
...
}
}
where the outer conditional's then clause contains exactly one statement
(the nested if) and there are no else clauses. This can easily be
optimized into:
if (x && y) {
...
}
This saves a few instructions in GLBenchmark 2.7:
total instructions in shared programs: 11833 -> 11649 (-1.55%)
instructions in affected programs: 8234 -> 8050 (-2.23%)
It also helps CS:GO slightly (-0.05%/-0.22%). More importantly,
however, it simplifies the control flow graph, which could enable other
optimizations.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This clarifies that the offset of 2 is actually 16 kB / 8kB units.
It also keys both computations off of a single variable, which should
make it easier to change in the future.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
These variables are only used within a single function, so we may as
well make them local variables.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
This was only produced by the brw_wm_input_dimensions atom, which was
removed in the previous commit. So there's no need for the dirty bit.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This was only used to compute proj_attrib_mask, which was removed by the
previous commit. That makes this dead code.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The previous commit removed the last user of this field, so there's no
longer any point in setting it. Removing this should eliminate
state-dependent recompiles, and make the precompile more reliable.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This optimization attempts to avoid extra attribute interpolation
instructions for texture coordinates where the W-component is 1.0.
Unfortunately, it requires a lot of complexity: the brw_wm_input_sizes
state atom (all the brw_vs_constval.c code) needs to run on each draw.
It computes the input_size_masks array, then uses that to compute
proj_attrib_mask. Differences in proj_attrib_mask can cause
state-dependent fragment shader recompiles. We also often fail to guess
proj_attrib_mask for the fragment shader precompile, causing us to
needlessly compile it twice.
Furthermore, this optimization only applies to fixed-function programs;
it does not help modern GLSL-based programs at all. Generally, older
fixed-function programs run fine on modern hardware anyway.
The optimization has existed in some form since the initial commit. When
we rewrote the fragment shader backend, we dropped it for a while. Eric
readded it in commit eb30820f26 as part of
an attempt to cure a ~1% performance regression caused by converting the
fixed-function fragment shader generation code from Mesa IR to GLSL IR.
However, no performance data was included in the commit message, so it's
unclear whether or not it was successful.
Time has passed, so I decided to re-measure this. Surprisingly,
Eric's OpenArena timedemo actually runs /faster/ after removing this and
the brw_wm_input_sizes atom. On Ivybridge at 1024x768, I measured a
1.39532% +/- 0.91833% increase in FPS (n = 55). On Ironlake, there was
no statistically significant difference (n = 37).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This is the same computation as the _WriteEnabled flag, so we may as
well use it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
ctx->Stencil.WriteMask is a statically sized array of 3 elements.
Checking it against 0 actually is a NULL check, and can never fail,
which meant that we always said stencil writes were enabled.
Use the new core Mesa derived state flag to fix this.
NOTE: This is a candidate for stable branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
i965 needs to know whether stencil writes are enabled in several places,
and gets the test wrong sometimes. While we could create a function to
compute this, it seems generally useful enough to warrant a new piece of
derived state. Also, all the plumbing is already in place.
NOTE: This is a candidate for stable branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
The ar_ge_as_at variable was just very very confusing since the condition
was actually the other way around (as_at_ge_ar). So change the condition
(and the selects depending on it) to match the variable name.
And also change the chosen major axis in case the coord values are the
same. OpenGL doesn't care one bit which one is chosen in this case but
it looks like dx10 would require z chosen over y, and y chosen over x
(previously did x chosen over y, y chosen over z). Since it's all the
same effort just honor dx10's wishes. (Though actually, for some prefered
orderings, we could save one (or two with derivatives) selects since the
tnewx and tnewz (and the corresponding dmax values) are the same.)
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The way we were allocating registers before, packing into low register
numbers for Ironlake, resulted in an overly-constrained dependency graph
for instruction scheduling. Improves GLBenchmark 2.1 performance by
4.5% +/- 0.7% (n=26). No difference on my old GLSL demo (n=20). No
difference on nexuiz (n=15).
v2: Fix off-by-one bug that made the change only work for 16-wide on i965.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
GCC 4.8 now warns about typedefs that are local to a scope and not
used anywhere within that scope. This produced spurious warnings with
the STATIC_ASSERT() macro (which used a typedef to provoke a compile
error in the event of an assertion failure).
This patch switches to a simpler technique that avoids the warning.
v2: Avoid GCC-specific syntax. Also update p_compiler.h.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The former just checks that the given block is valid by checking
the header and footer.
The later sets the memory block's tag. With extra debug code, we
can use that for monitoring/checking particular allocations.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
This is trivial now, though need to make sure we pass all the necessary
derivative values (which is 3 each for ddx/ddy not 2).
Passes piglit arb_shader_texture_lod-texgradcube test.
v2: add the forgotten abs() for all incoming derivatives (discovered
by new piglit arb_shader_texture_lod-texgradcube test, though more by
luck as it was failing only for exactly one pixel...).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This proved to be tricky, the problem is that after selection/mirroring
we cannot calculate reasonable derivatives (if not all pixels in a quad
end up on the same face the derivatives could get "randomly" exceedingly
large).
However, it is actually quite easy to simply calculate the derivatives
before selection/mirroring and then transform them similar to
the cube coordinates (they only need selection/projection, but not
mirroring as we're not interested in the sign bit, of course). While
there is a tiny bit more work to do (need to calculate derivs for 3
coords instead of 2, and additional selects) it also simplifies things
somewhat for the coord selection itself (as we save some broadcast aos
shuffles, and we don't need to calculate the average vector) - hence if
derivatives aren't needed this should actually be faster.
Also, this has the benefit that this will (trivially) work for explicit
derivatives too, which we completely ignored before that (will be in a
separate commit for better trackability).
Note that while the way for getting rho looks very different, it should
result in "nearly" the same values as before (the "nearly" is only because
before the code would choose the face based on an "average" vector and hence
the derivatives calculated according to this face, where now (for implicit
derivatives) the derivatives are projected on the face selected for the
first (top-left) pixel in a quad, so not necessarly the same face).
The transformation done might not quite be state-of-the-art, calculating
length(dx,dy) as max(dx,dy) certainly isn't neither but this stays the
same as before (that is I think a better transform would _somehow_ take
the "derivative major axis" into account so that derivative changes in
the major axis wouldn't get ignored).
Should solve some accuracy problems with cubemaps (can easily be seen with
the cubemap demo when switching wrapping/filtering), though we still don't
do seamless filtering to fix it completely (so not per-sample but per-pixel
is certainly better than per-quad and already sufficient for accurate
results with nearest tex filter).
As for performance, it seems to be a tiny bit faster too (maybe 3% or so
with cubemap demo). Which I'd have expected with nearest/nearest filtering
where this will be less instructions, but the difference seems to actually
be larger with linear/linear_mipmap_linear where it is slightly more
instructions, probably the code appears less serialized allowing better
scheduling (on a sandy bridge cpu). It actually seems to be now at least
as fast as the old path using a conditional when using 128bit vectors too
(that is probably more a result of testing with a newer cpu though), for now
that old path is still there but unused.
No piglit regressions.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Using a different packing for the single coord case should save a shuffle.
Plus some minor style fixes.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Should be way faster of course on cpus supporting this (includes AMD
Bulldozer and Jaguar cores, Intel Ivy Bridge and up (except budget models)).
Passes piglit fbo-blending-formats GL_ARB_texture_float -auto on Ivy Bridge.
Reviewed-by: Brian Paul <brianp@vmware.com>
When geometry shaders are present, one needs to be able to create
an empty geometry shader with stream output that needs to be
resolved later and attached to the currently bound vertex shader.
Lets add support for it to llvmpipe and draw. draw allows attaching
independent stream output info to any vertex shader and llvmpipe
resolves at draw time which vertex shader the given empty geometry
shader should be linked to.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
We need to reset the internal state of the so buffers or we'll
keep appending even though we're not supposed to.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
I think this was there before and got accidently
removed during a merge. Same code as for the GS
context, which is also using an enum instead of
hardcoded numbers.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
It's quite helpful during the rendering when we know
exactly the count of the vertices available in the
buffer.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
We were flushing with incorrect number of primitives. TGSI exec
can only work with a single primitive at a time. Plus the fetching
with multiple primitives on llvm paths wasn't copying the last
element.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
The functions are prototyped in u_transfer.h and are related to the
other functions in u_transfer.c.
The next patch will re-use the u_resource.c file for new code.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The fallbacks count is the number of drawing calls that use a "draw"
module fallback, such as polygon stipple.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
NOTE: Changed the semantic index for the drawtex coordinate to
be the texture unit index instead of always 0.
Not sure if this is correct but since the value seems to depend
on the unit it would make sense to use different varying slots.
This still isn't optimal, since the fence will signal a bit late,
but better than checking on the bo, which may never be ready if it
is shared (which is likely).
Also, renamed "pixels-rendered" to "samples-passed" because the
occlusion counter increments even if colour and depth writes are
disabled, or (on some implementations) for killed fragments that
passed the depth test when PS early_fragment_tests is set.
This patch consolidates duplicate code in the brw_depthbuffer and
gen7_depthbuffer state atoms. Previously, these state atoms contained
5 chunks of code for emitting the _3DSTATE_DEPTH_BUFFER packet (3 for
Gen4-6 and 2 for Gen7). Also a lot of logic for determining the
appropriate buffer setup was duplicated between the Gen4-6 and Gen7
functions.
This refactor splits the code into three separate functions:
brw_emit_depthbuffer(), which determines the appropriate buffer setup
in a mostly generation-independent way, brw_emit_depth_stencil_hiz(),
which emits the appropriate state packets for Gen4-6, and
gen7_emit_depth_stencil_hiz(), which emits the appropriate state
packets for Gen7.
Tested using Piglit on Gen5-7 (no regressions).
v2: Re-word some comments. Fix an assertion that incorrectly
prohibited packed depth/stencil formats on Gen6 (these are allowed
provided that HiZ is disabled).
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This reverts commit dbf94d105a, which
was working around a bug in the handling of array indexing when
constant folding built-in functions. Now that the constant folding
bug has been fixed, the workaround is no longer needed.
Mesa constant-folds built-in functions by using a miniature GLSL
interpreter (see
ir_function_signature::constant_expression_evaluate_expression_list()).
This interpreter had a bug in its handling of array indexing, which
caused expressions like "m[i][j]" (where m is a matrix) to be handled
incorrectly. Specifically, it incorrectly treated j as indexing into
the whole matrix (rather than indexing just into the vector m[i]); as
a result the offset computed for m[i] was lost and m[i][j] was treated
as m[j][0].
Fixes piglit tests inverse-mat[234].{vert,frag}.
NOTE: This is a candidate for the 9.1 and 9.0 branches.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=57436
Conceptually the same as previously done in float_to_half.
Should cut down number of instructions from 14 to 10 or so, but
will promote some NaNs to Infs, so it's disabled.
It gets a bit tricky though handling all the cases correctly...
Passes basic tests either way (though there are no tests testing special
cases, but some manual tests injecting them seemed promising).
v2: style and comment fixes suggested by Jose
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This replaces the existing float-to-half implementation.
There are definitely a couple of differences - the old implementation
had unspecified(?) rounding behavior, and could at least in theory
construct Inf values out of NaNs. NaNs and Infs should now always be
properly propagated, and rounding behavior is now towards zero
(note this means too large but non-Infinity values get propagated to max
representable value, not Infinity).
The implementation will definitely not match util code, however (which
does nearest rounding, which also means too large values will get
propagated to Infinity).
Also fix a bogus round mask probably leading to rounding bugs...
v2: fix a logic bug in handling infs/nans.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reduced stack size allows to run more threads in some cases,
improving performance for the shaders that use stack (that is, for the
shaders with control flow instructions). E.g. with unigine-based apps.
v4: implement exact computation taking into account wavefront size
v5: add cases for RV620, RS880
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
v2: reduce key size, don't copy key around to much.
v3: remove key size reduction
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
A fix for lower_jumps progress reporting, very much like similar in
c1e591eed.
NOTE: This is a candidate for stable branches.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
All the other expression types allowed here have inst->mlen == 0, and this
one has implied MRF writes for all of its payload, so nothing else in the
implementation should need to change.
Reduces SEND messages for loading from pull constants in kwin's Lanczos
shader from 16 to 6. (Due to a deficiency in constant propagation, I
can't use the hack I did in the previous commit to test the performance
change)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=61554
NOTE: This is a candidate for the 9.1 branch.
This comes at a minor performance cost at the moment (-3.2% +/- 0.2%, n=14 on
my GM45 forced to load all uniforms through the varying-index path), but we
get a whole vec4 at a time to reuse in the next commit.
v2: Fix comment about channels in the other message.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
NOTE: This is a candidate for the 9.1 branch.
We weren't setting needs_dep[i] in the loops, so we'd continue on to
potentially add the same workaround MOVs to the later basic block
boundaries, too. We can either set needs_dep[i] to exit through the
normal path, or we can just return since we know we're done.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
For sampler messages, it depends on the target gen, and on gen4
SIMD16-sampler-on-SIMD8-execution we were returning 4 instead of 8 like we
should.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
NOTE: This is a candidate for the 9.1 branch.
I think this makes it much more obvious what's going on here.
NOTE: This is a candidate for the 9.1 branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is our first CSE on a regs_written() > 1 instruction, so it takes a
bit of extra fixup. Reduces the number of loads on kwin's Lanczos shader
from 12 to 2.
v2: Fix compiler warning (false positive on possibly-uninitialized variable)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=61554
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
NOTE: This is a candidate for the 9.1 branch.
Like we have done for the VS and for constant-index uniform loads, we use
the sampler engine to get caching in front of the L3 to avoid tickling the
IVB L3 bug. This is also a bit of a functional change, as we're now
loading a vec4 instead of a single dword, though we're not taking
advantage of the other 3 components of the vec4 (yet).
With the driver hacked to always take the varying-index path for all
uniforms, improves performance of my old GLSL demo by 315% +/- 2% (n=4).
This a major fix for some blur shaders in compositors from the
varying-index uniforms support I introduced in 9.1.
v2: Move old offset computation into the pre-gen7 path.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=61554
NOTE: This is a candidate for the 9.1 branch.
Right now we don't have anything with regs_written() > 1 and !inst->mlen,
but that's about to change.
NOTE: This is a candidate for the 9.1 branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We want to load vec4s, since loading a vec4 instead of a dword is
basically no increased latency. But for variable indexed access, the
previous requirement of aligned vec4s for a sampler LD was hard to
implement.
Note that this change only affects those messages that use the surface
format, like sampler LDs, but not to the untyped data cache loads we've
used in other cases.
No significant performance difference on my GLSL demo with uniforms forced
to take the varying pull constants path (n=4).
NOTE: This is a candidate for the 9.1 branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This puts the rounding-up logic into the function itself instead of all
the callers having to manage it. Also drop an "unused" comment in gen4,
as the stride *is* used for texbos (and will be for uniforms soon).
NOTE: This is a candidate for the 9.1 branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I'm going to want to change the math for gen7 using sampler LD
instructions in a way that gets CSE to occur like we'd hope.
NOTE: This is a candidate for the 9.1 branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We weren't inserting it into the list, so it did nothing. This line was
replaced by the MOV/MUL block above.
NOTE: This is a candidate for the 9.1 branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This happens quite a bit with varying-index uniform loads. We could also
do better by avoiding the MACH entirely, but there's no reason not to at
least take this step.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If the active uniform is an array, then the length of the uniform name should
include the three extra characters for the "[0]" suffix, which is required by
the GL 4.2 spec to be appended to the uniform name in glGetActiveUniform().
This avoids the situation where the output buffer does not have enough space
to hold the "[0]" suffix, resulting in an incomplete array specification like
"foobar[0".
NOTE: This is a candidate for the 9.1 branch.
Change-Id: I41e87ba347a7169eec8c575596cc3416adbe0728
Signed-off-by: Haixia Shi <hshi@chromium.org>
Reviewed-by: Stéphane Marchesin <marcheu@chromium.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This is a more aggressive version of the old brw_optimize() path. Reduces
cycles spent in the vertex shader on minecraft by 18.6% +/- 10.0% (n=15).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We dump shader source in ir_to_mesa.cpp, and we dump linked programs here,
but we had no reference from the linked programs to their source. This
was preventing improvement of shader-db to use linked shader programs
instead of individual shader files (which is bogus, because it means we
optimize out VS outputs, and don't interpolate FS inputs!)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If we're drawing to a surface that's 2048 x 2048 pixels or larger there's
danger of fixed-point overflow in the triangle rasterization code. That
leads to various rendering glitches.
Rather than implement some intricate changes to the rasterization code,
simply subdivide triangles into smaller subtriangles to avoid the issue.
Only do this when the drawing surface is larger than 2048 by 2048.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
This can be enabled everywhere that ARB_texture_multisample is
supported -- ARB_texture_storage is supported on everything.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Brian Paul <brianp@vmware.com>
ARB_texture_storage_multisample allows texture parameters to be
queried for TEXTURE_2D_MULTISAMPLE and TEXTURE_2D_MULTISAMPLE_ARRAY
targets.
Some parameters may also be set, with the following exceptions:
- TEXTURE_BASE_LEVEL may not be set to a nonzero value; generates
INVALID_OPERATION
- any state which appears in the `per-sampler` state table may not
be set; generates INVALID_OPERATION
V2: Don't introduce bogus handling of TEXTURE_MAX_LEVEL
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Brian Paul <brianp@vmware.com>
Now that there are 4 variants, just pass the function name into
teximagemultisample rather than reconstructing it.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Brian Paul <brianp@vmware.com>
Adds XML for the extension, dispatch_sanity enabling, and the two new
entrypoints. These are both implemented by calling the shared
teximagemultisample() with immutable=GL_TRUE.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Brian Paul <brianp@vmware.com>
The new entrypoints will come later, but this adds the actual logic for
supporting immutable multisample textures:
- The immutability flag is set as desired.
- Attempting to modify an immutable multisample texture produces
INVALID_OPERATION.
Note: The extension spec does not mention adding this behavior to
TexImage*Multisample, but it seems like the reasonable thing to do.
V2: - Cover missing error cases (unsized formats; texture object zero)
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
[V1] Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Brian Paul <brianp@vmware.com>
This is about to be used in teximagemultisample() when immutable=true.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Brian Paul <brianp@vmware.com>
We are intentionally not allocating a slot for gl_ClipVertex. But by
leaving the bit set in the slots_valid, the fragment shader's computation
of where varyings are in urb entry coming out of the SF would be off by
one. Fixes rendering in Freespace 2 SCP, and improves rendering in TF2.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62830
Tested-by: Joaquín Ignacio Aramendía <samsagax@gmail.com>
NOTE: This is a candidate for the 9.1 branch.
Reviewed-and-tested-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Alessandro Pignotti noted when I added this code in commit
0e723b135b that it's in the else block for
"if (busy)", so this debug print couldn't happen.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When reading a column from a row-major matrix, we would slot the single
value read into the vector using an ir_dereference_array of the vector
with a constant index. This will (eventually) get optimized to a
masked-write, so just generate the masked write in the first place.
v2: Remove unused variable 'chan'. Suggested by Ken.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: Eric Anholt <eric@anholt.net>
Search and replace:
][0] -> ].x
][1] -> ].y
][2] -> ].z
][3] -> ].w
Fixes piglit tests inverse-mat[234].{vert,frag}. These tests call the
inverse function with constant parameters and expect proper constant
folding to happen. My suspicion is that this patch papers over some bug
in constant propagation involving array accesses.
Either way, all of these accesses eventually get lowered to swizzles.
This cuts out the middle man (saving a trivial amount of CPU).
NOTE: This is a candidate for the 9.1 branch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: Eric Anholt <eric@anholt.net>
Cc: Paul Berry <stereotype441@gmail.com>
Since the case was missing bec4->get_scalar_type() would return bvec4,
but vec4->get_scalar_type() would return float.
NOTE: This is a candidate for stable branches.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
"discard" instructions generate HALT instructions which jump to a final
HALT near the end of the shader. Previously, fs_generator created this
final jump target when it saw the first FS_OPCODE_FB_WRITE, causing it
to jump right before the FB write epilogue. This is normally good.
However, INTEL_DEBUG=shader_time also has an epilogue section which
records the final timestamp. The frontend emits IR for this just before
FS_OPCODE_FB_WRITE. Unfortunately, this led to the following ordering:
1. Shader Time Epilogue
2. Final HALT (where discards jump)
3. Framebuffer Write Epilogue
This meant that discarded pixels completely skipped the shader time
epilogue, causing no ending timestamp to be written. This obviously
led to inaccurate results.
This patch adds a new FS_OPCODE_PLACEHOLDER_HALT in the IR stream just
before any epilogue sections. This is where the final HALT should be
generated, and makes it easy to ensure the correct ordering:
1. Final HALT
2. Shader Time Epilogue
3. Framebuffer Write Epilogue
For shaders that don't discard, this opcode compiles away to nothing.
The scheduler adds barrier dependencies to make sure that it doesn't
get moved above any FS_OPCODE_DISCARD_JUMP instructions.
One 8-wide shader in GLBenchmark 2.7 dropped from 2291.67 Gcycles to
a mere 5.13 Gcycles.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
I'd previously added the minimum names to understand my dumps, but this
makes dumps in general much easier to read.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2 [mattst88]:
- Rebase.
- #define GL_ARB_texture_query_lod to 1.
- Remove comma after ir_lod in ir.h for MSVC.
- Handled ir_lod in ir_hv_accept.cpp, ir_rvalue_visitor.cpp,
opt_tree_grafting.cpp.
- Rename textureQueryLOD to textureQueryLod, see
https://www.khronos.org/bugzilla/show_bug.cgi?id=821
- Fix ir_reader of (lod ...).
v3 [mattst88]:
- Rename textureQueryLod to textureQueryLOD, pending resolution of
Khronos 821.
- Add ir_lod case to ir_to_mesa.cpp.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
x before
+ after
+------------------------------------------------------------------------------+
| x x + |
| xx ++ x + |
| xx ++ + xx ++ |
|x xxx x+++++ + xxx x*x+*+++ + x +|
| |_____|____________A______A____M____M_|_______| |
+------------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 23 8083.78 8287.83 8205.55 8162.7461 68.307951
+ 23 8107.56 8358.74 8224.33 8186.1765 71.506301
No difference proven at 95.0% confidence
Reviewed-by: Eric Anholt <eric@anholt.net>
Set its latency to what happens to be the default floating-point
instruction latency. One day we may want to handle latency based on
register bank information.
Reviewed-by: Eric Anholt <eric@anholt.net>
Similar enough that we can try to use shared code.
v2: fix a stupid bug using wrong variable causing mayhem with Inf and NaNs.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com
Previously at least i915 failed to provide an implementation, but
exposed ARB_internalformat_query anyway, leading to crashes when
QueryInternalformativ was called.
Default implementation just returns 1 for everything, so is suitable for
any driver which does not support multisampling.
V2: - Move from intel to core mesa.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
There's more, but this only adds (most) of the counters that are
handled directly by the shader processors.
The other counter domains are not handled on the multiprocessor and
there are no FIFO object methods for configuring them.
Instead, they have to be programmed by the kernel via PCOUNTER, and
the interface for this isn't in place yet.
We can use %-6s%-6s rather than manually counting characters, resulting
in much more readable code.
This necessitates a small secondary change: using "total fs16" and ""
now causes the "" string to be padded out to 6 characters, resulting in
too much whitespace. Splitting it into "total" and "fs16" produces the
same output as before.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
This much more accurately reflects the cost of the vertex shader, since
the payload setup is often a significant fraction of the instructions in
the VS.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This incidentally also teaches it a bit about gen6 math -- we now allow
unswizzled, unmodified GRF temps as the sources for math.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, if you just wrote a constant color to the render target, no
time got noted at all. This is convenient for doing single-instruction
timings, but not so much for actual program analysis.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This avoids conflicts between shader_time and FB writes, so we can include
more of the program under our profiling. This does mean hiding more of
the message setup from the optimizer, which doesn't have a way to handle
multi-reg sends from GRFs.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Ken asked me the other day what -1 vs 0 vs 3 vs other meant in our shader
names, and I realized that it was really unclear. I'd like to do even
better, like noting which one is the clear shader, but that would require
exposing the metaops struct to the driver.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We weren't correctly propagating the samplers and sampler views
when they were related to geometry shaders.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
We were allocating the output buffer but using the input
primitives. We need to allocate that buffer using the
maximum number of output, not input, primitives.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
TGSI semantics currently require an implicit endprim at the end
of GS if an ending primitive hasn't been emitted.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
This commits implements code generation of the geometry shaders in
the SOA paths. All the code is there but bugs are likely present.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Allows executing gs on up to 4 primitives at a time. Will also be
required by the llvm code because there we definitely don't want
to flush with just a single primitive.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
To be able to add llvm paths later on we need to have some common
interface for them.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
The member was never used and we'll need to handle it differently
because gs will also need samplers/textures setup.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Same as on r600, trace cs execution by writting cs offset after each
states, this allow to pin point lockup inside command stream and
narrow down the scope of lockup investigation.
v2: Use WRITE_DATA packet instead of WRITE_MEM
v3: Remove useless nop packet
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Fixes various test fallout from 90b5a2425a on Pineview, which claims to
support ARB_internalformat_query but doesn't actually provide the
driverfunc.
That driver is still broken [GetInternalformativ will still segfault!]
but it was silly to be going through the sample count logic in the
nonmultisampling case at all.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The pipe query interface is reused. The list of available queries can be
obtained using pipe_screen::get_driver_query_info.
Reviewed-by: Brian Paul <brianp@vmware.com>
Virtual address is used for PIPE_QUERY_SO* queries in
r600_emit_query_begin, but not in r600_emit_query_end.
This will trigger a GPU fault when one of those queries is
made and virtual address is enabled.
Note: this is a candidate for the 9.1 branch
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Optimize out parts of the render target that are scissored out by taking
into account maximal scissor bounds in fd_gmem_render_tiles().
This is a big win on things like gnome-shell which frequently do partial
screen updates.
Signed-off-by: Rob Clark <robdclark@gmail.com>
target-specific variables are undefined when used as pre-requisites.
instead, use secondary-expansion.
I noticed this when building the patch:
i965: Add a driconf option to disable flush throttling
Signed-off-by: Adrian Marius Negreanu <adrian.m.negreanu@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Since half of ir_validate uses asserts() (the other using printf() then
abort()), there's not much use to calling it in a release build. Cuts
6.3% of the startup time of TF2.
NOTE: This is a candidate for the stable branches.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes assign instead of compare defects reported by Coverity.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
This patch changes the arrays in brw_vue_map (which only ever contain
values from -1 to 58) from ints to signed chars. This reduces the
size of the struct from 488 bytes to 136 bytes.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: fix STATIC_ASSERT to use 127 instead of 128.
Reviewed-by: Eric Anholt <eric@anholt.net>
With the introduction of geometry shaders, fragment inputs will no
longer come exclusively from the vertex shader; sometimes they come
from the geometry shader. So the name "vp_outputs_written" will
become a misnomer. This patch renames vp_outputs_written to
input_slots_valid, to reflect the true meaning of the bitfield from
the fragment shader's point of view: it indicates which of the
possible input slots contain valid data that was written by the
previous shader stage.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch modifies post-GS pipeline stages (transform feedback, clip,
sf, fs) to refer to the VUE map through brw->vue_map_geom_out rather
than brw->vs.prog_data->vue_map. This ensures that when geometry
shader support is added, these pipeline stages will consult the
geometry shader output VUE map when appropriate, rather than the
vertex shader output VUE map.
v2: Fixed some stale "CACHE_NEW_VS_PROG" comments.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Currently, the GPU pipeline has one active VUE map in effect at any
given time--the one representing the layout of vertex data coming from
the vertex shader. However, when geometry shaders are added, they
will have their own independent VUE map. Later pipeline stages (clip,
sf, fs) will need to consult the geometry shader VUE map if a geometry
shader is in use, and the vertex shader VUE map otherwise.
This patch adds a new field to brw_context, vue_map_geom_out, which
contains the VUE map that should be used by later pipeline stages. It
also adds a new state flag, BRW_NEW_VUE_MAP_GEOM_OUT, which is
signalled whenever the contents of the VUE map changes.
Since we don't support geometry shaders yet, vue_map_geom_out is
currently set only by the brw_vs_prog state atom.
v2: Don't set vue_map_geom_out in do_vs_prog--that's redundant and
possibly problematic for precompiles. Only set it in
brw_upload_vs_prog. Also, make a copy instead of using a
pointer--this makes it possible to detect when the VUE map hasn't
changed, so we can avoid redundant state uploads.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Future patches will allow for there to be separate VUE maps when both
a geometry shader and a vertex shader are in use. When this happens,
we will want to have correspondingly separate outputs_written
bitfields. Moving outputs_written into the VUE map will make this
easy.
For consistency with the terminology used in the VUE map, the bitfield
is renamed to "slots_valid" in the process.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Gen7 adds mask bits to the message header for a URB write which allow
the write to apply only to certain channels. We don't use this
functionality, so to ensure that the entire write always occurs, we
emit an OR instruction to set the mask bits.
With the advent of geometry shaders, URB writes won't just happen at
the end of a thread; they will happen in mid-thread too. Thus, we can
no longer rely on channel 0 being enabled, so we need to emit the OR
instruction in WE_all mode to ensure that it is executed.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The new name clarifies that it represents *one more* than the maximum
possible brw_varying_slot value.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch removes the terminology "vert_result" from the i965 driver,
replacing it with "varying". The old terminology, "vert_result", was
confusing because (a) it referred to the enum gl_vert_result, which no
longer exists (it was replaced with gl_varying_slot), and (b) it
implied a vertex output, but with the advent of geometry shaders, it
could be either a vertex or a geometry output, depending what shaders
are in use. The generic term "varying" is less confusing.
No functional change.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Whitespace fixes.
Bump MAX_DEPTH_TEXTURE_SAMPLES to match what GetInternalformativ is
claiming. Since that limit is what is actually enforced now, this
doesn't actually change anything except the queried value.
There's still no piglits verifying that multisample depth textures work,
but this works in the Unigine demos.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Extends _mesa_check_sample_count() to properly support the
TEXTURE_2D_MULTISAMPLE and TEXTURE_2D_MULTISAMPLE_ARRAY targets, which
have subtly different limits than renderbuffers.
This resolves the remaining TODO in the implementation of
TexImage*DMultisample.
V2: - Don't introduce spurious block.
- Do this in multisample.c instead.
- Fix typo in error message.
- Inline spec quotes
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Pulls the checking of the sample count into a helper function, and
extends the existing logic to include the interactions with both
ARB_texture_multisample and ARB_internalformat_query.
_mesa_check_sample_count() checks a desired sample count against a
a combination of target/internalformat, and returns the error enum
to be produced, if any. Unfortunately the conditions are messy and the
errors vary.
V2: - Tidy up spurious block.
- Move _mesa_check_sample_count() to multisample.c instead; It
doesn't really belong in fbobject.c or teximage.c.
- Inlined spec quotes
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now that we support ARB_texture_multisample, there are multiple targets
accepted for this query, and they may have target-dependent limits, so
pass the target to the driverfunc.
For example, the sampling hardware may not be able to do general
texelFetch() for some format/sample count combination, but the driver
may still be able to implement a reasonable resolve operation, so it can
be supported for renderbuffers.
V2: - Don't break Gallium compile.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
And use this (and the code for r11g11b10 packed float to float conversion)
in the soa texturing code (the generated code looks quite good).
Should be an order of magnitude faster probably than using the fallback
(not measured).
Tested with piglit texwrap GL_EXT_packed_float and
GL_EXT_texture_shared_exponent respectively (didn't find much else using
it).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The blit-based paths for TexImage, GetTexImage, and ReadPixels aren't very
fast with software rasterizer. Now Gallium drivers have the ability to turn
them off.
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
Initial version contributed by: Martin Andersson <g02maran@gmail.com>
This is only used if the memcpy path cannot be used and if no transfer ops
are needed. It's pretty similar to our TexImage and GetTexImage
implementations.
The motivation behind this is to be able to use ReadPixels every frame and
still have at least 20 fps (or 60 fps with a powerful GPU and CPU)
instead of 0.5 fps.
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
I'll need the _mesa_readpixels_needs_slow_path function for the blit-based
version, but it's also useful to have this memcpy-based path in one place
and not scattered across several functions.
v2: add "const" to function parameters
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
I'll need both new functions for later. For now, it consolidates the code
for determining what the transfer ops should be and makes it a little bit
smarter.
v2: added "const"
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
New conversion code to handle conversion from/to r11g11b10 AoS to/from
SoA floats, and also add code for conversion from rgb9e5 AoS to float SoA
(which works pretty much the same as r11g11b10 except for the packing).
(This code should also be used for texture sampling instead of
relying on u_format conversion but it's not yet, so rgb9e5 is unused.)
Unfortunately a crazy amount of hacks is necessary to get the conversion
code running in llvmpipe's generate_unswizzled_blend, which isn't well
suited for formats where the storage representation has nothing to do
with what's needed for blending (moreover, the conversion will convert
from packed AoS values, which is the storage format, to float SoA values,
because this is much more natural for the conversion, and likewise from
SoA values to packed AoS values - but the "blend" (which includes
trivial things like partial mask) works on AoS values, so incoming fs
values will go SoA->AoS, values from destination will go packed
AoS->SoA->AoS, then do blend, then AoS->SoA->packed AoS which probably
isn't the most efficient way though the shuffles are probably bearable).
Passes piglit fbo-blending-formats (with GL_EXT_packed_float parameter),
still need to verify Inf/NaNs (where most of the complexity in the
conversion comes from actually).
v2: drop the (very bogus) rgb9e5 part, and do component extraction
in the helper code for r11g11b10 to float conversion, making the code
slightly more compact (suggested by Jose), now that there are no other
callers left this works quite well. (Could do the same for the
opposite way but it's less than ideal there, final part of packing
needs to be done in caller anyway and there'd be another conditional.)
v3: minor style and comment fixes. Also fix a potential issue with
negative zero being potentially returned by max(src, zero) as we
don't have well-defined min/max behavior (fortunately no additonal cost).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This helps minimize confusion / effort when moving between branches or
helping others.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Normally when submitting the first batch buffer after a flush, we
check whether the GPU has completed processing of the first batch
buffer of the previous frame. If it hasn't, we wait for it to finish
before submitting any more batches. This prevents GPU-heavy and
CPU-light applications from racing too far ahead of the current frame,
but at the expense of possibly lower frame rates. Sometimes when
benchmarking we want to disable this mechanism.
This patch adds the driconf option "disable_throttling" to disable the
throttling mechanism.
Reviewed-by: Eric Anholt <eric@anholt.net>
NOTE: This is a candidate for the 9.1 branch.
Fixes piglit's texture-immutable-levels test.
Reported-by: Marek Olšák <maraeo@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The arithmetic to convert a 3D texture slice to an R coordinate was
incorrect. Found when MSVC warned of a divide by zero.
Note that we don't actually ever hit this path. We don't decompress
slices of 3D textures and we don't support 3D mipmap generation yet.
Fixes random failures with piglit glsl-max-varyings.
NOTE: This is a candidate for the 9.1 branch.
Reviewed-by: Christian König <christian.koenig@amd.com>
This makes basic built-in functions work in GLSL 1.50. It supports
everything except the new Geometry Shader functions.
The new 150.glsl file is 140.glsl plus ARB_texture_multisample.glsl;
150.frag is identical to 140.frag except for the #version bump.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
GLSL 1.50 includes support for the new sampler types introduced by
the ARB_texture_multisample extension.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Commit 33599433c7 began setting the texture swizzle mode to XYZ1 for
RED, RG, and RGB textures in order to force alpha to 1.0 in case we
actually stored the texture as RGBA.
This had a unforseen performance implication: the shader precompile
assumes that the texture swizzle mode will be XYZW for non-shadow
sampler types. By setting it to XYZ1, this means every shader used with
a RED, RG, or RGB texture has to be recompiled. This is a very common
case.
Unfortunately, there's no way to improve the precompile, since RGBA
textures still need XYZW, and there's no way to know by looking at
the shader source what texture formats might be used.
However, we only need to smash alpha to 1.0 if the texture's memory
format actually has alpha bits. If not, the sampler already returns 1.0
for us without any special swizzling. XRGB8888, for example, is a very
common case where this occurs.
This partially fixes a performance regression since commit 33599433c7.
More work is required to fully fix it in all cases. This at least helps
Warsow.
NOTE: This is a candidate for the 9.1 branch.
Reviewed-by: Carl Worth <cworth@cworth.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
With the old context creation mechanism, an application asked the GL to
give it a context. Failing to produce a context was a fatal error.
Now, with GLX_ARB_create_context, the application can request a specific
version. If it's higher than the maximum version we support, context
creation will fail. But this is a normal error that applications
recover from.
In particular, the new glxinfo tries to create OpenGL 4.3, 4.2, 4.1,
4.0, 3.3, and 3.2 contexts before finally succeeding at creating a 3.1
context. This led to it printing the following message 6 times:
"brwCreateContext: failed to init intel context"
There's no need to alarm users (and developers) with such a message.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
On an INTEL_DEBUG=perf piglit run on IVB, reduces the instances of "HW
workaround: blit" (the printouts from the misaligned-depth workaround
blits) from 725 to 675.
It doesn't totally eliminate the workaround blit, because we still have
problems with Y offsets that we can't fix (since texturing can only align
miplevels up to 2 or 4, not 8).
No regressions on piglit/es3conform on IVB.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This makes it possible to identify gl_TexCoord and gl_PointCoord
for drivers where sprite coordinate replacement is restricted.
The new PIPE_CAP_TGSI_TEXCOORD decides whether these varyings
should be hidden behind the GENERIC semantic or not.
With this patch only nvc0 and nv30 will request that they be used.
v2: introduce a CAP so other drivers don't have to bother with
the new semantic
v3: adapt to introduction gl_varying_slot enum
The commit changed API in a helper library shared by both egl_dri2 and
the gallium egl state tracker, but only egl_dri2 was updated to use the
new interface.
Tested-by: Giulio Camuffo <giuliocamuffo@gmail.com>
Previous to this patch, when using fixed function fragment shading,
bit VARYING_BIT_POS of brw_wm_prog_key::proj_attrib_mask was being set
differently during precompiles and normal usage. During precompiles
it was being set only if the fragment shader reads from window
position (which it never does), so it was always being set to 0.
During normal usage it was being set if the vertex shader writes to
all 4 components of gl_Position (which it usually does), so it was
usually being set to 1. As a result, we were almost always doing an
extra recompile for the fixed function fragment shader.
The recompile was totally unnecessary, though, because
brw_wm_prog_key::proj_attrib_mask is only consulted for
fs_visitor::emit_general_interpolation(), which isn't used for
VARYING_SLOT_POS.
This patch avoids the unnecessary recompile by always setting bit
VARYING_BIT_POS of brw_wm_prog_key::proj_attrib_mask to 1.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, right after calling _mesa_glsl_link_shader(), the fixed
function fragment shader code made several calls with the ostensible
purpose of setting up uniforms for the fragment shader it just
created.
These calls are unnecessary, since _mesa_glsl_link_shader() calls
driver->LinkShader(), which takes care of calling these functions (or
their equivalent). Also, they are dangerous to call after
_mesa_glsl_link_shader() has returned, because on back-ends such as
i965 which do precompilation, _mesa_glsl_link_shader() may have
already cached pointers to the existing uniform structures; attempting
to set up the uniforms again invalidates those cached pointers.
It was only by sheer coincidence that this wasn't manifesting itself
as a bug. It turns out that i965's precompile mechanism was always
setting bit 0 of brw_wm_prog_key::proj_attrib_mask to 0 for fixed
function fragment shaders, but during normal usage this bit usually
gets set to 1. As a result, the precompiled shader (with its invalid
uniform pointers) was not being used.
I'm about to introduce some changes that cause bit 0 of
proj_attrib_mask to be set consistently between precompilation and
normal usage, so to avoid regressions I need to get rid of the
dangerous duplicate uniform setup code first.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Since apps typically begin rendering with a call to glClear(), it is
likely that when brw_workaround_depthstencil_alignment() moves a
miplevel to a temporary buffer, it can avoid doing a blit, since the
contents of the miplevel are about to be erased.
This patch adds the necessary plumbing to determine when
brw_workaround_depthstencil_alignment() is being called as a
consequence of glClear(), and avoids the unnecessary blit when it is
safe to do so.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Eliminate unnecessary call to _mesa_is_depthstencil_format(). Fix
handling of depth buffer in depth/stencil format.
v3: Use correct bitfields for clear_mask. Fix handling of depth
buffer in depth/stencil format when hardware uses separate stencil.
When invalidating, make sure we still reassociate the image to the new
miptree.
Reviewed-by: Eric Anholt <eric@anholt.net>
Doesn't exist on the asic and will cause a CS rejection
if VM is disabled.
Note: this is a candidate for the 9.1 branch.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Use vmw_printf() just for extra debugging info (off by default).
Use vmw_error() for real errors/failures/etc that we definitely
want to report.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Checks that no functions are exported that are not part of the ABI.
Note that currently we are exporting functions that are aliased to
functions that are part of the ABI. They shouldn't be exported, but the
XML descriptions don't adequately describe this case.
Checks that no functions are exported that are not part of the ABI.
Note that currently we are exporting functions that are aliased to
functions that are part of the ABI. They shouldn't be exported, but the
XML descriptions don't adequately describe this case.
If we're in some conditional or loop we must not return, or the code
after the condition is never executed.
(v2): And, we also can't just continue as nothing happened, since the
mask update code would later check if we actually have a mask, so we
need to remember that there was a return in main where we didn't exit
(to illustrate this, a ret in a if clause would cause a mask update
which is still ok as we're in a conditional, but after the endif the
mask update code would drop the mask hence bringing execution back to
pixels which should have their execution mask set to zero by the ret).
Thanks to Christoph Bumiller for figuring this out.
This fixes https://bugs.freedesktop.org/show_bug.cgi?id=62357.
Note: This is a candidate for the stable branches.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
To further improve the optimization of source and destination
indirect addressing we need the ability to store a reference
to the declaration of the addressed operands.
Since most of the fields in tgsi_src_register doesn't apply for
an indirect addressing operand replace it with a separate
tgsi_ind_register structure and so make room for extra information.
v2: rename Declaration to ArrayID, put the ArrayID into () instead of []
Signed-off-by: Christian König <christian.koenig@amd.com>
Remember which declarations are declared as "arrays" and so
can be indirectly addressed. ArrayIDs start at 1, cause for
compatibility reasons zero is treaded as no array present.
Signed-off-by: Christian König <christian.koenig@amd.com>
Instead of allocating everything as temporaries, use the
new array allocation functions.
v2: fix bug in simplify_cmp, declare arrays on demand
Signed-off-by: Christian König <christian.koenig@amd.com>
Don't bother with free temporaries, just allocate them at
the end and also emit them in their own declaration.
Signed-off-by: Christian König <christian.koenig@amd.com>
Instead of emitting each temporary separately, emit them in a chunk.
v2: keep separate function for emitting temps
Signed-off-by: Christian König <christian.koenig@amd.com>
No functional change here, but this will let us query the image
for an fd handle later.
Reviewed-by: Ander Conselvan de Oliveira <conselvan2@gmail.com>
If ddx does not support swap, don't advertise it. This is a hack to
work around current xservers which advertise this extension even when it
is clearly not supported. When:
http://lists.x.org/archives/xorg-devel/2013-February/035449.html
is merged in upstream xserver and makes it's way into most distros then
this hack can be removed. In the mean time, it is required to allow
gnome-shell/clutter/etc to work properly with a DDX driver which does
not support ScheduleSwap.
Signed-off-by: Rob Clark <robdclark@gmail.com>
This debug flag prints out the native GEN assembly for a blitting
shader produced using BLORP. Hopefully this should be useful in
developing additional BLORP features.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Needs to be set for depth, stencil, and fmask just
like other blocks.
v2: drop additional cayman bits for now
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
On cayman, 128bpp surfaces require non_disp ordering for hw
access to both linear and tiled surfaces. When we use the 3D
engine we can set the non_disp ordering on both the tiled and
linear sides (via CB or texture), but when we use the DMA
engine, we can only set the non_disp ordering on the tiled
side, so after a L2T operation with the DMA engine, the data
ends up in the wrong order on the tiled side.
v2: cayman/TN only
v3: fix comments
Fixes:
https://bugs.freedesktop.org/show_bug.cgi?id=60802
Note: this is a candidate for the 9.1 branch.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The only format returned by _mesa_get_format_base_format() that
satisfies _mesa_is_depthstencil_format() is GL_DEPTH_STENCIL, so we
can simplify the check.
Reviewed-by: Eric Anholt <eric@anholt.net>
Fast depth clears have the same depth/stencil alignment requirements
as other drawing operations. Therefore, we need to call
brw_workaround_depthstencil_alignment() from both the clear and
drawing paths.
Without this fix, we get image corruption if the following conditions
hold: (a) the first ever drawing operation to a depth miplevel (or the
first drawing operation after having used the texture for sampling) is
a clear, (b) the depth miplevel has a size that is eligible for fast
depth clears, and (c) the depth miplevel has an offset within the
miptree that isn't 8x8 aligned.
Fixes piglit "depthstencil-render-miplevels" tests with size 273.
NOTE: This is a candidate for stable branches
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This patch makes the following search-and-replace changes:
gl_frag_attrib -> gl_varying_slot
FRAG_ATTRIB_* -> VARYING_SLOT_*
FRAG_BIT_* -> VARYING_BIT_*
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Brian Paul <brianp@vmware.com>
Now that there is no difference between the enums that represent
vertex outputs and fragment inputs, there's no need for a conversion
function.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Brian Paul <brianp@vmware.com>
Now that there is no difference between the enums that represent
vertex outputs and fragment inputs, there's no need for a conversion
function. But we still need to be able to detect when a given vertex
output has no corresponding fragment input. So it is replaced by a
new function, _mesa_varying_slot_in_fs(), which tells whether the
given varying slot exists as an FS input or not.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Brian Paul <brianp@vmware.com>
This patch makes the following search-and-replace changes:
gl_geom_result -> gl_varying_slot
GEOM_RESULT_* -> VARYING_SLOT_*
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Brian Paul <brianp@vmware.com>
This patch makes the following search-and-replace changes:
gl_geom_attrib -> gl_varying_slot
GEOM_ATTRIB_* -> VARYING_SLOT_*
GEOM_BIT_* -> VARYING_BIT_*
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Brian Paul <brianp@vmware.com>
This patch makes the following search-and-replace changes:
gl_vert_result -> gl_varying_slot
VERT_RESULT_* -> VARYING_SLOT_*
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Brian Paul <brianp@vmware.com>
Future patches will make use of the enum. It will eventually take the
place of the existing enums gl_vert_result, gl_geom_attrib,
gl_geom_result, and gl_frag_attrib, all of which represent essentially
the same information but using inconsistent values.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Brian Paul <brianp@vmware.com>
This patch updates the bitfields brw_context::wm.input_size_masks,
tracker::size_masks, and brw_wm_prog_key::proj_attrib_mask, all of
which are indexed by gl_frag_attrib, from 32-bit to 64-bit.
This paves the way for supporting geometry shaders, and for merging
the gl_frag_attrib and gl_vert_result enums. The combination of these
two will require at least 55 bits in the bitfields.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Brian Paul <brianp@vmware.com>
This involved adding another driOptionCache to dri_screen. The
existing one just held the default values. But now we also need
to have the values from the DRI config file so that we can get at
the always_have_depth_buffer config option, which is per-screen.
This option is needed for some applications that neglect to request
a depth buffer when choosing a visual/fbconfig.
The Linux app Topogun is an example of this problem.
There were two different NUM_ENTRIES #defines for the framebuffer
tile cache and the texture tile cache. Rename the later to fix
the warnings:
In file included from sp_flush.c:40:0:
sp_tex_tile_cache.h:76:0: warning: "NUM_ENTRIES" redefined
sp_tile_cache.h:78:0: note: this is the location of the previous definition
In file included from sp_context.c:50:0:
sp_tex_tile_cache.h:76:0: warning: "NUM_ENTRIES" redefined
sp_tile_cache.h:78:0: note: this is the location of the previous definition
Also, replace occurances of NUM_ENTRIES with Element() macro to
be safer.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Untyped Atomic Operation messages are illegal for non-RAW formats. The
IVB hardware proceeds happily (after all, who cares what the format of the
surface is if you're doing untyped ops on it?), but later hardware
apparently doesn't. The simulator for gen7 does complain, though.
v2: Rebase against updates to previous patches. (by anholt)
NOTE: This is a candidate for the 9.1 branch.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
This is basically a copy and paste of gen7_create_constant_surface, but
with the parameters filled in to offer a simpler interface.
It will diverge shortly.
I didn't bother adding it to the vtable for now since shader time is only
exposed on Gen7+.
v2: Replace tabs in the new code (by anholt)
Add back dropped memset() and add a comment about HSW channel selects.
NOTE: This is a candidate for the 9.1 branch.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Haswell's "Data Cache" data port is a single unit, but split into two
SFIDs to allow for more message types without adding more bits in the
message descriptor.
Untyped Atomic Operations are now message 0010 in the second data cache
data port, rather than 6 in the first.
v2: Use the #defines from the previous commit. (by anholt)
NOTE: This is a candidate for the 9.1 branch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net> (v1)
We were sparsely using some of these message types, but I'll just fill
them all in now. It will be used for fixing shader_time on HSW.
v2: Add missing MEDIA_BLOCK_READ.
NOTE: This is a candidate for the 9.1 branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This avoids some snooping overhead between EUs processing separate shaders
(so VS versus FS).
Improves performance of a minecraft trace with shader_time by 28.9% +/-
18.3% (n=7), and performance of my old GLSL demo by 93.7% +/- 0.8% (n=4).
v2: Add a define for the stride with a comment explaining its units and
why.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
scons/llvm.py defines inline globally to workaround issues with LLVM C
binding headers, so the only way to is to avoid
aggravating xkeycheck.h errors is to set _ALLOW_KEYWORD_MACROS.
This fixes MSVC 2012 build with LLVM.
Reviewed-by: Brian Paul <brianp@vmware.com>
- each softpipe_tex_tile_cache 50*64*64*4*4 = 3,276,800 bytes
- each softpipe_context has 3*32 softpipe_tex_tile_cache, i.e, each softpipe
context is 314,572,800 bytes, i.e, 300MB
That is, in a 32bits process (around 3GB virtual memory max), we can
only fit 10 contexts.
This change is a short-term hack to shrink the context size. Longer
term we'll need to change how the texture cache works.
Reviewed-by: Brian Paul <brianp@vmware.com>
Framebuffer blitting operation should be skipped if any of the
dimensions (width/height) of src/dst rect is zero.
V2: Move the dimension check after error checking in _mesa_BlitFramebuffer.
Fixes: fbblit(negative.nullblit.zeroSize) in Intel oglconform
https://bugs.freedesktop.org/show_bug.cgi?id=59495
Note: Candidate for all the stable branches.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
We can't handle them yet, however we can safely just warn (we will
just render to first layer, which is fine since we can't handle
rendertarget system value neither).
Also make behavior more predictable with buffer surfaces
(it would sometimes hit bogus asserts because of the union in the surface,
instead create the surface but assert when trying to set a buffer
in the framebuffer).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Just delete unused kernels rather than marking them as internal and
running the GlobalDCE pass.
Also implement this function in C and inline it into
radeon_llvm_get_kernel_module()
Actually use $DEFINES, so we can see if GLX_INDIRECT_RENDERING is defined
If GLX_INDIRECT_RENDERING is defined, _GLAPI_SKIP_PROTO_ENTRY_POINTS will
be defined, and libglapi won't contain the 'protocol entry points', so we
should provide stubs in check_table.cpp
It looks like this has been broken since commit
1a1db1746d "Standardize names of OpenGL
functions."
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Fixes this build error with make check.
CC collision.o
In file included from ../../../../../src/mesa/main/hash_table.h:34:0,
from collision.c:31:
../../../../../src/mesa/main/compiler.h:51:53: fatal error: c99_compat.h: No such file or directory
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
If geometry shader is present its stream output info should
be used instead of the vs and we shouldn't use the pre-clipped
corrdinates.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
In cases where the vertex element size is smaller than the vertex buffer
stride, the previous calculation could end up 1 too low. This would result
in the GPU using index 0 instead of the maximum index for those elements,
which would be visible as intermittent distorted triangles.
NOTE: This is a candidate for the 9.1 branch.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
And remove non-working code for indirect sampler/resource selection.
Will be added back later.
Includes code from "nv50/ir/tgsi: Resource indirect indexing" by
Francisco Jerez (when mixing the R and S handles we can only specify
them via a register, i.e. indirectly, unless we upload all the used
handle combinations to c[] space, which we don't for now).
Squashed and (heavily) modified original patches by Francisco Jerez:
nv50/ir/tgsi: Implement resource LOAD/STORE (wip).
nv50/ir/tgsi: Emit SUST/SULD for surface access, and add CB LOAD/STORE support
nv50/ir/tgsi: Fix/clean up the LOAD/STORE handling code.
Left out for now:
nv50/ir/tgsi: Resource indirect indexing
Treating raw, read-only surfaces as constant buffers (CBs) was removed
because CBs are limited to a size of 64 KiB which isn't desireable, and
because this decision should probably be made by the state tracker.
If we used a number of CB slots for surfaces, it might find that we
cannot accomodate the advertised limit.
OpenGL is nice and makes the user specify a format with an image unit.
OpenCL is evil and doesn't, and what's better than adding a huge load
of functions that we call indirectly to handle the conversion ?
A live range [0, 0) counts as empty. For function inputs this can
be a problem, so insert a nop at the beginning to make it [0, 1).
This is a bit of a hack but also the most simple solution.
Currently works on a220. Others in the a2xx family look pretty similar
and should be pretty straightforward to support with the same driver.
The a3xx has a new shader ISA, and while many registers appear similar,
the register addresses have been completely shuffled around. I am not
sure yet whether it is best to support with the same driver, but
different compiler, or whether it should be split into a different
driver.
v1: original
v2: build file updates from review comments, and remove GPL licensed
header files from msm kernel
v3: smarter temp/pred register assignment, fix clear and depth/stencil
format issues, resource_transfer fixes, scissor fixes
Signed-off-by: Rob Clark <robdclark@gmail.com>
Previously, the derivatives were calculated and passed in a packed form
to the sample code (for implicit derivatives, explicit derivatives were
packed to the same format).
There's several reasons why this wasn't such a good idea:
1) the derivatives may not even be needed (not as bad as it sounds since
llvm will just throw the calculations needed for them away but still)
2) the special packing format really shouldn't be part of the sampler
interface
3) depending what the sample code actually does the derivatives will
be processed differently, hence there is no "ideal" packing. For cube
maps with explicit derivatives (which we don't do yet) for instance the
packing looked downright useless, and for non-isotropic filtering we'd
need different calculations too.
So, instead just pass the derivatives as is (for explicit derivatives),
or let the rho calculating sample code calculate them itself. This still
does exactly the same packing stuff for implicit derivatives for now,
though explicit ones are handled in a more straightforward manner (quick
estimates show performance should be quite similar, though it is much
easier to follow and also does the rho calculation per-pixel until the
end, which we eventually need for spec compliance anyway).
No piglit changes.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
After the previous fix that almost removes an allocation of 4*n^2
bytes, we can use a bitset to reduce another allocation from n^2 bytes
to n^2/8 bytes.
Between the previous commit and this one, the peak heap size for an
oglconform ARB_fragment_program max instructions test on i965 goes from
4GB to 255MB.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=55825
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We were allocating an adjacency_list entry for every possible
interference that could get created, but that usually doesn't happen.
We can save a lot of memory by resizing the array on demand.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We're already walking the list, and we can easily know when something
has no reason to be in the list any longer, so take a brief extra step
to reduce our worst-case runtime (an oglconform test that emits the
maximum instructions in a fragment program). I don't actually know what
the worst-case runtime was, because it was too long and I got bored.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We can execute way fewer instructions by doing our boolean manipulation
on an "int" of bits at a time, while also reducing our working set size.
Reduces compile time of L4D2's slowest shader from 4s to 1.1s
(-72.4% +/- 0.2%, n=10)
v2: Remove redundant masking (noted by Ken)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We were handling the the dependency workaround for the first written reg
of a send preceding the one we're fixing up, but didn't consider the other
regs. Thus if you had two sampler calls that got allocated to the same
set of regs, one might, rarely, ovewrite the other. This was occurring in
XBMC's GLSL shaders.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44567
NOTE: This is a candidate for the stable branches.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When forcing the compiler to always generate pull constants instead of
push constants (in order to have an easy to use testcase), improves
performance of my old GLSL demo 23.3553% +/- 1.42968% (n=7).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=60866
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The lowering process creates a new vgrf on gen7 that should be represented
in live interval analysis. As-is, it was getting a conflicting allocation
with gl_FragDepth in the dolphin emulator, producing broken rendering.
NOTE: This is a candidate for the 9.1 branch.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=61317
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I was going to fix the code above like the previous commit, but we already
had that covered (otherwise all our uniform access would have been broken,
unlike just pull constants).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We were allowing a compressed instruction to write a register that
contained the last use of a uniform pull constant (either UBO load or push
constant spillover), so it would get half its values smashed.
Since we need to see the actual instruction to decide this, move the
pre-gen6 pixel_x/y logic here, which should improve the performance of
register allocation since virtual_grf_interferes() is called more than
once per instruction.
NOTE: This is a candidate for the stable branches.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=61317
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I was looking at the list to see what might be interesting to document for
application developers, and it turns out some are completely dead.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We were assuming that each emitted primitive had the same
number of vertices. That is incorrect. Emitted primitives
can have arbirtrary number of vertices. Simply increment
index on iteration to fix it.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Whenever we're binding the shaders we're incrementing NumOutputs,
assuming the parser spots an output decleration, but we were never
reseting the variable. That means that each subsequent bind of
a geometry shader would add its number of output to the number
of output bound by all previously ran shaders and our indexes
would get completely messed up.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
This is needed for handling the dx10-style sample opcodes.
This also simplifies the logic by getting rid of sampler variants
completely (sampler_views though OTOH have sort of variants because
some of their state is different depending on the shader stage they
are bound to).
No significant performance difference (openarena run:
840 frames in 459.8 seconds vs. 840 frames in 460.5 seconds).
v2: fix reference counting bug spotted by Jose.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Can handle them since the single sampler interface was introduced.
v2: simplify txf/sample_i handling a bit according to Brian's feedback.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Only the disassembler is used to dump shaders. Here's a few examples
how to use R600_DEBUG.
Log compute info:
R600_DEBUG=compute
Dump all shaders:
R600_DEBUG=fs,vs,gs,ps,cs
Dump pixel shaders only:
R600_DEBUG=ps
Disable Hyper-Z:
R600_DEBUG=nohyperz
Disable the LLVM backend:
R600_DEBUG=nollvm
Or use any combination of the above, or print all options:
R600_DEBUG=help
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Commit 67ef7559 added an || test "x$enable_dri" check in an attempt to
get the DRI common bits built in some necessary cases. That change was
inappropriate as it made these common DRI pieces be built
unconditionally, so some builds were broken.
Subsequently, commit 998d975e3 change the "|| test" to a "-a"
conjunction within the existing test invocation. This made the '-a
"x$enable_dri" = xyes' clause have no effect, (as it was inside an
enclosing test for the same condition). So the new breakage from
commit 67ef7559 was addressed, but the original problems were
regressed.
The immediately preceding commit removed the redundant condition.
Now, finally this commit fixes the original problem as described in
the commit message of 67ef7559: this code should be compiled when
using the DRI state tracker. In order to do so, the HAVE_*_DRI
conditionals must be moved after the last assignment of HAVE_COMMON_DRI.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=61821
Tested-by: Stéphane Marchesin <marcheu@chromium.org>
It fails on 32-bit systems (I only tested on 64-bit). Power of two
size isn't required, so just remove the assertion.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
This fixes a crash when a display list is created in one context
but executed from a second one. The vbo_save_context::vertex_store
memeber will be NULL if we never created a display list with the
context. Just check for that before dereferencing the pointer.
Fixes http://bugzilla.redhat.com/show_bug.cgi?id=918661
Note: This is a candidate for the stable branches.
If the sampler object has been deleted on another context, an
alternative context may reference the old sampler. So ensure the sampler
object still exists.
Note: this is a candidate for the stable branch.
Signed-off-by: Alan Hourihane <alanh@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
LICM stands for Loop Invariant Code Motion. Instructions that
does not depend of loop index are moved outside of loop body.
DCE is DeadCodeElimination.
v2: updated commit msg, thx to Vincent.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Vincent Lejeune <vljn at ovi.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
We can't clip and viewport transform the vertices before we let
the geometry shader process them. Lets make sure the generated
vertex shader has both disabled if geometry shader is present.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
The geometry shader code seems to have been originally written with the
assumptions that there are the same number of VS outputs as GS outputs and
that VS outputs are in the same order as their corresponding GS inputs. Since
TGSI uses separate shader objects, these are both wrong assumptions. This
was causing several valid vertex/geometry shader combinations to either render
incorrectly or trigger an assertion.
Conflicts:
src/gallium/auxiliary/draw/draw_gs.c
Reviewed-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
This change specifically unbinds a sampler object from the texture unit
if it's bound to a unit. The spec calls for default object when deleting
sampler objects which are currently bound.
Note: this is a candidate for the stable branches
Signed-off-by: Alan Hourihane <alanh@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
We advertise a max texture/surfaces size of 8K x 8K but the old values
for these limits didn't actually allow us to handle that surface size.
For 8K x 8K we'll have 16384 bins. Each bin needs at least one cmd_block
object which was 2192 bytes in size. Since 16384 * 2192 exceeded
LP_SCENE_MAX_SIZE we'd silently fail in lp_scene_new_data_block() and not
draw the complete scene.
By reducing CMD_BLOCK_MAX to 29 we get nice 512-byte cmd_blocks. And
by increasing LP_SCENE_MAX_SIZE to 9 MB we can allocate enough command
blocks for 8K x 8K, plus a few regular data blocks.
Fixes the (improved) piglit fbo-maxsize test.
Note: This is a candidate for the stable branches.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
This was only necessary because our bounds checking was off by one, and
thus we read an extra pair of values.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
If we've written N pairs of values to the buffer, then last_index = N,
but the values are 0 .. N-1. Thus, we need to use <, not <=.
This worked anyway because we fill the buffer with zeroes, so we just
added an extra (0 - 0) to our results.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
We were allowing things like copying RG1616 to a user's ARGB8888
format, while we were denying anything that wasn't ARGB8888 or
RGB565.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
This is similar code to intel_miptree_copy_slice, but the knobs
are all set differently.
v2: fix whitespace
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
I'm trying to move us away from the region structure, and all the
callers are currently dereferencing a miptree to get the region.
In this change, the map_refcount is dropped. However, the bo->virtual is
itself map refcounted, so that's already dealt with.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
The point of tracking the value was removed in February 2012
(65b096aedd), and this should have
been removed at the same time.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
I don't see any reason for it -- it was introduced with the DRI2
invalidate work by krh in 2010 with no explanation. I suspect it was
something about wanting the same drm_intel_bo struct underneath multiple
openings of the BO within one process, but that's covered by libdrm at
this point. As far as the struct region goes, it is not threadsafe, so
multiple contexts sharing a region could have mixed up the map_count and
assertion failed or worse.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Several commits on master for the 9.1 branch had "NOTE" messages in a
slightly different format.
NOTE: This is a candidate for stable branches
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Now all the per-message enums from mtypes are gone. Now we can extend
unique message IDs into all generators of debug output without having to
update mtypes.h for each one.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This ends up reusing the dynamic ID support, so a silly enum gets to go
away. We don't assign good IDs to different messages yet, but at least
that's tractable now.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
I was testing the ARB_debug_output code and wrote an obvious sample that
should have hit this, and got confused that my ARB_debug_output was
broken.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
I tried to ensure that performance in the non-debug case doesn't change
(we still just check one condition up front), and I think the impact is
small enough in the debug context case to warrant including all of it.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This doesn't provide detailed error type information, but it's important
to get these relatively severe but rare error messages out to the
developer through whatever mechanism they are using.
v2: Rebase on new WARN_ONCE additions.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> (v1)
We can emit messages now without always having to use the same ID for
each, or having a giant table of all possible errors in mtypes.h.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
I want to have dynamic IDs so that we don't need to add to mtypes.h for
every error we might want to add. To do so, I need to get rid of the
static arrays and actually support all the crazy filtering of dynamic IDs
that we already support for application-provided error sources.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This was apparently not noticed because we don't have any testing of
application-generated debug output. However, as I'm changing the
GL-generated debug output to use the same path as
application/middleware-generated debug output, this obviously became an
issue.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
These will get reused by new ARB_debug_output messages in drivers/core,
instead of having the caller pass GL enums and have us immediately
switch-statement those into enums.
Add source enums will be handled in the next commit, because the way
different sources are handled at the moment is pretty strange.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
The new one doesn't have the same behavior for GL_NO_ERROR, but we don't
produce errors with GL_NO_ERROR as the error type.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This partly reverts 6ace2e41da.
Apparently with GL_MESA_texture_array fixed-function texturing
with texture arrays is possible, and hence we have to handle TXP.
(Though noone seems to know the semantics, softpipe now does what
it did before, which is to NOT project the array coord, llvmpipe
for instance however indeed does project the array coord. Unlike
before it will project the comparison coord for shadow1d array, as
that clearly was an error.)
This fixes https://bugs.freedesktop.org/show_bug.cgi?id=61828.
Reviewed-by: Brian Paul <brianp@vmware.com>
X11 is already checked conditionally below.
Fixes OSMesa-only configurations to not require X11.
Note: This is a candidate for the 9.1 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
This was hit on the glTexStorage2D() path.
Note: this is a candidate for the stable branches
Signed-off-by: Alan Hourihane <alanh@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The second digit was off by one, which meant we accidentally treated
GTn as GT(n-1). This also meant no support for GT1 at all.
NOTE: This is a candidate for stable branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
We'll want to reuse this for non-occlusion queries in the future.
Plus, it's a single logical task, so having it as a helper function
clarifies the code somewhat.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Again, eliminating a global variable in favor of a per-query object
variable will help in a future where we have more queries in hardware.
Personally, I find this clearer: there's just the query object's BO,
rather than two variables that usually shadow each other.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
The code a few lines above calls brw_emit_query_begin() if !query->bo,
and that creates query->bo. So it should always be non-NULL.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
If we haven't allocated a BO yet, we need to do that. Or, if there
isn't enough room to write another pair of values, we need to gather up
the existing results and start a new one. This is simple enough.
However, the old code was awkwardly split into two blocks, with a
write_depth_count() placed in the middle. The new depth count isn't
relevant to gathering the old BO's data, so that can go after the
reallocation is done. With the two blocks adjacent, we can merge them.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Since we already have an index in the brw_query_object, there's no need
to also keep a global variable that shadows it.
Plus, if we ever add support for more types of queries that still need
the per-batch before/after treatment we do for occlusion queries, we
won't be able to use a single global variable. In contrast, per-query
object variables will work fine.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
brw->query.index is initialized to 0 just a few lines before it's
copied to first_index.
Presumably the idea here was to reuse the query BO for subsequent
queries of the same type, but since that doesn't happen, there's no need
to have the extra code complexity.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
This code was really difficult to follow, for a number of reasons:
- Queries were handled in four different ways (TIMESTAMP writes a single
value, TIME_ELAPSED writes a single pair of values, occlusion queries
write pairs of values for the start and end of each batch, and other
queries are done entirely in software. It turns out that there are
very good reasons each query is handled the way it is, but
insufficient comments explaining the rationale.
- It wasn't immediately obvious which functions were driver hooks
and which were helper functions. For example, brw_query_begin() is
a driver hook that implements glBeginQuery() for all query types, but
the similarly named brw_emit_query_begin() is a helper function that's
only relevant for occlusion queries.
Extra explanatory comments should save me and others from constantly
having to ask how this code works and why various query types are
handled differently.
v2: Incorporate Eric's feedback: change "as soon as possible" to "the
results will be present when mapped."
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
For timestamp queries, we just write a single value to a BO. The
natural place to write that is element 0, so we should do that.
Previously, we wrote it into element 1 (the second slot) leaving
element 0 filled with garbage.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
In OpenGL, most queries record statistics about operations performed
between a defined beginning and ending point. However, TIMESTAMP
queries are different: they immediately return a single value, and there
is no start/stop mechanism.
Previously, Mesa implemented TIMESTAMP queries by calling EndQuery
without first calling BeginQuery. Apparently this is DirectX
convention, and Gallium followed suit. I personally find the asymmetry
jarring, however---having BeginQuery and EndQuery handle a different set
of enum values looks like a bug. It's also a bit confusing to mix the
one-shot query with the start/stop model.
So, add a new QueryCounter driver hook for implementing TIMESTAMP. For
now, fall back to EndQuery to support drivers that don't do the new
mechanism.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Something I never got around to implement, but this is the tgsi execution
side for implementing texel offsets (for ordinary texturing) and explicit
derivatives for sampling (though I guess the ordering of the components
for the derivs parameters is debatable).
There is certainly a runtime cost associated with this.
Unless there are different interfaces used depending on the "complexity"
of the texture instructions, this is impossible to avoid.
Offsets are always active (I think checking if they are active or not is
probably not worth it since it should mostly be an add), whereas the
sampler_control is extended for explicit derivatives.
For now softpipe (the only user of this) just drops all those new values
on the floor (which is the part I never implemented...).
Additionally this also fixes (discovered by accident) inconsistent
projective divide for the comparison coord - the code did do the
projection for shadow2d targets, but not shadow1d ones. This also
drops checking for projection modifier on array targets, since they
aren't possible in any extension I know of (hence we don't actually
know if the array layer should also be divided or not).
Reviewed-by: Brian Paul <brianp@vmware.com>
Similar fix to what is done for the non-llvm case, we could otherwise still
hit the stages (near certainly with gs) which crash. It is probably a much
better idea to skip trying to draw at that point anyway.
Reviewed-by: Brian Paul <brianp@vmware.com>
It seems easiest (and best) if we simply skip all the later stages
(after stream output).
(This is different to the llvm case at least for now where we will
simply try to render garbage, though both behaviors should be correct.)
Fixes piglit glsl-1.40-tf-no-position with softpipe.
Reviewed-by: Brian Paul <brianp@vmware.com>
With glsl 1.40 writing position is not required (useful for transform
feedback, though in fact it's still possible to rasterize such geometry
even if the results aren't too well defined).
Prevents crashes in that case. Fixes piglit glsl-1.40-tf-no-position.
Not quite sure this is 100% correct as it also skips clipdistance
clipping which could still work (but not sure if the result would
really be needed?)
Reviewed-by: Jose Fonseca <jfonseca@vmware.com
Reviewed-by: Brian Paul <brianp@vmware.com>
Since c8eb2d0e82 llvmpipe checks if it's
actually legal to create a surface. The opengl state tracker doesn't quite
obey this so for now just warn instead of assert.
Also warn instead of disabled assert when creating sampler views
(same reasoning).
Addresses https://bugs.freedesktop.org/show_bug.cgi?id=61647.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
texel offsets should have been the last missing feature for 130, and in
fact 140 as well (last there were texture buffers). In any case we still
don't do OpenGL 3.0 (missing MSAA which will be difficult,
plus EXT_packed_float, ARB_depth_buffer_float and EXT_framebuffer_sRGB).
v2: bump to 140 instead - we have everything except we crash when not writing
to gl_Position (but softpipe crashes as well) so let's just say this is a bug
instead. Also (by Dave Airlie's suggestion) update llvm-todo.txt.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This was previously only handled for texelFetch (much easier).
Depending on the wrap mode this works slightly differently (for somewhat
efficient implementation), hence have to do that separately in all roughly
137 places - it is easy if we use fixed point coords for wrapping, however
some wrapping modes are near impossible with fixed point (the repeat stuff)
hence we have to normalize the offsets if we can't do the wrapping in
unnormalized space (which is a division which is slow but should still be
much better than the alternative, which would be integer modulo for wrapping
which is just unusable). This should still give accurate results in all
cases that really matter, though it might be not quite conformant behavior
for some apis (but we have much worse problems there anyway even without
using offsets).
(Untested, no piglit test.)
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Even when we don't have LLVM since there's other C++ code
in the resulting DRI driver object.
Note: This is a candidate for the stable branches.
Reviewed-by: Matt Turner <mattst88@gmail.com>
The problem is that we mix bo handles and flinked names in the hash
table. Because kms type handles are not flinked they should not be
added to the hash table. If we do that we will sooner or later
get a situation where we will overwrite a correct entry because
the bo handle was the same as a flinked name.
Note: this is a candidate for the stable branches.
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
On Gen6, lower this to `ld` with lod=0 and an extra sample_index
parameter.
On Gen7, use `ld2dms`. We don't support CMS yet for multisample
textures, so we just hardcode MCS=0. This is ignored for IMS and UMS
surfaces.
Note: If we do end up emitting specialized shaders based on the MSAA
layout, we can emit a slightly shorter message here in the UMS case.
Note: According to the PRM, `ld2dms` takes one more parameter, lod.
However, it's always zero, and including it would make the message too
long for SIMD16, so we just omit it.
V2: Reworked completely, added support for Gen7.
V3: - Introduce sample_index parameter rather than reusing lod
- Removed spurious whitespace change
- Clarify commit message
V4: - Fix comment style
- Emit SHADER_OPCODE_TXF_MS on Gen6. This was benignly wrong since
it lowers to `ld` anyway on this gen, but still wrong.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
On Gen6, lower this to `ld` with lod=0 and an extra sample_index
parameter.
On Gen7, use `ld2dms`. This takes an additional MCS parameter to support
compressed multisample surfaces, but we're not enabling them for
multisample textures for now, so it's always ignored and can be safely
omitted.
V2: Reworked completely, added support for Gen7.
V3: - Use new sample_index, sample_index_type rather than reusing lod
- Clarify commit message.
V4: - Fix comment style
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
This is very similar to the TXF opcode, but lowers to `ld2dms` rather
than `ld` on Gen7.
V4: - add SHADER_OPCODE_TXF_MS to is_tex() functions, so regalloc thinks
it actually writes the correct number of registers. Otherwise in
nontrivial shaders some of the registers tend to get clobbered,
producing bad results.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Gen7 has an erratum affecting the ld_mcs message, making it unsafe to
use when the surface doesn't have an associated MCS.
From the Ivy Bridge PRM, Vol4 Part1 p77 ("MCS Enable"):
"If this field is disabled and the sampling engine <ld_mcs>
message is issued on this surface, the MCS surface may be
accessed. Software must ensure that the surface is defined
to avoid GTT errors."
To allow the shader to treat all surfaces uniformly, force UMS if the
surface is to be used as a multisample texture, even if CMS would have
been possible.
V3: - Quoted erratum text
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
The surface_state setup for renderbuffers already worked; only the
texturing side needed work. BLORP does something similar, but does its
own surface_state setup.
On Gen6, we just need to set the correct sample count.
On Gen7: - set the correct sample count
- set the correct layout mode
- set GEN7_SURFACE_ARYSPC_LOD0 if it's set in the miptree.
V2: - Clarify commit message
- Rebased onto Paul's physical/logical dims cleanup
- Added Gen7 support
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
V2: - Fix for state moving from texobj to image
- Rebased onto Paul's logical/physical cleanup
- Fixed missing quantization of sample count
- Fold in IMS renderbuffer wrapper fixes from later in the series
- Use correct physical slice offset for UMS/CMS surfaces on Gen7
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
V2: - fix formatting issues
- generate GL_OUT_OF_MEMORY if teximage cannot be allocated
- fix for state moving from texobj to image
V3: - remove ridiculous stencil hack
- alter format check to not allow a base format of STENCIL_INDEX
- allow width/height/depth to be zero, to deallocate the texture
- dont forget to call _mesa_update_fbo_texture
V4: - fix indentation
- don't throw errors on proxy texture targets
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
- sample count must be the same on all attachments
- fixedsamplepositions must be the same on all attachments
(renderbuffers have fixedsamplepositions=true implicitly; only
multisample textures can choose to have it false)
V2: - fix wrapping to 80 columns, debug message, fix for state moving
from texobj to image.
- stencil texturing tweaks tidied up and folded in here.
V3: - Removed silly stencil hacks entirely; the extension doesn't
actually make stencil-only textures legal at all.
- Moved sample count / fixed sample locations checks into
existing attachment-type-specific blocks, as suggested by Eric
V4: - Removed stencil hacks which were missed in V3 (thanks Eric)
- Don't move the declaration of texImg; only required pre-V3.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
[V2] Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Moves the definition of the sample positions out of
gen6_emit_3dstate_multisample, and unpacks them in
gen6_get_sample_position.
V2: Be consistent about `sample position` rather than `location`.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
V2: - fix multiline comment style
- stop using ASSERT_OUTSIDE_BEGIN_END_AND_FLUSH since that
doesn't exist anymore.
V3: - check for the extension being enabled
- tidier flagging of _NEW_MULTISAMPLE
- fix weird indentation in get.c
V4: - move flush later in SampleMaski()
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Actual sample locations deferred to a driverfunc since only the driver
really knows where they will be.
V2: - pass the draw buffer to the driverfunc; don't fallback to pixel
center if driverfunc is missing.
- rename GetSampleLocation to GetSamplePosition
- invert y sample position for winsys FBOs, at Paul's suggestion
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
- GL_MAX_COLOR_TEXTURE_SAMPLES
- GL_MAX_DEPTH_TEXTURE_SAMPLES
- GL_MAX_INTEGER_SAMPLES
V2: initialize limits to 1 in _mesa_init_constants as suggested by Brian
and Paul
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
V2: - emit `sample` parameter properly for multisample texelFetch()
- fix spurious whitespace change
- introduce a new opcode ir_txf_ms rather than overloading the
existing ir_txf further. This makes doing the right thing in
the driver somewhat simpler.
V3: - fix weird whitespace
V4: - don't forget to include the new opcode in tex_opcode_strs[]
(thanks Kenneth for spotting this)
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
[V2] Reviewed-by: Eric Anholt <eric@anholt.net>
[V2] Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Adds the new texture targets, and per-image state for GL_TEXTURE_SAMPLES
and GL_TEXTURE_FIXED_SAMPLE_LOCATIONS.
V2: - Allow multisample texture targets in glInvalidateTexSubImage too.
This was already partly there, but I missed it the first time around
since the interaction is defined in a newer extension. Fixed weird
indentation.
- Allow multisample array textures in glFramebufferTextureLayer.
This was overlooked as the tests originally only used 2d
multisample textures.
V3: - Set min/mag filters sensibly for multisample textures. This
can't actually be changed by the user, so it's more sensible to
initialize it correctly than to hack around it being bogus later.
V4: - Tidy up initial min/mag filter setup. Setup in
_mesa_initialize_texture_object was bogus, but benign since
finish_texture_init() clobbered everything with correct values. For V4,
just do the setup in finish_texture_init().
V5: - Don't break glPopAttrib(GL_TEXTURE_BIT)
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
[V2] Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Adds new enums, dispatch machinery, and stubs for the 4 new entrypoints.
V2: - Drop placeholder
- Align enum values
- Remove explicit exec=mesa; it *is* the dispatch flavor we want,
but it's also the default. I misunderstood how this worked before;
after actually reading the generator it makes good sense.
V3: - Squash in stubs for new entrypoints, and dispatch_sanity tweaks,
so we don't get build breakage between those patches.
V4: - Fix various remaining whitespace issues
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
[1/3 V2] Reviewed-by: Matt Turner <mattst88@gmail.com>
[V3] Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The GLX extension lets you expose visuals that explicitly guarantee you
that the GL_FRAMEBUFFER_SRGB_CAPABLE flag will be set, but we can set
the flag even while the visual doesn't provide the guarantee. This
appears to be consistent with other implementations, as we've seen
several apps now that don't require an srgb visual and assume sRGB will
work without checking the GL_FRAMEBUFFER_SRGB_CAPABLE flag.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=55783
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=60633
Reviewed-and-tested-by: Ian Romanick <ian.d.romanick@intel.com>
Now that we have W-tiled S8, we can't just region_map and poke at bits --
there has to be some swizzling. Rely on intel_miptree_map to get that job
done. This should also get the highest performance path we know of for the
mapping (interesting if I get around to finishing movntdqa some day).
v2: Fix stale name of the bit in a comment.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
I want to reuse intel_miptree_map() to replace some region mapping that's
broken for separate stencil, but doing so would result in new demands on
ETC transcode that we actually don't want to happen.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Any driver can implement this simple and efficient optimization.
Team Fortress 2 hits it always. The DISCARD_RANGE codepath is not even used
with TF2 anymore, so we avoid a ton of useless buffer copies.
Tested-by: Andreas Boll <andreas.boll.dev@gmail.com>
NOTE: This is a candidate for the 9.1 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
v2: cosmetic changes based on Brian's review
Tested-by: Andreas Boll <andreas.boll.dev@gmail.com>
NOTE: This is a candidate for the 9.1 branch. (the next patch depends on it)
The states were split because we thought it caused a hardlock. Now we know
the hardlock was caused by something else and has since been fixed.
Tested-by: Andreas Boll <andreas.boll.dev@gmail.com>
This doesn't fix any issue we know of, but there indeed is a week spot
in draw_vbo where streamout can fail. After streamout is enabled,
the need_cs_space call can flush the context, which causes the streamout
to be disabled right after it was enabled and bad things happen.
One way to fix it is to atomize the beginning part, so that no context flush
can happen between streamout enabling and the first drawing.
Tested-by: Andreas Boll <andreas.boll.dev@gmail.com>
Without this set, dri_util.c:dri2CreateContextAttribs
will reject requests to create a context with
__DRI_API_OPENGL_CORE.
This prevents a 3.2 core profile context from being created
even when MESA_GL_OVERRIDE_VERSION=3.2 is used.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If the override is version is >= 3.1, then update the
max_gl_core_version. Otherwise, update max_gl_compat_version.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This will allow other code to get access to the override
version before a context is available.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Although GLSL 1.50 compiler support is not available,
this change will allow MESA_GLSL_VERSION_OVERRIDE=150 to be
used while 1.50 support is being developed.
Since no drivers claim 1.50 GLSL support, this change should
only impact Mesa when MESA_GLSL_VERSION_OVERRIDE=150 is set.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Immediate operands can only be src2 in 2-source instructions. Fixes
piglit failures since 0a1d145e (oops!).
Spotted-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The field was equivalent to (etc_format != MESA_FORMAT_NONE), and
therefore duplicate information.
This patch removes field and replaces all references to it with
`etc_format != MESA_FORMAT_NONE`.
No Piglit ETC test regresses on Intel Sandybridge.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
v2 [mattst88]:
- Add BRW_OPCODE_LRP to list of CSE-able expressions.
- Fix op_var[] array size.
- Rename arguments to emit_lrp to (x, y, a) to clear confusion.
- Add LRP function to brw_fs.cpp/.h.
- Corrected comment about LRP instruction arguments in emit_lrp.
v3 [mattst88]:
- Duplicate MAD code for LRP instead of using a function pointer.
- Check for != GRF instead of == IMM in emit_lrp.
- Lower LRP on gen < 6.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
1
Many GPUs have an instruction to do linear interpolation which is more
efficient than simply performing the algebra necessary (two multiplies,
an add, and a subtract).
Pattern matching or peepholing this is more desirable, but can be
tricky. By using an opcode, we can at least make shaders which use the
mix() built-in get the more efficient behavior.
Currently, all consumers lower ir_triop_lrp. Subsequent patches will
actually generate different code.
v2 [mattst88]:
- Add LRP_TO_ARITH flag to ir_to_mesa.cpp. Will be removed in a
subsequent patch and ir_triop_lrp translated directly.
v3 [mattst88]:
- Move changes from the next patch to opt_algebraic.cpp to accept
3-src operations.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, we had separate constructors for one, two, and four operand
expressions. This patch consolidates them into a single constructor
which uses NULL default parameters.
The unary and binary operator constructors had assertions to verify that
the caller supplied the correct number of operands for the expression,
but the four-operand version did not. Since get_num_operands for
ir_quadop_vector returns the number of vector_elements, we can safely
add that without breaking the semantics of ir_quadop_vector.
This also paves the way for expressions with three operands. Currently,
none can be constructed since get_num_operands() never returns 3.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
total instructions in shared programs: 346873 -> 346847 (-0.01%)
instructions in affected programs: 364 -> 338 (-7.14%)
(All affected shaders are from Lightsmark)
Reviewed-by: Eric Anholt <eric@anholt.net>
Gen6 has write-only MRF registers, and for ease of implementation we
paritition off 16 general purposes registers to act as MRFs on Gen7.
Knowing that our Gen7 MRFs are actually GRFs, we can do things we can't
do with real MRFs:
- read from them;
- return values directly to them from a send instruction; and
- compute directly to them with math instructions.
Reviewed-by: Eric Anholt <eric@anholt.net>
This requirement was added by ARB_fragment_program
When the Steam overlay is enabled, this fixes:
* Menu corruption with the Puddle game
* The screen going black on Rochard when
the Steam overlay is accessed
NOTE: This is a candidate for the 9.0 and 9.1 branches.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Motivated by wanting to see if GenTextures was called by an
application while debugging another Steam overlay issue.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Now that buffers can be used as textures or render targets
make sure they aren't skipped.
Fix suggested by Jose Fonseca.
v2: added a couple of assertions so we can actually guarantee
we check the resources and don't skip them. Also added some comments
that this is actually a lie due to the way the opengl buffer api works.
Unfortunately not usable from OpenGL, and no cap bit.
Pretty similar to a 1d texture, though allows specifying a start element.
v2: also fix up renderbuffer width (which will get promoted to fb width)
to be the number of elements
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
For PIPE_BUFFER we need coord adjustments for the transfer.
And for pure integer formats util_pack_color just crashes,
need to handle that differently due to clear colors being ints/uints.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Use a single sampler adapter instead of per-sampler-unit samplers,
and just pass along texture unit and sampler unit in the calls.
The reason is that for dx10-style sample opcodes pre-wired
samplers including all the texture state aren't really feasible (and for
sample_i/sviewinfo we don't even have samplers).
Of course right now softpipe doesn't actually do anything more than
just look up all its pre-wired per-texunit/per-samplerunit sampler as
it did before so this doesn't really achieve much except one more
function call, however this is now all softpipe's fault (fixing that in
a way which doesn't suck is still an unsolved problem).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
glXCreateWindow() and glXCreatePbuffer() always fail when built without
GLX_DIRECT_RENDERING defined since commit 48331047.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Normally the application will own the main event queue and be responsible
for moving events. In case of EGL_DEFAULT_DISPLAY, EGL opens the display
and has to own the main queue so it can move the events itself.
Call wl_display_dispatch_pending() to take ownership.
Previously only the 32-bit X visual would match the 32-bit RGBA8888
configs. This resulted in every config with alpha getting the "magic"
visual whose alpha is used by the compositor. This also resulted in no
multisample visuals being advertised. How many ways could we lose?
This patch inverts the problem... now you can't get the visual with
alpha used by the compositor even if you want it. I think we need to
invent a new value for EGL_TRANSPARENT_TYPE that apps can use to get
this. I'm surprised that there isn't already a choice for
EGL_TRANSPARENT_ALPHA.
NOTE: This is a candidate for the 9.1 branch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Tian Ye <yex.tian@intel.com>
Acked-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=59783
Streamout buffers need to be synchronized on r6xx as
well.
v2: Add DEST flush as well.
v3: drop DEST flush
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Since with llvm execution parts of sampler view and sampler state is baked into
the shader, we need to revalidate otherwise the wrong shader might get used.
(Not completely sure but I think this would not be required for non-llvm case,
along with everything else in these functions.)
This caused bugs in piglit arb_texture_buffer_object-formats, because we never
noticed that the view format changed.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This also fixes not honoring first/last_layer view parameters for array
textures, plus not honoring last_level view parameter for all textures
(neither is really used by OpenGL).
This mostly passes piglit arb_texture_buffer_object tests (it needs, however,
glsl 140 version override, plus GL 3.1 override, the latter only because
mesa does not allow ARB_tbo in non-core contexts).
Most arb_texture_buffer_object tests pass, with the exception of
arb_texture_buffer_object-formats. With "arb" parameter it passes most weirdo
formats before it segfaults in the state tracker, this looks to be some issue
with using legacy formats in core context (fails the same in softpipe).
With "core" parameter it passes with "fs", however fails with "vs" (for most
formats). This will be fixed later (debugging shows we're completely missing
the shader recompile depending on format).
v2: based on Jose's feedback, fix comments, variable/function names.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
When you didn't have a texcoord array bound (or a non-1 current w
attrib), we were telling the fragment shader that it could just use "1"
instead of doing expensive pre-gen6 math to invert it. If you drew the
point with a non-1 W value, then you'd get the right size (since all the
vertex computations worked), but we'd mis-interpolate the coordinate
across the face.
Fixes the mesa pointsprite demo on GM45.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=30232
Reviewed-and-tested-by: Ian Romanick <ian.d.romanick@intel.com>
Note: This is a candidate for the stable branches.
missing case GL_REQUIRED_TEXTURE_IMAGE_UNITS_OES is required
by OES_EGL_image_external extension.
Note: This is a candidate for the stable branches.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Previously when an input varying was optimized out of the
FS we would still retain it as an output of the VS.
We now build a hash of live FS input varyings rather
than looking in the FS symbol table. (The FS symbol table
will still contain the optimized out varyings.)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We might want to revisit the normalized_coords semantics, but this is
the current expected behavior.
Fixes fdo bug 61091.
Reviewed-by: Brian Paul <brianp@vmware.com>
The llvm pipeline handles regular filled triangle offsets, but it
doesn't handle offsets for triangles drawn in point or line mode.
Fixes failures found with new piglit polygon-mode-offset test.
Note: This is a candidate for the stable branches.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
There were several issues. We weren't handling different front/back
polygon fill modes. We weren't checking whether the offset applied to
fill mode vs. line mode vs. point mode.
Fixes problems found with the Visualization Toolkit (VTK) test suite.
Note: This is a candidate for the stable branches.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The old logic was kind of twisted, but seemed to work in practice.
Note: This is a candidate for the stable branches.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
When we destroy an ARB vp/fp whose ID was gen'd but not otherwise used we
get a pointer to the dummy/placeholder program. We can't destroy that one
so just skip it. This only failed during context tear-down because
glDeleteProgramsARB() was already aware of dummy programs.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=38086
Note: This is a candidate for the stable branches.
Tested-by: Andreas Boll <andreas.boll.dev@gmail.com>
We sometimes convert GL_QUAD_STRIP prims into GL_TRIANGLE_STRIP, but
that changes the results of the u_trim_pipe_prim() call. We need to
pass the original primitive type to the trim function.
Note that OpenGL's GL_x prim type values match Gallium's PIPE_PRIM_x values.
Fixes a failure in the new piglit degenerate-prims test.
Note: This is a candidate for the stable branches.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
So we don't emit it twice if we ever use the flag on
cayman.
Note: this is a candidate for the 9.1 branch.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
PS_PARTIAL flushes seems to be required in certain
cases to prevent hangs, especially on r6xx.
Note: this is a candidate for the 9.1 branch.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Regardless of what we put in the screen structure, all of the extensions
that compute_version_es2 checks are present and 3.0 will be exposed
anyway.
NOTE: This is a candidate for the 9.1 branch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Commit 86d30dea3c broke building with older
automake versions with this error:
Makefile:769: *** Recursive variable am__v_YACC_ references itself (eventually). Stop.
This patch fixes it. Fix stolen from xorg-macros.
Signed-off-by: Lauri Kasanen <cand@gmx.com>
The recent change for GL core broke the older setup, which broke
gl_PointCoord on pre-gen6 (where gl_PointCoord is undefined if point
sprites are disabled). Fixes the new piglit GLES-2.0/glsl-fs-pointcoord
test.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=32429
Note: This is a candidate for the stable branches.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In a debug build this led to assertion failures, but on a non-debug
build the hardware would just reference the whole vec8 instead of the
same channel 8 times.
Fixes the new piglit glsl-1.40/uniform-buffer/fs-exp2.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=57121
Note: This is a candidate for the stable branches
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
- zero temps/outputs instead of copying (otherwise we won't be able to see
the temps/outputs assignments for small shaders where nothing changes
across big areas
- also show the inputs (as it's often impossible to infer from the rest)
Reviewed-by: Brian Paul <brianp@vmware.com>
Mesa state tracker recently started using PIPE_FORMAT_X8B8G8R8_UNORM,
causing segfaults in texture-packed-formats, because swizze[chan] was
0xff for padding channel (X).
Signed-off-by: José Fonseca <jfonseca@vmware.com>
Now with buffer formats clarification don't need all that logic any longer.
(Note that it never would have worked in any case, because blockwidth and
blockheight were swapped any allocation with multi-byte format would have
had zero size.)
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This clarifies some things and gets rid of some old stuff.
The most significant one is probably that buffers cannot have formats
(nearly all drivers completely ignored format and used width0 as byte size
already in any case). There seems to be no use case for "structured" buffers.
(Note while d3d11 has new Structured Buffers, these still aren't associated
with a format, rather a byte stride, which we can't do yet either way.)
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Some parts calculated key size by using shader information, others by using
the pipe_vertex_element information. Since it is perfectly valid to have more
vertex_elements set than the vertex shader is using those may not be the same,
so we weren't copying over all vertex_element state - this caused the tgsi dump
to assert (iterates over all vertex elements). More importantly in this
situation it would also break vertex texturing completely (since the sampler
state derived from the key is at a different position than expected).
Fix thix by deriving key->nr_vertex_elements from the shader information
instead of the pipe_vertex_element state (unlike dx10, we can't have "holes"
in pipe_vertex_element state, so this should be safe).
(Note that actual llvm shader generation does not use the pipe_vertex_element
state from the key itself in any case (althogh I guess it could) but uses
the one from draw.pt (which should be the same though contains all elements)
instead.)
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The OMOD value was only being folded to one instruction in cases where
the MUL instruction was reading a value written by more than one
instruction.
NOTE: This is a candidate for the stable branches.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Now you can convert assembly strings into a full struct radeon_compiler
object and use it to test individual compiler pases.
NOTE: This is a candidate for the stable branches.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
This way make check can report whether or not the tests pass.
NOTE: This is a candidate for the stable branches.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
These are all files that I authored, but forgot to add the license
headers.
NOTE: This is a candidate for the stable branches.
Signed-off-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
This fixes a bug introduced in commit 258453716f and
triggered whenever "rb" is NULL.
Fixes at least one cause bug #59445:
[SNB/IVB/HSW Bisected]Oglc draw-buffers2(advanced.blending.none) segfault
https://bugs.freedesktop.org/show_bug.cgi?id=59445
(Though segfaults are still possible in that test case, but they have been
present since before commit 258453716f which is what's being fixed here.)
Reviewed-by: Eric Anholt <eric@anholt.net>
It's the reciprocal of the register value.
Fixes piglit fragcoord_w and glsl-fs-fragcoord-zw-perspective.
NOTE: This is a candidate for the 9.1 branch.
Requires corresponding LLVM R600 backend fix to work correctly, but even
without that it doesn't hang anymore.
13 more little piglits.
Depends on LLVM: r175193, r175733
NOTE: This is a candidate for the 9.1 branch.
Pre-Gen6, the SF thread requires exact matching between VS output
slots (aka VUE slots) and FS input slots, even when the corresponding
VS output slot is unused due to being overwritten by point coordinate
replacement (glTexEnvi(GL_POINT_SPRITE, GL_COORD_REPLACE, GL_TRUE)).
As a result, we have a special hack in the VS to ensure when any
texture coordinate is subject to point coordinate replacement, it is
always allocated space in the VUE, even if it isn't written to by the
VS.
This hack isn't needed from Gen6 onwards, since SF (Gen7: SBE)
swizzling has the ability to insert the point coordinate into
gl_TexCoord[] without needing a corresponding unused VUE slot.
Note that no modification of SF setup code is required for this
patch--get_attr_override() already does the right thing. However, we
make a slight comment change to clarify why this works.
In addition to eliminating unnecessary VS recompiles and saving
precious URB space on Gen6+, this will save us the trouble of having
to adjust this hack when we implement geometry shaders.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
For constant and temporary register fetches, the bitcasts weren't done
correctly for the indirect case, leading to crashes due to type mismatches.
Simply do the bitcasts after fetching (much simpler than fixing up the load
pointer for the various cases).
This fixes https://bugs.freedesktop.org/show_bug.cgi?id=61036
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
We don't need to flush resources for each layer, and since we don't actually
care about layer at all in the flush function just drop the parameter.
Also we can use util_copy_box instead of repeated util_copy_rect.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
These used to be illegal a very long time ago, then for some more time
nothing really emitted these so this code path wasn't hit.
Just trivially iterate over box->depth.
(Might be worth refactoring at some point since nowadays all the code
doesn't really do much except for depth textures.)
This fixes https://bugs.freedesktop.org/show_bug.cgi?id=61093
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This patch implements a stub for GL_EXT_discard_framebuffer with
required checks listed by the extension specification. This extension
is required by GLBenchmark 2.5 when compiled with OpenGL ES 2.0
as the rendering backend.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-and-tested-by: Chad Versace <chad.versace@linux.intel.com>
Only compile tested, but should fix at least some piglit fbo-blending tests.
NOTE: This is a candidate for the stable branches.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
That means we can map and read multiple slices with one transfer_map call.
[ Cherry-picked from r600g commit 1aebb6911e ]
11 more little piglits on master, 1 more on the 9.1 branch (Marek's
glTex(Sub)Image improvements on master broke the other 10).
NOTE: This is a candidate for the 9.1 branch.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
GLX_INTEL_swap_event is broken on the server side, where it's
currently unconditionally enabled. This completely breaks
systems running on drivers which don't support that extension.
There's no way to test for its presence on this side, so instead
of disabling it uncondtionally, just disable it for drivers
which are known to not support it. It makes sense because
most drivers do support it right now.
We'll be able to remove this once Xserver properly advertises
GLX_INTEL_swap_event.
Note: This is a candidate for stable branch branches.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=60052
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
Improves on a major performance regression for the dolphin wii emulator
from its move to using UBOs. Performance in the UBO codepath (as
replayed through apitrace) is up 21.1% +/- 2.3% (n=26/29).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We could potentially do some CSE even when the dst types aren't the same
on gen6 where there is no implicit dst type conversion iirc, or in the
case of uniform pull constant loads where the dst type doesn't impact
what's stored. But it's not worth worrying about.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
NOTE: This is a candidate for the 9.1 branch.
This should fix the register allocation explosion on the GLES 3.0 test
on gen6. It also gives us an instruction that will fit our CSE handling.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
NOTE: This is a candidate for the 9.1 branch.
We were correctly relaying the smear from MOV's src, but if the MOV
didn't do a smear, we don't want to smash the smear value from the
instruction being propagated into. Prevents a regression in the
upcoming UBO change.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
NOTE: This is a candidate for the 9.1 branch.
brw_vs_prog_data::userclip hasn't been used since commit f0cecd4
(i965: Move VUE map computation to once at VS compile time).
brw_gs_prog_key::userclip_active hasn't been used since commit 9f3d321
(i965: Make the userclip flag for the VUE map come from VS prog data).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Otherwise, the PBuffer's size was never set. This also initializes
the buffer size for windows, pixmaps, etc.
Fixes http://bugs.freedesktop.org/show_bug.cgi?id=61012
Note: This is a candidate for the stable branches.
A single element in a GLX reply is contained in the header itself.
The number of elements is denoted in the "n" field of the reply.
If "n" is 1, the length of additional data is 0.
The XXX_data_length() function of xcb does not return the length of
the (optional, n>1) data but the number of elements.
Fixes http://bugs.freedesktop.org/show_bug.cgi?id=59876
Note: This is a candidate for the stable branches.
Signed-off-by: Stefan Brüns <stefan.bruens@rwth-aachen.de>
Signed-off-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We weren't mapping the PBO when using the bitmap cache (but we had
the PBO code for the non-cache path.)
Fixes http://bugs.freedesktop.org/show_bug.cgi?id=61026
Note: This is a candidate for the stable branches.
This fixes a regression from ab74fee5e1.
When we use the clip coordinate to compute the screen-space interpolation
factor, we need to first apply the divide-by-W step to the clip
coordinate.
Fixes http://bugs.freedesktop.org/show_bug.cgi?id=60938
Note: This is a candidate for the 9.1 branch.
It has become a bit messy.
Changes:
- finally correct checking for transfer ops depending on the base format
- making sure the base internal format and the texture format match
(we were ignoring it, but it's important for correctness)
- the way-too-strict rule that both src and dst base formats must be the same
was dropped; ensuring the simpler and more permissive rule mentioned above
is enough
- stop using util_blit_pixels; pipe->blit is flexible enough, and now that we
have RGBX and red-alpha formats, pipe->blit can be used for more cases
Reviewed-by: Brian Paul <brianp@vmware.com>
A temporary texture is created such that it matches the format and type
combination and pixels are copied to it using memcpy. Then the blit is used to
copy the temporary texture to the texture image being modified by TexImage or
TexSubImage. The blit takes care of the format and type conversion and
swizzling. The result is a very fast texture upload involving as little CPU
as possible.
This improves performance in apps which upload textures during rendering.
An example is the Wine OpenGL backend for DirectDraw, which I used to test
the game StarCraft. Profiling had shown that TexSubImage was taking 50% of
CPU time without this patch, which was the main motivation for this work, and
now TexSubImage only takes 14% of CPU time. I had to underclock my CPU to see
any difference in the game and this patch does make the game a lot faster
if the CPU is slow (or using the powersave cpufreq profile).
Reviewed-by: Brian Paul <brianp@vmware.com>
This is not easy to hit, because we have 3 code paths now
(tried in this order):
- memcpy-based (skips the blit) -> _mesa_tex_getimage
- blit-based
- slow pixel packing -> _mesa_tex_getimage
The main difference later in the code is the parameters of
_mesa_image_address3d.
Reviewed-by: Brian Paul <brianp@vmware.com>
Based on r600g commit 2b9659c9e6 .
Fixes crashes with 4 piglit tests which are now hitting these formats.
NOTE: This is a candidate for the 9.1 branch.
_mesa_delete_renderbuffer does not call the driver-specific
renderbuffer delete function, so the blorp code was leaking the
Intel-specific bits, including some GEM objects.
Call the renderbuffer's ->Delete() method instead, which does the
right thing.
Fixes Unity rapidly sending the machine into the arms of the OOM-killer
Note: This is a candidate for the 9.1 branch.
Reviewed-by: Eric Anholt <eric@anholt.net>
We need to encode them as Texture instructions since the NumOffsets field
is encoded there. However, we don't encode the actual target in there, this
is derived from the sampler view src later.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Need to take the type into account. Also, if we want to allow
mov's with modifiers we need to pick a type (assume float).
v2: don't allow all modifiers on all type, in particular don't allow
absolute on non-float types and don't allow negate on unsigned.
Also treat UADD as signed (despite the name) since it is used
for handling both signed and unsigned integer arguments and otherwise
modifiers don't work.
Also add tgsi docs clarifying this.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The emulation of these if there's no rounding instruction available
is a bit more complicated than what the code did.
In particular, doing fp-to-int/int-to-fp will not work if the exponent
is large enough (and with NaNs, Infs). Hence such values need to be filtered
out and the original value returned in this case (which fortunately should
always be exact). This comes at the expense of performance (if your cpu
doesn't support rounding instructions).
Furthermore, floor/ifloor/ceil/iceil were affected by precision issues for
values near negative (for floor) or positive (for ceil) zero, fix that as well
(fixing this issue might not actually be slower except for ceil/iceil if the
type is not signed which is probably rare - note iceil has no callers left
in any case).
Also add some new rounding test values in lp_test_arit to actually test
for that stuff (which previously would have failed without sse41).
This fixes https://bugs.freedesktop.org/show_bug.cgi?id=59701.
To help catch mixed up context pointer bugs in the future, add a
trace_context_check() function and some new assertions.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
When a trace_surface object is created in trace_surf_create() we
weren't correctly setting the surface's context pointer. Instead of
it being the trace context, it was the wrapped driver's context.
This caused things to blow up sometimes during surface deallocation.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The GL_ARB_texture_rg spec says that we need to support both texturing
and rendering for the GL_RED and GL_RG formats. So move the format
check up into the rendertarget_mapping[] list. Also, add
PIPE_FORMAT_R8_UNORM to the list of formats required.
Note: This is a candidate for the stable branches.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
We'd been ad-hoc inserting instructions in some SEND messages with no
knowledge of when it was required (so extra instructions), but not all SENDs
(so not often enough). This should do much better than that, though it's
still flow-control-ignorant.
v2: Use BRW_MAX_MRF instead of magic numbers.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=58960
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
NOTE: Candidate for the stable branches.
In GLSL, sampler indices are allocated contiguously from 0. But in the
case of ARB_fragment_program (and possibly fixed function), an app that
uses texture 0 and 2 will use sampler indices 0 and 2, so we were only
allocating space for samplers 0 and 1 and setting up sampler 0. We
would read garbage for sampler 2, resulting in flickering textures and
an angry simulator.
Fixes bad rendering in 0 A.D. and ETQW. This was fixed for pre-gen7 by
28f4be9eb9
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=25201
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=58680
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
NOTE: This is a candidate for stable branches.
I should say "fix", but it has never been used until now.
S8Z24 is the format equivalent to the GL_UNSIGNED_INT_24_8 packing,
so we'll start to see it more often with st/mesa now making smart decisions
about formats.
The DB<->CB copy can change the channel ordering for transfers, other than
that, the internal DB format doesn't really matter.
R600-R700 support is possible except shadow mapping.
FMT_24_8 is broken if the SAMPLE_C instruction is used (no idea why).
Also the sampler swizzling was broken in theory and the fact it worked was
a lucky coincidence.
radeonsi might need to port this.
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
The code was rather broken for non-XYZW on 8-wide, but all of our
callers were using XYZW anyway. For my experiments with using writemask
on texturing, I've been using manual header setup in the compiler
backends, since we want to actually know what registers are written for
optimization and register allocation.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Detect a duplicate Shader type as and error instead of silently allowing
it, restrict to ES2 API.
v2: Tapani Pälli <tapani.palli@intel.com>
- make the check run time instead of compile time
v3: chadv
- Quote spec on which error to generate.
Signed-off-by: bma <Bo.Ma@windriver.com>
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-and-tested-by: Chad Versace <chad.versace@linux.intel.com>
The hardware just doesn't support it. I suspect this was a regression from
the move to fixed MESA_FORMATs for compressed textures and that previously we
were storing uncompressed for this or something.
Fixes GPU hangs in piglit "texwrap GL_EXT_texture_sRGB-s3tc bordercolor
swizzled" on my GM965.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
All of the GLSL specs from GLSL 1.30 (and GLSL ES 3.00) onward contain
language requiring certain integer variables to be declared with the
"flat" keyword, but they differ in exactly *when* the rule is
enforced:
(a) GLSL 1.30 and 1.40 say that vertex shader outputs having integral
type must be declared as "flat". There is no restriction on fragment
shader inputs.
(b) GLSL 1.50 through 4.30 say that fragment shader inputs having
integral type must be declared as "flat". There is no restriction on
vertex shader outputs.
(c) GLSL ES 3.00 says that both vertex shader outputs and fragment
shader inputs having integral type must be declared as "flat".
Previously, Mesa's behaviour was consistent with (a). This patch
makes it consistent with (b) when compiling desktop shaders, and (c)
when compiling ES shaders.
Rationale for desktop shaders: once we add geometry shaders, (b) really
seems like the right choice, because it requires "flat" in just the
situations where it matters. Since we may want to extend geometry
shader support back before GLSL 1.50 (via ARB_geometry_shader4), it
seems sensible to apply this rule to all GLSL versions. Also, this
matches the behaviour of the nVidia proprietary driver for Linux, and
the expectations of Intel's oglconform test suite.
Rationale for ES shaders: since the behaviour specified in GLSL ES
3.00 matches neither pre-GLSL-1.50 nor post-GLSL-1.50 behaviour, it
seems likely that this was a deliberate choice on the part of the GLES
folks to be more restrictive. Also, the argument in favor of (b)
doesn't apply to GLES, since it doesn't support geometry shaders at
all.
Some discussion about this has already happened on the Mesa-dev list.
See:
http://lists.freedesktop.org/archives/mesa-dev/2013-February/034199.html
Fixes piglit tests:
- glsl-1.30/compiler/interpolation-qualifiers/nonflat-*.frag
- glsl-1.30/compiler/interpolation-qualifiers/vs-flat-int-0{2,3,4,5}.vert
- glsl-es-3.00/compiler/interpolation-qualifiers/varying-struct-nonflat-{int,uint}.frag
Fixes oglconform tests:
- glsl-q-inperpol negative.fragin.{int,uint,ivec,uvec}
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
In the GLSL 1.30 spec, section 4.3.6 ("Outputs") says:
"If a vertex output is a signed or unsigned integer or integer
vector, then it must be qualified with the interpolation qualifier
flat."
The GLSL ES 3.00 spec further clarifies, in section 4.3.6 ("Output
Variables"):
"Vertex shader outputs that are, *or contain*, signed or unsigned
integers or integer vectors must be qualified with the
interpolation qualifier flat."
(Emphasis mine.)
The language in the GLSL ES 3.00 spec is clearly correct and should be
applied to all shading language versions, since varyings that contain
ints can't be interpolated, regardless of which shading language
version is in use.
(Note that in GLSL 1.50 the restriction is changed to apply to
fragment shader inputs rather than vertex shader outputs, to
accommodate the fact that in the presence of geometry shaders, vertex
shader outputs are not necessarily interpolated. That will be
addressed by a future patch).
NOTE: This is a candidate for stable branches.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
From GLSL ES 3.00 section 4.5.4 ("Default Precision Qualifiers"):
"The precision statement
precision precision-qualifier type;
can be used to establish a default precision qualifier. The type
field can be either int or float or any of the sampler types, and
the precision-qualifier can be lowp, mediump, or highp."
GLSL ES 1.00 has similar language. GLSL 1.30 doesn't allow precision
qualifiers on sampler types, but this seems like an oversight (since
the intention of including these in GLSL 1.30 is to allow
compatibility with ES shaders).
Previously, Mesa followed GLSL 1.30 and only allowed default precision
qualifiers to be set for float and int. This patch makes it follow
GLSL ES rules in all cases.
Fixes Piglit tests default-precision-sampler.{vert,frag}.
Partially addresses https://bugs.freedesktop.org/show_bug.cgi?id=60737.
NOTE: This is a candidate for stable branches.
Reviewed-by: Eric Anholt <eric@anholt.net>
Otherwise, we fail to correctly handle GL_PRIMITIVE_RESTART_FIXED_INDEX.
Fixes gles3conform's primitive_restart_mode test.
NOTE: This is a candidate for the 9.1 branch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This commit allows using glGetTexImage during rendering and still
maintain interactive framerates.
This improves performance of WarCraft 3 under Wine. The framerate is improved
from 25 fps to 39 fps in the main menu, and from 0.5 fps to 32 fps in the game.
v2: fix choosing the format for decompression
This is for glGetTexImage and it will be used for samplers only (which some
drivers already implement by reading util_format_description).
v2: incorporate Brian's suggestion
Reviewed-by: Brian Paul <brianp@vmware.com>
Seems that alpha test being enabled confuse the GPU on the order in
which it should perform the Z testing. So force the order programmed
throught db shader control.
v2: Only force z order when alpha test is enabled
v3: Update db shader when binding new dsa + spelling fix
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
In OpenGL 4.3, new language was added that would require
this check. But, if this check results in broken applications
then perhaps it will be reversed.
For now, remove this check and re-evaluate when
desktop GL 4.3 is closer.
NOTE: This is a candidate for the 9.1 branch.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
With the llvm patches, fixing 14 piglit tests in total.
v2: increase the const limit
v3: document the const limit
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
When the user specifies an unsupported GLSL version,
_mesa_glsl_parse_state::process_version_directive() nicely gives them
an error message telling them which GLSL versions are supported.
Previous to this patch, the logic for determining whether a given
language version was supported was independent from the logic to
generate this error message string; as a result, we had a bug where
GLSL 3.00 would never be listed in the error message as an available
language version, even if it was really available.
To make matters worse, the code for generating the error message
string assumed that desktop GL versions were always separated by 0.10,
an assumption that will be wrong as soon as we support GLSL 3.30.
This patch fixes both problems by adding a table of supported GLSL
versions to _mesa_glsl_parse_state; this table is used both to
generate the error message and to check whether a given version is
supported.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Discovered accidentally when changing SAMPLE_L definition.
Turns out the lod arguments were already correct for the new definition
but the compare and derivs were not.
Reviewed-by: Christoph Bumiller <e0425955@student.tuwien.ac.at>
It looks like using coord.w as explicit lod value is a mistake, most likely
because some dx10 docs had it specified that way. Seems this was changed though:
http://msdn.microsoft.com/en-us/library/windows/desktop/hh447229%28v=vs.85%29.aspx
- let's just hope it doesn't depend on runtime build version or something.
Not only would this need translation (so go against the stated goal these
opcodes should be close to dx10 semantics) but it would prevent usage of this
opcode with cube arrays, which is apparently possible:
http://msdn.microsoft.com/en-us/library/windows/desktop/bb509699%28v=vs.85%29.aspx
(Note not only does this show cube arrays using explicit lod, but also the
confusion with this opcode: it lists an explicit lod parameter value, but then
states last component of location is used as lod).
(For "true" hw drivers, only nv50 had code to handle it, and it appears the
code was already right for the new semantics, though fix up the seemingly
wrong c/d arguments while there.)
v2: fix comment, separate out other changes.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
These were, at some point in the past, used to request that Xorg's
compiler.h export a static inline xf86ReadMmio32 instead of a function
pointer. compiler.h only has this option for DEC Alpha.
But Xorg's compiler.h isn't being included by either of these two files
and the radeon driver still works on Alpha, so the definitions are dead
and not needed.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
link up the fs outputs and blend inputs, and make sure the second blend source
is correctly loaded and converted (which is quite complex).
There's a slight refactoring of the monster generate_unswizzled_blend()
function where it makes sense to factor out alpha conversion (which needs
to run twice for dual source blend).
This passes piglit arb_blend_func_extended tests.
v2: remove new but ultimately not used function...
Reviewed-by: Brian Paul <brianp@vmware.com>
These are more recent additions, and no one remembered to update the
INTEL_DEBUG=state code.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This reorders the "brw_bits" array in brw_state_upload.c to match the
order of the #defines in brw_context.h.
Otherwise, it's really hard to see if any are missing.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
These don't need to be re-disabled on every batch if we're using
hardware contexts. (If we're not, this is equivalent.)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
If the same context try to flink and open the object, use the
same bo struct instead of opening a new gem handle for the object.
This way we avoid avoid having 2 different handle pointing to the
same kernel object which can latter lead to trouble with virtual
address.
Fix:
https://bugs.freedesktop.org/show_bug.cgi?id=60200
Signed-off-by: Martin Andersson <g02maran@gmail.com>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
minecraft apparently has its piles of display lists each contain 6
instances of glBegin(GL_QUADS)/verts/glEnd(), which appear in the
compiled list as 6 prims of 4 verts each in one draw call. We can
reduce driver overhead even more by making that one prim of 24 verts.
Improves minecraft performance by 1.6% +/- .25% (n=446)
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
We used to have clip planes optionally included in the push constants,
resulting in a variable amount of data uploaded, but no more. This also
means less wasted space in the batch for our push constants.
v2: Update _NEW_TRANSFORM state bit information.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
It doesn't matter with our current implementation of MapBufferRange,
but it was wrong -- the result pointer is read by intel_upload_data().
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The Mesa format can be RGBA8888_REV, the format/type can be
GL_RGBA/GL_UNSIGNED_BYTE, but the actual texture internal format can be
LUMINANCE_ALPHA, INTENSITY, etc. Therefore we should look at the base
internal format as well.
NOTE: This is a candidate for the stable branches.
Reviewed-by: Brian Paul <brianp@vmware.com>
_mesa_base_tex_format doesn't accept GL_BGR and GL_ABGR_EXT, etc.
v2: add a (now hopefully complete) helper function to deal with this
NOTE: This is a candidate for the stable branches.
Reviewed-by: Brian Paul <brianp@vmware.com>
- swapBytes has no effect on 8-bit single-component formats
- GL_SHORT is in host byte order, so checking for littleEndian is unnecessary,
I decided to make the change for single-component formats only
Based on suggestions from Michel Dänzer.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
This patch fixes the vertex_header mask bitfield store in big-endian
architectures by bit-swap the fields accordingly.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Older hardware cannot do ARB_texture_rgb10_a2ui, and the translation
code for OES_compressed_ETC1_RGB8_texture was never implemented in the
i915 driver.
NOTE: This is a candidate for all stable branches.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This should handle the new lod_zero modifier more correctly.
The runtime-conditional is a bit more complex however we now also do
scalar lod computation when appropriate which should more than make up for it.
The refactoring should also fix an issue with explicit lods
(lod clamp wasn't applied to them).
Also, always pass lod as the 5th element from tgsi executor, which simplifies
things (get rid of annoying conditionals later).
v2: based on Brian's feedback, use switch in a couple of places, fix up
some function parameter names, fix up comments.
Reviewed-by: Brian Paul <brianp@vmware.com>
There were several bugs how this was handled, most opcodes wouldn't even
have fetched the right arguments.
Also, the tex "target" is coming from the sampler view, hence it cannot
have information about shadow comparisons - fortunately this is not only
sampler state but also needs to have matching instruction, so just use this
instead to identify shadow comparisons.
Still untested (compiles...).
Note that sample_i and sviewinfo are still busted (just assert).
(The problem is that the interface for doing the opengl-equivalent functions
txf and txq is tied to the specific the sampler itself but these opcodes
have no sampler associated with them. Oops...)
Also, even the other sample instructions will not work correctly since
they always operate on samplers which include the texture state. Fixing
this wouldn't be that difficult but most likely make softpipe quite a bit
slower when using the OpenGL tex opcodes (as the samplers have pre-baked
function calls in the sampler state depending on texture state and that stuff
would need to be evaluated at runtime), so leave it for now.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Need to calculate the number of mip levels (if it would be worthwile could
store it in dynamic state).
While here, the query code also used chan 2 for the lod value.
This worked with mesa state tracker but it seems safer to use chan 0.
Still passes piglit textureSize (with some handwaving), though the non-GL
parts are (largely) untested.
v2: clarify and expect the sviewinfo opcode to return ints, not floats,
just like the OpenGL textureSize (dx10 supports dst modifiers with resinfo).
Also simplify some code.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
They are similar to old-style tex opcodes but with separate sampler and
texture units (and other arguments in different places).
Also adjust the debug tgsi dump code.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Fixes uninitialized scalar field defects reported by Coverity.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
None of the filters used it (why would they). Maybe that param
was just there because some of the lines were considered to be
too short...
Reviewed-by: Dave Airlie <airlied@redhat.com>
This optimized filter (when using repeat wrap modes,
linear min/mag/mip filters, pot textures) only applies to 2d textures,
but nothing prevented it from being used for other textures (likely
leading to very bogus sample results).
Note: This is a candidate for the 9.0 branch.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The signed case didn't do what the comment indicated. Should increase rounding
precision (at the expense of performance since the former code was effectively
a no-op).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This adds support of the additional blending factors to the blend function
itself, and also enables testing of it in lp_test_blend (which passes).
Still need to add the glue code of linking fs shader outputs to blend inputs
in llvmpipe, and probably need to add special handling if destination doesn't
include alpha (which lp_test_blend doesn't test).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
There can be other per-thread data than just vis_counter, so pass a struct
around instead (some of our non-public code uses this already and this
difference is a major cause of merge pain).
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The exa core will already set the pointer to NULL prior calling
the callback function. So don't bail out in the callback if it's
already NULL.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
At eglSwapBuffer time, we blindly assume we have a back buffer, but the
back buffer only gets allocated when somebody tries to render something.
NOTE: This is a candidate for the 9.0 and 9.1 branches.
https://bugs.freedesktop.org/show_bug.cgi?id=60086
Previous to this patch, there were 13 identical definitions of this
macro in Mesa source. That's ridiculous. This patch consolidates 6
of them to a single definition in src/mesa/main/macros.h.
Unfortunately, I wasn't able to eliminate the remaining definitions,
since they occur in places that don't include src/mesa/main/macros.h:
- include/pci_ids/pci_id_driver_map.h
- src/egl/drivers/dri2/egl_dri2.h
- src/egl/main/egldefines.h
- src/gbm/main/backend.c
- src/gbm/main/gbm.c
- src/glx/glxclient.h
- src/mapi/mapi/stub.c
I'm open to suggestions as to how to deal with the remaining redundancy.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, the i965 driver enabled EXT_framebuffer_multisample even
on pre-gen6 chipsets. However, since we don't support multisampling
on these chips, we set GL_MAX_SAMPLES=1 (the minimum allowed by
EXT_framebuffer_multisample), and if the client ever requested a
multisample buffer, we quietly supplied them with a single-sampled
buffer instead.
After some discussion on the mailing list (see thread
"ext_framebuffer_multisample: check for num_samples<=1"), it's clear
that this was the wrong approach. The correct approach is to only
expose EXT_framebuffer_multisample when we truly support
multisampling; that frees us to set a sensible value of
GL_MAX_SAMPLES=0 on other chipsets, so that we never have to deal with
a client requesting a multisample buffer when multisampling isn't
supported.
This change causes the following piglit tests to be skipped on
chipsets prior to Gen6:
- "ARB_framebuffer_sRGB/blit {renderbuffer,texture}
{linear,linear_to_srgb,srgb,srgb_to_linear}
{downsample,msaa,upsample} {disabled,enabled}"
- EXT_framebuffer_multisample/blit-mismatched-formats
- EXT_framebuffer_multisample/blit-mismatched-sizes
- EXT_framebuffer_multisample/dlist
- EXT_framebuffer_multisample/interpolation 0 *
- EXT_framebuffer_multisample/minmax
- EXT_framebuffer_multisample/negative-copypixels
- EXT_framebuffer_multisample/negative-copyteximage
- EXT_framebuffer_multisample/negative-max-samples
- EXT_framebuffer_multisample/negative-mismatched-samples
- EXT_framebuffer_multisample/negative-readpixels
- EXT_framebuffer_multisample/renderbuffer-samples
- EXT_framebuffer_multisample/renderbufferstorage-samples
- EXT_framebuffer_multisample/samples
This is expected, since the above tests exercise MSAA functionality,
and shouldn't be run on systems prior to Gen6.
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously the loop_state was allocated in the loop_analysis
constructor, but not freed in the (nonexistent) destructor. Moving
the allocation of the loop_state makes this code appear less sketchy.
Either way, there is no actual leak. The loop_state is freed by the
single caller of analyze_loop_variables.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: Dave Airlie <airlied@freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=57753
In the documentation for BindBufferRange, OpenGL specs from 3.0
through 4.1 contain this language:
"The error INVALID_VALUE is generated if size is less than or
equal to zero or if offset + size is greater than the value of
BUFFER_SIZE."
This text was dropped from OpenGL 4.2, and it does not appear in the
GLES 3.0 spec.
Presumably the reason for the change is because come clients change
the size of the buffer after calling BindBufferRange. We don't want
to generate an error at the time of the BindBufferRange call just
because the old size of the buffer was too small, when the buffer is
about to be resized.
Since this is a deliberate relaxation of error conditions in order to
allow clients to work, it seems sensible to apply it to all versions
of GL, not just GL 4.2 and above.
(Note that there is no danger of this change allowing a client to
access data beyond the end of a buffer. We already have code to
ensure that that doesn't happen in the case where the client shrinks
the buffer after calling BindBufferRange).
Eliminates a spurious error message in the gles3 conformance test
"transform_feedback_offset_size".
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This matches the behavior of the Windows driver, but a bspec reference
should would be nice.
NOTE: This is a candidate for the 9.0 and 9.1 branches.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Should have been done in d9948e49 but I missed it because
MAX_VARYING_FLOATS doesn't appear in the ES 3 spec, but is the same
value as MAX_VARYING_COMPONENTS.
NOTE: Candidate for the 9.1 branch
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Append the overloaded vector type used for passing in the addressing
parameters.
Without this, LLVM uses the same function signature for all those types,
which cannot work.
Fixes problems e.g. with FlightGear and Red Eclipse.
Was using the pixel size instead of the number of block for the slice
tile max computation which resulted in dma writing at wrong address.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Now that we have support for overriding alpha to 1.0, we can handle
blitting between these formats in either direction.
For now, we only support two XRGB formats: MESA_FORMAT_XRGB8888 and
MESA_FORMAT_RGBX8888_REV. Most places only appear to worry about the
former, so ignore the latter for now. We can always add it later.
NOTE: This is a candidate for the 9.1 branch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Martin Steigerwald <martin@lichtvoll.de>
Currently, Blorp requires the source and destination formats to be
equal. However, we'd really like to be able to blit between XRGB and
ARGB formats; our BLT engine paths have supported this for a long time.
For ARGB -> XRGB, nothing needs to occur: the missing alpha is already
interpreted as 1.0. For XRGB -> ARGB, we need to smash the alpha
channel to 1.0 when writing the destination colors. This is fairly
straightforward with blending.
For now, this code is never used, as the source and destination formats
still must be equal. The next patch will relax that restriction.
NOTE: This is a candidate for the 9.1 branch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Martin Steigerwald <martin@lichtvoll.de>
The BLT engine has many limitations. Currently, it can only blit
X-tiled buffers (since we don't have a kernel API to whack the BLT
tiling mode register), which means all depth/stencil operations get
punted to meta code, which can be very CPU-intensive.
Even if we used the BLT engine, it can't blit between buffers with
different tiling modes, such as an X-tiled non-MSAA ARGB8888 texture
and a Y-tiled CMS ARGB8888 renderbuffer. This is a fundamental
limitation, and the only way around that is to use BLORP.
Previously, BLORP only handled BlitFramebuffer. This patch adds an
additional frontend for doing CopyTexSubImage. It also makes it the
default. This is partly to increase testing and avoid hiding bugs,
and partly because the BLORP path can already handle more cases. With
trivial extensions, it should be able to handle everything the BLT can.
This helps PlaneShift massively, which tries to CopyTexSubImage2D
between depth buffers whenever a player casts a spell. Since these
are Y-tiled, we hit meta and software ReadPixels paths, eating 99% CPU
while delivering ~1 FPS. This is particularly bad in an MMO setting
because people cast spells all the time.
It also helps Xonotic in 4X MSAA mode. At default power management
settings, I measured a 6.35138% +/- 0.672548% performance boost (n=5).
(This data is from v1 of the patch.)
No Piglit regressions on Ivybridge (v3) or Sandybridge (v2).
v2: Create a fake intel_renderbuffer to wrap the destination texture
image and then reuse do_blorp_blit rather than reimplementing most
of it. Remove unnecessary clipping code and conditional rendering
check.
v3: Reuse formats_match() to centralize checks; delete temporary
renderbuffers. Reorganize the code.
v4: Actually copy stencil when dealing with separate stencil buffers but
packed depth/stencil formats. Tested by a new Piglit test.
NOTE: This is a candidate for the 9.1 branch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com> [v4]
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> [v3]
Reviewed-and-tested-by: Carl Worth <cworth@cworth.org> [v2]
Tested-by: Martin Steigerwald <martin@lichtvoll.de> [v3]
I need to use this from C++ code.
NOTE: This is a candidate for the 9.1 branch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
These formats were added a few months after these tables were committed.
No idea why we have the table though. AFAIK, texstore always takes the slow path
for GL_RGBn.
Reviewed-by: Brian Paul <brianp@vmware.com>
The GLSL compiler can simplify clamp(v,0,1) to saturate. The state tracker
doesn't use it yet, but it will.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
I'd like to test Mesa OpenGL ES along side with NVIDIA libGL drivers. But
without this change, I get a NULL pointer dereference.
Reviewed-by: Brian Paul <brianp@vmware.com>
Fixes uninitialized pointer field defect reported by Coverity.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
We weren't emitting the SVGA_RS_OUTPUTGAMMA state so sRGB rendering
didn't work properly.
Fixes piglit's framebuffer-srgb test.
Note: This is a candidate for the stable branches.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Strangely, the DRIimage interface we have passes the pitch in pixels
instead of bytes, which anholt missed in the change to using bytes for
region pitch.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Since transform feedback needs to be able to access individual fields
of varying structs, we can no longer match up the arguments to
glTransformFeedbackVaryings() with variables in the vertex shader.
Instead, we build up a hashtable which records information about each
possible name that is a candidate for transform feedback, and then
match up the arguments to glTransformFeedbackVaryings() with the
contents of that hashtable.
Populating the hashtable uses the program_resource_visitor
infrastructure, so the logic is shared with how we handle uniforms.
NOTE: This is a candidate for the 9.1 branch.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Previously, transform feedback varyings were parsed in an ad-hoc
fashion that wasn't compatible with structs (or array of structs).
This patch makes it use parse_program_resource_name(), which correctly
handles both.
Note that parse_program_resource_name()'s technique for handling
mal-formed input strings is to simply let them through and rely on the
fact that a future name lookup will fail. Because of this,
tfeedback_decl::init() no longer needs to return a boolean error
code--it always succeeds, and if the input was mal-formed the error
will be detected later.
NOTE: This is a candidate for the 9.1 branch.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
There's actually nothing uniform-specific in uniform_field_visitor.
It is potentially useful for all kinds of program resources (in
particular, future patches will use it for transform feedback
varyings).
This patch renames it to program_resource_visitor, and clarifies
several comments, to reflect the fact that it is useful for more than
just uniforms.
NOTE: This is a candidate for the 9.1 branch.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The parsing logic is moved to a new function in the GLSL module,
parse_program_resource_name(). This name was chosen because it should
eventually be useful for handling everything that OpenGL 4.3 calls
"program resources" (e.g. uniforms, vertex inputs, fragment outputs,
and transform feedback varyings).
Future patches will make use of this function for linking transform
feedback varyings.
NOTE: This is a candidate for the 9.1 branch.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Previously we were relying on CFLAGS_FOR_BUILD to be the same as CFLAGS
when not cross compiling, but this assumption didn't take into
consideration 32-bit builds on 64-bit systems. More generally, not
honoring CFLAGS is bad.
Automake is evidently too stupid to accept
if CROSS_COMPILING
CC = @CC_FOR_BUILD@
...
else
CC = @CC@
endif
without warning that CC has been already defined. The warnings are
harmless, but I'd prefer to avoid future reports about them, so define
proxy variables, which are assigned inside the conditional and then
unconditionally assigned to CC et al.
NOTE: This is a candidate for the 9.1 branch.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=59737
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=60038
The glsl-to-tgsi translater will emit SQRT to implement GLSL's sqrt()
and distance() functions if the PIPE_SHADER_CAP_TGSI_SQRT_SUPPORTED
query says it's supported by the driver.
Otherwise, sqrt(x) is implemented with x*rsq(x). The problem with
this is sqrt(0) must be handled specially because rsq(0) might be
Inf/NaN/undefined (and then 0*rsq(0) is Inf/Nan/undefined). In the
glsl-to-tgsi code we use an extra CMP to check if x is zero and then
replace the result of x*rsq(x) with zero.
In the end, this makes sqrt() generate much more reasonable code for
drivers that can do square roots.
Note that many of piglit's generated shader tests use the GLSL
distance() function.
In particular, the LOD bias and depth comparison values are packed before the
'normal' texture coordinates, and the array slice and LOD values are appended.
NOTE: This is a candidate for the 9.1 branch.
In particular, rework the sRGB/linear format selection code.
There's no reason to mess with the Mesa format.
Just do everything in terms of the gallium pipe_format.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
The code before was getting a pipe format, then calling
st_pipe_format_to_mesa_format() and then converting back again with
st_mesa_format_to_pipe_format(). This removes one conversion step.
If we call gl[Copy]TexImage2D() with a generic compression format
(e.g. intFormat=GL_COMPRESSED_RGBA) we can't choose a DXT format if
we don't have the external DXT compression library.
We weren't actually enforcing this before since the
pipe_screen::is_format_supported(DXT) query has no dependency on
the DXT compression library.
Now if we're given a generic compressed format and we can't do DXT
compression we'll fall back to a non-compressed format.
v2: use util_format_is_s3tc() function and add more comments about
the allow_dxt parameter.
Note: This is a candidate for the stable branches.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
When glCompressedTexImage is called the internalFormat is a specific
format for the incoming image and the the hardware format should be
the same (since we never do format transcoding). So use the simpler
_mesa_glenum_to_compressed_format() function. This change is also
needed for the next patch.
Note: This is a candidate for the stable branches.
Ivybridge doesn't appear to have the same errata as Sandybridge; no
corruption was observed by setting it to more than the minimal correct
value. It's possible that we were simply lucky, since the URB entries
are 1024-bit on Ivybridge vs. 512-bit Sandybridge. Or perhaps the
underlying hardware issue is fixed.
Either way, we may as well program the minimum value since it's now
readily available, likely to be more efficient, and possibly more
correct.
v2: Use GEN7_SBE_* defines rather than GEN6_SF_*. (A copy and paste
mistake.) They're the same, but using the right names is better.
NOTE: This is a candidate for all stable branches.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
(This commit message was primarily written by Paul Berry, who explained
what's going on far better than I would have.)
Previous to this patch, we thought that the only restrictions on
3DSTATE_SF's URB read length were (a) it needs to be large enough to
read all the VUE data that the SF needs, and (b) it can't be so large
that it tries to read VUE data that doesn't exist. Since the VUE map
already tells us how much VUE data exists, we didn't bother worrying
about restriction (a); we just did the easy thing and programmed the
read length to satisfy restriction (b).
However, we didn't notice this erratum in the hardware docs: "[errata]
Corruption/Hang possible if length programmed larger than recommended".
Judging by the context surrounding this erratum, it's pretty clear that
it means "URB read length must be exactly the size necessary to read all
the VUE data that the SF needs, and no larger". Which means that we
can't program the read length based on restriction (b)--we have to
program it based on restriction (a).
The URB read size needs to precisely match the amount of data that the
SF consumes; it doesn't work to simply base it on the size of the VUE.
Thankfully, the PRM contains the precise formula the hardware expects.
Fixes random UI corruption in Steam's "Big Picture Mode", random terrain
corruption in PlaneShift, and Piglit's fbo-5-varyings test.
NOTE: This is a candidate for all stable branches.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=56920
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=60172
Tested-by: Jordan Justen <jordan.l.justen@intel.com> (v1/Piglit)
Tested-by: Martin Steigerwald <martin@lichtvoll.de> (PlaneShift)
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
The maximum SF source attribute is necessary to compute the Vertex URB
read length properly, which will be done in the next commit.
NOTE: This is a candidate for all stable branches.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Tested-by: Martin Steigerwald <martin@lichtvoll.de>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
The next patch will benefit from easy access to the source attribute
number and whether or not we're swizzling. It doesn't want the final
attr_override DWord form, however.
NOTE: This is a candidate for all stable branches.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Tested-by: Martin Steigerwald <martin@lichtvoll.de>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Back when ir_var_in and ir_var_out signified both function parameters
and shader input/outputs, we had trouble distinguishing the two when
looking at a dereference. Now that we have separate ir_var_shader_in
and ir_var_shader_out modes, we can determine this easily.
Removing the hash table saves memory and CPU overhead.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
The third argument of AC_ARG_WITH is evaluated for any provided value,
not only on --with-, so it must not force-enable the feature
Also, setting $with_llvm_shared_libs in the opencl check was overriding
the user switch
https://bugs.freedesktop.org/show_bug.cgi?id=59851
Signed-off-by: Quentin Glidic <sardemff7+git@sardemff7.net>
Save miptree level info to DRIImage:
- Appropriately-aligned base offset pointing to the image
- Additional x/y adjustment offsets from above.
v8: -Bump intelImageExtension version
v9: -Don't use internal _eglError but implement error reporting in new DRI inteface
instead. This fixes Android build problems based on feedback from
Adrian M Negreanu and Chad Versace.
-Move the non-tile-aligned check and error-reporting to intel_set_texture_image_region
v10: -Don't #include "egl/main/eglcurrent.h". [chadv]
Reviewed-by: Eric Anholt <eric@anholt.net> (v6)
Acked-by: Chad Versace <chad.versace@linux.intel.com> (v10)
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
We need to take account the offset from original bo when using glTexSubImage()
and other functions that manipulate the subregion of an exported texture.
Offsets are appended to mapped region address and when blitting from a source
region.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
When binding a region to a texture image, re-create the miptree base-level
considering the offset and dimension information exported by DRIImage.
v8: - Move the alignment surface address checks from the image-from-texture
code to the texture-from-image side. This allows the error reporting to conform to
OES_EGL_Image and to prevent mixing up EGL and GL errors. Reported by Chad Versace.
- Addressed an existing issue in renderbuffer case where there is a
a possibility of creating EGL images out of depthstencil textures which isn't
really possible. This was spotted by Eric earlier.
Reviewed-by: Eric Anholt <eric@anholt.net> (v6)
Reviewed-by: Chad Versace <chad.versace@linux.intel.com> (v8)
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
If the offsets are present, this lets us specify a particular level and slice
in a shared region using the base level of an exported mip-map tree.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Add create image from texture extension and bump version.
v8: - Add appropriate image errors codes in DRI interface so we don't
have to use internal EGL functions in driver. Suggested by Chad Versace.
Reviewed-by: Eric Anholt <eric@anholt.net> (v6)
Reviewed-by: Chad Versace <chad.versace@linux.intel.com> (v8)
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Was broken since commit bf469f4edc
('gallium: add void *user_buffer in pipe_index_buffer').
Fixes 11 piglit tests and lots of missing geometry e.g. in TORCS.
NOTE: This is a candidate for the 9.1 branch.
The svga device doesn't handle them. Replace with zeros.
Fixes several piglit tests, such as "glsl-const-builtin-inversesqrt".
Reviewed-by: Reviewed-by: José Fonseca <jfonseca@vmware.com>
glRasterPos doesn't exist in the core profile.
NOTE: This is a candidate for the stable branches (9.0 and 9.1).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This along with the latest drm-fixes branch should help with bad performance
of MSAA. Remember: Nx MSAA can't be more than N times slower (where N=2,4,6).
Anyway, I recommend at least 512 MB of VRAM for Full HD 6x MSAA.
NOTE: This is a candidate for the 9.1 branch.
GLX uses mapi/glapi/libglapi.la, which is only built for OpenGL.
If the user specified --enable-xlib-glx --disable-opengl, error out, as these
cannot be both observed at the same time. If the user just specified
--disable-opengl but not --disable-glx, print a warning and disable GLX as
well.
NOTE: This is a candidate for the stable branches.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=59364
Tested-by: Tom Stellard <thomas.stellard@amd.com>
R600_DUMP_SHADERS environment var now allows to choose dump method:
0 (default) - no dump
1 - full dump (old dump)
2 - disassemble
3 - both
v2: fix output for burst_count > 1
v3: use more human-readable output for kcache data in CF_ALU_xxx clauses,
improve output for ALU_EXTENDED, other minor fixes
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
v3: added some flags including condition codes for ALU,
fixed issue with CF reverse lookup (overlapping ranges of CF_ALU_xxx
and other CF instructions)
rebased on current master
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
We are now seing cs that can go over the vram+gtt size to avoid
failing flush early cs that goes over 70% (gtt+vram) usage. 70%
is use to allow some fragmentation.
The idea is to compute a gross estimate of memory requirement of
each draw call. After each draw call, memory will be precisely
accounted. So the uncertainty is only on the current draw call.
In practice this gave very good estimate (+/- 10% of the target
memory limit).
v2: Remove left over from testing version, remove useless NULL
checking. Improve commit message.
v3: Add comment to code on memory accounting precision
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
This reverts commit 2906e2034c.
Fixes a regression in the glean depthStencil test.
Reverting this does not affect any tests in es3conform, so a more recent
patch must have also fixed the failure this one was intended to fix.
Reported-by: lu hua <huax.lu@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=59494
v2: Update to handle BufferSize being -1 and return a NULL sampler
view if the specified range would cause out of bounds access.
Reviewed-by: Brian Paul <brianp@vmware.com>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
v2: Record texObj.BufferSize as -1 in TexBuffer(non-Range) instead
of the buffer's current size so we know we always have to use the
full size of the buffer object (i.e. even if it changes without the
user calling TexBuffer again) for the texture.
Clarify invalid offset alignment error message.
v3: Use extra GL_CORE-only section in get_hash_params.py for
TEXTURE_BUFFER_OFFSET_ALIGNMENT.
v4: Remove unnecessary check for profile in _mesa_TexBufferRange.
Add check for extension enable in get_tex_level_parameter_buffer.
v5: Fix position in gl_API.xml.
Add comment about meaning of BufferSize == -1.
v6: Add back checks for core profile and add a note about it.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Check that the return value from xcb_dri2_swap_buffers_reply is
non-NULL before accessing the struct members.
Note: This is a candidate for the 9.0 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
S8_UNORM was inadvertedly supported together with Z16_UNORM.
I tried to update the code to accomodate stencil-only -- it seemed a simple
thing to do -- but "fbo-stencil clear GL_STENCIL_INDEX8" still fails,
and it's not worth debugging.
Therefore although this change tries to update for S8_UNORM, it also
disables it completely.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
This special path hadn't been exercised by my earlier testing, and mask
values weren't being properly truncated to match the values.
This change fixes that.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The struct padding got broken by c789b981b2.
This caused serious performance regression because part of the key was
uninitialized and hence the shader always recompiled (at least on release
builds...).
While here also fix key size calculation when the number of samplers
and the number of sampler views are different.
v2: add comment
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The swrast fragment program interpreter has trouble computing the
right texture LOD because it doesn't have easy access to input
derivatives. This causes the GLSL-based meta generate mipmap code
to fetch texels from the wrong mipmap level.
One possible fix would be to set the GL_TEXTURE_MIN/MAX_LOD parameters
to limit sampling from the right level. But let's just use the
_mesa_generate_mipmap() fallback since it's a lot faster than using
the fragment shader interpreter.
Fixes http://bugs.freedesktop.org/show_bug.cgi?id=54240
Note: This is a candidate for the 9.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
The gallium docs for pipe_screen::is_format_supported() says that
samples==0 or samples==1 both mean that multisampling is not supported.
Return GL_MAX_SAMPLES==0 instead of 1 for consistency with other drivers.
Note: This is a candidate for the 9.0 branch.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Simply by adjusting the vector element width after/before
reading/writing the depth-stencil values.
Ran several GL_DEPTH_COMPONENT16 piglit tests without regressions.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The maximum number of URB entries come from the 3DSTATE_URB_VS and
3DSTATE_URB_GS state packet documentation; the thread count information
comes from the 3DSTATE_VS and 3DSTATE_PS state packet documentation.
NOTE: This is a candidate for the 9.0 branch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
The packet length may change at some point in the future. Specifying it
explicitly (rather than hardcoding it in the command #define) allows us
to change it much more easily in the future.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
I did not list the *_get_program_binary extensions since they're not
useful to anyone with their current implementation (that supports 0
binary formats).
Before, we were keeping a CPU-only buffer to accumulate the batchbuffer in,
which was an improvement over mapping the batch through the GTT directly
(since any readback or other failure to stream through write combining
correctly would hurt). However, on LLC-sharing architectures we can do better
by mapping the batch directly, which reduces the cache footprint of the
application since we no longer have this extra copy of a batchbuffer around.
Improves performance of GLBenchmark 2.1 offscreen on IVB by 3.5% +/- 0.4%
(n=21). Improves Lightsmark performance by 1.1 +/- 0.1% (n=76). Improves
cairo-gl performance by 1.9% +/- 1.4% (n=57).
No statistically significant difference in GLB2.1 on SNB (n=37). Improves
cairo-gl performance by 2.1% +/- 0.1% (n=278).
Fixes side effect in assertion defects reported by Coverity.
Note: This is a candidate for the 9.1 branch.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Currently a gralloc internal structure is exposed to Mesa,
Use a query function instead to maintain ABI compatibility.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
<li><ahref="http://www.cs.unc.edu/~olano/papers/2dh-tri/">Triangle Scan Conversion using 2D Homogeneous Coordinates</a></li>
<li><ahref="http://www.drdobbs.com/parallel/rasterization-on-larrabee/217200602">Rasterization on Larrabee</a> (<ahref="http://devmaster.net/posts/2887/rasterization-on-larrabee">DevMaster copy</a>)</li>
<li><ahref="http://devmaster.net/posts/6133/rasterization-using-half-space-functions">Rasterization using half-space functions</a></li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=64323">Bug 64323</a> - Severe misrendering in Left 4 Dead 2</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=68838">Bug 68838</a> - GLSL: struct declarations produce a "empty declaration warning" in 9.2</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=58660">Bug 58660</a> - CAYMAN broken with HyperZ on</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=64471">Bug 64471</a> - Radeon HD6570 lockup in Brütal Legend with HyperZ</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=66352">Bug 66352</a> - GPU lockup in L4D2 on TURKS with HyperZ</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=68799">Bug 68799</a> - [APITRACE] Hyper-Z lockup with Falcon BMS 4.32u6 on CAYMAN</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=71547">Bug 71547</a> - compilation failure :#error "SSE4.1 instruction set not enabled"</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=72685">Bug 72685</a> - [radeonsi hyperz] Artifacts in Unigine Sanctuary</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=73088">Bug 73088</a> - [HyperZ] Juniper (6770): Gone Home / Unigine Heaven 4.0 lock up system after several minutes of use</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=74428">Bug 74428</a> - hyperz causes gpu hang in Counter-strike: Source</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=74803">Bug 74803</a> - [r600g] HyperZ broken on RV630 (Cogs shadows are broken)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=74863">Bug 74863</a> - [r600g] HyperZ broken on RV770 and CYPRESS (Left 4 Dead 2 trees corruption) bisected!</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=74892">Bug 74892</a> - HyperZ GPU lockup with radeonsi 7970M PITCAIRN and Distance Alpha game</li>
Note: some of the new features are only available with certain drivers.
</p>
<ul>
<li>GL_AMD_seamless_cubemap_per_texture on i965.</li>
<li>GL_ARB_conservative_depth on i965.</li>
<li>GL_ARB_texture_gather on i965.</li>
<li>GL_ARB_texture_query_levels on i965.</li>
<li>GL_ARB_texture_mirror_clamp_to_edge.</li>
<li>GL_ARB_transform_feedback2, GL_ARB_transform_feedback3, and GL_ARB_transform_feedback_instanced on i965/Gen7 (with appropriate kernel support).</li>
<li>GL_ARB_sample_shading on i965.</li>
<li>GL_ARB_shader_atomic_counters on i965.</li>
<li>GL_ARB_vertex_attrib_binding</li>
<li>GL_ARB_vertex_type_10f_11f_11f_rev on i965 and r600g</li>
<li>GL_KHR_debug</li>
<li>GLX_MESA_query_renderer</li>
</ul>
<h2>Bug fixes</h2>
<p>Attempts have been made to <b>not</b> include bugs fixed in previous 9.2
releases or bugs that were regressions during 10.0 development. This list is
likely incomplete.</p>
<ul>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=47755">Bug 47755</a> - [glsl-compiler] no error checking when Interpolation qualifier for built-in variable is different in vertex and fragment shader</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=52171">Bug 52171</a> - [gallium/r600/clover] Simple benchmarks failed to run</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=53077">Bug 53077</a> - [IVB] Output error with msaa when both of framebuffer and source color's alpha are not 1</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=54867">Bug 54867</a> - bug in r300 compiler</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=60929">Bug 60929</a> - [r600-llvm] mono games with opengl are blocking on start</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=62142">Bug 62142</a> - Mesa/demo mipmap_limits upside down with running by SOFTWARE</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=66213">Bug 66213</a> - Certain Mesa Demos Rendering Inverted (vertically)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=66806">Bug 66806</a> - [softpipe] glxgears floating point exception</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=67921">Bug 67921</a> - [bisected commit 883987] crosscompiling fails with util/u_cpu_detect.c:247:4: error: 'asm' undeclared (first use in this function)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=68162">Bug 68162</a> - [radeonsi] texture rendering is broken in Source-Engine games</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=68451">Bug 68451</a> - Texture flicker in native Dota2 in mesa 9.2.0rc1</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=68503">Bug 68503</a> - Graphical glitches in Serious Sam 3 when SB is enabled</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=68792">Bug 68792</a> - Problems during playback of h264 files using UVD and VLC on AMD E-350 CPU</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=70042">Bug 70042</a> - Major texture flickering in Dota 2 (r600g on HD 6950)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=70088">Bug 70088</a> - Glamor on r600g crashes Xserver</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=70123">Bug 70123</a> - Freeze caused by 'winsys/radeon: remove cs_queue_empty' commit</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=70327">Bug 70327</a> - Casting floating point variable to integer not working properly while constant gets converted properly</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=70891">Bug 70891</a> - CL_INVALID_BUILD_OPTIONS results in CL_INVALID_DEVICE when asking for build log</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=71022">Bug 71022</a> - configure: error: Expat required for DRI.</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=71110">Bug 71110</a> - xorg_driver.c:1030:2: error: too many arguments to function ‘DamageUnregister’</li>
@@ -25,7 +25,7 @@ glGetString(GL_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 2.1.
</p>
<p>
See the <ahref="install.html">Compiling/Installing page</a> for prerequisites
See the <ahref="../install.html">Compiling/Installing page</a> for prerequisites
for DRI hardware acceleration.
</p>
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.