Rename _mesa_pack_rgba_span_int to _mesa_pack_rgba_span_from_uints.
Add _mesa_pack_rgba_span_from_ints.
These separate routines allow the integer clamping to be handled
properly for signed versus unsigned integers.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
If the pack type is not supported, use _mesa_problem
rather than asserting.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
_mesa_is_integer_format is moved to formats.c and renamed
as _mesa_is_enum_format_integer.
_mesa_is_format_unsigned, _mesa_is_type_integer,
_mesa_is_type_unsigned, and _mesa_is_enum_format_or_type_integer
are added.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
llvm-3.2svn r160587 moved createBoundsCheckingPass from
lib/Transforms/Scalar to lib/Transforms/Instrumentation.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Except for a couple of explicit uses, _mesa_inv_sqrtf was disabled since
its addition in 2003 (see f9b1e524).
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Temporarily disabled since 2003 (see 386578c5b).
This saves us from calling sqrt() 128 times to generate the sqrttab in
one_time_init().
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Found by compiler warning:
i830_texstate.c:131:28: warning: argument to 'sizeof' in 'memset' call
is the same expression as the destination; did you mean to
dereference it? [-Wsizeof-pointer-memaccess]
memset(state, 0, sizeof(state));
~~~~~ ^~~~~
On 64-bit systems, memset here would write an extra 4 bytes.
Note: This is a candidate for the stable branches.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
This can potentially cut shader program size by a factor of 4 for 4-wide
execution respectively 2 for 8-wide execution and while this ratios aren't
quite reached for more complex shaders it can be close.
Could not really measure a performance difference so far except for trivial
shaders (glxgears).
There seems to be a fair amount of unnecessary move's generated especially
at the beginning it might be possible to optimize those away somehow.
Things aren't quite as clean, some additional stuff needs to be done for
keeping both paths working (though llvm might be able to optimize this away).
glxgears seems to lose about 5-10% of performance, looking at the generated
shaders this is actually less than I'd think it would be - both 4 and 8-wide
shaders, despite containing a loop actually have about 10% more instructions
in total, and will have roughly 50% more executed instructions (though mostly
cheap ones). Need to figure out how to reduce overhead...
v2: keep complex interpolation for 4-wide mode, adapt to interface changes.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
This thread count is only supposed to be enabled when "WIZ Hashing Disable in
GT_MODE register enabled." I've always been confused whether that means the
bit in the register should be 1 or 0. For my IVB GT2's register 0x7008 value
of 0x0, this appears to work fine.
Improves l4d2 performance at 640x480 by 0.88 +/- 0.11% (n=88). Improves
performance with rasterization at 1280x1024 by 1.45% +/- 0.36% (n=6).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now that we finally have a list of uniform blocks in the linked shader
program, we can tell what their indices are.
Fixes piglit GL_ARB_uniform_buffer_object/getuniformblockindex.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
At this point in the linking, we've totally lost track of the struct
gl_uniform_buffer that this pointed to in the original unlinked
shader, so we do a nasty n^2 walk to find it the new one based on the
variable name.
Note that these point into the shader's list of gl_uniform_buffers,
not the linked program's.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We'll need to propagate the UBO fields to the uniform storage records
before we can handle the other pnames.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This is a single entrypoint that maps from a series of names to the
indices of those names within the active uniforms list. Each index is
like glGetUniformLocation()'s return value, except that it doesn't
encode an array offset.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
With the upcoming GL_ARB_uniform_buffer_object changes, the only
other caller that will want the cooked value is state_tracker.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We're going to need this structure to cross-validate the uniform
blocks between shader stages, since unused ir_variables might get
dropped. It's also the place we store the RowMajor qualifier, which
is not part of the GLSL type (since that would cause a bunch of type
equality checks to fail).
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Someone tried to be clever and "optimized" add_vertex_data2() to just use
two points for the texture coordinates and then reuse individual
components. Sadly this is not how matrix multiplication works.
Fixes rendercheck -t tmcoords
Signed-off-by: Lucas Stach <dev@lynxeye.de>
Previously, on Gen7, when texturing from a depth or stencil surface,
the blorp engine would configure the 3D pipeline as though the input
surface was non-multisampled, and perform the necessary coordinate
transformations in the fragment shader to account for the IMS layout.
This meant outputting a lot of extra fragment shader code, and it
raised some uncertainty about how to deal with very large surfaces.
This patch modifies blorp to configure the 3D pipeline properly for
IMS layout when reading from depth and stencil surfaces.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Previously, on Gen7, compute_msaa_layout_for_pipeline() would verify
that IMS layout is not used. However, now that we configure
SURFACE_STATE correctly for IMS surfaces, IMS layout is available.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This patch modifies gen7_set_surface_num_multisamples() to set up the
SURFACE_STATE appropriately for texturing from IMS format MSAA
surfaces (which are only used on Gen7 for depth and stencil buffers).
Since the function now sets more than just the number of multisamples,
it's been renamed to gen7_set_surface_msaa().
This will make it possible to remove some kludginess from the blorp
engine.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
When downsampling a compressed multisampled surface, we can take a
shortcut to downsample any pixels that were completely covered by a
single primitive. In this case, the first color value we fetch is the
correct final color for the downsampled pixel, so we can skip the rest
of the blending operation.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
When downsampling an integer-format buffer on Gen7, we need to use the
"avg" instruction rather than the "add" instruction, to ensure that we
don't overflow the range of 32-bit integers. Also, we need to use the
proper register type (BRW_REGISTER_TYPE_D or BRW_REGISTER_TYPE_UD) for
intermediate color data and for writing to the render target.
Note: this patch causes blorp to use the proper register type for all
operations (downsampling, upsampling, and ordinary blits). Strictly
speaking, this is only necessary for downsampling, because the other
operations exclusively use MOV instructions on the color data. But
it's simpler to use the proper register type in all cases.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
When downsampling from an MSAA image to a single-sampled image, it is
inevitable that some loss of numerical precision will occur, since we
have to use 32-bit floating point registers to hold the intermediate
results while blending. However, it seems reasonable to expect that
when all samples corresponding to a given pixel have the exact same
color value, there will be no loss of precision.
Previously, we averaged samples as follows:
blend = (((sample[0] + sample[1]) + sample[2]) + sample[3]) / 4
This had the potential to lose numerical precision when all samples
have the same color value, since ((sample[0] + sample[1]) + sample[2])
may not be precisely representable as a 32-bit float, even if the
individual samples are.
This patch changes the formula to:
blend = ((sample[0] + sample[1]) + (sample[2] + sample[3])) / 4
This avoids any loss of precision in the event that all samples are
the same, by ensuring that each addition operation adds two equal
values.
As a side benefit, this puts the formula in the form we will need in
order to implement correct blending of integer formats.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
From the Ivy Bridge PRM, Vol4 Part3 p152:
"The avg instruction performs component-wise integer average of
src0 and src1 and stores the results in dst. An integer average
uses integer upward rounding. It is equivalent to increment one to
the addition of src0 and src1 and then apply an arithmetic right
shift to this intermediate value."
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
The kill_emitted variable was duplicating the functionality of
gl_fragment_program::UsesKill. There's no need for both.
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, the code for setting this flag for GLSL programs was
duplicated in three places: brw_link_shader(), glsl_to_tgsi_visitor,
and ir_to_mesa_visitor. In addition to the unnecessary duplication,
there was a performance problem on i965: brw_link_shader() set the
flag before doing its final round of optimizations, which meant that
if the optimizations managed to eliminate all the discard operations,
the flag would still be set, resulting (at least in theory) in slower
performance.
This patch consolidates all of the code that sets UsesKill for GLSL
programs into do_set_program_inouts(), which already is doing a
similar job for UsesDFdy, and which occurs after i965's final round of
optimizations.
Non-GLSL programs (ARB programs and the state tracker's glBitmap
program) are unaffected.
Reviewed-by: Eric Anholt <eric@anholt.net>
Move it to native_wayland_drm_bufmgr_helper.c which only gets compiled when
wayland is enabled and which already includes the right headers.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
The cube sampler generates two-dimensional texture coordinates and
hence passes NULL for the array for the third one. The actual 2D
sampler, lower in the pipe, knew not to used that array since it
didn't need it. But the samplers have become single-texel and the
coordinate array dereference has been moved up one step, to a level
where the code does not know only two coordinates are used. Hence the
segfault.
The simplest fix by far is to add a third dummy coordinate array in
the call to the next pipe step, which will be dereferenced to an
harmless 0 which then will be happily ignored by the sampler.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=52250
Signed-off-by: Olivier Galibert <galibert@pobox.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
We also reuse EGL_TEXTURE_RGBA and EGL_TEXTURE_RGB, adding only the new
planar YUV texture formats: EGL_TEXTURE_Y_U_V_WL, EGL_TEXTURE_Y_UV_WL and
EGL_TEXTURE_Y_XUXV_WL.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
The i965 back-end needs to compile dFdy() differently for FBOs and
window system framebuffers, because Y coordinates are flipped between
the two (see commit 82d2596: i965: Compute dFdy() correctly for FBOs).
This patch avoids unnecessarily recompiling shaders that don't use
dFdy(), by only setting render_to_fbo in the wm program key if the
shader actually uses dFdy().
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This patch updates the ir_set_program_inouts_visitor so that it also
sets gl_fragment_program::UsesDFdy.
This is a bit of a hack (since dFdy() isn't an input or an output),
but there's no other obvious visitor to squeeze this functionality
into, and it would be silly to create a brand new visitor just for
this purpose.
v2: use local 'fprog' var to avoid repeated casting.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The i965 back-end needs to compile dFdy() differently for FBOs and
window system framebuffers, because Y coordinates are flipped between
the two (see commit 82d2596: i965: Compute dFdy() correctly for FBOs).
This boolean will allow it to avoid unnecessarily recompiling shaders
that don't use dFdy().
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Unigine Heaven (at least) has a bug where it incorrectly uses the
GL_ARB_blend_func_extended extension.
Dual source blending allows two color outputs per render target;
individual shader outputs can be assigned to be either the first or
second blending input by setting the 'index' via one of two methods:
- An API call: glBindFragDataLocationIndexed()
- The GLSL 'layout' qualifier provided by GL_ARB_explicit_attrib_location
Both of these only work on user defined fragment shader outputs; it's an
error to use either on built-in outputs like gl_FragData.
Unigine uses gl_FragData and gl_FragColor exclusively, and doesn't even
attempt to use either method to set index == 1. However, it does set
the blending function to SRC1 enums, which requires a fragment shader
output with index == 1 or else rendering is undefined.
In other words, enabling ARB_blend_func_extended causes Unigine to
render incorrectly, resulting in an apparent regression, even though our
driver code (as far as I can tell) is perfectly fine.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=50291
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, if we were spilling the result of a texture call, we would store
all 4 regs, then for each use of one of those regs as the source of an
instruction, we would unspill all 4 regs even though only one was needed.
In both lightsmark and l4d2 with my current graphics config, the shaders that
produce spilling do so on split GRFs, so this doesn't help them out. However,
in a capture of the l4d2 shaders with a different snapshot and playing the
game instead of using a demo, it reduced one shader from 2817 instructions to
2179, due to choosing a now-cheaper texture result to spill instead of piles
of texcoords.
v2: Fix comment noted by Ken, and fix the if condition associated with it for
the current state of what constitutes a partial write of the destination.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
There's one instance of a potential behavior change: propagate_constants may
now propagate into a part of a vgrf after a different part of it was
overwritten by a send that returns multiple registers. I don't think we ever
generate IR that meets that condition, but it's something to note if we bisect
behavior change to this.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In these places, we care about any sort of send that hits more than one reg,
not just textures. We don't yet have anything else returning more than one
reg, so there's no change.
v2: Use mlen instead of is_tex() for the is-it-a-send check.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
"count" is a more useful name, since most of the time we're using it for
looping over the variables.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
OpenGL specification 3.3 (page 196), section 4.1.3 says:
If drawbuffer zero is not NONE and the buffer it references has an
integer format, the SAMPLE_ALPHA_TO_COVERAGE and SAMPLE_ALPHA_TO_ONE
operations are skipped."
This should work properly even if there are other draw buffers that
are not in integer format.
This patch makes following piglit tests pass on mesa:
int-draw-buffers-alpha-to-coverage
int-draw-buffers-alpha-to-one
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
This patch churns a lot because it needs to change 4-wide filters into
single pixel filters, since each fragment may use a different filter.
The only case not entirely supported is the anisotropic filtering.
Not sure what we want to do there, since a full quad is required by
that filter.
Signed-off-by: Olivier Galibert <galibert@pobox.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
From the GL 3.0 spec, section 4.3.3, in the documentation for
CopyPixels():
"An INVALID_OPERATION error will be generated if the object bound
to READ_FRAMEBUFFER_BINDING is framebuffer complete and the value
of SAMPLE_BUFFERS is greater than zero."
The same applies to CopyTexImage...() and CopyTexSubImage...()
functions, since they are defined in terms of CopyPixels().
Previously we were generating an INVALID_FRAMEBUFFER_OPERATION error
in these cases.
Fixes piglit tests
"EXT_framebuffer_multisample/negative-{copypixels,copyteximage}".
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Issues fixed:
- set_vs_sampler_views for evergreen is now properly implemented.
- Added the missing inval_texture_cache call for evergreen.
- have_depth_texture was sometimes incorrectly set to false on evergreen even
if there were depth textures in other shader stages. To fix this, set it
to true once and never set it to false again. It's stupid, but it matches
the r600 code. The proper fix is left to another patch.
- Optimizaton: The sampler views which aren't changed aren't updated.
This is a leftover from:
commit fe1fd67556
Author: Marek Olšák <maraeo@gmail.com>
Date: Sun Jul 8 03:10:37 2012 +0200
r600g: don't flush depth textures set as colorbuffers
If only some buffers are changed, the other ones don't have to re-emitted.
This uses bitmasks of enabled and dirty buffers just like
emit_constant_buffers does.
* Also add mcjit in the non-OpenCL case.
* Replace hardcoded llvm-config with $LLVM_CONFIG everywhere.
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Tom Stellad <thomas.stellard@amd.com>
Helps spotting and removing the obsolete generated files, which otherwise break
the build.
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This is neccessary for linking the llvmpipe tests. It appears this
dependency was introduced by the "wider native register" changes.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
It's been broken (using NULL getBuffersWithFormat() instead of
getBuffers()) due to a copy and paste error for a year now.
GetBuffersWithFormat has been around since 2009, so I don't feel any
guilt in not supporting it.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This means that GLX buffer sharing of these no longer works. On the
other hand, just *look* at this code reduction.
v2:
- [chad] Fix intelCreateBuffer for gen < 6. When the branch for
!screen->hw_has_separate_stencil was taken,
intel_create_private_renderbuffer was incorrectly not used.
- [chad] Remove all code in intel_process_dri2_buffer for processing
depth, stencil, and hiz buffers. That code is now dead.
CC: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
commit '7250cd506baa0bd4649b30d87509cdd0cbc06a57'
changes struct gbm_bo, renaming it's 'pitch' to 'stride'.
This applies to Gallium.
Signed-off-by: Elvis Lee <kwangwoong.lee@lge.com>
Previously, if you ran make followed by make check it would work, but
if you just ran make check the test program would fail to compile.
Reviewed-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Squashed commit of the following:
commit 7acb7b4f60dc505af3dd00dcff744f80315d5b0e
Author: José Fonseca <jfonseca@vmware.com>
Date: Mon Jul 9 17:46:31 2012 +0100
draw: Don't use dynamically sized arrays.
Not supported by MSVC.
commit 5810c28c83647612cb372d1e763fd9d7780df3cb
Author: José Fonseca <jfonseca@vmware.com>
Date: Mon Jul 9 17:44:16 2012 +0100
gallivm,llvmpipe: Don't use expressions with PIPE_ALIGN_VAR().
MSVC doesn't accept exceptions in _declspec(align(...)). Use a
define instead.
commit 8aafd1457ba572a02b289b3f3411e99a3c056072
Author: José Fonseca <jfonseca@vmware.com>
Date: Mon Jul 9 17:41:56 2012 +0100
gallium/util: Make u_cpu_detect.h header C++ safe.
commit 5795248350771f899cfbfc1a3a58f1835eb2671d
Author: José Fonseca <jfonseca@vmware.com>
Date: Mon Jul 2 12:08:01 2012 +0100
gallium/util: Add ULL suffix to large constants.
As suggested by Andy Furniss: it looks like some old gcc versions
require it.
commit 4c66c22727eff92226544c7d43c4eb94de359e10
Author: José Fonseca <jfonseca@vmware.com>
Date: Fri Jun 29 13:39:07 2012 +0100
gallium/util: Truly disable INF/NAN tests on MSVC.
Thanks to Brian for spotting this.
commit 8bce274c7fad578d7eb656d9a1413f5c0844c94e
Author: José Fonseca <jfonseca@vmware.com>
Date: Fri Jun 29 13:39:07 2012 +0100
gallium/util: Disable INF/NAN tests on MSVC.
Somehow they are not recognized as constants.
commit 6868649cff8d7fd2e2579c28d0b74ef6dd4f9716
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu Jul 5 15:05:24 2012 +0200
gallivm: Cleanup the 2 x 8 float -> 16 ub special path in lp_build_conv.
No behaviour change intended, like 7b98455fb40c2df84cfd3cdb1eb7650f67c8a751.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
commit 5147a0949c4407e8bce9e41d9859314b4a9ccf77
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu Jul 5 14:28:19 2012 +0200
gallivm: (trivial) fix issues with multiple-of-4 texture fetch
Some formats can't handle non-multiple of 4 fetches I believe, but
everything must support length 1 and multiples of 4.
So avoid going to scalar fetch (which is very costly) just because length
isn't 4.
Also extend the hack to not use shift with variable count for yuv formats to
arbitrary length (larger than 1) - doesn't matter how many elements we
have we always want to avoid it unless we have variable shift count
instruction (which we should get with avx2).
Reviewed-by: José Fonseca <jfonseca@vmware.com>
commit 87ebcb1bd71fa4c739451ec8ca89a7f29b168c08
Author: Roland Scheidegger <sroland@vmware.com>
Date: Wed Jul 4 02:09:55 2012 +0200
gallivm: (trivial) fix typo for wrap repeat mode in linear filtering aos code
This would lead to bogus coordinates at the edges.
(undetected by piglit because this path is only taken for block-based
formats).
Signed-off-by: José Fonseca <jfonseca@vmware.com>
commit 3a42717101b1619874c8932a580c0b9e6896b557
Author: José Fonseca <jfonseca@vmware.com>
Date: Tue Jul 3 19:42:49 2012 +0100
gallivm: Fix TGSI integer translation with AVX.
commit d71ff104085c196b16426081098fb0bde128ce4f
Author: José Fonseca <jfonseca@vmware.com>
Date: Fri Jun 29 15:17:41 2012 +0100
llvmpipe: Fix LLVM JIT linear path.
It was not working properly because it was looking at the JIT function
before it was actually compiled.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
commit a94df0386213e1f5f9a6ed470c535f9688ec0a1b
Author: José Fonseca <jfonseca@vmware.com>
Date: Thu Jun 28 18:07:10 2012 +0100
gallivm: Refactor lp_build_broadcast(_scalar) to share code.
Doesn't really change the generated assembly, but produces more compact IR,
and of course, makes code more consistent.
Reviewed-by: Brian Paul <brianp@vmware.com>
commit 66712ba2731fc029fa246d4fc477d61ab785edb5
Author: José Fonseca <jfonseca@vmware.com>
Date: Wed Jun 27 17:30:13 2012 +0100
gallivm: Make LLVMContextRef a singleton.
There are any places inside LLVM that depend on it. Too many to attempt
to fix.
Reviewed-by: Brian Paul <brianp@vmware.com>
commit ff5fb7897495ac263f0b069370fab701b70dccef
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu Jun 28 18:15:27 2012 +0200
gallivm: don't use 8-wide texture fetch in aos path
This appears to be a slight loss usually.
There are probably several reasons for that:
- fetching itself is scalar
- filtering is pure int code hence needs splitting anyway, same
for the final texel offset calculations
- texture wrap related code, which can be done 8-wide, is slightly more
complex with floats (with clamp_to_edge) and float operations generally
more costly hence probably not much faster overall
- the code needed to split when encountering different mip levels for the
quads, adding complexity
So, just split always for aos path (but leave it 8-wide for soa, since we
do 8-wide filtering there when possible).
This should certainly be revisited if we'd have avx2 support.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
commit ce8032b43dcd8e8d816cbab6428f54b0798f945d
Author: Roland Scheidegger <sroland@vmware.com>
Date: Wed Jun 27 18:41:19 2012 +0200
gallivm: (trivial) don't extract fparts variable if not needed
Did not have any consequences but unnecessary.
commit aaa9aaed8f80dc282492f62aa583a7ee23a4c6d5
Author: Roland Scheidegger <sroland@vmware.com>
Date: Wed Jun 27 18:09:06 2012 +0200
gallivm: fix precision issue in aos linear int wrap code
now not just passes at a quick glance but also with piglit...
If we do the wrapping with floats, we also need to set the
weights accordingly. We can potentially end up with different
(integer) coordinates than what the integer calculations would
have chosen, which means the integer weights calculated previously
in this case are completely wrong. Well at least that's what I think
happens, at least recalculating the weights helps.
(Some day really should refactor all the wrapping, so we do whatever is
fastest independent of 16bit int aos or 32bit float soa filtering.)
Reviewed-by: José Fonseca <jfonseca@vmware.com>
commit fd6f18588ced7ac8e081892f3bab2916623ad7a2
Author: José Fonseca <jfonseca@vmware.com>
Date: Wed Jun 27 11:15:53 2012 +0100
gallium/util: Fix parsing of options with underscore.
For example
GALLIVM_DEBUG=no_brilinear
which was being parsed as two options, "no" and "brilinear".
commit 09a8f809088178a03e49e409fa18f1ac89561837
Author: James Benton <jbenton@vmware.com>
Date: Tue Jun 26 15:00:14 2012 +0100
gallivm: Added a generic lp_build_print_value which prints a LLVMValueRef.
Updated lp_build_printf to share common code.
Removed specific lp_build_print_vecX.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
commit e59bdcc2c075931bfba2a84967a5ecd1dedd6eb0
Author: José Fonseca <jfonseca@vmware.com>
Date: Wed May 16 15:00:23 2012 +0100
draw,llvmpipe: Avoid named struct types on LLVM 3.0 and later.
Starting with LLVM 3.0, named structures are meant not for debugging, but
for recursive data types, previously also known as opaque types.
The recursive nature of these types leads to several memory management
difficulties. Given that we don't actually need recursive types, avoid
them altogether.
This is an attempt to address fdo bugs 41791 and 44466. The issue is
somewhat random so there's no easy way to check how effective this is.
Cherry-picked from 9af1ba565d
commit df6070f618a203c7a876d984c847cde4cbc26bdb
Author: Roland Scheidegger <sroland@vmware.com>
Date: Wed Jun 27 14:42:53 2012 +0200
gallivm: (trivial) fix typo in faster aos linear int wrap code
no longer crashes, now REALLY tested.
commit d8f98dce452c867214e6782e86dc08562643c862
Author: Roland Scheidegger <sroland@vmware.com>
Date: Tue Jun 26 18:20:58 2012 +0200
llvmpipe: (trivial) remove bogus optimization for float aos repeat wrap
This optimization for nearest filtering on the linear path generated
likely bogus results, and the int path didn't have any optimizations
there since the only shader using force_nearest apparently uses
clamp_to_edge not repeat wrap anyway.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
commit c4e271a0631087c795e756a5bb6b046043b5099d
Author: Roland Scheidegger <sroland@vmware.com>
Date: Tue Jun 26 23:01:52 2012 +0200
gallivm: faster repeat wrap for linear aos path too
Even if we already have scaled integer coords, it's way faster to use
the original float coord (plus some conversions) rather than use URem.
The choice of what to do for texture wrapping is not really tied to int
aos or float soa filtering though for some modes there can be some gains
(because of easier weight calculations).
Reviewed-by: José Fonseca <jfonseca@vmware.com>
commit 1174a75b1806e92aee4264ffe0ffe7e70abbbfa3
Author: Roland Scheidegger <sroland@vmware.com>
Date: Tue Jun 26 14:39:22 2012 +0200
gallivm: improve npot tex wrap repeat in linear soa path
URem gets translated into series of scalar divisions so
just about anything else is faster.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
commit f849ffaa499ed96fa0efd3594fce255c7f22891b
Author: Roland Scheidegger <sroland@vmware.com>
Date: Tue Jun 26 00:40:35 2012 +0100
gallivm: (trivial) fix near-invisible shift-space typo
I blame the keyboard.
commit 5298a0b19fe672aebeb70964c0797d5921b51cf0
Author: Roland Scheidegger <sroland@vmware.com>
Date: Mon Jun 25 16:24:28 2012 +0200
gallivm: add new intrinsic helper to deal with arbitrary vector length
This helper will split vectors which are too large for the hw, or expand
them if they are too small, so a caller of a function using intrinsics which
uses such sizes need not split (or expand) the vectors manually and the
function will still use the intrinsic instead of dropping back to generic
llvm code. It can also accept scalars for use with pseudo-vector intrinsics
(only useful for float arguments, all x86 scalar simd float intrinsics use
4vf32).
Only used for lp_build_min/max() for now (also added the scalar float case
for these while there). (Other basic binary functions could use it easily,
whereas functions with a different interface would need different helpers.)
Expanding vectors isn't widely used, because we always try to use
build contexts with native hw vector sizes. But it might (or not) be nicer
if this wouldn't need to be done, the generated code should in theory stay
the same (it does get hit by lp_build_rho though already since we
didn't have a intrinsic for the scalar lp_build_max case before).
v2: incorporated Brian's feedback, and also made the scalar min/max case work
instead of crash (all scalar simd float intrinsics take 4vf32 as argument,
probably the reason why it wasn't used before).
Moved to lp_bld_intr based on José's request, and passing intrinsic size
instead of length.
Ideally we'd derive the source type info from the passed in llvm value refs
and process some llvmtype return type so we could handle intrinsics where
the source and destination type isn't the same (like float/int conversions,
packing instructions) but that's a bit too complicated for now.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
commit 01aa760b99ec0b2dc8ce57a43650e83f8c1becdf
Author: Roland Scheidegger <sroland@vmware.com>
Date: Mon Jun 25 16:19:18 2012 +0200
gallivm: (trivial) increase max code size for shader disassembly
64kB was just short of what I needed (which caused a crash) hence
increase to 96kB (should probably be smarter about that).
commit 74aa739138d981311ce13076388382b5e89c6562
Author: Roland Scheidegger <sroland@vmware.com>
Date: Mon Jun 25 11:53:29 2012 +0100
gallivm: simplify aos float tex wrap repeat nearest
just handle pot and npot the same. The previous pot handling
ended up with exactly the same instructions plus 2 more (leave it
in the soa path though since it is probably still cheaper there).
While here also fix a issue which would cause a crash after an assert.
commit 0e1e755645e9e49cfaa2025191e3245ccd723564
Author: Roland Scheidegger <sroland@vmware.com>
Date: Mon Jun 25 11:29:24 2012 +0100
gallivm: (trivial) skip floor rounding in ifloor when not signed
This was only done for the non-sse41 case before, but even with
sse41 this is obviously unnecessary (some callers already call
itrunc in this case anyway but some might not).
commit 7f01a62f27dcb1d52597b24825931e88bae76f33
Author: Roland Scheidegger <sroland@vmware.com>
Date: Mon Jun 25 11:23:12 2012 +0100
gallivm: (trivial) fix bogus comments
commit 5c85be25fd82e28490274c468ce7f3e6e8c1d416
Author: José Fonseca <jfonseca@vmware.com>
Date: Wed Jun 20 11:51:57 2012 +0100
translate: Free elt8_func/elt16_func too.
These were leaking.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
commit 0ad498f36fb6f7458c7cffa73b6598adceee0a6c
Author: Roland Scheidegger <sroland@vmware.com>
Date: Tue Jun 19 15:55:34 2012 +0200
gallivm: fix bug for tex wrap repeat with linear sampling in aos float path
The comparison needs to be against length not length_minus_one, otherwise
the max texel is never chosen (for the second coordinate).
Fixes piglit texwrap-1D-npot-proj (and 2D/3D versions).
Reviewed-by: José Fonseca <jfonseca@vmware.com>
commit d1ad65937c5b76407dc2499b7b774ab59341209e
Author: Roland Scheidegger <sroland@vmware.com>
Date: Tue Jun 19 16:13:43 2012 +0200
gallivm: simplify soa tex wrap repeat with npot textures and no mip filtering
Similar to what is already done in aos sampling for the float path (but not
the int path since we don't get normalized float coordinates there).
URem is expensive and the calculation is done trivially with
normalized floats instead (at least with sse41-capable cpus).
(Some day should probably do the same for the mip filter path but it's much
more complicated there hence the gain is smaller.)
Reviewed-by: José Fonseca <jfonseca@vmware.com>
commit e1e23f57ba9b910295c306d148f15643acc3fc83
Author: Roland Scheidegger <sroland@vmware.com>
Date: Mon Jun 18 20:38:56 2012 +0200
llvmpipe: (trivial) remove duplicated function declaration
Reviewed-by: José Fonseca <jfonseca@vmware.com>
commit 07ca57eb09e04c48a157733255427ef5de620861
Author: Roland Scheidegger <sroland@vmware.com>
Date: Mon Jun 18 20:37:34 2012 +0200
llvmpipe: destroy setup variants on context destruction
lp_delete_setup_variants() used to be called in garbage collection,
but this no longer exists hence the setup shaders never got freed.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
commit ed0003c633859a45f9963a479f4c15ae0ef1dca3
Author: Roland Scheidegger <sroland@vmware.com>
Date: Mon Jun 18 16:25:29 2012 +0100
gallivm: handle different ilod parts for multiple quad sampling
This fixes filtering when the integer part of the lod is not the same
for all quads. I'm not fully convinced of that solution yet as it just
splits the vector if the levels to be sampled from are different.
But otherwise we'd need to do things like some minify steps, and getting
mip level base address separately anyway hence it wouldn't really look
like much of a win (and making the code even more complex).
This should now give identical results to single quad sampling.
commit 8580ac4cfc43a64df55e84ac71ce1a774d33c0d2
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu Jun 14 18:14:47 2012 +0200
gallivm: de-duplicate sample code common to soa and aos sampling
There doesn't seem to be any reason why this code dealing with cube face
selection, lod and mip level calculation is separate in aos and
soa sampling, and I am sick of having it to change in both places.
commit fb541e5f957408ce305b272100196f1e12e5b1e8
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu Jun 14 18:15:41 2012 +0200
gallivm: do mip filtering with per quad lod_fpart
This gives better results for mip filtering, though the generated code might
not be optimal. For now it also creates some artifacts if the lod_ipart isn't
the same for all quads, since instead of using the same mip weight for all
quads as previously (which just caused non-smooth gradients) this now will
use the right weights but with the wrong mip level in this case (can easily
be seen with things like texfilt, mipmap_tunnel).
v2: use logic helper suggested by José, and fix issue with negative lod_fpart
values
commit f1cc84eef7d826a20fab6cd8ccef9a275ff78967
Author: Roland Scheidegger <sroland@vmware.com>
Date: Wed Jun 13 18:35:25 2012 +0200
gallivm: (trivial) fix bogus assert in lp_build_unpack_broadcast_aos_scalars
commit 7c17dbae8ae290df9ce0f50781a09e8ed640c044
Author: James Benton <jbenton@vmware.com>
Date: Tue Jun 12 12:11:14 2012 +0100
util: Reimplement half <-> float conversions.
Removed u_half.py used to generate the table for previous method.
Previous implementation of float to half conversion was faulty for
denormalised and NaNs and would require extra logic to fix,
thus making the speedup of using tables irrelevant.
commit 7762f59274070e1dd4b546f5cb431c2eb71ae5c3
Author: James Benton <jbenton@vmware.com>
Date: Tue Jun 12 12:12:16 2012 +0100
tests: Updated tests to properly handle NaN for half floats.
commit fa94c135aea5911fd93d5dfb6e6f157fb40dce5e
Author: Roland Scheidegger <sroland@vmware.com>
Date: Mon Jun 11 18:33:10 2012 +0200
gallivm: do mip level calculations per quad
This is the final piece which shouldn't change the rendering output yet.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
commit 23cbeaddfe03c09ca18c45d28955515317ffcf4c
Author: Roland Scheidegger <sroland@vmware.com>
Date: Sat Jun 9 00:54:21 2012 +0200
gallivm: do per-quad cube face selection
Doesn't quite fix the piglit cubemap test (not sure why actually)
but doing per-quad face selection is doing the right thing and
definitely an improvement.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
commit abfb372b3702ac97ac8b5aa80ad1b94a2cc39d33
Author: Roland Scheidegger <sroland@vmware.com>
Date: Mon Jun 11 18:22:59 2012 +0200
gallivm: do all lod calculations per quad
Still no functional change but lod is now converted to scalar after
lod calculations.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
commit 519368632747ae03feb5bca9c655eccbc5b751b4
Author: James Benton <jbenton@vmware.com>
Date: Tue May 22 16:46:10 2012 +0100
gallivm: Added support for half-float to float conversion in lp_build_conv.
Updated various utility functions to support this change.
commit 135b4d683a4c95f7577ba27b9bffa4a6fbd2c2e7
Author: James Benton <jbenton@vmware.com>
Date: Tue May 22 16:02:46 2012 +0100
gallivm: Added function for half-float to float conversion.
Updated lp_build_format_aos_array to support half-float source.
commit 37d648827406a20c5007abeb177698723ed86673
Author: James Benton <jbenton@vmware.com>
Date: Tue May 22 14:55:18 2012 +0100
util: Updated u_format_tests to rigidly test half-float boundary values.
commit 2ad18165d96e578aa9046df7c93cb1c3284d8c6b
Author: James Benton <jbenton@vmware.com>
Date: Tue May 22 14:54:16 2012 +0100
llvmpipe: Updated lp_test_format to properly handle Inf/NaN results.
commit 78740acf25aeba8a7d146493dd5c966e22c27b73
Author: James Benton <jbenton@vmware.com>
Date: Tue May 22 14:53:30 2012 +0100
util: Added functions for checking NaN / Inf for double and half-floats.
commit 35e9f640ae01241f9e0d67fe893bbbf564c05809
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu May 24 21:05:13 2012 +0200
gallivm: Fix calculating rho for 3d textures for the single-quad case
Discovered by accident, this looks like a very old typo bug.
commit fc1220c636326536fd0541913154e62afa7cd1d8
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu May 24 21:04:59 2012 +0200
gallivm: do calcs per-quad in lp_build_rho
Still convert to scalar at the end of the function.
commit 50a887ffc550bf310a6988fa2cea5c24d38c1a41
Author: Roland Scheidegger <sroland@vmware.com>
Date: Mon May 21 23:21:50 2012 +0200
gallivm: (trivial) return scalar in lp_build_extract_range for length 1 vectors
Our type system on top of llvm's one doesn't generally support vectors of
length 1, instead using scalars. So we should return a scalar from this
function instead of having to bitcast the vector with length 1 later elsewhere.
commit 80c71c621f9391f0f9230460198d861643324876
Author: James Benton <jbenton@vmware.com>
Date: Tue May 22 17:49:15 2012 +0100
draw: Fixed bad merge error
commit c47401cfad0c9167de20ff560654f533579f452c
Author: James Benton <jbenton@vmware.com>
Date: Tue May 22 15:29:30 2012 +0100
draw: Updated store_clip to store whole vectors instead of individual elements.
commit 2d9c1ad74b0b0b41861fffcecde39f09cc27f1cf
Author: James Benton <jbenton@vmware.com>
Date: Tue May 22 15:28:32 2012 +0100
gallivm: Added lp_build_fetch_rgba_aos_array.
A version of lp_build_fetch_rgba_aos which is targeted at simple array formats.
Reads the whole vector from memory in one, instead of reading each element
individually.
Tested with mesa tests and demos.
commit ff7805dc2b6ef6d8b11ec4e54aab1633aef29ac8
Author: James Benton <jbenton@vmware.com>
Date: Tue May 22 15:27:40 2012 +0100
gallivm: Added lp_build_pad_vector.
This function pads a vector with undef to a desired length.
commit 701f50acef24a2791dabf4730e5b5687d6eb875d
Author: James Benton <jbenton@vmware.com>
Date: Fri May 18 17:27:19 2012 +0100
util: Added util_format_is_array.
This function checks whether a format description is in a simple array format.
commit 5e0a7fa543dcd009de26f34a7926674190fa6246
Author: James Benton <jbenton@vmware.com>
Date: Fri May 18 19:13:47 2012 +0100
draw: Removed draw_llvm_translate_from and draw/draw_llvm_translate.c.
This is "replaced" by adding an optimised path in lp_build_fetch_rgba_aos
in an upcoming patch.
commit 8c886d6a7dd3fb464ecf031de6f747cb33e5361d
Author: James Benton <jbenton@vmware.com>
Date: Wed May 16 15:02:31 2012 +0100
draw: Modified store_aos to write the vector as one, not individual elements.
commit 37337f3d657e21dfd662c7b26d61cb0f8cfa6f17
Author: James Benton <jbenton@vmware.com>
Date: Wed May 16 14:16:23 2012 +0100
draw: Changed aos_to_soa to use lp_build_transpose_aos.
commit bd2b69ce5d5c94b067944d1dcd5df9f8e84548f1
Author: James Benton <jbenton@vmware.com>
Date: Fri May 18 19:14:27 2012 +0100
draw: Changed soa_to_aos to use lp_build_transpose_aos.
commit 0b98a950d29a116e82ce31dfe7b82cdadb632f2b
Author: James Benton <jbenton@vmware.com>
Date: Fri May 18 18:57:45 2012 +0100
gallivm: Added lp_build_transpose_aos which converts between aos and soa.
commit 69ea84531ad46fd145eb619ed1cedbe97dde7cb5
Author: James Benton <jbenton@vmware.com>
Date: Fri May 18 18:57:01 2012 +0100
gallivm: Added lp_build_interleave2_half aimed at AVX unpack instructions.
commit 7a4cb1349dd35c18144ad5934525cfb9436792f9
Author: José Fonseca <jfonseca@vmware.com>
Date: Tue May 22 11:54:14 2012 +0100
gallivm: Fix build on Windows.
MC-JIT not yet supported there.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
commit afd105fc16bb75d874e418046b80d9cc578818a1
Author: James Benton <jbenton@vmware.com>
Date: Fri May 18 16:17:26 2012 +0100
llvmpipe: Added a error counter to lp_test_conv.
Useful for keeping track of progress when fixing errors!
Signed-off-by: José Fonseca <jfonseca@vmware.com>
commit b644907d08c10a805657841330fc23db3963d59c
Author: James Benton <jbenton@vmware.com>
Date: Fri May 18 16:16:46 2012 +0100
llvmpipe: Changed known failures in lp_test_conv.
To comply with the recent fixes to lp_bld_conv.
Signed-off-by: José Fonseca <jfonseca@vmware.com>
commit d7061507bd94f6468581e218e61261b79c760d4f
Author: James Benton <jbenton@vmware.com>
Date: Fri May 18 16:14:38 2012 +0100
llvmpipe: Added fixed point types tests to lp_test_conv.
Signed-off-by: José Fonseca <jfonseca@vmware.com>
commit 146b3ea39b4726dbe125ac666bd8902ea3d6ca8c
Author: James Benton <jbenton@vmware.com>
Date: Fri May 18 16:26:35 2012 +0100
llvmpipe: Changed lp_test_conv src/dst alignment to be correct.
Now based on the define rather than a fixed number.
Signed-off-by: José Fonseca <jfonseca@vmware.com>
commit f3b57441f834833a4b142a951eb98df0aa874536
Author: James Benton <jbenton@vmware.com>
Date: Fri May 18 16:06:44 2012 +0100
gallivm: Fixed erroneous optimisation in lp_build_min/max.
Previously assumed normalised was 0 to 1, but it can be -1 to 1
if type is signed.
Tested with lp_test_conv and lp_test_format, reduced errors.
Signed-off-by: José Fonseca <jfonseca@vmware.com>
commit a0613382e5a215cd146bb277646a6b394d376ae4
Author: James Benton <jbenton@vmware.com>
Date: Fri May 18 16:04:49 2012 +0100
gallivm: Compensate for lp_const_offset in lp_build_conv.
Fixing a /*FIXME*/ to remove errors in integer conversion in lp_build_conv.
Tested using lp_test_conv and lp_test_format, reduced errors.
Signed-off-by: José Fonseca <jfonseca@vmware.com>
commit a3d2bf15ea345bc8a0664f8f441276fd566566f3
Author: James Benton <jbenton@vmware.com>
Date: Fri May 18 16:01:25 2012 +0100
gallivm: Fixed overflow in lp_build_clamped_float_to_unsigned_norm.
Tested with lp_test_conv and lp_test_format, reduced errors.
Signed-off-by: José Fonseca <jfonseca@vmware.com>
commit e7b1e76fe237613731fa6003b5e1601a2e506207
Author: José Fonseca <jfonseca@vmware.com>
Date: Mon May 21 20:07:51 2012 +0100
gallivm: Fix build with LLVM 2.6
Trivial, and useful.
commit d3c6bbe5c7f5ba1976710831281ab1b6a631082d
Author: José Fonseca <jfonseca@vmware.com>
Date: Tue May 15 17:15:59 2012 +0100
gallivm: Enable MCJIT/AVX with vanilla LLVM 3.1.
Add the necessary C++ glue, so that we don't need any modifications
to the soon to be released LLVM 3.1.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
commit 724a019a14d40fdbed21759a204a2bec8a315636
Author: José Fonseca <jfonseca@vmware.com>
Date: Mon May 14 22:04:06 2012 +0100
gallivm: Use HAVE_LLVM 0x0301 consistently.
commit af6991e2a3868e40ad599b46278551b794839748
Author: José Fonseca <jfonseca@vmware.com>
Date: Mon May 14 21:49:06 2012 +0100
gallivm: Add MCRegisterInfo.h to silence benign warnings about missing implementation.
Trivial.
commit 6f8a1d75458daae2503a86c6b030ecc4bb494e23
Author: Vinson Lee <vlee@freedesktop.org>
Date: Mon Apr 2 22:14:15 2012 -0700
gallivm: Pass in a MCInstrInfo to createMCInstPrinter on llvm-3.1.
llvm-3.1svn r153860 makes MCInstrInfo available to the MCInstPrinter.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
commit 62555b6ed8760545794f83064e27cddcb3ce5284
Author: Vinson Lee <vlee@freedesktop.org>
Date: Tue Mar 27 21:51:17 2012 -0700
gallivm: Fix method overriding in raw_debug_ostream.
Use matching type qualifers to avoid method hiding.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
commit 6a9bd784f4ac68ad0a731dcd39e5a3c39989f2be
Author: Vinson Lee <vlee@freedesktop.org>
Date: Tue Mar 13 22:40:52 2012 -0700
gallivm: Fix createOProfileJITEventListener namespace with llvm-3.1.
llvm-3.1svn r152620 refactored the OProfile profiling code.
createOProfileJITEventListener was moved from the llvm namespace to the
llvm::JITEventListener namespace.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
commit b674955d39adae272a779be85aa1bd665de24e3e
Author: Vinson Lee <vlee@freedesktop.org>
Date: Mon Mar 5 22:00:40 2012 -0800
gallivm: Pass in a MCRegisterInfo to MCInstPrinter on llvm-3.1.
llvm-3.1svn r152043 changes createMCInstPrinter to take an additional
MCRegisterInfo argument.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
commit 11ab69971a8a31c62f6de74905dbf8c02884599f
Author: Vinson Lee <vlee@freedesktop.org>
Date: Wed Feb 29 21:20:53 2012 -0800
Revert "gallivm: Change getExtent and readByte to non-const with llvm-3.1."
This reverts commit d5a6c17254.
llvm-3.1svn r151687 makes MemoryObject accessor members const again.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
commit 339960c82d2a9f5c928ee9035ed31dadb7f45537
Author: Roland Scheidegger <sroland@vmware.com>
Date: Mon May 14 16:19:56 2012 +0200
gallivm: (trivial) fix assertion failure for mipmapped 1d textures
In lp_build_rho, we may end up with a 1-element vector (for mipmapped 1d
textures), but in this case we require the type to be a non-vector type,
so need a cast.
commit 9d73edb727bd6d196030dc3026b7bf0c574b3e19
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu May 10 18:12:07 2012 +0200
gallivm: prepare for per-quad lod calculations for large vectors
to be able to handle multiple quads at once in texture sampling and still
do lod calculations per quad, it is necessary to get the per-quad derivatives
into the lp_build_rho function.
Until now these derivative values were just scalars, which isn't going to work.
So we now use vectors, and since the interface needs to change we also do some
different (slightly more efficient) packing of the values.
For 8-wide vectors the packed derivative values for 3 coords would look like
this, this scales to a arbitrary (multiple of 4) vector size:
ds1dx ds1dy dt1dx dt1dy ds2dx ds2dy dt2dx dt2dy
dr1dx dr1dy _____ _____ dr2dx dr2dy _____ _____
The second vector will be unused for 1d and 2d textures.
To facilitate future changes the derivative values are put into a struct, since
quite some functions just pass these values through.
The generated code seems to be very slightly better for 2d textures (with
4-wide vectors) than before with sse2 (if you have a cpu with physical 128bit
simd units - otherwise it's probably not a win).
v2: suggestions from José, rename variables, add comments, use swizzle helper
commit 0aa21de0d31466dac77b05c97005722e902517b8
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu May 10 18:10:31 2012 +0200
gallivm: add undefined swizzle handling to lp_build_swizzle_aos
This is useful for vectors with "holes", it lets llvm choose the most
efficient shuffle instructions if some elements aren't needed without having to
worry what elements to manually pick otherwise.
commit 00faf3f370e7ce92f5ef51002b0ea42ef856e181
Author: José Fonseca <jfonseca@vmware.com>
Date: Fri May 4 17:25:16 2012 +0100
gallivm: Get the LLVM IR optimization passes before JIT compilation.
MC-JIT engine compiles the module immediately on creation, so the optimization
passes were being run too late.
So now we create a target data layout from a string, that matches the
ABI parameters reported by the compiler.
The backend optimization passes were always been run, so the performance
improvement is modest (3% on multiarb mesa demo).
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
commit 40a43f4e2ce3074b5ce9027179d657ebba68800a
Author: Roland Scheidegger <sroland@vmware.com>
Date: Wed May 2 16:03:54 2012 +0200
gallivm: (trivial) fix wrong define used in lp_build_pack2
should fix stack-smashing crashes.
commit e6371d0f4dffad4eb3b7a9d906c23f1c88a2ab9e
Author: Roland Scheidegger <sroland@vmware.com>
Date: Mon Apr 30 21:25:29 2012 +0200
gallivm: add perf warnings when not using intrinsics with 256bit vectors
Helper functions using integer sse2 intrinsics could split the vectors with AVX
instead of using generic fallback (which should be faster).
We don't actually expect to hit these paths (hence don't fix them up to actually
do the vector splitting) so just emit warnings (for those functions where it's
obvious doing split/intrinsic is faster than using generic path).
Only emit warnings for 256bit vectors since we _really_ don't expect to hit
arbitrary large vectors which would affect a lot more functions.
The warnings do not actually depend on avx since the same logic applies to
plain sse2 too (but of course again there's _really_ no reason we should hit
these functions with 256bit vectors without avx).
commit 8a9ea701ea7295181e846c6383bf66a5f5e47637
Author: Roland Scheidegger <sroland@vmware.com>
Date: Tue May 1 20:37:07 2012 +0200
gallivm: split vectors manually for avx in lp_build_pack2 (v2)
There's 2 reasons for this:
First, there's a llvm bug (fixed in 3.1) which generates tons of byte
inserts/extracts otherwise, and second, more importantly, we want to use
pack intrinsics instead of shuffles.
We do this in lp_build_pack2 and not the calling code (aos sample path)
because potentially other callers might find that useful too, even if
for larger sequences of code using non-native vector sizes it might be
better to manually split vectors.
This should boost texture performance in the aos path considerably.
v2: fix issues with intrinsics types with old llvm
commit 27ac5b48fa1f2ea3efeb5248e2ce32264aba466e
Author: Roland Scheidegger <sroland@vmware.com>
Date: Tue May 1 20:26:22 2012 +0200
llvmpipe: refactor lp_build_pack2 (v2)
prettify, and it's unnecessary to assert when there's no intrinsic due to
unsupported bit width - the shuffle path will work regardless.
In contrast lp_build_packs2, should only rely on lp_build_pack2 doing the
clamping for element sizes for which there is a sse2 intrinsic.
v2: fix bug spotted by Jose regarding the intrinsic type for packusdw
on old llvm versions.
commit ddf279031f0111de4b18eaf783bdc0a1e47813c8
Author: Roland Scheidegger <sroland@vmware.com>
Date: Tue May 1 20:13:59 2012 +0200
gallivm: add src width check in lp_build_packs2()
not doing so would skip clamping even if no sse2 pack instruction is
available, which is incorrect (in theory only, such widths would also always
hit a (unnecessary) assertion in lp_build_pack2().
commit e7f0ad7fe079975eae7712a6e0c54be4fae0114b
Author: Roland Scheidegger <sroland@vmware.com>
Date: Fri Apr 27 15:57:00 2012 +0200
gallivm: (trivial) fix crash-causing typo for npot textures with avx
commit 28a9d7f6f655b6ec508c8a3aa6ffefc1e79793a0
Author: Roland Scheidegger <sroland@vmware.com>
Date: Wed Apr 25 19:38:45 2012 +0200
gallivm: (trivial) remove code mistakenly added twice.
commit d5926537316f8ff67ad0a52e7242f7c5478d919b
Author: Roland Scheidegger <sroland@vmware.com>
Date: Tue Apr 24 21:16:15 2012 +0200
gallivm: add a new avx aos sample path (v2)
Try to avoid mixing float and int address calculations. This does texture wrap
modes with floats, and then the offset calculations still with ints (because
of lack of precision with floats, though we could do some effort to make it work
with not too large (16MB) textures).
This also handles wrap repeat mode with npot-sized textures differently than
either the old soa or aos int path (likely way faster but untested).
Otherwise the actual address wrap code is largely similar to the soa path (not
quite the same as this one also has some int code), it should get used by avx
soa sampling later as well but doesn't handle more complex address modes yet
(this will also have the benefit that we can use aos sampling path for all
texture address modes).
Generated code for that looks reasonable, but still does not split vectors
explicitly for fetch/filter which means still get hit by llvm (fixed upstream)
which generates hundreds of pinsrb/pextrb instead of two shuffles.
It is not obvious though if it's much of a win over just doing address calcs
4-wide but with ints, even if it is definitely much less instructions on avx.
piglit's texwrap seems to look exactly the same but doesn't test
neither the non-normalized nor the npot cases.
v2: fix comments, prettify based on Brian's and Jose's feedback.
commit bffecd22dea66fb416ecff8cffd10dd4bdb73fce
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu Apr 19 01:58:29 2012 +0200
gallivm: refactor aos lp_build_sample_image_nearest/linear
split them up to separate address calculations and fetching/filtering.
Need this for being able to do 8-wide float address calcs and 4-wide
fetch/filter later (for avx). Plus the functions were very big scary monsters
anyway (in particular lp_build_sample_image_linear).
commit a80b325c57529adddcfa367f96f03557725c4773
Author: Roland Scheidegger <sroland@vmware.com>
Date: Mon Apr 16 17:17:18 2012 +0200
gallivm: fix lp_build_resize when truncating width but expanding vector size
Missed this case which I thought was impossible - the assertion for it was
right after the division by zero...
(AoS) texture sampling may ask us to do this, for things like 8 4x32int
vectors to 1 32x8int vector conversion (eventually, we probably don't want
this to happen).
commit f9c8337caa3eb185830d18bce8b95676a065b1d7
Author: Roland Scheidegger <sroland@vmware.com>
Date: Sat Apr 14 18:00:59 2012 +0200
gallivm: fix cube maps with larger vectors
This makes the branchless cube face selection code work with larger vectors.
Because the complexity is quite high (cannot really be improved it seems,
per-face selection would reduce complexity a lot but this leads to errors
unless the derivatives are calculated all from the same face which almost
doubles the work to be done) it is still slower than the branching version,
hence only enable this with large vectors.
It doesn't actually do per-quad face selection yet (only makes sense with
matching lod selection, in fact it will select the same face for all pixels
based on the average of the first four pixels for now) but only different
shuffles are required to make it work (the branching version actually should
work with larger vectors too now thanks to the improved horizontal add but of
course it cannot be extended to really select the face per-quad unless doing
branching per quad).
commit 7780c58869fc9a00af4f23209902db7e058e8a66
Author: Roland Scheidegger <sroland@vmware.com>
Date: Fri Mar 30 21:11:12 2012 +0100
llvmpipe: (trivial) fix compiler warning
and also clarify comment regarding availability of popcnt instruction.
commit a266dccf477df6d29a611154e988e8895892277e
Author: Roland Scheidegger <sroland@vmware.com>
Date: Fri Mar 30 14:21:07 2012 +0100
gallivm: remove unneeded members in lp_build_sample_context
Minor cleanup, the texture width, height, depth aren't accessed in their
scalar form anywhere. Makes it more obvious those values should probably be
fetched already vectorized (but this requires more invasive changes)...
commit b678c57fb474e14f05e25658c829fc04d2792fff
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu Mar 29 15:53:55 2012 +0100
gallivm: add a helper for concatenating vectors
Similar to the extract_range helper intended to get around slow code generated
by llvm for 128bit insertelements.
Concatenating two 128bit vectors this way will result in a single vinsertf128
operation rather than two 64bit stores plus one 128bit load, though it might be
mildly useful for other purposes as well.
commit 415ff228bcd0cf5e44a4c15350a661f0f5520029
Author: Roland Scheidegger <sroland@vmware.com>
Date: Wed Mar 28 19:41:15 2012 +0100
gallivm: add a custom 2x8f->1x16ub avx conversion path
Similar to the existing 4x4f->1x16ub sse2 path, shaves off a couple
instructions (min/max mostly) because it relies on pack intrinsics clamping.
commit 78c08fc89f8fbcc6dba09779981b1e873e2a0299
Author: Roland Scheidegger <sroland@vmware.com>
Date: Wed Mar 28 18:44:07 2012 +0100
gallivm: add avx arithmetic intrinsics
Add all avx intrinsics for arithmetic functions (with the exception
of the horizontal add function which needs another look).
Seems to pass basic tests.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
commit a586caa2800aa5ce54c173f7c0d4fc48153dbc4e
Author: Roland Scheidegger <sroland@vmware.com>
Date: Wed Mar 28 15:31:35 2012 +0100
gallivm: add avx logic intrinsics
Add the blend intrinsics for 8-wide float and 4-wide double vectors.
Since we lack 256bit int instructions these are used for int vectors as well,
though obviously not for byte or word element values.
The comparison intrinsics aren't extended for avx since these are only used
for pre-2.7 llvm versions.
commit 70275e4c13c89315fc2560a4c488c0e6935d5caf
Author: Roland Scheidegger <sroland@vmware.com>
Date: Wed Mar 28 00:40:53 2012 +0100
gallivm: new helper function for extract shuffles.
Based on José's idea as we can need that in a couple places.
Note that such shuffles should not be used lightly, since data layout
of <4 x i8> is different to <16 x i8> for instance, hence might cause
data rearrangement.
commit 4d586dbae1b0c55915dda1759d2faea631c0a1c2
Author: Roland Scheidegger <sroland@vmware.com>
Date: Tue Mar 27 18:27:25 2012 +0100
gallivm: (trivial) don't overallocate shuffle variable
using wrong define meant huge array...
commit 06b0ec1f6d665d98c135f9573ddf4ba04b2121ad
Author: Roland Scheidegger <sroland@vmware.com>
Date: Tue Mar 27 17:54:20 2012 +0100
gallivm: don't do per-element extract/insert for vector element resize
Instead of doing per-element extract/insert if the src vectors
and dst vector differ in total size (which generates atrocious code)
first change the src vectors size by using shuffles to destination
vector size.
We can still do better than that on AVX for packing to color buffer
(by exploiting pack intrinsics characteristics hence eleminating the
need for some clamps) but this already generates much better code.
v2: incorporate feedback from José, Keith and use shuffle instead of
bitcasts/extracts. Due to llvm deficiencies the latter cause all data
to get moved to GPRs and back in pieces (even though the data in the
regs actually stays the same...).
commit c9970d70e05f95d3f52fe7d2cd794176a52693aa
Author: Roland Scheidegger <sroland@vmware.com>
Date: Fri Mar 23 19:33:19 2012 +0000
gallivm: fix bug in simple position interpolation
Accidental use of position attribute instead of just pixel coordinates.
Caused failures in piglit glsl-fs-ceil and glsl-fs-floor.
commit d0b6fcdb008d04d7f73d3d725615321544da5a7e
Author: Roland Scheidegger <sroland@vmware.com>
Date: Fri Mar 23 15:31:14 2012 +0000
gallivm: fix emission of ceil opcode
lp_build_ceil seems more appropriate than lp_build_trunc.
This seems to be never hit though someone performs some ceil
to floor magic.
commit d97fafed7e62ffa6bf76560a92ea246a1a26d256
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu Mar 22 11:46:52 2012 +0000
gallivm: new vectorized path for cubemap calculations
should be faster when adapted to multiple quads as only selection masks need to be different.
The code is more or less a per-pixel version adapted to only do it per quad.
A per pixel version would be much simpler (could drop 2 selects, 6 broadcasts and the messy
horizontal add of 3 vectors at the expense of only 2 more absolute value instructions -
would also just work for arbitary large vectors).
This version doesn't yet work with larger vectors because the horizontal add isn't adjusted
to be able to work with 2x4 vectors (and also because face selection wouldn't be done per
quad just per block though that would be only a correctness issue just as with lod selection).
The downside is this code is quite a bit slower. On a Core2 it can be sped up by disabling the
hw blend instructions for selection and using logicop fallbacks instead, but it is still slower
than the old code, hence leave that in for now. Probably will chose one or the other version
based on vector length in the end.
commit b375fbb18a3fd46859b7fdd42f3e9908ea4ff9a3
Author: Roland Scheidegger <sroland@vmware.com>
Date: Wed Mar 21 14:42:29 2012 +0000
gallivm: fix optimized occlusion query intrinsic name
commit a9ba0a3b611e48efbb0e79eb09caa85033dbe9a2
Author: José Fonseca <jfonseca@vmware.com>
Date: Wed Mar 21 16:19:43 2012 +0000
draw,gallivm,llvmpipe: Call gallivm_verify_function everywhere.
commit f94c2238d2bc7383e088b8845b7410439a602071
Author: Roland Scheidegger <sroland@vmware.com>
Date: Tue Mar 20 18:54:10 2012 +0000
gallivm: optimize calculations for cube maps a bit
this does some more vectorized calculations and uses horizontal adds if possible.
A definite win with sse3 otherwise it doesn't seem to make much of a difference.
In any case this is arithmetically identical, cannot handle larger vectors.
Should be useful as a reference point against larger vector version later...
commit 21a2c1cf3c8e1ac648ff49e59fdc0e3be77e2ebb
Author: Roland Scheidegger <sroland@vmware.com>
Date: Tue Mar 20 15:16:27 2012 +0000
llvmpipe: slight optimization of occlusion queries
using movmskps when available.
While this is slightly better for cpus without popcnt we should
really sum the vectors ourselves (it is also possible to cast to i4 before
doing the popcnt but that doesn't help that much neither since llvm
is using some optimized popcnt version for i32)
commit 5ab5a35f216619bcdf55eed52b0db275c4a06c1b
Author: Roland Scheidegger <sroland@vmware.com>
Date: Tue Mar 20 13:32:11 2012 +0000
llvmpipe: fix occlusion queries with larger vectors
need to adjust casts etc.
commit ff95e6fdf5f16d4ef999ffcf05ea6e8c7160b0d5
Author: José Fonseca <jfonseca@vmware.com>
Date: Mon Mar 19 20:15:25 2012 +0000
gallivm: Restore optimization passes.
commit 57b05b4b36451e351659e98946dae27be0959832
Author: Roland Scheidegger <sroland@vmware.com>
Date: Mon Mar 19 19:34:22 2012 +0000
llvmpipe: use existing min2 macro
commit bc9a20e19b4f600a439f45679451f2e87cd4b299
Author: Roland Scheidegger <sroland@vmware.com>
Date: Mon Mar 19 19:07:27 2012 +0000
llvmpipe: add some safeguards against really large vectors
As per José's suggestion, prevent things from blowing up if some cpu
would have 1024bit or larger vectors.
commit 0e2b525e5ca1c5bbaa63158bde52ad1c1564a3a9
Author: Roland Scheidegger <sroland@vmware.com>
Date: Mon Mar 19 18:31:08 2012 +0000
llvmpipe: fix mask generation for uberwide vectors
this was the only piece preventing 16-wide vectors from working
(apart from the LP_MAX_VECTOR_WIDTH define that is), which is the maximum
as we don't get more pixels in the fragment shader at once.
Hence adjust that so things could be tested properly with that size
even though there seems to be no practical value.
commit 3c8334162211c97f3a11c7f64e9e5a2a91ad9656
Author: Roland Scheidegger <sroland@vmware.com>
Date: Mon Mar 19 18:19:41 2012 +0000
llvmpipe: fix the simple interpolation method with larger vectors
so both methods actually _really_ work now. Makes textures look
nice with larger vectors...
commit 1cb0464ef8871be1778d43b0c56adf9c06843e2d
Author: Roland Scheidegger <sroland@vmware.com>
Date: Mon Mar 19 17:26:35 2012 +0000
llvmpipe: fix mask generation and position interpolation with 8-wide vectors
trivial bugs, with these things start to look somewhat reasonable.
Textures though have some swizzling issues it seems.
commit 168277a63ef5b72542cf063c337f2d701053ff4b
Author: Roland Scheidegger <sroland@vmware.com>
Date: Mon Mar 19 16:04:03 2012 +0000
llvmpipe: don't overallocate variables
we never have more than 16 (stamp size) / 4 (minimum possible vector size).
(With larger vectors those variables are still overallocated a bit.)
commit 409b54b30f81ed0aa9ed0b01affe15c72de9abd2
Author: Roland Scheidegger <sroland@vmware.com>
Date: Mon Mar 19 15:56:48 2012 +0000
llvmpipe: add some 32f8 formats to lp_test_conv
Also add the ability to handle different sized vectors.
commit 55dcd3af8366ebdac0af3cdb22c2588f24aa18ce
Author: Roland Scheidegger <sroland@vmware.com>
Date: Mon Mar 19 15:47:27 2012 +0000
gallivm: handle different sized vectors in conversion / pack
only fully generic path for now (extract/insert per element).
commit 9c040f78c54575fcd94a8808216cf415fe8868f6
Author: Roland Scheidegger <sroland@vmware.com>
Date: Sun Mar 18 00:58:28 2012 +0100
llvmpipe: fix harmless use of unitialized values
commit 551e9d5468b92fc7d5aa2265db9a52bb1e368a36
Author: Roland Scheidegger <sroland@vmware.com>
Date: Fri Mar 16 23:31:21 2012 +0100
gallivm: drop special path in extract_broadcast with different sized vectors
Not needed, llvm can handle shuffles with different sized result vector just
fine. Should hopefully generate the same code in the end, but simpler IR.
commit 44da531119ffa07a421eaa041f63607cec88f6f8
Author: Roland Scheidegger <sroland@vmware.com>
Date: Fri Mar 16 23:28:49 2012 +0100
llvmpipe: adapt interpolation for handling multiple quads at once
this is still WIP there are actually two methods possible not quite
sure what makes the most sense, so there's code for both for now:
1) the iterative method as used before (compute attrib values at upper left
corner of stamp and upper left corner of each quad initially).
It is improved to handle more than one quad at once, and also do some more vectorized
calculations initially for slightly better code - newer cpus have full throughput with
4 wide float vectors, hence don't try to code up a path which might be faster if there's
just one channel active per attribute.
2) just do straight interpolation for each pixel.
Method 2) is more work per quad, but less initially - if all quads are executed
significantly more overall though. But this might change with larger vector lengths.
This method would also be needed if we'd do some kind of active quad merging when
operating on multiple quads at once.
This path contains some hack to force llvm to generate better code, it is still far
from ideal though, still generates far too many unnecessary register spills/reloads.
Both methods should work with different sized vectors.
Not very well tested yet, still seems to work with four-wide vectors, need changes
elsewhere to be able to test with wider vectors.
commit be5d3e82e2fe14ad0a46529ab79f65bf2276cd28
Author: José Fonseca <jfonseca@vmware.com>
Date: Fri Mar 16 20:59:37 2012 +0000
draw: Cleanup.
commit f85bc12c7fbacb3de2a94e88c6cd2d5ee0ec0e8d
Author: José Fonseca <jfonseca@vmware.com>
Date: Fri Mar 16 20:43:30 2012 +0000
gallivm: More module compilation refactoring.
commit d76f093198f2a06a93b2204857e6fea5fd0b3ece
Author: José Fonseca <jfonseca@vmware.com>
Date: Thu Mar 15 21:29:11 2012 +0000
llvmpipe: Use gallivm_compile/free_function() in linear code.
Should had been done before.
commit 122e1adb613ce083ad739b153ced1cde61dfc8c0
Author: Roland Scheidegger <sroland@vmware.com>
Date: Tue Mar 13 14:47:10 2012 +0100
llvmpipe: generate partial pixel mask for multiple quads
still works with one quad, cannot be tested yet with more
At least for now always fixed order with multiple quads.
commit 4c4f15081d75ed585a01392cd2dcce0ad10e0ea8
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu Mar 8 22:09:24 2012 +0100
llvmpipe: refactor state setup a bit
Refactor to make it easier to emit (and potentially later fetch in fs)
coefficients for multiple attributes at once.
Need to think more about how to make this actually happen however, the
problem is different attributes can have different interpolation modes,
requiring different handling in both setup and fs (though linear and
perspective handling is close).
commit 9363e49722ff47094d688a4be6f015a03fba9c79
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu Mar 8 19:23:23 2012 +0100
llvmpipe: vectorize tri offset calc
cuts number of instructions in quad-offset-factor from 107 to 75.
This code actually duplicated the (scalar) code calculating the determinant
except it used different vertex order (leading to different sign but it doesn't
matter) hence llvm could not have figured out it's the same (of course with
determinant vectorized in the other place that wouldn't have worked any longer
neither).
Note this particular piece doesn't actually vectorize well, not many arithmetic
instructions left but tons of shuffle instructions...
Probably would need to work on n tris at a time for better vectorization.
commit 63169dcb9dd445c94605625bf86d85306e2b4297
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu Mar 8 03:11:37 2012 +0100
llvmpipe: vectorize some scalar code in setup
reduces number of arithmetic instructions, and avoids loading
vector x,y values twice (once as scalars once as vectors).
Results in a reduction of instructions from 76 to 64 in fs setup for glxgears
(16%) on a cpu with sse41.
Since this code uses vec2 disguised as vec4, on old cpus which had physical
64bit sse units (pre-Core2) it probably is less of a win in practice (and if
you have no vectors you can only hope llvm eliminates the arithmetic for
unneeded elements).
commit 732ecb877f951ab89bf503ac5e35ab8d838b58a1
Author: Roland Scheidegger <sroland@vmware.com>
Date: Wed Mar 7 00:32:24 2012 +0100
draw: fix clipping
bug introduced by 4822fea3f0440b5205e957cd303838c3b128419c broke
clipping pretty badly (verified with lineclip test)
commit ef5d90b86d624c152d200c7c4056f47c3c6d2688
Author: Roland Scheidegger <sroland@vmware.com>
Date: Tue Mar 6 23:38:59 2012 +0100
draw: don't store vertex header per attribute
storing the vertex header once per attribute is totally unnecessary.
Some quick look at the generated assembly says llvm in fact cannot optimize
away the additional stores (maybe due to potentially aliasing pointers
somewhere).
Plus, this makes the code cleaner and also allows using a vector "or"
instead of scalar ones.
commit 6b3a5a57b0b9850854cfbd7b586e4e50102dda71
Author: Roland Scheidegger <sroland@vmware.com>
Date: Tue Mar 6 19:11:01 2012 +0100
draw: do the per-vertex "boolean" clipmask "or" with vectors
no point extracting the values and doing it per component.
Doesn't help that much since we still extract the values elsewhere anyway.
commit 36519caf1af40e4480251cc79a2d527350b7c61f
Author: Roland Scheidegger <sroland@vmware.com>
Date: Fri Mar 2 22:27:01 2012 +0100
gallivm: fix lp_build_extract_broadcast with different sized vectors
Fix the obviously wrong argument, so it doesn't blow up.
commit 76d0ac3ad85066d6058486638013afd02b069c58
Author: José Fonseca <jfonseca@vmware.com>
Date: Fri Mar 2 12:16:23 2012 +0000
draw: Compile per module and not per function (WIP).
Enough to get gears w/ LLVM draw + softpipe to work on AVX doing:
GALLIUM_DRIVER=softpipe SOFTPIPE_USE_LLVM=yes glxgears
But still hackish -- will need to rethink and refactor this.
commit 78e32b247d2a7a771be9a1a07eb000d1e54ea8bd
Author: José Fonseca <jfonseca@vmware.com>
Date: Wed Feb 29 12:01:05 2012 +0000
llvmpipe: Remove lp_state_setup_fallback.
Never used.
commit 6895d5e40d19b4972c361e8b83fdb7eecda3c225
Author: José Fonseca <jfonseca@vmware.com>
Date: Mon Feb 27 19:14:27 2012 +0000
llvmpipe: Don't emit EMMS on x86
We already take precautions to ensure that LLVM never emits MMX code.
commit 4822fea3f0440b5205e957cd303838c3b128419c
Author: Roland Scheidegger <sroland@vmware.com>
Date: Wed Feb 29 15:58:19 2012 +0100
draw: modifications for larger vector sizes
We want to be able to use larger vectors especially for running the vertex
shader. With this patch we build soa vectors which might have a different
length than 4.
Note that aos structures really remain the same, only when aos structures
are converted to soa potentially different sized vectors are used.
Samplers probably don't work yet, didn't look at them.
Testing done:
glxgears works with both 128bit and 256bit vectors.
commit f4950fc1ea784680ab767d3dd0dce589f4e70603
Author: José Fonseca <jfonseca@vmware.com>
Date: Wed Feb 29 15:51:57 2012 +0100
gallivm: override native vector width with LP_NATIVE_VECTOR_WIDTH env var for debug
commit 6ad6dbf0c92f3bf68ae54e5f2aca035d19b76e53
Author: José Fonseca <jfonseca@vmware.com>
Date: Wed Feb 29 15:51:24 2012 +0100
draw: allocate storage with alignment according to native vector width
commit 7bf0e3e7c9bd2469ae7279cabf4c5229ae9880c1
Author: José Fonseca <jfonseca@vmware.com>
Date: Fri Feb 24 19:06:08 2012 +0000
gallivm: Fix comment grammar.
Was missing several words. Spotted by Roland.
commit b20f1b28eb890b2fa2de44a0399b9b6a0d453c52
Author: José Fonseca <jfonseca@vmware.com>
Date: Thu Feb 23 19:22:09 2012 +0000
gallivm: Use MC-JIT on LLVM 3.1 + (i.e, SVN)
MC-JIT
Note: MC-JIT is still WIP. For this to work correctly it requires
LLVM changes which are not yet upstream.
commit b1af4dfcadfc241fd4023f4c3f823a1286d452c0
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu Feb 23 20:03:15 2012 +0100
llvmpipe: use new lp_type_width() helper in lp_test_blend
commit 04e0a37e888237d4db2298f31973af459ef9c95f
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu Feb 23 19:50:34 2012 +0100
llvmpipe: clean up lp_test_blend a little
Using variables just sized and aligned right makes it a bit more obvious
what's going on.
The test still only tests vector length 4.
For AoS anything else probably isn't going to work.
For SoA other lengths should work (at least with floats).
commit e61c393d3ec392ddee0a3da170e985fda885a823
Author: José Fonseca <jfonseca@vmware.com>
Date: Thu Feb 23 17:48:30 2012 +0000
gallivm: Ensure vector width consistency.
Instead of assuming that everything is the max native size.
commit 330081ac7bc41c5754a92825e51456d231bf84dd
Author: José Fonseca <jfonseca@vmware.com>
Date: Thu Feb 23 17:44:14 2012 +0000
draw: More simd vector width consistency fixes.
commit d90ca002753596269e37297e2e6c139b19f29f03
Author: José Fonseca <jfonseca@vmware.com>
Date: Thu Feb 23 17:43:00 2012 +0000
gallivm: Remove unused lp_build_int32_vec4_type() helper.
commit cae23417824d75869c202aaf897808d73a2c1db0
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu Feb 23 17:32:16 2012 +0100
gallivm: use global variable for native vector width instead of define
We do not know the simd extensions (and hence the simd width we should use)
available at compile time.
At least for now keep a define for maximum vector width, since a global
variable obviously can't be used to adjust alignment of automatic stack
variables.
Leave the runtime-determined value at 128 for now in all cases.
commit 51270ace6349acc2c294fc6f34c025c707be538a
Author: José Fonseca <jfonseca@vmware.com>
Date: Thu Feb 23 15:41:02 2012 +0000
gallivm: Add a hunk inadvertedly lost when rebasing.
commit bf256df9cfdd0236637a455cbaece949b1253e98
Author: José Fonseca <jfonseca@vmware.com>
Date: Thu Feb 23 14:24:23 2012 +0000
llvmpipe: Use consistent vector width in depth/stencil test.
commit 5543b0901677146662c44be2cfba655fd55da94b
Author: José Fonseca <jfonseca@vmware.com>
Date: Thu Feb 23 14:19:59 2012 +0000
draw: Use a consistent the vector register width.
Instead of 4x32 sometimes, LP_NATIVE_VECTOR_WIDTH other times.
commit eada8bbd22a3a61f549f32fe2a7e408222e5c824
Author: José Fonseca <jfonseca@vmware.com>
Date: Thu Feb 23 12:08:04 2012 +0000
gallivm: Remove garbagge collection.
MC-JIT will require one compilation per module (as opposed to one
compilation per function), therefore no state will be shared,
eliminating the need to do garbagge collection.
commit 556697ea0ed72e0641851e4fbbbb862c470fd7eb
Author: José Fonseca <jfonseca@vmware.com>
Date: Thu Feb 23 10:33:41 2012 +0000
gallivm: Move all native target initialization to lp_set_target_options().
commit c518e8f3f2649d5dc265403511fab4bcbe2cc5c8
Author: José Fonseca <jfonseca@vmware.com>
Date: Thu Feb 23 09:52:32 2012 +0000
llvmpipe: Create one gallivm instance for each test.
commit 90f10af8920ec6be6f2b1e7365cfc477a0cb111d
Author: José Fonseca <jfonseca@vmware.com>
Date: Thu Feb 23 09:48:08 2012 +0000
gallivm: Avoid LLVMAddGlobalMapping() in lp_bld_assert().
Brittle, complex, and unecesary. Just use function pointer constant.
commit 98fde550b33401e3fe006af59db4db628bcbf476
Author: José Fonseca <jfonseca@vmware.com>
Date: Thu Feb 23 09:21:26 2012 +0000
gallivm: Add a lp_build_const_func_pointer() helper.
To be reused in all places where we want to call C code.
commit 6cfedadb62c2ce5af8d75969bc95a607f3ece118
Author: José Fonseca <jfonseca@vmware.com>
Date: Thu Feb 23 09:44:41 2012 +0000
gallivm: Cleanup/simplify lp_build_const_string_variable.
- Move to lp_bld_const where it belongs
- Rename to lp_build_const_string
- take the length from the argument (and don't count the zero terminator twice)
- bitcast the constant to generic i8 *
commit db1d4018c0f1fa682a9da93c032977659adfb68c
Author: José Fonseca <jfonseca@vmware.com>
Date: Thu Feb 23 11:52:17 2012 +0000
gallivm: Set NoFramePointerElimNonLeaf to true where supported.
commit 088614164aa915baaa5044fede728aa898483183
Author: Roland Scheidegger <sroland@vmware.com>
Date: Wed Feb 22 19:38:47 2012 +0100
llvmpipe: pass in/out pointers rather scalar floats in lp_bld_arit
we don't want llvm to potentially optimize away the vectors (though it doesn't
seem to currently), plus we want to be able to handle in/out vectors of arbitrary
length.
commit 3f5c4e04af8a7592fdffa54938a277c34ae76b51
Author: Roland Scheidegger <sroland@vmware.com>
Date: Tue Feb 21 23:22:55 2012 +0100
gallivm: fix lp_build_sqrt() for vector length 1
since we optimize away vectors with length 1 need to emit intrinsic
without vector type.
commit 79d94e5f93ed8ba6757b97e2026722ea31d32c06
Author: José Fonseca <jfonseca@vmware.com>
Date: Wed Feb 22 17:00:46 2012 +0000
llvmpipe: Remove lp_test_round.
commit 81f41b5aeb3f4126e06453cfc78990086b85b78d
Author: Roland Scheidegger <sroland@vmware.com>
Date: Tue Feb 21 23:56:24 2012 +0100
llvmpipe: subsume lp_test_round into lp_test_arit
Much simpler, and since the arguments aren't passed as 128bit values can run
on any arch.
This also uses the float instead of the double versions of the c functions
(which probably was the intention anyway).
In contrast to lp_test_round the output is much less verbose however.
Tested vector width of 32 to 512 bits - all pass except 32 (length 1) which
crashes in lp_build_sqrt() due to wrong type.
Signed-off-by: José Fonseca <jfonseca@vmware.com>
commit 945b338b421defbd274481d8c4f7e0910fd0e7eb
Author: José Fonseca <jfonseca@vmware.com>
Date: Wed Feb 22 09:55:03 2012 +0000
gallivm: Centralize the function compilation logic.
This simplifies a lot of code.
Also doing this in a central place will make it easier to carry out the
changes necessary to use MC-JIT in the future.
gallivm: Fix typo in explicit derivative shuffle.
Trivial.
draw: make DEBUG_STORE work again
adapt to lp_build_printf() interface changes
Reviewed-by: José Fonseca <jfonseca@vmware.com>
draw: get rid of vecnf_from_scalar()
just use lp_build_broadcast directly (cannot assign a name but don't really
need it, vecnf_from_scalar() was producing much uglier IR due to using
repeated insertelement instead of insertelement+shuffle).
Reviewed-by: José Fonseca <jfonseca@vmware.com>
llvmpipe: fix typo in complex interpolation code
Fixes position interpolation when using complex mode
(piglit fp-fragment-position and similar)
Reviewed-by: José Fonseca <jfonseca@vmware.com>
draw: fix clipvertex/position storing again
This appears to be the result of a bad merge.
Fixes piglit tests relying on clipping, like a lot of the interpolation tests.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
gallivm: Fix explicit derivative manipulation.
Same counter variable was being used in two nested loops. Use more
meanigful variable names for the counter to fix and avoid this.
gallivm: Prevent buffer overflow in repeat wrap mode for NPOT.
Based on Roland's patch, discussion, and review .
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
gallivm: Fix dims for TGSI_TEXTURE_1D in emit_tex.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
gallivm: Fix explicit volume texture derivatives.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
gallivm: fix 1d shadow texture sampling
Always r coordinate is used, hence need 3 coords not two
(the second one is unused).
Reviewed-by: José Fonseca <jfonseca@vmware.com>
gallivm: Enable AVX support without MCJIT, where available.
For now, this just enables AVX on Windows for testing. If the code is
stable then we might consider prefering the old JIT wherever possible.
No change elsewhere.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The vertex element state isn't in registers any more, so
remove that old code. That fixes a memory corruption with
the blend state and gets eglgears partially working.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Some application calls eglCreateWindowSurface with
EGLNativeWindowType parameter having zero value. It causes SEGV
and disturbs error handling like EGL_NO_SURFACE.
Signed-off-by: Elvis Lee <kwangwoong.lee@lge.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
a112ca5d rather crassly smashed all the compiler flags together into AM_CFLAGS.
Separate them out the way they were before, putting pre-processor flags into
AM_CPPFLAGS, so assembly source gets preprocessed with the correct pre-processor
flags as well.
Also, remove unneeded CFLAGS from AM_CFLAGS, and CXXFLAGS from AM_CXXFLAGS
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Tested-by: Brian Paul <brianp@vmware.com>
I suck at resolving merge conflicts and broke the build in a5a34b1.
This patch adds the missing field intel_mipmap_tree::wraps_etc1.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Enable it for all hardware.
No current hardware supports ETC1, so this patch implements it by
translating the ETC1 data to RGBX data during the call to
glCompressedTexImage2D(). For details, see the doxygen for
intel_mipmap_tree::wraps_etc1.
Passes the Piglit test spec/OES_compressed_ETC1_RGB8_texture/miptree and
the ETC1 test in the GLES2 conformance suite.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Add function _mesa_etc1_unpack_rgba8888. It is intended to be used by
glCompressedTexSubImage2D to decode ETC1 textures into RGBA.
CC: Chia-I <olv@lunarg.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Move the body of util_etc1_rgb8_unpack_rgba_unorm8 into a new function
that can be shared between gallium and dri drivers,
texcompress_etc_tmp.h:etc1_unpack_rgba8888.
CC: Chia-I <olv@lunarg.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
lp_delete_setup_variants() used to be called in garbage collection,
but this no longer exists hence the setup shaders never got freed.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
When we don't intend to texture from or render to a __DRIimage we
use __DRI_IMAGE_FORMAT_NONE. In that case, we just create the __DRIimage
to reference the underlying buffer, and will create usable __DRIimages
from it using createSubImage later.
If we try to use _mesa_get_format_bytes() on MESA_FORMAT_NONE in
a debug build, we hit an assertion, so let's not do that.
Commit 68e04cc6 was tested using automake-1.11. Unfortunately, automake-1.12
made a "slightly backward-incompatible change" in the use of yacc with C++, and
for a .yy file, the generated header file is now named .hh, not .h
To work with both, write our own rule for running yacc, which generates a
header file named .h, rather than using automake's rule.
Also, remove things from BUILD_SOURCES which don't need to be there
Also, update EXCLUDE rules in doxygen/glsl.doxy, for change of generated files
from .cpp -> .cc, and glsl_lexer.h has never existed.
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Commit defadf2b1 erroneously tries to make gallium drivers link with libdricore
as a static library, not a shared library
Also, change uses of DRI_LIB_DEPS in gallium driver Makefiles to
GALLIUM_DRI_LIB_DEPS, so the libraries added are used in the linking the gallium
driver
Also, fix the path to the libdricore.so symlink, it's made in LIB_DIR, not in
the libdricore directory
Also repair quoting of dricore settings of DRI_LIB_DEPS and GALLIUM_DRI_LIB_DEPS
variables so VERSION is interpolated in configure but TOP and LIB_DIR are
interpolated later (where they are known, but VERSION isn't)
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Signed-off-by: Tom Stellard <thomas.stellard@amd.com>
- Use LLVM limits when LLVM is being used, instead of TGSI limits
- Provide draw_get_shader_param_no_llvm for when llvm is never used (softpipe)
- Eliminate several of the hacks around draw shader caps in several drivers
Unfortunately the hack for PIPE_MAX_VERTEX_SAMPLERS is still necessary.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
The libmesa convenience library is linked with the libglsl convenience
library. libOsmesa is linked with libmesa, and also directly with libglsl.
When using libtool, this gives rise to duplicate symbol errors.
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Matt Turner <mattst88@gmail.com>
* "configure substitutions are not allowed in _SOURCES variables" in automake,
so remove the AC_SUBST'ed GLAPI_ASM_SOURCES and instead use some AM_CONDITIONALS
to choose which asm sources are used
* Change GLAPI_LIB to point to the .la file in other Makefile.am files, and make a link
to the .a file for the convenience of other Makefiles which have not yet been converted
to automake
v2:
- Use AM_CPPFLAGS for cleaner build output
- EXTRA_SOURCES is not needed
- Remove libglapi.a compatibility link on clean
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Matt Turner <mattst88@gmail.com>
Now mesa/drivers/dri is converted to automake, we want to update DRI_LIB_DEPS
so that we link with the libmesa or libdricore libtool library, as appropriate.
However, this is complicated by the fact that gallium/targets is not (yet)
converted, so we can't share the DRI_LIB_DEPS autoconf variable with that anymore.
Add an additional autoconf variable GALLIUM_DRI_LIB_DEPS, which is now used in
gallium/targets/Makefile.dri, to link with the libdircore or libmesa native library.
v2: libdricore$VERSION.a needs to be libdricore$(VERSION).a
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Matt Turner <mattst88@gmail.com>
* "configure substitutions are not allowed in _SOURCES variables" in automake, so instead of
MESA_ASM_FILES, use some AM_CONDITIONALS to choose which architecture's asm sources are used
in libmesa_la_SOURCES. (Can't remove MESA_ASM_FILES autoconf variable as it's still used in
sources.mak)
* Update to link with the .la file in other Makefile.am files, and make a link to the
.a file for the convenience of other Makefiles which have not yet been converted to automake
v2: Remove stray -static from LDFLAGS
v3: Remove .a compatibility link on clean
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Matt Turner <mattst88@gmail.com>
Automake can't handle having both clip.S and clip.c, even though they have different paths
"src/mesa/Makefile.am: object `clip.lo' created by `$(SRCDIR)/sparc/clip.S' and `$(SRCDIR)/main/clip.c'"
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Matt Turner <mattst88@gmail.com>
v2: Use AM_V_GEN to silence generated code rules. Add BUILT_SOURCES to CLEANFILES
v3:
- Fix an accidental // in a path
- Use automake make rules for lex/yacc rather than writing our own
- Update .gitignore appropriately
- Build a libglcpp convenience library rather than awkwardly including
the files in libglsl and delegating the generation
- Remove libglsl.a compatibility link on clean
v4:
- Automake's rules for lex/yacc make .cc if source is .ll or .yy, and apparently we
must use those extensions "because of scons", so update everywhere glsl_parser.cpp
-> glsl_parser.cc and glsl_lexer.cpp -> glsl_lexer.cc. This fixes 'make tarballs'
and building with dricore enabled.
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Matt Turner <mattst88@gmail.com>
This also currently fix the installation of libOSmesa.
v2: Remove old Makefile, libOSmesa is now versioned, fix typos
v3: Keep config substitution alphabetized
v4: Update .gitignore
v5: Libraries will be in the builddir, not the srcdir.
Reviewed-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Matt Turner <mattst88@gmail.com>
This was not implemented, because the spec was changed just recently.
Everything has been in place already.
Gallium has PIPE_FORMAT_B5G6R5_UNORM, while Mesa has MESA_FORMAT_RGB565.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The whole reason I avoided this was because it might operate on a
brw_vertex_program or a brw_fragment_program. However, that isn't a
problem: all we need is the gl_program base type.
This avoids awkwardly passing the loop counter 'i' as a parameter,
simplifies both callers, and also plumbs prog in place for future use.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
If alpha-testing is enabled, we need to send alpha down the pipeline
even if nr_color_buffers == 0. However, tracking whether alpha-testing
is enabled in the WM program key is expensive: it causes us to compile
multiple specializations of the same shader, using program cache space.
This patch removes the check for alpha-testing, and simply emits alpha
whenever nr_color_buffers == 0. We believe this will also be necessary
for alpha-to-coverage, and it should add minimal overhead to an uncommon
case. Saving the recompiles should more than make up the difference.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Previously we only did this pre-Gen6, and used pwrite on Gen6+.
In one workload, this cuts significant amount of overhead.
v2: Simplify the function based on Eric's suggestions.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
We rely on proper IEEE 754 behavior in too many places for this.
See also commit 2fdbbeca43 with equivalent
change for autoconf.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Without that, people with buggy apps that looked at just the server
string for GLX_ARB_create_context would call this function that just
threw an error when you tried to make a context. Google shows plenty
of complaints about this.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This function assumes that lp_build_context::type is a vector type,
which is not true for r600 or radeonsi.
This fixes an assertion failure using glamor 2D accel.
It had many problems:
- The shadow comparison was done post-filtering.
- It required state-dependent recompiles whenever the comparison
function changed.
- It didn't even work: many cases hit assertion failures.
- I never implemented it for the VS.
The new lowering pass which converts textureGrad to textureLod by
computing the LOD value works much better.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Intel hardware doesn't natively support textureGrad with shadow
comparisons. So we need to generate code to handle it somehow.
Based on the equations of page 205 of the OpenGL 3.0 specification,
it's possible to compute the LOD value that would be selected given the
gradient values. Then, we can simply convert the TXD to a TXL.
Currently, this passes 34/46 of oglconform's shadow-grad subtests;
four cubemap tests are regressed. We should investigate this in the
future.
v2: Apply abs() to the scalar case (thanks to Eric).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This swizzles away unwanted components, while preserving the order of
the ones that remain.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
I needed to compute logs and square roots in a patch I was working on,
and wanted to use the convenient interface. We already have a similar
constructor for binops; adding one for unops seems reasonable.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
I ran into this while trying to create a TXS query, which doesn't have a
coordinate. Since it didn't get initialized to NULL, a bunch of
visitors tried to access it and crashed.
Most of the time, this won't be a problem, but it's just a good idea.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The only case a depth buffer can be set as a color buffer is when flushing.
That wasn't always the case, but now this code isn't required anymore.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
- maintain a mask of which mipmap levels are dirty (instead of one big flag)
- only flush what was requested at a given point and not the whole resource
(most often only one level and one layer has to be flushed)
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
we can just update the state when decompressing, there's no need to add
additional info into the DSA state
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
to remove some overhead from draw_vbo. This is a derived state.
BTW, I've got no idea how compute interacts with 3D here, but it should
use cb_misc_state, so that 3D and compute don't conflict.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Because u_blit couldn't sample a 1D, 3D, CUBE and ARRAY texture, we created
a 2D texture holding a copy of one slice of the source texture (even for 1D).
Let's just do it right.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
This patch updates the blorp engine to properly handle the case where
the surface being textured from uses Gen7's CMS MSAA layout. The
following changes were necessary:
- Before reading color values from the surface, we need to read from
the MCS buffer using the ld_mcs sampler message. This is done by
the mcs_fetch() function, and the result is stored in the mcs_data
register. This only needs to be done once per pixel, since the MCS
value is shared between all samples belonging to a pixel.
- When reading color values from the surface, we need to use the
ld2dms sampler message instead of the ld2dss message, and we need to
provide the value read from the MCS buffer as an argument.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
When a buffer using Gen7's CMS MSAA layout is bound to a texture or a
render target, the SURFACE_STATE structure needs to point to the MCS
buffer and to indicate its pitch. This patch updates the functions
that emit SURFACE_STATE to handle CMS layout properly.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Previously the DWORD used to control the CMS MSAA layout was just a
pad value, because we didn't use it.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
To implement Gen7's CMS MSAA layout, we need an extra buffer, the MCS
(Multisample Control Surface) buffer. This patch introduces code for
allocating and deallocating the buffer, and storing a pointer to it in
the intel_mipmap_tree struct.
No functional change, since the CMS layout is not enabled yet.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
From the Ivy Bridge PRM, Vol 1 Part 1, p112:
There are three types of multisampled surface layouts designated
as follows:
- IMS Interleaved Multisampled Surface
- CMS Compressed Mulitsampled Surface
- UMS Uncompressed Multisampled Surface
Previously, the i965 driver only used IMS and UMS formats, and
distinguished beetween them using the boolean
intel_mipmap_tree::msaa_is_interleaved. To facilitate adding support
for the CMS format, this patch replaces that boolean (and other
booleans derived from it) with an enum
INTEL_MSAA_LAYOUT_{IMS,CMS,UMS}. It also updates the terminology used
in comments throughout the driver to match the IMS/CMS/UMS terminology
used in the PRM. CMS layout is not yet used.
The enum has a fourth possible value, INTEL_MSAA_LAYOUT_NONE, which is
used for non-multisampled surfaces.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
On Gen6, MSAA buffers always use an interleaved layout and non-MSAA
buffers always use a non-interleaved layout, so it is not strictly
necessary to keep track of the layout of the texture and render target
surfaces in the blorp program key. However, it is cleaner to do so,
since (a) it makes the blorp compiler less dependent on implicit
knowledge about how the GPU pipeline is configured, and (b) it paves
the way for implementing compressed multisampled surfaces in Gen7.
This patch won't cause any redundant compiles, because the layout of
the texture and render target surfaces depends on other parameters
that are already in the blorp program key.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
We don't generate public entrypoints for GLES extensions, so move the
GL_NV_draw_buffers definition from ARB_draw_buffers.xml to es_EXT.xml.
When the extension is defined in ARB_draw_buffers.xml, we end up with a
public entry point for it, but no prototype, which gives an error when
compiled with --disable-asm and --disable-shared-glapi.
Instead, just move the GLES extension to es_EXT.xml so this doesn't happen.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
This lets us specify an offset into the bo where the miptree starts,
which will let us set up a texture for a single plane in a planar buffer.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
The additions in version 5 enables creating EGLImages for different planes
of a YUV buffer. createImageFromName is still used to create the containing
__DRIimage, and createSubImage can then be used no that __DRIimage to create
__DRIimages that correspond to the y, u, and v planes (__DRI_IMAGE_FORMAT_R8)
or the uv planes (__DRI_IMAGE_FORMAT_RG88) for formats such as NV12 where
the u and v components are interleaved. Packed formats such as YUYV etc
doesn't require any special treatment, we just sample those as a regular
ARGB texture.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
The code for growing the memory pool (which is used for storing all of
the global buffers) wasn't working. There seem to be two separate issues
with the memory pool code. The first was the way it was growing the pool.
When the memory pool needed more space, it would:
1. Copy the data from the memory pool's backing texture to system memory.
2. Delete the memory pool's texture
3. Create a bigger backing texture for the memory pool.
4. Copy the data from system memory into the bigger texture.
The copy operations didn't seem to be working, and I suspect that since
they were using fragment shaders to do the copy, that there might have
been a problem with the mixing of compute and 3D state.
The other issue is that the size of 1D textures is limited, and I was
having trouble getting 2D textures to work.
I think these problems will be easier to solve once more code is shared
between 3D and compute, which is why I decided to disable it for now
rather than continue searching for a fix.
The original strategy for handling floating point loads, which was to
lower (f32 load) to (f32 bitcast (i32 load)) wasn't really working. The
main problem was that the DAG legalizer couldn't handle replacing a node
with two results (load) with a node with only one result (bitcast).
It didn't change performance on Lightsmark or Nexuiz, which both used
DYNAMIC_DRAW buffers, but it was killing performance (40% CPU wasted pwriting
buffers) on a closed-source app we're looking at.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Add the infrastructure required for this extension. There is no
xserver support and no driver support yet. Drivers can enable this be
advertising DRI2 version 4 and accepting the
__DRI_CTX_FLAG_ROBUST_BUFFER_ACCESS flag and the
__DRI_CTX_ATTRIB_RESET_STRATEGY attribute in create context.
Some additional Mesa infrastructure is needed before drivers can do
this. The GL_ARB_robustness spec, which all Mesa drivers already
advertise, requires:
"If the behavior is LOSE_CONTEXT_ON_RESET_ARB, a graphics reset
will result in the loss of all context state, requiring the
recreation of all associated objects."
It is necessary to land this infrastructure now so that the related
infrastructure can land in the xserver. The xserver has very long
release schedules, and the remaining Mesa parts should land long, long
before the next xserver merge window opens.
v2: Expose robustness as a DRI2 extension rather than bumping
__DRI_DRI2_VERSION.
v3: Add a comment explaining why dri2->base.version >= 3 is also
required for GLX_ARB_create_context_robustness.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This allows revising the dri_interface.h separately from adding driver
support.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We neglected to list the deprecation model/forward compatible context
support.
inverse() has been done for a while.
None of us know what "highp change" means; GLSL 1.30 already added the
ability to recognize precision keywords, and it doesn't look like 1.40
has any new requirements there (precision keywords still have no meaning).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Use r600_resource_texture::flished_depth_texture for GPU access, and
allocate it in the VRAM. For transfers we'll allocate texture in the GTT
and store it in the r600_transfer::staging.
Improves performance when flushed depth texture is frequently used by the
GPU, e.g. in Lightsmark (~30%)
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
With fixes and updates from Ben Widawsky and comments from Paul Berry.
v2: Use drm_intel_gem_context_destroy to destroy hardware context;
remove useless initialization of hw_ctx, both suggested by Eric.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Acked-by: Paul Berry <stereotype441@gmail.com>
This doesn't do anything with the uniform block declarations yet, so
usage of those uniforms finds them to be undeclared.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
I've been trying to derive from this for UBO support, and the slightly
obfuscated types were putting me over the edge.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The got_one variable was set iff one of the bits in flags.i was set.
v2: Fix incorrect dropping of the ARB_conservative_depth warning.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This function is used when dispatching compute shader in order to avoid
mixing compute and 3D registers in the context's dirty list. This
allows the compute code to resuse 3D functions like evergreen_cb, which
return a struct r600_pipe_state and still have control over when and how
the register writes are emitted.
The start_compute_cs atom initializes some config and context registers
to the values needed for running compute shaders. When a compute shader
is dispatched, this atom is emitted after the start_cs_cmd atom, which
initializes registers that are common to both 3D and compute.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Some packets require the shader type bit (bit 1) to be set when
used for compute shaders. The pkt_flag will be initialized to
RADEON_CP_PACKET3_COMPUTE_MODE for any struct r600_command_buffer used
for dispatching compute shaders and it will be or'd against the result of
the PKT3 macro when adding a new packet to a struct r600_command buffer.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
For copy propgation, we've dropped the use of a GRF in favor of a
(probably later) use of a different GRF. This definitely requires
invalidating intervals.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Since live intervals are based on ip, removing an instruction trashes
the intervals unless we were to go do some surgery. These happen to
usually remove a use of a grf, so it's time to recalculate, anyway.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
NOTE: This is a candidate for the 8.0 release branch.
This has less impact than for the FS (4k savings), because it was partially
done already, but makes things more consistent.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We factor out all the EGL book-keeping into dri2_create_image() and
simplify the wayland case by using dupImage.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
We have the same switch and allocation code in two places.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This reverts commit cbffaf20e9.
Use the PRIx64 macro in the fprintf() call instead, as suggested
by Dylan Noblesmith.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
ROUND and TRUNC are implemented with one function to reduce code duplication.
Note: ROUND isn't actually used yet, but probably will be soon.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Converting CMP to SLT+LRP didn't work when src2 or src3 was Inf/NaN.
That's the case for GLSL sqrt(0). sqrt(0) actually happens in many
piglit auto-generated tests that use the distance() function.
v2: remove debug/devel code, per Jose
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Was previously implemented with FLOOR.
Fixes quite a few piglit tests of float->int conversion, integer
division, etc.
v2: clean up left over debug/devel code, per Jose
Reviewed-by: José Fonseca <jfonseca@vmware.com>
If the 'dst' register is the same as the 'pass' register we'll generate
invalid code. Use a temporary register in that case.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Redo this commit, and remove the inclusion of gl2ext.h
from src/mapi/glapi/glapi_priv.h. The include was added in
8f3be33985 to fix a missing prototype for
glDrawBuffersNV and others, but it's not possible to include both
glext.h and gl2ext.h from the same file.
I don't see the missing prototype here (with or without shared glapi)
so I'm just removing the offending #include.
Also, since we're redoing this, update to the most recent gl2ext.2.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
That old bug was hidden but the clipper always interpolating in 3d space
no matter what it should have been doing. Now that the interpolation
has been fixed, the bug shows up.
Fixes fdo 51364.
Signed-off-by: Olivier Galibert <galibert@pobox.com>
Signed-off-by: José Fonseca <jfonseca@vmware.com>
Calling glGenerateMipmap could overwrite vertex buffer state, leading
to incorrect rendering or crashes depending on the Gallium driver.
This was happening on WebGL Conformance test texture-size.
Before 784dd51198 this was covered up
by redundant vertex buffer validation.
Reviewed-by: Stéphane Marchesin <marcheu@chromium.org>
Signed-off-by: Marek Olšák <maraeo@gmail.com>
This reverts commit 8818b88748.
I get a lot of errors like this one:
In file included from ../../../src/mapi/glapi/glapi_priv.h:49:0,
from glapi_dispatch.c:40:
../../../include/GLES2/gl2ext.h:1074:28: error: redefinition of typedef ‘PFNGLRENDERBUFFERSTORAGEMULTISAMPLEEXTPROC’
../../../include/GL/glext.h:10237:25: note: previous declaration of ‘PFNGLRENDERBUFFERSTORAGEMULTISAMPLEEXTPROC’ was here
This with a clean build (with git clean -fdX).
I don't get the errors on my other machine. I didn't investigate why,
a wild guess is that this depends on the version of gcc.
This is a big win for savage2, hon and yofrankie. 62 new programs for
savage2/hon get 16-wide mode, along with one for humus demos and two
for tropics. Even a few shaders from tropics see reductions of 15% or
more.
total instructions in shared programs: 216536 -> 207353 (-4.24%)
instructions in affected programs: 123941 -> 114758 (-7.41%)
In benchmarking Tropics, only a .040% +/- 034% performance improvement
was observed (n=90). Rather disappointing, but I was primarily
motivated to do this patch by a regression in the number of 16-wide
shaders compiled after a GRF texturing on IVB patch I'm working on.
Hopefully this helps avoid that regression.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This shaves a few instructions off of a ton of programs. For 12
shaders from tropics and sanctuary, it's enough reduction in register
pressure to get 16-wide mode. 7 shaders from heroes of newerth and
savage2 are hurt by about 1.1%, where copy propagation of negates ends
up preventing coalescing, but we could regain that by doing dataflow
analysis in our copy propagation.
No significant performance difference in tropics (n=11)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The meta-ops _mesa_meta_Clear() and _mesa_meta_glsl_Clear() need to
ignore the state of GL_SAMPLE_ALPHA_TO_COVERAGE,
GL_SAMPLE_ALPHA_TO_ONE, GL_SAMPLE_COVERAGE, GL_SAMPLE_COVERAGE_VALUE,
and GL_SAMPLE_COVERAGE_INVERT when clearing multisampled buffers. The
easiest way to accomplish this is to disable GL_MULTISAMPLE during the
clear meta-ops.
Note: this patch also causes GL_MULTISAMPLE to be disabled during
_mesa_meta_GenerateMipmap() and _mesa_meta_GetTexImage() (since those
two meta-ops use MESA_META_ALL). Arguably this isn't strictly
necessary, since those meta-ops use their own non-MSAA fbo's, but it
shouldn't do any harm.
Fixes Piglit tests "EXT_framebuffer_multisample/clear {2,4}
{color,stencil}" on i965.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
From the Ivy Bridge PRM, Vol 2 Part 1 p280-281 (3DSTATE_WM:
Barycentric Interpolation Mode):
"Errata: When Centroid Barycentric mode is required, HW may
produce incorrect interpolation results when a 2X2 pixels have
unlit pixels."
To work around this problem, after doing centroid interpolation, we
replace the centroid-interpolated values for unlit pixels with
non-centroid-interpolated values (which are interpolated at pixel
centers). This produces correct rendering at the expense of a slight
increase in shader execution time.
I've conditioned the workaround with a runtime flag
(brw->needs_unlit_centroid_workaround) in the hopes that we won't need
it in future chip generations.
Fixes piglit tests "EXT_framebuffer_multisample/interpolation {2,4}
{centroid-deriv,centroid-deriv-disabled}". All MSAA interpolation
tests pass now.
Reviewed-by: Eric Anholt <eric@anholt.net>
In order to compute centroid varyings correctly, the fragment shader
needs to be able to load the current pixel/sample mask into a flag
register. This patch adds an opcode to the fragment shader back-end
to do this; the opcode gets translated into the instruction
mov(1) f0<1>UW g1.14<0,1,0>UW { align1 WE_all }
Since this instruction clobbers f0, instruction scheduling has to
treat it the same as instructions that have a conditional modifier.
Reviewed-by: Eric Anholt <eric@anholt.net>
When querying GL_PRIMITIVES_GENERATED, if primitive restart
is also used, then take the software primitive restart
path so GL_PRIMITIVES_GENERATED is returned correctly.
GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN is also updated
since it will also affected by the same issue.
As noted in brw_primitive_restart.c, with further work we
should be able to move this situation back to a hardware
handled path.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Commit d73f6375f5 fixed the cause of the Piglit failure with
ARB_color_buffer_float fragment clamp modes. Now that it's fixed,
there's no reason to leave snorm format rendering disabled.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Commit 0c005bd7 intended to make ir_loop_jump::mode public, but also
accidentally added a new pointer to the enclosing loop. Furthermore, it
tried to initialize the new field by adding "this->loop = loop;" to the
constructor, but since there is no loop parameter, this only initialized
the field to itself---so it will likely be a garbage pointer.
A lot of code, such as lower_jumps, allocates new loop jumps without
setting this field appropriately, so any uses would probably just crash.
Thankfully, there were none, so we can just delete the field.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=51574
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
DrawPixels uses the MESA_META_CLAMP_FRAGMENT_COLOR flag to save/restore
the fragment color clamp mode. This is unnecessary since it never
alters it. It's also harmful: when the clamp mode is GL_FIXED_ONLY,
setting this flag causes _mesa_meta_begin to force it to GL_FALSE,
breaking clamping on SNORM formats.
DrawPixels should use the user-specified clamp mode and not change it.
Fixes Piglit's spec/ARB_color_buffer_float/GL_RGBA8_SNORM-drawpixels
test on i965/Sandybridge (with SNORM render targets re-enabled).
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Add "-f $(srcdir)/gl_API.xml" to the arguments of all
the scripts that by default look for gl_API.xml in the
working directory when run with no arguments, and prepend
$(srcdir) to those scripts that are already using an
explicit -f argument.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
tgsi_ureg was recently enhanced to support local temporaries, and as result
temps are declared individually.
This change avoids many TEMP register declarations on common shaders.
(And fixes performance regression due to mismatches against performance
sensitive shaders.)
Reviewed-by: Brian Paul <brianp@vmware.com>
The templated copy constructor doesn't prevent the compiler from
emitting a default copy constructor, which leads to inconsistent
memory handling and was reported to cause segfaults when doing event
manipulation.
Reported-by: Tom Stellard <thomas.stellard@amd.com>
The function internalizer pass marks non-kernel functions as internal,
which enables optimizations like function inlining and global dead-code
elimination.
v2:
- Pass vector arguments by const reference
Removed u_half.py used to generate the table for previous method.
Previous implementation of float to half conversion was faulty for
denormalised and NaNs and would require extra logic to fix,
thus making the speedup of using tables irrelevant.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Some parameters need to be checked only once.
check_valid_to_render needs to be called only once.
The validate function is based on the one for DrawElements.
Reviewed-by: Brian Paul <brianp@vmware.com>
This is a cleanup for ARB_transform_feedback3, where
GL_MAX_TRANSFORM_FEEDBACK_BUFFERS is introduced for interleaved attribs and
has the same meaning as GL_MAX_.._SEPARATE_ATTRIBS for separate attribs.
Also, the maximum number of TFB buffers is reduced from 32 to 4, which makes
this patch useful even without the extension.
I don't know of any hardware which can do more than 4.
Reviewed-by: Brian Paul <brianp@vmware.com>
Doesn't really change the generated assembly, but produces more compact IR,
and of course, makes code more consistent.
Reviewed-by: Brian Paul <brianp@vmware.com>
For some reason regular gcc on Linux didn't catch these but the mingw
compiler did (generated errors, not warnings).
v2: include the changes in src/mapi/ too
Fixes the es2 build with gcc.
Note: in glext.h the prototypes for glShaderSource() and glShaderSourceARB()
disagree: only the former has the extra const qualifier.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Set the step_rate value when drawing to implement
ARB_instanced_arrays for gen >= 4.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, we were counting gl_FrontFacing, gl_FragCoord and gl_PointCoord
against the limit of varying variables. This prevented some valid shaders
from linking.
The other potential solution to this is to have the driver advertise
more varying vars or set the GLSLSkipStrictMaxVaryingLimitCheck flag.
But the above-mentioned variables aren't conventional varying attributes
so it doesn't seem right to count them.
Reviewed-by: Eric Anholt <eric@anholt.net>
Updated lp_build_printf to share common code.
Removed specific lp_build_print_vecX.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Since we don't have them in hw we emulate them in the shader. Although not
recommended by the spec it is legit.
As a side effect we also get GL 2.1. I think this is as far as we can take
the i915.
The most recent commit adds support for comments and macro expansion
on #line directives. Add testing to verify the new features.
Signed-off-by: Carl Worth <cworth@cworth.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The GLSL specification requires that #line directives be interpreted
after macro expansion. Our existing implementation of #line macros in
the lexer prevents conformance on this point.
Moving the handling of #line from the lexer to the parser gives us the
macro expansion we need. An additional benefit is that the
preprocessor also now supports comments on the same line as #line
directives.
Finally, the preprocessor now emits the (fully-macro-expanded) #line
directives into the output. This allows the full GLSL compiler to also
see and interpret these directives so it can also generate correct
line numbers in error messages.
Signed-off-by: Carl Worth <cworth@cworth.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This function is currently used only in the expansion of #if lines,
but we will soon be using it more generally (for the expansion of
(_glcpp_parser_expand_and_lex_from) and some more documentation.
Signed-off-by: Carl Worth <cworth@cworth.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Commit b823b99ec0 switched from using
functions such as ralloc_asprintf and ralloc_strcat to
ralloc_asprintf_rewrite_tail. This change maintains the string's
length as a aparamter that is updated by the ralloc functions (rather
than recomputing it with strlen over and over).
However, the change failed to updated two locations (glcpp_error and
glcpp_warning), with the result that the string's length wasn't
updated by these calls. Then, subsequent calls to other
ralloc_asprintf_rewrite_tail would overwrite the text appended by
glcpp_error.
This commit fixes the two missing updates, and restores line numbers
to the output of glcpp error messages, (as noticed by a glcpp unit
test case that has been failing since the above-mentioned commit).
Signed-off-by: Carl Worth <cworth@cworth.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
A strict reading of the GLSL specification would have this be an
error, but we've received reports from users who expect the
preprocessor to interepret undefined macros as 0. This is the standard
behavior of the rpeprocessor for C, and according to these user
reports is also the behavior of other OpenGL implementations.
So here's one of those cases where we can make our users happier by
ignoring the specification. And it's hard to imagine users who really,
really want to see an error for this case.
The two affected tests cases are updated to reflect the new behavior.
Signed-off-by: Carl Worth <cworth@cworth.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
DUAL_EXPORT can be enabled on r6xx/r7xx when all CBs use 16-bit export
and there is no depth/stencil export.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
It seems DUAL_EXPORT on evergreen may be enabled when all CBs use 16-bit export
mode (EXPORT_4C_16BPC), also there should be at least one CB, and the PS
shouldn't export depth/stencil.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
In some cases TGSI shader has more color outputs than the number of CBs,
so it seems we need to limit the number of color exports. This requires
different shader variants depending on the nr_cbufs, but on the other hand
we are doing less exports, which are very costly.
v2: fix various piglit regressions
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Shader variants are stored in the list, the key for lookup is based on the
states that require different hw shaders - currently it's rctx->two_side (all
gpus) and rctx->nr_cbufs (evergreen/cayman, when writes_all property is set).
v2:
- use simple list instead of keymap as suggested by Marek on irc
- call r600_adjust_gprs from r600_bind_vs_shader for r6xx/r7xx
(r600_shader_select isn't used for vertex shaders currently)
v3:
- fix call to r600_adjust_gprs - do it after updating current shader
Improves performance for some apps, e.g. FlightGear -
see https://bugs.freedesktop.org/show_bug.cgi?id=50360
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
As with the previous commit for softpipe.
v2: remove 'default' case to get compile-time warning
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
These all return zero. Add a debug_printf() to catch the default case so
we don't accidently mishandle something important in the future.
v2: remove 'default' case to get compile-time warning
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
This is actually required for GL_ARB_framebuffer_object, but the state
tracker doesn't currently check it.
Direct3D 9 allows mixed format color buffers with some restrictions.
Setting this allows Unigine Heaven 2.5 and 3.0 to run. Tested both on
GL and D3D hosts.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
The type is the destination type (i.e. float vector) and not the
source type. Fixes piglit fs-{in,de}crement-uint.
Signed-off-by: Olivier Galibert <galibert@pobox.com>
Signed-off-by: José Fonseca <jfonseca@vmware.com>
i965 hardware needs to be informed of situations in which it's
possible for pixels (or samples) to be discarded for reasons other
than depth/stencil testing (e.g. due to an explicit "discard" in the
fragment shader). One of these situations is when
GL_ALPHA_TO_COVERAGE is enabled, since that can cause samples to be
discarded by the color calculator when the pixel's alpha value is less
than 1.0.
Without this patch, GL_ALPHA_TO_COVERAGE does not take effect on depth
buffers.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This patch enables the multisampling parameters
GL_SAMPLE_ALPHA_TO_COVERAGE and GL_SAMPLE_ALPHA_TO_ONE, which allow
the fragment shader's alpha output to be converted into a sample
coverage mask and ignored for blending. i965 supports these
parameters through the BLEND_STATE structure.
The GL spec allows, but does not require, the implementation to dither
the conversion from alpha to a sample coverage mask, so that alpha
values that aren't a multiple of 1/num_samples result in the correct
proportion of samples being lit. A bit exists in the BLEND_STATE
structure to enable this functionality, but according to the hardware
docs it must be disabled on Sandy Bridge (see the Sandy Bridge PRM,
Vol2, Part1, p379: AlphaToCoverage Dither Enable). So it is enabled
for Gen7 only.
Fixes piglit tests
"EXT_framebuffer_multisample/sample-alpha-to-{coverage,one} {2,4}".
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This patch enables glSampleCoverage() functionality, which allows the
client program to specify that only a portion of the samples be lit up
when performing multisampled rendering. i965 supports
glSampleCoverage() through the 3DSTATE_SAMPLE_MASK command packet,
which allows the driver to specify a bitfield indicating which samples
to light up.
Fixes piglit tests "EXT_framebuffer_multisample/sample-coverage {2,4}
{inverted,non-inverted}".
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Fixes gles2conform GL.equal.equal_bvec2_frag.
This fixes brw_fs_visitor's translation of ir_unop_f2b. It used CMP to
convert the float to one of 0 or ~0. However, the convention in the
compiler is that true is represented by 1, not ~0. This patch adds an AND
to convert ~0 to 1.
By inspection, a similar problem existed with ir_unop_i2b, with a similar
fix.
[v2 kayden]: eliminate extra temporary register.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=49621
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
This patch causes the fragment shader to be configured correctly (and
the correct code to be generated) for centroid interpolation. This
required two changes: brw_compute_barycentric_interp_modes() needs to
determine when centroid barycentric coordinates need to be included in
the pixel shader thread payload, and
fs_visitor::emit_general_interpolation() needs to interpolate using
the correct set of barycentric coordinates.
Fixes piglit tests "EXT_framebuffer_multisample/interpolation {2,4}
centroid-edges" on i965.
Reviewed-by: Eric Anholt <eric@anholt.net>
To save time, we only instruct the clip stage of the pipeline to
compute noperspective barycentric coordinates if those coordinates are
needed by the fragment shader. Previously, we would determine whether
the coordinates were needed by seeing whether the fragment shader used
the BRW_WM_NONPERSPECTIVE_PIXEL_BARYCENTRIC interpolation mode.
However, with MSAA, it's possible that the fragment shader might use
BRW_WM_NONPERSPECTIVE_CENTROID_BARYCENTRIC instead. In the future,
when we support ARB_sample_shading, it might use
BRW_WM_NONPERSPECTIVE_SAMPLE_BARYCENTRIC.
This patch modifies the upload_clip_state() functions to check for all
three possible noperspective interpolation modes.
Reviewed-by: Eric Anholt <eric@anholt.net>
This bitfield tells the back-ends which of a fragment shader's inputs
require centroid interpolation. It is only set for GLSL fragment
shaders, since assembly fragment shaders don't support centroid
interpolation.
Reviewed-by: Eric Anholt <eric@anholt.net>
It was only no-oping the clear() function, not actual triangle
rasterization. Move the no_rast field from lp_context down into
lp_rasterizer so it's accessible where it's needed.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Fixes this build failure on Solaris.
Compiling build/sunos-debug/glsl/glcpp/glcpp-lex.c ...
"src/glsl/glcpp/glcpp-lex.l", line 30: cannot find include file: "glcpp-parse.h"
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
$CLANG_RESOURCE_DIR is the directory that contains all resources
needed by clang to compile programs. When clover uses clang to
compile kernels it needs to specify a resource dir, so that clang
can find its internal headers (e.g. stddef.h).
clang defines $CLANG_RESOURCE_DIR as $CLANG_LIBDIR/clang/$CLANG_VERSION
This patch adds the --with-clang-libdir option in order to accommodate
clang intalls to non-standard locations, and it also adds a check
to the configure script to verify that $CLANG_RESOURCE_DIR/include
contains the necessary header files.
On i965, dFdx() and dFdy() are computed by taking advantage of the
fact that each consecutive set of 4 pixels dispatched to the fragment
shader always constitutes a contiguous 2x2 block of pixels in a fixed
arrangement known as a "sub-span". So we calculate dFdx() by taking
the difference between the values computed for the left and right
halves of the sub-span, and we calculate dFdy() by taking the
difference between the values computed for the top and bottom halves
of the sub-span.
However, there's a subtlety when FBOs are in use: since FBOs use a
coordinate system where the origin is at the upper left, and window
system framebuffers use a coordinate system where the origin is at the
lower left, the computation of dFdy() needs to be negated for FBOs.
This patch modifies the fragment shader back-ends to negate the value
of dFdy() when an FBO is in use. It also modifies the code that
populates the program key (brw_wm_populate_key() and
brw_fs_precompile()) so that they always record in the program key
whether we are rendering to an FBO or to a window system framebuffer;
this ensures that the fragment shader will get recompiled when
switching between FBO and non-FBO use.
This will result in unnecessary recompiles of fragment shaders that
don't use dFdy(). To fix that, we will need to adapt the GLSL and
NV_fragment_program front-ends to record whether or not a given shader
uses dFdy(). I plan to implement this in a future patch series; I've
left FIXME comments in the code as a reminder.
Fixes Piglit test "fbo-deriv".
NOTE: This is a candidate for stable release branches.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It's not optimal, but it's better than the register pressure scheduler
that was previously being used. The VLIW scheduler currently ignores
all the complicated instruction groups restrictions and just tries to
fill the instruction groups with as many instructions as possible.
Though, it does know enough not to put two trans only instructions in
the same group.
We are able to ignore the instruction group restrictions in the LLVM
backend, because the finalizer in r600_asm.c will fix any illegal
instruction groups the backend generates.
Enabling the VLIW scheduler improved the run time for a sha1 compute
shader by about 50%. I'm not sure what the impact will be for graphics
shaders. I tested Lightsmark with the VLIW scheduler enabled and the
framerate was about the same, but it might help apps that use really
big shaders.
The rest of the TFB implementation remains in transformfeedback.c, and
this will be shared with UBOs.
v2: Move the size/offset checks shared with UBOs to common code as
well. (Kenneth's review)
Reviewed-by: Brian Paul <brianp@vmware.com> (v1)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Fix a typo spotted by Eric Anholt.
v3: Fix missing "GL" on types, fix style, fix Studly_Caps extension name,
drop commented code duplicated with GL3x.xml [anholt]
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Our intention is still that it's not abi stable, so make the package
version number get included in the library name. Now you can parallel
install dricore-using drivers from multiple mesa versions. We can put
it into lib now that we're following library versioning rules
(assuming that ABIs don't change within a single Mesa point release).
LD_LIBRARY_PATH still doesn't work with a non-/, non-/usr prefix
because libtool uses rpath instead of runpath for nonstandard
prefixes.
The weird versioning of the libGL where the package version was sort
of expressed as a big integer is dropped. libtool didn't like the 0
prefix, and it didn't really make sense anyway -- if you interpret it
as an integer version number, old Mesa 071200 was bigger than current
Mesa 08100. Instead, just bump the minor version and drop the
patchlevel.
Except for the deleted linux-cell target, these were just the target
cc/cflags. The only usage was for gen_matypes, which wants the
target's structure packing, not the host, anyway.
Every place that uses ASM_FLAGS already uses DEFINES. Not including
it in DEFINES is just a way to screw up potential users, as I've done
several times while working on the build system.
Even pre-automake, we rely on gmake features for pattern
substitutions, and replacing those with reams more make code is not
interesting. This will let us turn the old Makefiles using pattern
substitutions into automake without spewing warnings.
Reviewed-by: Dan Nicholson <dbn.lists@gmail.com>
1) We need to insert a barrier between consecutive transform feedback calls.
2) VBO cache needs to be flushed when TFB output is used as VBO draw input.
Fixes Piglit test EXT_transform_feedback/immediate-reuse.
Thanks to Christoph Bumiller for pointing out bugs in previous versions
of this patch.
gl_ClipDistance needs special treatment in form of lowering pass
which transforms gl_ClipDistance representation from float[] to
vec4[]. There are 2 implementations - at glsl linker level (enabled
by LowerClipDistance option) and at glsl_to_tgsi level (enabled
unconditionally for gallium drivers). Second implementation is
incomplete - it does not take into account transform feedback (see
commit 642e5b413e "mesa: Fix transform
feedback of unsubscripted gl_ClipDistance array" for details).
There are 2 possible fixes:
- adding transform feedback support into glsl_to_tgsi version
- ripping gl_ClipDistance support from glsl_to_tgsi and enabling
gl_ClipDistance lowering on glsl linker side
This patch implements 2nd option. All it does is:
- reverts most of the commit 59be691638
"st/mesa: add support for gl_ClipDistance"
- changes LowerClipDistance to true
Fixes Piglit tests "EXT_transform_feedback/builtin-varyings
gl_ClipDistance[{2,3,4,5,6,7,8}]-no-subscript" at least on nv50
and evergreen cards.
From the GL 3.0 spec (p.116):
"Multisample rasterization is enabled or disabled by calling
Enable or Disable with the symbolic constant MULTISAMPLE."
Elsewhere in the spec, where multisample rasterization is described
(sections 3.4.3, 3.5.4, and 3.6.6), the following text is consistently
used:
"If MULTISAMPLE is enabled, and the value of SAMPLE_BUFFERS is
one, then..."
So, in other words, disabling GL_MULTISAMPLE should prevent
multisample rasterization from occurring, even if the draw framebuffer
is multisampled. This patch implements that behaviour by setting the
WM and SF stage's "multisample rasterization mode" to
MSRAST_ON_PATTERN only when the draw framebuffer is multisampled *and*
GL_MULTISAMPLE is enabled.
Fixes piglit test spec/EXT_framebuffer_multisample/enable-flag.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Due to hardware limitations, MSAA is unsupported on Gen6 for formats
containing >64 bits of data per pixel. From the Sandy Bridge PRM,
vol4 part1, p72 ("Surface Format"):
If Number of Multisamples is set to a value other than
MULTISAMPLECOUNT_1, this field cannot be set to the following
formats:
- any format with greater than 64 bits per element
- any compressed texture format (BC*)
- any YCRCB* format
Gen7 has a similar, but less stringent limitation: formats with >64
bits of data per pixel only support 4x MSAA.
This patch causes the unsupported formats to report
GL_FRAMEBUFFER_UNSUPPORTED.
Fixes piglit "multisample-formats" tests on Gen6.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Sandy Bridge and later don't use this field, so there's no point in
setting it. It can only cause harmful state-based recompiles.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The system array values concept doesn't really because it expects the
system values to be fixed per call, which is wrong for gl_VertexID and
iffy for gl_SampleID. So this patch does two things:
- kill the array, have emit_fetch_system_value directly pick the
values it needs (only gl_InstanceID for now, as the previous code)
- correctly handle the expected type in emit_fetch_system_value
Signed-off-by: Olivier Galibert <galibert@pobox.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This includes:
- picking up correctly which attributes are flatshaded and which are
noperspective
- copying the flatshaded attributes when needed, including the
non-built-in ones
- correctly interpolating the noperspective attributes in screen-space
instead than in a 3d-correct fashion.
Signed-off-by: Olivier Galibert <galibert@pobox.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
z or stencil texture should not be created with the z/stencil
flags for surface creation as they are intended to be bound
as texture.
v2: remove broken code
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Solaris Studio C compiler does not support anonymous structs and
anonymous unions.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
The idea here is to rewrite comparisons like 2 >= x with x <= 2; we want
to simply exchange arguments, not negate the condition. If equality was
part of the original comparison, it should remain part of the swapped
version.
This is the true cause of bug #50298. It didn't manifest itself on
Sandybridge because we embed the conditional modifier in the IF
instruction rather than emitting a CMP. All other platforms use CMP.
It also didn't manifest itself on the master branch because commit
be5f27a84d ("glsl: Refine the loop instruction counting.") papered over
the problem.
NOTE: This is a candidate for stable release branches.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=50298
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Fixes build error on Cygwin and Solaris. _R, _G, and _B are used in
ctype.h on those platforms.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
This fixes a bug where a sampler view was using stale texture/resource
data when the texture was modified through a surface (render to texture).
Bumping the texture and layer ages triggers sampler view revalidation.
Fixes piglit fbo-blit failure.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
This lets us select the front buffer for reading under GLES2.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This extra condition checks the API not the version of the API, so rename
to reflect that.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is failing sometimes, probably because TargetData keeps a structure layout
cache, which can becomes bogus, ever since the InvalidateStructLayoutInfo API
was removed in LLVM r135245.
This change merely makes the problem easier to diagnose (an assertion
failure instead of a random crash).
instead of failing to allocate a renderbuffer.
This also fixes piglit/get-renderbuffer-internalformat with non-renderable
formats.
Reviewed-by: Brian Paul <brianp@vmware.com>
This allows drivers not to do any allocation in AllocStorage if the storage
cannot be allocated because of an unsupported internalformat + samples combo.
The little ugliness is that AllocStorage is expected to return TRUE in this
case.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This requires the latest streamout kernel patches.
Streamout is disabled by default on r7xx, so this patch is safe for regular
users.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Note: for the moment TGSI_OPCODE_F2U is implemented using
lp_build_itrunc() (the same function used to implement
TGSI_OPCODE_F2I). In the long run, we should create an
lp_build_utrunc() function to do the proper conversion. But this
should allow us to limp along with mostly correct behaviour for now.
Previously, we performed conversions from float->uint by a two step
process: float->int->uint. However, on platforms that use saturating
conversions (e.g. i965), this didn't work, because if the source value
was larger than the maximum representable int (0x7fffffff), then
converting it to an int would clamp it to 0x7fffffff.
This patch just adds the new opcode; further patches will adapt
optimization passes and back-ends to use it, and then finally the
ast_to_hir logic will be modified to emit the new opcode.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch modifies blorp blits (which are used for MSAA) to properly
account for clipping of source coordinates. Previously, if we
detected the possibility of source clipping, we would fall back to the
blit meta-op, which doesn't support MSAA and is very slow for depth
and stencil buffers.
Fixes piglit tests
"EXT_framebuffer_multisample/clip-and-scissor-blit" on i965/Gen6+.
Also substantially speeds up the Humble Bundle V game "Psychonauts" on
Gen6+ (without this patch, the game's depth buffer blits use the slow
blit meta-op).
Reviewed-by: Carl Worth <cworth@cworth.org>
This allows to submit things to the compute only
rings on cayman+
v2: rebased on current master and actually make use
of the new flag in evergreen_compute.c
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
When drawing a depth image the fragment shader also needs to emit the
current raster color.
The new piglit drawpix-z test exercises this.
NOTE: This is a candiate for the 8.0 branch.
This patch updates .gitignore files to account for the new build
artifacts introduced by the following commits:
ae376f0 glx/tests: Rename test as glx-test
8fecdcc mesa/tests: Add tests for _mesa_lookup_enum_by_{name,nr} functions
a29ad2b mesa/tests: Add tests for the generated dispatch table
Haiku targets the Pentium or higher processor.
To ensure compatibility we can do march 586 and
mtune 686. Mesa will still use sse however if
the cpu supports it (and the stack is properly
aligned). These flags only effect the internal
compiler optimizations.
Previously, rbug_*.c would fail to compile with incomplete prototype
errors when make was run from the command line on my machine. My IDE
always built fine, and still does after this patch (Netbeans 7.1.2).
Most of the includes from files in gallium/auxiliary/rbug/* were
assuming an rbug/ subdirectory, while the headers are actually in the
same directory as the .c files.
The build error was also previously a problem for me on Ubuntu 11.10
and Mint 12.
Fixes build for the following configuration: ./autogen.sh
--enable-debug --enable-texture-float --with-gallium-drivers=r600
--with-dri-drivers=radeon --enable-r600-llvm-compiler
Signed-off-by: Brian Paul <brianp@vmware.com>
In single precision, 1.5707963 becomes 1.5707962513 which is too
small. However, 1.5707964 becomes 1.5707963705 which is just right.
The value 1.5707964 is already used in asin.ir.
NOTE: This is a candidate for stable release branches.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Olivier Galibert <galibert@pobox.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
There is no GLX protocol for these functions. Open-source Linux
driver have not supported this extension for many years, and it seems
unlikely at this point that this support will return. There's no
reason to have slots for these functions in the dispatch table.
The unit tests (GetProcAddress::TableDidntShrink and others) are also updated.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
There is no GLX protocol for these functions. No open-source Linux
driver has ever supported this extension, and it seems unlikely at
this point that one ever will. There's no reason to have slots for
these functions in the dispatch table.
The unit tests (GetProcAddress::TableDidntShrink and others) are also updated.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
There is no GLX protocol for these functions. No open-source Linux
driver has ever supported this extension, and it seems unlikely at
this point that one ever will. There's no reason to have slots for
these functions in the dispatch table.
The unit tests (GetProcAddress::TableDidntShrink and others) are also updated.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
There is no GLX protocol for these functions, and no Linux driver has
ever supported this extension. There's no reason to have slots for
these functions in the dispatch table.
The unit tests (GetProcAddress::TableDidntShrink and others) are also updated.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
There is no GLX protocol for this function. Open-source Linux driver
have not supported this extension for many years, and it seems
unlikely at this point that this support will return. There's no
reason to have slots for this function in the dispatch table.
The unit tests (GetProcAddress::TableDidntShrink and others) are also updated.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
There is no GLX protocol for these functions, and no Linux driver has
ever supported this extension. There's no reason to have slots for
these functions in the dispatch table.
The unit tests (GetProcAddress::TableDidntShrink and others) are also updated.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
These are from OpenGL 3.1 and ARB_uniform_buffer_object. I only added
them to 3.1 because that required the least work.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
These are from OpenGL 3.3, ARB_texture_swizzle, and
EXT_texture_swizzle (with different names). I only added them to 3.3
because that required the least work.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Determines whether it's a basis vector, i.e., a vector with one element
equal to 1 and all other elements equal to 0.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When a value was replaced, the new key was strdup'd and leaked.
To fix this, we modify the hash table implementation to return
whether the value was replaced and free() the (now useless)
duplicate string.
When we have multiple shared contexts, and one of them is
long-running, this will lead to never freeing those resources
since they are shared. Instead, free them right away on context
destruction since we know the other context isn't using them.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
NOTE: This is a candidate for the 8.0 branch.
From the GL_NV_primitive_restart spec:
"PrimitiveRestartIndexNV is not compiled into display lists, but is
executed immediately."
Prior to this patch, calls to glPrimitiveRestartIndex would hit the noop
dispatch stub.
+2 oglconforms.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
From the GL_ARB_copy_buffer spec:
"An INVALID_VALUE error is generated if any of readoffset, writeoffset,
or size are negative [...]"
Fixes oglconform's copybuffer/negative.CNNegativeValues test.
NOTE: This is a candidate for stable release branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The warnings appear to occur with newer automake (probably 1.12).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
These allow one to mangle the library names, without also mangling the
symbol names, to make them distinct from other GL libraries on the
system.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Because these classes are used entirely from their own source files
and not from separate DSOs, the linker gets to produce massively less
code. This cuts about 13k of text in the libdricore case. In the
non-libdricore case, the additional linkage information allows the
compiler to inline some code, so libglsl.a size actually increases by
about 300 bytes.
For a dricore build, improves shader_runner runtime on
glsl-fs-copy-propagation-texcoords-1 by 0.21% +/- 0.03% (n=353574,
outliers removed). No statistically significant difference with n=322
on glslparsertest on a yofrankie shader intended to test compiler
performance.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now we have just one library of "all of Mesa core" instead of both
libdricore and libglsl that drivers link against.
I did this change in a sort of nonrecursive make fashion: the
generated files are still produced in the non-automake build, like the
rest of dricore, but the GLSL files are stuffed into libdricore
without building a convenience library in src/glsl (even though we
could now). This would make a bit more sense if glsl was just another
dir under src/mesa, because right now I had to contort the prefix
variable name to look another ../ level up.
This is part of a series to fix our build issues in the automake case
by hooking up the automatic Makefile regeneration support. The
extract_git_sha1 is moved into src/mesa/Makefile so that we get
correct dependency generation.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I tried to update all the old Makefiles that included the default
config to be sure they had a default target if they didn't previously
have one, since this new all target will always point at it. Almost
everything had one.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Some more of the files are now autogenerated, this caused build breakage,
patch adds generation of these missing files. Patch also changes existing
make so that the files are created to be part of the local source
(not intermediate directory, this causes several problems).
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
This patch fixes a copy/paste error and masking of depth/stencil (stencil
is in the top 8 bits), and makes glean/readPixSanity happy.
Both the stencil and the depth buffer piglit test also pass if
glClear(DEPTH | STENCIL) is executed instead of
glClear(DEPTH)/glClear(STENCIL).
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Tested-by: Christopher Egert <cme3000@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
remove archaic .cvsignore
*.pyo is already in toplevel .gitignore
*.pyc is already in toplevel .gitignore
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, blits using the "blorp" mechanism only worked for 8-bit
RGBA color buffers, 24-bit depth buffers, and 8 bit stencil buffers.
This was not enough, because the blorp mechanism must be used for
blitting whenever MSAA is in use. This patch allows all formats to be
used, provided the source and destination formats match.
So far I have confirmed that the following formats work properly with
MSAA:
- GL_RGB
- GL_RGBA
- GL_ALPHA
- GL_ALPHA4
- GL_ALPHA8
- GL_R3_G3_B2
- GL_RGB4
- GL_RGB5
- GL_RGB8
- GL_RGB10
- GL_RGB12
- GL_RGB16
- GL_RGBA2
- GL_RGBA4
- GL_RGB5_A1
- GL_RGBA8
- GL_RGB10_A2
- GL_RGBA12
- GL_RGBA16
Fixes piglit tests "EXT_framebuffer_multisample/formats {2,4}" on
Sandy Bridge and Ivy Bridge.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously the blorp engine only supported RGBA8 color buffers and
24-bit depth buffers. This patch adds support for any color buffer
format that is supported as a render target, and for 16-bit and 32-bit
depth buffers.
This required threading the brw_context struct through into
brw_blorp_surface_info::set() so that it can consult the
brw->render_target_format array.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Even though brw_blorp_surface_info is derived from brw_blorp_mip_info,
this function doesn't need to be virtual, because it is never accessed
through a base class pointer. Making the function non-virtual will
allow it to take additional parameters in the brw_blorp_surface_info
case.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch moves the responsibility for deciding on the format of the
source and destination surfaces from the
gen{6,7}_blorp_emit_surface_state() functions to
brw_blorp_surface_info::set(), which is shared between Gen6 and Gen7.
This will make it possible to add support for more surface formats
without code duplication.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
TGSI doesn't need an opcode, since registers are untyped (but beware
once doubles come into the scene). Mesa IR doesn't handle native
integers, so trying to handle them there is worthless, the case
entries are only added for warning reasons.
It was only tested with softpipe, since llvmpipe doesn't support glsl
1.3 yet.
Signed-off-by: Olivier Galibert <galibert@pobox.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
That adds support for activating the extension. It doesn't actually
*do* anything yet, of course.
Signed-off-by: Olivier Galibert <galibert@pobox.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
From the issues section of the GL_ARB_texture_compression_rgtc extension:
15) What should glGetTexLevelParameter return for
GL_TEXTURE_GREEN_SIZE and GL_TEXTURE_BLUE_SIZE for the RGTC1
formats? What should glGetTexLevelParameter return for
GL_TEXTURE_BLUE_SIZE for the RGTC2 formats?
RESOLVED: Zero bits.
These formats always return 0.0 for these respective components
and have no bits devoted to these components.
Returning 8 bits for red size of RGTC1 and the red and green
sizes of RGTC2 makes sense because that's the maximum potential
precision for the uncompressed texels.
Thus, we need to return 8 bits for GL_TEXTURE_RED_SIZE on all RGTC formats
and 8 bits for GL_TEXTURE_GREEN_SIZE on RGTC2 formats. BLUE should be 0.
Fixes oglconform/rgtc/advanced.texture_fetch.tex_param.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
While ~loop_state() is already freeing the loop_variable_state objects
via ralloc_free(this->mem_ctx), the ~loop_variable_state() destructor
was never getting called, so the hash table inside loop_variable_state
was never getting destroyed.
Fixes a memory leak in any shader with loops.
NOTE: This is a candidate for stable release branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
The functions for handling 1D, 2D and 3D texture images were nearly
identical. This folds them all together.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We can't remove this pass yet, because we need it to convert AMDIL
registers in BRANCH* instructions, but we don't need it for
instruction conversion any more.
OpenGL allows you to declare user-defined fragment shader outputs with
less than four components:
out ivec2 color;
This makes sense if you're rendering to an RG format render target.
Previously, we assumed that all color outputs had four components (like
the built-in gl_FragColor/gl_FragData variables). This caused us to
call emit_color_write for invalid indices, incrementing the output
virtual GRF's reg_offset beyond the size of the register.
This caused cascading failures: split_virtual_grfs would allocate new
size-1 registers based on the virtual GRF size, but then proceed to
rewrite the out-of-bounds accesses assuming that it had allocated enough
new (contiguously numbered) registers. This resulted in instructions
that accessed size-1 GRFs which register numbers beyond
virtual_grf_next (i.e. registers that were never allocated).
Finally, this manifested as live variable analysis and instruction
scheduling accessing their temporary array with an out of bounds index
(as they're all sized based on virtual_grf_next), and the program would
segfault.
It looks like the hardware's Render Target Write message requires you to
send four components, even for RT formats such as RG or RGB. This patch
continues to use all four MRFs, but doesn't bother to fill any data for
the last few, which should be unused.
+2 oglconforms.
NOTE: This is a candidate for stable release branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Commit 4650aea7a5 fixed texelFetchOffset()
on Ivybridge, but didn't update the Ironlake/Sandybridge code.
+18 piglits on Sandybridge.
NOTE: This and 4650aea7a5 are both candidates for stable branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Commit f41ecade7b fixed texelFetchOffset()
on Ivybridge, but didn't update the Ironlake/Sandybridge code.
+15 piglits on Sandybridge.
NOTE: This and f41ecade7b are both candidates for stable branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
This isn't saved/restored by _mesa_meta_begin, so we need to do it
manually (like we do for the read/draw framebuffers). Additionally,
we neglected to re-bind before the glRenderbufferStorage call.
+13 oglconforms.
NOTE: This is a candidate for stable release branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
DeleteBuffer needs to unbind from these binding points as well, based on
the same rationale as the previous patch.
+51 oglconforms (together with the last patch).
NOTE: This is a candidate for stable release branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
_mesa_lookup_bufferobj returns NULL for 0, which caused us to say
"there's no such buffer object" and raise an error, rather than
correctly binding the shared NullBufferObj.
Now you can unbind your buffers.
NOTE: This is a candidate for stable release branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
According to the GL 3.1 spec, section 2.9 ("Buffer Objects"):
"If a buffer object is deleted while it is bound, all bindings to that
object in the current context (i.e. in the thread that called
DeleteBuffers) are reset to zero."
The code already checked for a number of cases, but neglected these
newer binding points.
+21 oglconforms.
NOTE: This is a candidate for stable release branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
We were incorrectly assuming that the coordinate's dimensionality is
equal to the gradient's dimensionality. For array types, the coordinate
has one more component.
Fixes 12 subcases of oglconform's glsl-bif-tex-grad test.
NOTE: This is a candidate for stable release branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Currently, if you pass --with-egl-platforms=x11 but xcb-dri2 isn't available
we just silently fail and disables building the EGL DRI2 driver.
This commit cleans up the EGL platfrom checking and fails if a selected
platform can't find its required dependencies.
Reviewed-by: Eric Anholt <eric@anholt.net>
Commit a07cf3397e added support for TBOs
on Gen7, but missed Gen6.
Passes piglit -t texture_buffer and oglconform's buffermapping
basic.read.texture tests.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
According to Table 6.17 in the GL 2.1 specification, DEPTH_TEXTURE_MODE,
TEXTURE_COMPARE_MODE, and TEXTURE_COMPARE_FUNC need to be restored on
glPopAttrib(GL_TEXTURE_BIT).
Makes a number of oglconform tests happier.
v2: Make restoration conditional on the ARB_shadow and ARB_depth_texture
extensions, as suggested by Brian. I'm not sure that any
implementations still remain that don't support those, but why not?
NOTE: This is a candidate for stable release branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
this fixes libdricore directory build with --enable-32-bit on a x86_64 system
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
The VTX_READ instructions were using the ADDRParam ComplexPattern which
allows a load instruction's offset to be a register, but VTX_READ
instructions can only handle an immediate offset.
Also, the load_param pattern fragment had an erroneous return true;
statement that was causing it to match the wrong load instructions.
Tungsten Graphics has not existed for several years, and the majority of
ongoing development and support is done by Intel. I chose to include
"Open Source Technology Center" to distinguish it from, say, the closed
source Windows OpenGL driver.
The one downside to this patch is that applications that pattern match
against "Intel" may start applying workarounds meant for the Windows
driver. However, it does seem like the right thing to do.
This does change oglconform behavior.
Acked-by: Eric Anholt <eric@anholt.net>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Acked-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
These look like debug messages from the switch-statement development.
NOTE: This is a candidate for the 8.0 release branch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Tom Stellard:
- Updated for gallium interface changes
- Fixed a few bugs:
+ Set the loop counter
+ Calculate the correct number of pipes
- Added hooks into the LLVM compiler
v2:
-Separate IR type and LLVM triple
-Do the OpenCL C->LLVM IR and linking steps for all PIPE_SHADER_IR
types.
v3:
- Coding style fixes
- Removed compatibility code for LLVM < 3.1
- Split build_module_llvm() into three functions:
compile(), link(), and build_module_llvm()
v4:
- Use struct pipe_compute_program
v5:
- Don't malloc memory for struct pipe_llvm_program
v6:
- Fix serialization of llvm bytecode
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
This structure is used as a header that precedes LLVM bytecode programs
that are passed to the drivers.
v2:
- s/pipe_compute_program/pipe_llvm_program/
v3:
- Rename to struct pipe_llvm_program_header
- Drop the char * prog member
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
This is for the llvm code that can't use extended initializers.
v2:
- Use const references for vector arguments
- Move constructor defs before data members
- Initialize all values in the default constructors
v3:
- Fix typo
A device now has two function for getting information about the IR
it needs to return.
ir_format() => returns the preferred IR
ir_target() => returns the triple for the target that is understood by
clang/llvm.
v2:
- renamed ir_target() to ir_format()
- renamed llvm_triple() to ir_target()
v3:
- Remove unnecessary include
- Do proper conversion from std::vector<char> to std::string
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
v2: Tom Stellard
- Update CAP description
v3: Tom Stellard
- TGSI targets should pass an empty string for this CAP.
v4: Tom Stellard
- TGSI targets can ignore this CAP.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
TEX instructions can't do saturation. Do the TEX into a temp reg w/out
saturation, then do a MOV_SAT.
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
Some distributions (like Arch Linux) make /usr/bin/python Python 3,
rather than Python 2. Since compare_ir uses /usr/bin/env python,
such systems will fail to run optimization-test, causing 'make check' to
always fail.
Automake's TESTS_ENVIRONMENT variable provides a mechanism to run
programs or set environment variables in the test environment.
Ideally, I think we would want to use AM_TESTS_ENVIRONMENT, since
TESTS_ENVIRONMENT is supposed to be user-overridable. However, it isn't
supported using the default/serial test runner.
Fixes 'make check' on Arch Linux and Gentoo.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Matt Turner <mattst88@gmail.com>
I started writing unit tests for a new piece of code, and discovered
they all failed due to a bug in ralloc. Clearly it needs a test suite.
v2: Rename to 'ralloc-test' and fix copyright date. (idr review)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
If an object is allocated out of the NULL context, info->parent will be
NULL. Using the PTR_FROM_HEADER macro would be incorrect: it would say
that ralloc_parent(ralloc_context(NULL)) == sizeof(ralloc_header).
Fixes the new "null_parent" unit test.
NOTE: This is a candidate for the 7.9, 7.10, 7.11, and 8.0 branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Discovered while running the Khronos conformance test suite and
receiving "implementation error: meta program compile failed."
This bug was recently introduced by the i965 clear patch set and would
only be detected while using the ES2 API and only on gen6+ hardware.
Signed-off-by: Oliver McFadden <oliver.mcfadden@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is performed in a subdirectory to avoid needing to convert all of
src/mesa/Makefile in one go.
I can now cherry-pick a commit containing glapi XML changes, do "(cd
src/mapi/glapi/gen && make) && make", and get a working driver.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In order to do the minimal change for libdricore conversion to
automake, I need to put its Makefile.am in a subdirectory. Automake
gets whiny/broken if you use GNU make features like "addprefix" or
"$(FILES:%=../%)" to munge your *_SOURCES. So, use a plain old
variable to be able to substitute in that "../"
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*_SOURCES is reserved for files lists for particular automake targets.
Also, "-" in the variable names is not allowed.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This variable won't be set when called from non-automake makefiles,
but it cleans up shared-glapi's output.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Mesa already always depends on python to build. The checked in
changes are not reviewed (because any trivial change rewrites the
world). We also have been pushing commits between xml change and
regen where at-build-time xml-generated code disagrees with committed
xml-generated code. And worst of all, sometimes we ("I") check in
*stale* xml-generated code.
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
commit 87f12bb2d9 tried to fix rb->mt
being NULL, but change this case wrong.
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Kurt Roeckx <kurt@roeckx.be>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We now model loading uses sgpr values with LLVM IR load instructions that
use the USER_SGPR address space.
The definition of the sgpr parameter to the use_sgpr() helper function
in radeonsi_shader.c has changed so that you can pass raw sgpr values
rather than having to divide the sgpr value you want to use by the dword
width of the type you want to load.
This function was causing compile errors in the tablegen'd code for
some intrinsic definitions. I don't think we really need this function,
so I'm removing the function body just as a temporary solution. I'll
look into removing the entire AMDILIntrinsicInfo class later.
v2: use a define for the maximum sample count
v3: also test odd sample counts (r300 supports MS3)
While multisample renderbuffers are supported by mesa, MS visuals
are not, so we need a way to tell dri/st not to advertise them even
if the gallium driver does support multisampled surfaces.
Otherwise applications selecting these non-functional visuals would
run into trouble ...
Reviewed-by: Brian Paul <brianp@vmware.com>
The code which scans the index buffer for restart indexes wasn't adding
the index buffer offset so we were always starting at offset=0. The
offset is usually zero so it wasn't noticed before.
Fixes a failure in the piglit primitive-restart test when testing
vertex data + index data in a single VBO.
NOTE: This is a candidate for the 8.0 branch.
Basic 4x MSAA support now works on Gen7. This patch enables it.
As with Gen6, MSAA support is still fairly preliminary. In
particular, the following are not yet supported:
- 8x oversampling (Gen7 has hardware support for this, but we do not
yet expose it).
- Fully general blits between MSAA and non-MSAA buffers.
- Formats other than RGBA8, DEPTH24, and STENCIL8.
- Centrold interpolation.
- Coverage parameters (glSampleCoverage, GL_SAMPLE_ALPHA_TO_COVERAGE,
GL_SAMPLE_ALPHA_TO_ONE, GL_SAMPLE_COVERAGE, GL_SAMPLE_COVERAGE_VALUE,
GL_SAMPLE_COVERAGE_INVERT).
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
On Gen6, the blending necessary to blit an MSAA surface to a non-MSAA
surface could be accomplished with a single texturing operation. On
Gen7, the WM program must fetch each sample and blend them together
manually. From the Bspec (Shared Functions/Messages/Initiating
Message/Message Types/sample):
[DevIVB+]:Number of Multisamples on the associated surface must be
MULTISAMPLECOUNT_1.
This patch implements the manual blend operation.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Since blorp uses color textures and render targets to do all its work
(even when blitting stencil and depth data), it always has to
configure the Gen7 GPU to use the new "sliced" MSAA layout. However,
when blitting stencil or depth data, the actual MSAA layout is
interleaved (as in Gen6). Therefore, blorp has to do extra coordinate
transformation work to account for the interleaving manually.
This patch causes blorp to perform the necessary extra coordinate
transformations.
It also modifies the blorp SURFACE_STATE setup code for Gen7, so that
it does not try to correct the surface width and height to account for
MSAA, since "sliced" MSAA layout doesn't affect the surface width or
height.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
When a Gen7 SURFACE_STATE is configured for MSAA, a number of
additional constaints come in to play. This patch adds a function
gen7_check_surface_setup() which verifies that all of those
constraints are met.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Starting in Gen7, there are two possible layouts for MSAA surfaces:
- Interleaved, in which additional samples are accommodated by scaling
up the width and height of the surface. This is the only layout
available in Gen6. On Gen7 it is used for depth and stencil
surfaces only.
- Sliced, in which the surface is stored as a 2D array, with array
slice n containing all pixel data for sample n. On Gen7 this layout
is used for color surfaces.
The "Sliced" layout has an additional requirement: it must be used in
ARYSPC_LOD0 mode, which means that the surface doesn't leave any extra
room between array slices for miplevels other than 0.
This patch modifies the surface allocation functions to use the
correct layout when allocating MSAA surfaces in Gen7, and to set the
array offsets properly when using ARYSPC_LOD0 mode. It also modifies
the code that populates SURFACE_STATE structures to ensure that
ARYSPC_LOD0 mode is selected in the appropriate circumstances.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Gen7 support for blorp (blits using the render bath) now works for
non-MSAA purposes. This patch enables it.
Since blorp operations re-use the logic for HiZ ops, this required
adding a case to the switch statement in gen7_blorp_emit_wm_config(),
to allow for the case where no HiZ op is being performed.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
On Gen6, texel fetch is always accomplished using the SAMPLE_LD
message, which accepts arguments (u, v, r, lod, si). On Gen7, there
are two* texel fetch messages: SAMPLE_LD for non-MSAA surfaces, taking
arguments (u, lod, v), and SAMPLE_LD2DSS for MSAA surfaces, taking
arguments (si, u, v).
*Technically, there are other texel fetch messages, but they are used
for "compressed" MSAA surfaces, which we don't yet support.
This patch adds the proper message types and argument orderings for
Gen7.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Gen7 hardware requires us to enable at least one WM dispatch mode,
even if there is no program being dispatched to. When this code was
only used for HiZ operations (which don't use a WM program), we used
32-pixel dispatch, because it didn't matter. But blit programs are
compiled for 16-pixel dispatch. So just enable 16-wide dispatch
unconditionally.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Enable 16-wide dispatch unconditionally rather than add the
unnecessary complication of using 32-wide dispatch when there is no WM
program.
On Gen7, push constants for shader programs are stored in the URB, so
blorp code needs to set aside space for them. This was previously
unnecessary because blorp code was based on HiZ operations, which
don't require any shaders.
This patch adds a call from gen7_blorp_exec() to
gen7_allocate_push_constants(), to ensure that push constants are
assigned the correct location in the URB. It also extracts a new
function gen7_emit_urb_state() from gen7_upload_urb(), which is
re-used by gen7_blorp_emit_urb_config() to ensure that the URB regions
used by all the pipeline stages leave room for the push constants.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We know from previous bug fixes (commits
c25e5300cb and
b2ace06cbb) that texture border color
doesn't work if the dynamic state upper bound is set to 0. Although
the blorp engine doesn't make use of texture borders, it seems like we
ought to err on the safe side and set this value properly.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch separates out the portions of gen6_blorp_emit_batch_head()
that emit 3DSTATE_MULTISAMPLE, 3DSTATE_SAMPLE_MASK, and
STATE_BASE_ADDRESS. This paves the way for making the blorp code work
on Gen7, where additional command packets
(3DSTATE_PUSH_CONSTANT_ALLOC_VS and 3DSTATE_PUSH_CONSTANT_ALLOC_PS)
need to be emitted before 3DSTATE_MULTISAMPLE.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch modifies the "blorp" WM program so that it can be run in
MSDISPMODE_PERSAMPLE (which means that every single sample of a
multisampled render target is dispatched to the WM program, not just
every pixel).
Previously we were using the ugly hack of configuring multisampled
destination surfaces as single-sampled, and generating sample indices
other than zero by swizzling the pixel coordinates in the WM program.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This patch modifies the function brw_blorp_blit_program::texel_fetch()
to emit the SI (sample index) argument to the SAMPLE_LD message when
reading from a sample index other than zero.
Previously we were using the ugly hack of configuring multisampled
source surfaces as single-sampled, and accessing sample indices other
than zero by swizzling the texture coordinates in the WM program.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch generalizes the function
brw_blorp_blit_program::texture_lookup() so that it prepares the
arguments to the sampler message based on a caller-provided array
rather than assuming the argument order is always (u, v).
This paves the way for the messages we will need to use in Gen7, which
use argument orders (u, lod, v) and (si, u, v) (si=sample index).
It will also will allow us to read from arbitrary sample indices on
Gen6, by supplying the arguments (u, v, r, lod, si) to the SAMPLE_LD
message instead of just (u, v).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Gen6 MSAA buffers (and Gen7 MSAA depth/stencil buffers) interleave
MSAA samples in a complex pattern that repeats every 2x2 pixel block.
Therefore, when allocating an MSAA buffer, we need to make sure to
allocate an integer number of 2x2 blocks; if we don't, then some of
the samples in the last row and column will be cut off.
Fixes piglit tests "EXT_framebuffer_multisample/unaligned-blit {2,4}
color msaa" on i965/Gen6.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Without passing the -ldflags parameter before $(LDFLAGS) in some cases
flags will be passed to MKLIB which it does not understand.
This might be -m64, -m32 or similar.
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Thomas Gstädtner <thomas@gstaedtner.net>
Signed-off-by: Brian Paul <brianp@vmware.com>
This patch gets the FreeBSD SCons build working again. The build still
fails though.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
We need to return immediately after inserting instructions that require
S_WAITCNT so that the parent class' custom inserter won't try to insert
them again.
Fix uninitialized scalar variable defects report by Coverity.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
We should just set the bits of functionality that we support; the
GL/ES1/ES2 flags in extensions.c will take care of advertising the
appropriate extensions for the current API.
This enables the GL_EXT_texture_compression_dxt1 extension on ES1/ES2
when libtxc_dxtn is installed or the force_s3tc driconf option is set.
The main extension code set this up properly, but the ES-specific code
failed to do so.
Otherwise, the extension strings reported by es1_info, es2_info, and
glxinfo all remain the same.
This patch manually disables the ARB_framebuffer_object bit on ES
to preserve the behavior of 1c0f5d8324.
v2: Rebase, fix the i915 Makefile, and unconditionally set the
OES_draw_texture bit as core Mesa will only apply it to ES1 now.
Tested-by: Daniel Charles <daniel.charles@intel.com> [v1]
Reviewed-by: Chad Versace <chad.versace@linux.intel.com> [v1]
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
If the primitive restart index and the primitive type can
be handled by the cut index feature, then use the hardware
to handle the primitive restart feature.
The VBO module's software handling of primitive restart is
used as a fall back.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
For newer hardware we disable the VBO module's software handling
of primitive restart. We now handle primitive restarts in
brw_handle_primitive_restart.
The initial version of brw_handle_primitive_restart simply calls
vbo_sw_primitive_restart, and therefore still uses the VBO
module software primitive restart support.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
When considering which components of a variable were killed by an
assignment, constant propagation would previously just use the write
mask of the assignment. This worked if the LHS of the assignment was
simple, e.g.:
v.xy = ...; // (assign (xy) (var_ref v) ...)
But it did the wrong thing if the LHS of the assignment involved an
array indexing operator, since in this case the write mask is always
(x):
v[i] = ...; // (assign (x) (deref_array (var_ref v) (var_ref i)) ...)
In general, we can't predict which vector component will be selected
by array indexing, so the only safe thing to do in this case is to
kill the entire variable.
Fixes piglit tests {fs,vs}-vector-indexing-kills-all-channels.shader_test.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Now that the linker handles initializers of samplers just like any
other uniform, a bunch of this annoying code is unnecessary.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The linker may have set initial values for uniforms. Propagate these
values to the driver's backing storage when it is first associated.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Fix handling of arrays-of-structure. Thanks to Eric Anholt for
pointing this out.
v3: Minor comment change based on feedback from Ken.
Fixes piglit glsl-1.20/execution/uniform-initializer/fs-structure-array
and glsl-1.20/execution/uniform-initializer/vs-structure-array.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Add support for gen6, and don't turn it on if blending is
disabled. (fixes GPU hang), and note it in docs/GL3.txt
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The i965 driver needed this as well for hardware setup, so instead of
duplicating the logic, just save it off.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
While it doesn't have the same warning in the simulator as in gen7,
let's emit it out of paranoia. We wouldn't want our resolves of some
previous clear to get clamped to some current clamping value.
Suggested-by: pretty much everyone
When doing fast clears, a fulsim warning said that the batch was being
emitted without the viewport set up. While the fast clear pass I was
looking at doesn't use the clear value, the later resolves which also
didn't set up the vieport would trigger the same. It's not obvious
from the error message whether it meant "fast clear value gets clamped
to something you haven't defined" or "fast clear value doesn't get
clamped, and I saw it was out of the current (uninitialized) range,
and you probably wanted it clamped to that (uninitialized) range". Be
paranoid and assume the first case.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Having this enum separate caused us to need a bunch of helper
functions to translate to the op to be executed.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
The GLSL clear path doesn't need any buffer presence checks, since
those are already handled in the normal drawing path code.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Our understanding is that the 3D engine is supposed to be faster
anyway. We used to have more overhead in our tri clear path than we
do today, which would have led to this choice. But given that we
almost always see a depth clear along with a color clear, the path was
hardly exercised anyway.
Also, the color mask logic was broken in the presence of
GL_EXT_draw_buffers2's per-buffer colormask.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Previously, when the environment variable INTEL_DEBUG=aub was set,
mesa would simply instruct DRM to start dumping data to an .aub file,
but we would not provide DRM with any information about the format of
the data in various buffers. As a result, a lot of the data in the
generate .aub file would be unannotated, making further data analysis
difficult.
This patch causes the entire contents of each batch buffer to be
annotated using the data in brw->state_batch_list (which was
previously used only to annotate the output of INTEL_DEBUG=bat). This
includes data that was allocated by brw_state_batch, such as binding
tables, surface and sampler states, depth/stencil state, and so on.
The new annotation mechanism requires DRM version 2.4.34.
Reviewed-by: Eric Anholt <eric@anholt.net>
When we are generating an AUB dump, we make a final call to
aub_dump_bmp() as the context is being destroyed, to ensure that any
rendering performed before the application exits can be seen during a
simulation run. However, we were doing this before flushing the batch
buffer; as a result simulation runs would not always see the effect of
all rendering commands.
This patch flushes the batch buffer just before making the final call
to aub_dump_bmp(), to ensure that all rendering is properly captured
in the final bitmap.
This is a long standing problem, that recently surfaced with the change
to enable perspective correct color interpolation.
A fix for all possible formats is left to the future.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Previously assumed normalised was 0 to 1, but it can be -1 to 1
if type is signed.
Tested with lp_test_conv and lp_test_format, reduced errors.
Signed-off-by: José Fonseca <jfonseca@vmware.com>
Fixing a /*FIXME*/ to remove errors in integer conversion in lp_build_conv.
Tested using lp_test_conv and lp_test_format, reduced errors.
Signed-off-by: José Fonseca <jfonseca@vmware.com>
This patch removes two Clang warnings in GLU:
The first one seems to be an actual bug in mapdesc.cc: Clang complains
that sizeof(dest) will return the size of REAL*[MAXCOORDS], instead of
the intended REAL[MAXCOORDS][MAXCOORDS]. The second one is just
cosmetic because Clang doesn't like extra parentheses.
NOTE: This is a candidate for the 8.0 branch
Reviewed-by: Brian Paul <brianp@vmware.com>
Fixes another case of sampler views being created by one context,
shared by another, then deleted by the first, leaving a dangling
pipe context pointer.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Use it where performance matters more and the exact method of float->int
conversion/rounding isn't terribly important. There should no net change
here since F_TO_I() is the new name of the old IROUND() function.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
The different implementations of IROUND() behaved differently and in
the case of fistp, depended on the current x86 FPU rounding mode.
This caused some tests like piglit roundmode-pixelstore and
roundmode-getintegerv to fail on 32-bit x86 but pass on 64-bit x86.
Now IROUND() always rounds to the nearest integer (away from zero).
The new F_TO_I function converts a float to an int by whatever means
is fastest. We'll use this where we're more concerned with performance
and not too worried to how the conversion is done.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
The IROUND converted all arguments to 0 or 1. That's not what we wanted.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
For zero-stride vertex arrays, the svga driver copies the value into
the constant value and uses that value in the shader. The recent
gallium-userbuf changes caused a regression in this. An example
symptom was per-primitive glColor3f() calls getting ignored.
Where we copied the vertex value from the vertex buffer to the
constant buffer we neglected to take into account the
pipe_vertex_buffer::buffer_offset field. Adding that value to the
source offset fixes the problem. Actually, it looks like we should
have been doing this all along, but it never was an issue before for
some reason.
If the MESA_GLSL env var contains "errors", GLSL compilation and
link errors will be reported to stderr.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Fix uninitialized scalar variable defect reported by Coverity.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Piglits test for fragment shaders pass, vertex shaders fail. The
actual failure seems to be in the interpolators, and not the
textureSize query.
Signed-off-by: Olivier Galibert <galibert@pobox.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: José Fonseca <jose.r.fonseca@gmail.com>
Fixes a bunch of piglit tests related to flat interpolation of floats.
Signed-off-by: Olivier Galibert <galibert@pobox.com>
Signed-off-by: José Fonseca <jose.r.fonseca@gmail.com>
The VBO module now can handle primitive restart in software
if required. Therefore this support is no londer required.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
If the PIPE_CAP_PRIMITIVE_RESTART screen param is not set, then enable
PrimitiveRestartInSoftware to enable software primitive restart
support in the VBO module.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
When PrimitiveRestartInSoftware is set, the VBO module will handle
primitive restart scenarios before calling the vbo->draw_prims
drawing function.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If set, then the VBO module will handle all primitive
restart scenarios before calling the driver draw_prims.
Software primitive restart support is disabled by default.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
vbo_sw_primitive_restart implements primitive restart in software
by splitting primitive draws apart.
This is based on similar support in mesa/state_tracker/st_draw.c.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It's an implied argument, and I don't think being explicit about it
helps.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The comment quotes spec saying that only scalar integers are allowed,
but we only checked for integer.
Fixes piglit switch-expression-const-ivec2.vert
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Total instructions: 261582 -> 261316
135/2147 programs affected (6.3%)
36752 -> 36486 instructions in affected programs (0.7% reduction)
This excludes a tropics shader that now gets 16-wide mode and throws
off the numbers. 5 shaders are hurt: two extra MOVs in 4 tropics
shaders it looks like because we don't split register names according
to independent webs, and one gstreamer shader where it looks like
try_rewrite_rhs_to_dst() is falling on its face.
This should also help avoid a regression in VSes from idr's ARB
programs to GLSL work.
By using the live variables code for determining interference, we can
handle coalescing in the presence of control flow, which the other
register coalescing path couldn't.
Total instructions: 207184 -> 206990
74/1246 programs affected (5.9%)
33993 -> 33799 instructions in affected programs (0.6% reduction)
There is a newerth shader that loses out, because of some extra MOVs
that now get their dead-code nature obscured by coalescing. This
should be fixed by doing better at dead code elimination.
Starting with LLVM 3.0, named structures are meant not for debugging, but
for recursive data types, previously also known as opaque types.
The recursive nature of these types leads to several memory management
difficulties. Given that we don't actually need recursive types, avoid
them altogether.
This is an attempt to address fdo bugs 41791 and 44466. The issue is
somewhat random so there's no easy way to check how effective this is.
No functional change. This patch replaces the
brw_blorp_params::exec() method with a global function
brw_blorp_exec() that performs the operation described by the params
data structure.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch enables MSAA for Gen6, by modifying intel_mipmap_tree to
understand multisampled buffers, adapting the rendering pipeline setup
to enable multisampled rendering, and adding multisample resolve
operations to brw_blorp_blit.cpp. Some preparation work is also
included for Gen7, but it is not yet enabled.
MSAA support is still fairly preliminary. In particular, the
following are not yet supported:
- Fully general blits between MSAA and non-MSAA buffers.
- Formats other than RGBA8, DEPTH24, and STENCIL8.
- Centroid interpolation.
- Coverage parameters (glSampleCoverage, GL_SAMPLE_ALPHA_TO_COVERAGE,
GL_SAMPLE_ALPHA_TO_ONE, GL_SAMPLE_COVERAGE, GL_SAMPLE_COVERAGE_VALUE,
GL_SAMPLE_COVERAGE_INVERT).
Fixes piglit tests "EXT_framebuffer_multisample/accuracy" on
i965/Gen6.
v2:
- In intel_alloc_renderbuffer_storage(), quantize the requested number
of samples to the next higher sample count supported by the
hardware. This ensures that a query of GL_SAMPLES will return the
correct value. It also ensures that MSAA is fully disabled on Gen7
for now (since Gen7 MSAA support doesn't work yet).
- When reading from a non-MSAA surface, ensure that s_is_zero is true
so that we won't try to read from a nonexistent sample.
This patch expands the "blorp" component to be able to perform blits
as well as HiZ resolves. The new blitting code is located in
brw_blorp_blit.cpp. This includes the necessary fragment shader code
to look up pixels in the source buffer (which is configured as a
texture) and output them to the destination buffer (which is
configured as the render target).
Most of the time the fragment shader code is simple and
straightforward, since it merely has to apply a coordinate offset,
read from the texture, and write to the render target. However, in
the case of blitting stencil buffers, things are more complicated,
since the GPU stores stencil data using W tiling, and W tiling is not
supported for textures or render targets. So, we set up the stencil
buffers as Y tiled, and emit fragment shader code that adjusts the
coordinates to account for the difference between W and Y tiling.
Furthermore, since a rectangular region in W tiling does not
necessarily correspond to a rectangular region in Y tiling, we widen
the rectangle primitive to the nearest tile boundary and have the
fragment shader "kill" any pixels that don't fall inside the actual
desired destination rectangle.
All of this is a necessary prerequisite for implementing MSAA, since
we'll need to be able to blit between multisample color, depth, and
stencil buffers and their non-multisampled counterparts, and none of
the existing blitting mechanisms support multisampling.
In addition, the new blitting code should speed up operations where we
previously fell back to software rasterization, such as blitting of
stencil buffers. The current fallback sequence is: first we try to do
a blit using the hardware blitting engine. If that fails we try to do
a blit using the render path. If that also fails then we do the blit
using a meta-op (which may or may not fall back to software
rasterization).
Note that blitting using the render path has some limitations at the
moment: it only supports a few formats, and it doesn't support
clipping or scissoring. These limitations will be addressed in future
patch series.
v2:
- Add the code that configures the WM program to
gen{6,7}_emit_wm_config() and gen7_emit_ps_config() rather than
creating separate ...enable() functions.
- Call intel_prepare_render before determining which miptrees we are
blitting from/to, because it may cause miptrees to be reallocated.
- Allow the blit to mirror X and/or Y coordinates.
- Disable blorp blits on Gen7 for now, since they aren't working yet.
This patch exposes the functions brw_get_surface_tiling_bits and
gen7_set_surface_tiling, so that they can be re-used when setting up
surface states in gen6_blorp.cpp and gen7_blorp.cpp.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
This patch splits up the gen6_blorp_exec and gen7_blorp_exec
functions, which were very long, into simple component functions.
With a few exceptions, there is one function per state packet.
This will allow blit functionality to be added without significantly
complicating the code.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
v2: Rename the functions gen{6,7}_emit_wm_disable() to
gen{6,7}_emit_wm_config() (since the WM is not actually disabled
during HiZ ops; it simply doesn't have a program). Also, on gen7,
split out the configration of 3DSTATE_PS to a separate function
gen7_emit_ps_config().
This patch groups together the parameters used by the HiZ functions
into a new data structure, brw_hiz_resolve_params, rather than passing
each parameter individually between the HiZ functions. This data
structure is a subclass of brw_blorp_params, which represents the
parameters of a general-purpose blit or resolve operation. A future
patch will add another subclass for blits.
In addition, this patch generalizes the (width, height) parameters to
a full rect (x0, y0, x1, y1), since blitting operations will need to
be able to operate on arbitrary rectangles. Also, it renames several
of the HiZ functions to reflect the expanded role they will serve.
v2: Rename brw_hiz_resolve_params to brw_hiz_op_params. Move
gen{6,7}_blorp_exec() functions back into gen{6,7}_blorp.h.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
When drawing to a FBO, the viewport wasn't always set correctly. It
was fine in the usual case of the viewport dims matching the surface
dims but broken otherwise. In particular, this was happening because
the viewport scale is negative for FBO rendering.
The piglit fbo-viewport test exercises this.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
We can save one instruction by lowering it to:
SUB_INT tmp, 0, src
MAX_INT dst, src, tmp
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
This patch adds .gitignore files to ignore the makefiles generated by
the gallium pipe loader and the clover OpenCL state tracker.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Previously, I tried implementing this in the i965 driver, but did so
in a way that violated the intent of the spec, and broke Tropics.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This will be convenient when I want to comment out optimization code
to see the raw program being optimized, but more importantly will let
the interference check be used during optimization.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
We could do more by handling abs/negate and non-GRF sources, but this is
a good start. Improves tropics performance 0.30% +/- .17% (n=43).
shader-db results:
Total instructions: 208032 -> 207184
60/1246 programs affected (4.8%)
23286 -> 22438 instructions in affected programs (3.6% reduction)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When I had a bug causing the backend to never finish optimizing, it
also sent me deep into swap. This avoids extra memory allocation per
trip through optimization, and thus may reduce the peak memory
allocation of the driver even in the success case.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Total instructions: 18210 -> 17836
49/163 programs affected (30.1%)
12888 -> 12514 instructions in affected programs (2.9% reduction)
This reduces Lightsmark's "Scale down filter" shader from 395
instructions to 283, a whopping 28%. It also reduces register pressure
significantly: the SIMD8 program now uses 29 registers instead of 101,
giving us more than enough room for a SIMD16 program.
v2: Add && !inst->conditional_mod to the "skip some instructions" check.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This lets you omit some ampersands and is more idiomatic C++. Using
const also marks the function as not altering either register (which
was obvious, but nice to enforce).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This allows us to calculate the triangle's area using fixed point,
previously it was cacluated in floating point space. It was possible
that a triangle which had negative area in floating point space had
a positive area in fixed point space.
Fixes fdo 40920.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
As pointed out by Marek, if we have only one cb, we may as well add this
single register write here rather than adding it in the draw loop.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Vertex and index buffers are never used by hardware, only by Draw.
SWTCL chipsets usually have very little memory, so this might help
with stability and reliability.
Instead of having to hack the code to enable these debugging options,
set them through the MESA_DEBUG env var.
Reviewed-by: Eric Anholt <eric@anholt.net>
This flag has been around for a while but it wasn't actually used anywhere.
Now, setting this flag causes a glFlush() to be issued after each
drawing call (including glBegin/End, glDrawElements, glDrawArrays,
glDrawPixels, glCopyPixels and glBitmap).
This was being done in the _mesa_Flush/Finish() calls but if there
was an internal call to _mesa_flush/finish() the FLUSH_VERTICES()
wouldn't happen. Looks like only the intel and radeon drivers made
such calls in MakeCurrent().
When glColorMaterial() is used to latch glColor commands to a material
attribute, glMaterial calls to change that material should become no-ops.
This failed to work properly when the glMaterial call was inside a
display list.
This removes the Material function from the vbo_attrib_tmp.h template
file. We have separate/different implementations for the "save" and
"exec" cases now.
NOTE: This is a candidate for the 8.0 branch.
_mesa_material_bitmask() will record a GL error and return 0 if
face or mode are illegal. Return early in that case.
NOTE: This is a candidate for the 8.0 branch.
Some code relies on the existing of an invalid texture target. It seems
safer to bring it back than to deal with unintended consequences.
This partially reverts commit a4ebb04214.
Reviewed-by: Brian Paul <brianp@vmware.com>
Contains the following patches squashed in:
commit 9fff1dc0875f7c9591550fa3ebbe1ba7a18483fa
Author: Tom Stellard <thomas.stellard@amd.com>
Date: Tue Mar 20 23:20:03 2012 +0100
configure.ac: Build gallium loader when OpenCL is enabled
commit 542111cb02957418c6a285cb6ef2924e49adc66e
Author: Tom Stellard <thomas.stellard@amd.com>
Date: Tue Mar 20 23:30:29 2012 +0100
configure.ac: Add sw/null to GALLIUM_WINSYS_DIRS for gallium loader
commit 876f8de46062dde76b6075be3b6628f969b16648
Author: Tom Stellard <thomas.stellard@amd.com>
Date: Thu Feb 9 11:26:05 2012 -0500
configure.ac: Require gcc > 4.6.0 for clover
commit 99049d50fa3d9a23297ae658189c19c89dca1766
Author: Tom Stellard <thomas.stellard@amd.com>
Date: Tue Mar 20 23:32:06 2012 +0100
configure.ac: Require Gallium drm loader when gallium loader is enabled
No longer silently exclude this when building OpenCL drivers
for nouveau and r600.
Add a test program that tries to exercise some of the language
features commonly used by compute programs at the Gallium API level:
- Correctness of the values returned by the grid parameters.
- Proper functioning of resource LOADs and STOREs.
- Subroutine calls.
- Argument passing to the compute parameter through the INPUT
memory space.
- Mapping of buffer objects to the GLOBAL memory space.
- Proper functioning of the PRIVATE and LOCAL memory spaces.
- Texture sampling and constant buffers.
- Support for multiple kernels in the same program.
- Indirect resource indexing.
- Formatted resource loads and stores (i.e. with channel conversion
and scaling) using several different formats.
- Proper functioning of work-group barriers.
- Atomicity and semantics of the atomic opcodes.
As of now all of them seem to pass on my nvA8.
It simplifies things slightly, and besides, it makes possible to
execute the trivial tests on a hardware device instead of being
limited to software rendering.
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
This target generates pipe driver modules intended to be consumed by
auxiliary/pipe-loader. Most of it was taken from the "gbm" target --
the duplicated code will be replaced with references to this target in
a future commit.
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
The goal is to have a uniform interface to create winsys and
pipe_screen instances for any driver, exposing the device enumeration
capabilities that might be supported by the operating system (for now
there's a "drm" back-end using udev and a "sw" back-end that always
returns the same built-in devices).
The typical use case of this library will be:
>
> struct pipe_loader_device devs[n];
> struct pipe_screen *screen;
>
> pipe_loader_probe(&devs, n);
>[pick some device from the array...]
>
> screen = pipe_loader_create_screen(dev, library_search_path);
>[do something with screen...]
>
> screen->destroy(screen);
> pipe_loader_release(&devs, N);
>
A part of the code was taken from targets/gbm/pipe_loader.c, which
will be removed and replaced with calls into this library by a future
commit.
Add a shader cap for specifying the preferred shader representation.
Right now the only supported value is TGSI, other enum values will be
added as they are needed.
This is mainly to accommodate AMD's LLVM compiler back-end by letting
it bypass the TGSI representation for compute programs. Other drivers
will keep using the common TGSI instruction set.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
This change will be useful to implement function parameter passing on
top of TGSI. As we don't have a proper stack, a register-based
calling convention will be used instead, which isn't necessarily a bad
thing given that GPUs often have plenty of registers to spare.
Using the same register space for local temporaries and
inter-procedural communication caused some inefficiencies, because in
some cases the register allocator would lose the freedom to merge
temporary values together into the same physical register, leading to
suboptimal register (and sometimes, as a side effect, instruction)
usage.
The LOCAL declaration modifier specifies that the value isn't intended
for parameter passing and as a result the compiler doesn't have to
give any guarantees of it being preserved across function boundaries.
Ignoring the LOCAL flag doesn't change the semantics of a valid
program in any way, because local variables are just supposed to get a
more relaxed treatment. IOW, this should be a backwards-compatible
change.
Normal resource access (e.g. the LOAD TGSI opcode) is supposed to
perform a series of conversions to turn the texture data as it's found
in memory into the target data type.
In compute programs it's often the case that we only want to access
the raw bits as they're stored in some buffer object, and any kind of
channel conversion and scaling is harmful or inefficient, especially
in implementations that lack proper hardware support to take care of
it -- in those cases the conversion has to be implemented in software
and it's likely to result in a performance hit even if the pipe_buffer
and declaration data types are set up in a way that would just pass
the data through.
Add a declaration flag that marks a resource as typeless. No channel
conversion will be performed in that case, and the X coordinate of the
address vector will be interpreted in byte units instead of elements
for obvious reasons.
This is similar to D3D11's ByteAddressBuffer, and will be used to
implement OpenCL's constant arguments. The remaining four compute
memory spaces can also be understood as raw resources.
This texture type was already referred to by the documentation but it
was never defined. Define it as 0 to match the pipe_texture_target
enumeration values.
Move Interpolate, Centroid and CylindricalWrap from tgsi_declaration
to a separate token -- they only make sense for FS inputs and we need
room for other flags in the top-level declaration token.
This commit splits the current concept of resource into "sampler
views" and "shader resources":
"Sampler views" are textures or buffers that are bound to a given
shader stage and can be read from in conjunction with a sampler
object. They are analogous to OpenGL texture objects or Direct3D
SRVs.
"Shader resources" are textures or buffers that can be read and
written from a shader. There's no support for floating point
coordinates, address wrap modes or filtering, and, unlike sampler
views, shader resources are global for the whole graphics pipeline.
They are analogous to OpenGL image objects (as in
ARB_shader_image_load_store) or Direct3D UAVs.
Most hardware is likely to implement shader resources and sampler
views as separate objects, so, having the distinction at the API level
simplifies things slightly for the driver.
This patch introduces the SVIEW register file with a declaration token
and syntax analogous to the already existing RES register file. After
this change, the SAMPLE_* opcodes no longer accept a resource as
input, but rather a SVIEW object. To preserve the functionality of
reading from a sampler view with integer coordinates, the
SAMPLE_I(_MS) opcodes are introduced which are similar to LOAD(_MS)
but take a SVIEW register instead of a RES register as argument.
Define an interface that exposes the minimal functionality required to
implement some of the popular compute APIs. This commit adds entry
points to set the grid layout and other state required to keep track
of the usual address spaces employed in compute APIs, to bind a
compute program, and execute it on the device.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
This patch renames the gen6_hiz.h and gen7_hiz.h files to correspond
to the renames of the corresponding .cpp files (see previous commit).
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
This patch converts the files gen6_hiz.c and gen7_hiz.c to C++, in
preparation for expanding the HiZ code to support arbitrary blits.
The new files are called gen6_blorp.cpp and gen7_blorp.cpp to reflect
the expanded role that this code will serve--"blorp" stands for "BLit
Or Resolve Pass".
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Previous to this patch, gen6_hiz.c contained two implicit type casts
from void * to a a non-void pointer type. This is allowed in C but
not in C++. This patch makes the type casts explicit, so that
gen6_hiz.c can be converted into a C++ file.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
In C++, if a struct is defined inside another struct, or its name is
first seen inside a struct or function, the struct is nested inside
the namespace of the struct or function it appears in. In C, all
structs are visible from toplevel.
This patch explicitly moves the decalartions of intel_batchbuffer to
toplevel, so that it does not get nested inside a namespace when
header files are included from C++.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
These declarations are necessary to allow C++ code to call C code
without causing unresolved symbols (which would make the driver fail
to load).
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Fix wrong cube/3D texture layout for the tailing levels whose width or
height is smaller than the align unit.
From 965 B-spec http://intellinuxgraphics.org/VOL_1_graphics_core.pdf at
page 135:
All of the LOD=0 q-planes are stacked vertically, then below that,
the LOD=1 qplanes are stacked two-wide, then the LOD=2 qplanes are
stacked four-wide below that, and so on.
Thus we should always inrease pack_x_nr, which results to the pitch of LODn
may greater than the pitch of LOD0. So we should refactor mt->total_width
when needed.
This would fix the following webgl test case on all gen4 platforms:
conformance/textures/texture-size-cube-maps.html
NOTE: This is a candidate for stable release branches.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
This points to the object with the function body, allowing us to map
from a built-in prototype to the actual body with IR code to execute.
Signed-off-by: Olivier Galibert <galibert@pobox.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
- copy_masked_offset copies part of a constant into another,
assign-like.
- copy_offset copies a constant into (a subset of) another,
funcall-return like.
These methods are to be used to trace through assignments and function
calls when computing a constant expression.
Signed-off-by: Olivier Galibert <galibert@pobox.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net> [v1]
The method is used to get a reference to an ir_constant * within the
context of evaluating an assignment when calculating a
constant_expression_value.
Signed-off-by: Olivier Galibert <galibert@pobox.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net> [v1]
We were looping over all the vector components, but only dealing with
the first one. This was masked by the fact that constant expression
handling on built-ins went through custom code for the lessThan()
/function/ rather than the ir_binop_less expression operator.
NOTE: This is a candidate for all release branches.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Olivier Galibert <galibert@pobox.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
The vbo module recomputes its states if _NEW_ARRAY is set, so it shouldn't use
the same flag to notify the driver. Since we've run out of bits in NewState
and NewState is for core Mesa anyway, we need to find another way.
This patch is the first to start decoupling the state flags meant only
for core Mesa and those only for drivers.
The idea is to have two flag sets:
- gl_context::NewState - used by core Mesa only
- gl_context::NewDriverState - used by drivers only (the flags are defined
by the driver and opaque to core Mesa)
It makes perfect sense to use NewState|=_NEW_ARRAY to notify the vbo module
that the user changed vertex arrays, and the vbo module in turn sets
a driver-specific flag to notify the driver that it should update its vertex
array bindings.
The driver decides which bits of NewDriverState should be set and stores them
in gl_context::DriverFlags. Then, Core Mesa can do this:
ctx->NewDriverState |= ctx->DriverFlags.NewArray;
This patch implements this behavior and adapts st/mesa.
DriverFlags.NewArray is set to ST_NEW_VERTEX_ARRAYS.
Core Mesa only sets NewDriverState. It's the driver's responsibility to read
it whenever it wants and reset it to 0.
Reviewed-by: Brian Paul <brianp@vmware.com>
In the future we'd like to treat vertex arrays as a state and
not as a parameter to the draw function. This is the first step
towards that goal. Part of the goal is to avoid array re-validation
for every draw call.
This commit adds:
const struct gl_client_array **gl_context::Array::_DrawArrays.
The pointer is changed in:
* vbo_draw_method
* vbo_rebase_prims - unused by gallium
* vbo_split_prims - unused by gallium
* st_RasterPos
Reviewed-by: Brian Paul <brianp@vmware.com>
Replacing "float equal to 1.0f" with "int not equal to 0".
This should help for further optimization of boolean computations.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
We're using float as default type, so basically for every instruction that
wants other types for dst/src operands we need to perform the bitcast
to/from default float. Currently bitcast produces no-op MOV instruction,
will be eliminated later.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
In i965 Gen7, Mesa has for a long time used the "depth coordinate
offset X/Y" settings (in 3DSTATE_DEPTH_BUFFER) to cause the GPU to
render to miplevels other than 0. Unfortunately, this doesn't work,
because these offsets must be aligned to multiples of 8, and miplevels
in the depth buffer are only guaranteed to be aligned to multiples of
4. When the offsets aren't aligned to a multiple of 8, the GPU
sometimes hangs.
As a temporary measure, to avoid GPU hangs, this patch smashes the 3
LSB's of "depth coordinate offset X/Y" to 0. This results in
incorrect rendering to mipmapped depth textures, but that seems like a
reasonable stopgap while we figure out a better solution.
Avoids GPU hangs in piglit test "depthstencil-render-miplevels" at
texture sizes that are not powers of 2.
Reviewed-by: Chad Verace <chad.versace@linux.intel.com>
In i965 Gen6, Mesa has for a long time used the "depth coordinate
offset X/Y" settings (in 3DSTATE_DEPTH_BUFFER) to cause the GPU to
render to miplevels other than 0. Unfortunately, this doesn't work,
because these offsets must be aligned to multiples of 8, and miplevels
in the depth buffer are only guaranteed to be aligned to multiples of
4. When the offsets aren't aligned to a multiple of 8, the GPU
sometimes hangs.
As a temporary measure, to avoid GPU hangs, this patch smashes the 3
LSB's of "depth coordinate offset X/Y" to 0. This results in
incorrect rendering to mipmapped depth textures, but that seems like a
reasonable stopgap while we figure out a better solution.
(Note that we have only ever observed this GPU hang on Gen6 when HiZ
is enabled, so another possible stopgap would be to disable HiZ).
Avoids GPU hangs in piglit test "depthstencil-render-miplevels" at
texture sizes that are not powers of 2.
Reviewed-by: Chad Verace <chad.versace@linux.intel.com>
When the user attaches a texture to one of the depth/stencil
attachment points (GL_STENCIL_ATTACHMENT or GL_DEPTH_ATTACHMENT), we
check to see if the same texture is also attached to the other
attachment point, and if so, we re-use the existing texture
attachment. This is necessary to ensure that if the user later
queries what is attached to GL_DEPTH_STENCIL_ATTACHMENT, they will not
receive an error.
If, however, the user attaches buffers to the two different attachment
points using different parameters (e.g. a different miplevel), then we
can't re-use the existing texture attachment, because it is pointing
to the wrong part of the texture. This might occur as a transitory
condition if, for example, if the user attached miplevel zero of a
texture to GL_STENCIL_ATTACHMENT and GL_DEPTH_ATTACHMENT, rendered to
it, and then later attempted to attach miplevel one of the same
texture to GL_STENCIL_ATTACHMENT and GL_DEPTH_ATTACHMENT.
This patch causes Mesa to check that GL_STENCIL_ATTACHMENT and
GL_DEPTH_ATTACHMENT use the same attachment parameters before
attempting to share the texture attachment.
On i965 Gen6, fixes piglit tests
"texturing/depthstencil-render-miplevels 1024 depth_stencil_shared"
and "texturing/depthstencil-render-miplevels 1024
stencil_depth_shared".
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
When rendering to a miplevel other than 0 within a color, depth,
stencil, or HiZ buffer, we need to tell the GPU to render to an offset
within the buffer, so that the data is written into the correct
miplevel. We do this using a coarse offset (in pages), and a fine
adjustment (the so-called "tile_x" and "tile_y" values, which are
measured in pixels).
We have always computed the coarse offset and fine adjustment using
intel_renderbuffer_tile_offsets() function. This worked fine for
color and combined depth/stencil buffers, but failed to work properly
when HiZ and separate stencil were in use. It failed to work because
there is only one set of fine adjustment controls shared by the HiZ,
depth, and stencil buffers, so we need to choose tile_x and tile_y
values that are compatible with the tiling of all three buffers, and
then compute separate coarse offsets for each buffer.
This patch fixes the HiZ and separate stencil case by replacing the
call to intel_renderbuffer_tile_offsets() with calls to two functions:
intel_region_get_tile_masks(), which determines how much of the
adjustment can be performed using offsets and how much can be
performed using tile_x and tile_y, and
intel_region_get_aligned_offset(), which computes the coarse offset.
intel_region_get_tile_offsets() is still used for color renderbuffers,
so to avoid code duplication, I've re-worked it to use
intel_region_get_tile_masks() and intel_region_get_aligned_offset().
On i965 Gen6, fixes piglit tests
"texturing/depthstencil-render-miplevels 1024 X" where X is one of
(depth, depth_and_stencil, depth_stencil_single_binding, depth_x,
depth_x_and_stencil, stencil, stencil_and_depth, stencil_and_depth_x).
On i965 Gen7, the variants of
"texturing/depthstencil-render-miplevels" that contain a stencil
buffer still fail, due to another problem: Gen7 seems to ignore the 3
LSB's of the tile_y adjustment (and possibly also tile_x).
v2: Removed spurious comments. Added assertions to check
preconditions of intel_region_get_aligned_offset().
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
This patch removes ARB_framebuffer_object from the GLES1 and GLES2
extension lists in intel_extensions_es.c.
Fixes a crash in the Android browser on Ice Cream Sandwich.
The Android browser crashed because it did the following, which is legal
in GLES2 but not in ARB_framebuffer_object.
glGenFramebuffers(1, &fb);
glBindFramebuffer(GL_FRAMEBUFFER, fb);
// render render render...
glDeleteFramebuffers(1, &fb);
// go do other stuff...
glBindFramebuffer(GL_FRAMEBUFFER, fb);
// This bind unexpectedly failed, and the app panics.
The semantics of glBindFramebuffer specified by ARB_framebuffer_object (a
desktop GL extension) and GLES2 specs are incompatible. The ideal solution
to fix this is to create separate API entry points for glBindFramebuffer,
one for GL and the other for GLES2. But, until that work is complete,
disabling ARB_framebuffer_object in GLES2 contexts safely fixes the problem.
Likewise, the semantics of glBindFramebuffer in ARB_framebuffer_object and
of glBindFramebufferOES in OES_framebuffer_object (a GLES1 extension) are
incompatible. Even though the functions have different names, the semantic
difference still results in a bug because both API calls are implemented
by a single function, _mesa_BindFramebufferEXT, which handles the semantic
difference incorrectly. Again, disabling ARB_framebuffer_object in GLES1
contexts safely fixes this problem.
According to the ARB_framebuffer_object spec, the extension is an
amalgamation of
EXT_framebuffer_object
EXT_framebuffer_blit
EXT_packed_depth_stencil
EXT_framebuffer_multisample
By disabling this extension, however, no functionality is removed from
GLES1 and GLES2 contexts because 1) the first three extensions are
explicitly enabled in Intel's ES extension lists and 2) no functionality
of the last extension is exposed in an ES context.
Note: This is a candidate for the 8.0 branch.
See-also: http://www.mail-archive.com/mesa-dev@lists.freedesktop.org/msg21006.html
CC: Charles Johnson <charles.f.johnson@intel.com>
CC: Sean Kelley <sean.v.kelley@intel.com>
Reviewed-by: Ian Romanick <idr@freedesktop.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
When primitive restart is enabled, and glArrayElement is called
with the restart index value, then call glPrimitiveRestartNV.
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Brian Paul<brianp@vmware.com>
The signal.h include was missed in the commit
bc16c73407 which leads to broken
compilations under Linux.
Signed-off-by: José Fonseca <jose.r.fonseca@gmail.com>
When doing the var->assigned change in
f2475ca424, I overzealously indented the
second block of code into the "if (var)" test. Revert these blocks to
the way they were before, just taking advantage of "var" to avoid
re-calling variable_referenced().
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=49066
I only considered var->assigned for FragColor and FragData, but
ignored when it was false for out vars. Fixes piglit
write-gl_FragColor-and-not-user-output.frag
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=49068
It seems silly that GL lets you allocate these given that they're
framebuffer attachment incomplete, but the webgl conformance tests
actually go looking to see if the getters on 0-width/height
depth/stencil renderbuffers return good values. By failing out here,
they all got smashed to 0, which turned out to be correct for all the
getters they tested except for GL_RENDERBUFFER_INTERNAL_FORMAT. Now,
by succeeding but not making a miptree, that one also returns the
expected value.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
The index is also used for GL_ARB_blend_func_extended. Cloning in
i965 was dropping a non-ARB_explicit_attrib_location index.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
When glTexImage or glCopyTexImage is called with internalFormat being a
generic compressed format (like GL_COMPRESSED_RGB) we need to do the same
error checks as for specific compressed formats. In particular, check if
the texture target is compatible with the format. None of the texture
compression formats we support so far work with GL_TEXTURE_1D, for example.
See also https://bugs.freedesktop.org/show_bug.cgi?id=49124
NOTE: This is a candidate for the 8.0 branch.
Otherwise it fails like so:
CC egl_dri2.lo
In file included from egl_dri2.h:40:0,
from egl_dri2.c:42:
../../../../../../src/egl/wayland/wayland-drm/wayland-drm.h:8:41:
fatal error: wayland-drm-server-protocol.h: No such file or directory
compilation terminated.
This new gbm entry point allows writing data into a gbm bo. The bo has
to be created with the GBM_BO_USE_WRITE flag, and it's only required to
work for GBM_BO_USE_CURSOR_64X64 bos.
The gbm API is designed to be the glue layer between EGL and KMS, but there
was never a mechanism initialize a buffer suitable for use with KMS
hw cursors. The hw cursor bo is typically not compatible with anything EGL
can render to, and thus there's no way to get data into such a bo.
gbm_bo_write() fills that gap while staying out of the efficient
cpu->gpu pixel transfer business.
Reviewed-by: Ander Conselvan de Oliveira <conselvan2@gmail.com>
At this point, in order for OpenCL to work correctly with r600g, OpenCL
specific intrinsics need to be defined in the LLVM tree. So, we need
to check for these intrinsics in the LLVM include directory to make sure
not to re-define them.
This is a pseudo instruction that enables the LLVM backend to encode
instructions and pass it through r600_bytecode_build()
Signed-off-by: Tom Stellard <thomas.stellard@amd.com>
This moves the alpha test control to derived state and disables alpha
testing for integer fbs.
fbo-blending test in piglit gets further when we do this (not a pass
but less fail).
v2: drop the fb_sx_alpha_test_control
Signed-off-by: Dave Airlie <airlied@redhat.com>
- Move to lp_bld_const where it belongs
- Rename to lp_build_const_string
- take the length from the argument (and don't count the zero terminator twice)
- bitcast the constant to generic i8 *
Allows the creation of const aos masks which have the mask swizzled
to match the correct format.
Updated existing mask creation code to use the swizzled version where
necessary (tgsi register masks and llvmpipe aos blending).
Signed-off-by: José Fonseca <jfonseca@vmware.com>
With this feature enabled, the LLVM backend will dump the MachineIntrs
prior to emitting code. The mesa env variable R600_DUMP_SHADERS will enable
this feature in the backend.
Fix uninitialized scalar field defect reported by Coverity.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
We can't delete MASK_WRITE instructions from the program, because this
will cause instructions being masked by MASK_WRITE to be marked dead and
then deleted in the dce pass.
Enabled MESA_FORMAT_RGBX8888_REV for RGBX. Android software
requires RGBX8888 format to be supported for software rendering.
That requires EGL to be capable of generating images from this
format.
Signed-off-by: Sean V Kelley <sean.v.kelley@linux.intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Add new format __DRI_IMAGE_FORMAT_XBGR8888 to __DRI_IMAGE.
HAL_PIXEL_FORMAT_RGBX_8888 now maps to __DRI_IMAGE_FORMAT_XBGR8888.
Signed-off-by: Sean V Kelley <sean.v.kelley@linux.intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Only images created with intel_create_image() had the field properly
set. Set it also on intel_dup_image(), intel_create_image_from_name()
and intel_create_image_from_renderbuffer().
GL_ARB_texture_storage says:
The commands eglBindTexImage, wglBindTexImageARB, glXBindTexImageEXT or
EGLImageTargetTexture2DOES are not permitted on an immutable-format
texture.
They will generate the following errors:
- EGLImageTargetTexture2DOES: INVALID_OPERATION
- eglBindTexImage: EGL_BAD_MATCH
- wglBindTexImage: ERROR_INVALID_OPERATION
- glXBindTexImageEXT: BadMatch
Fixing the EGL and GLX cases requires extending the DRI interface,
since setTexBuffer2 doesn't currently return any error information.
Reviewed-by: Brian Paul <brianp@vmware.com>
Adapted drivers: i915, llvmpipe, r300, r600, radeonsi, softpipe.
User index buffers have been disabled in nv30, nv50, nvc0 and svga to keep
things working.
This reduces CPU overhead in st_draw_vbo and removes a lot of unnecessary code
in that function which was required only to comply with the gallium interface,
but wasn't any useful really.
Adapted drivers: i915, llvmpipe, r300, softpipe.
No changes required in: r600, radeonsi.
User vertex buffers have been disabled in nv30, nv50, nvc0 and svga to keep
things working.
This is required for any serious constant buffer support.
Constant buffer offsets on ATI and NVIDIA DX10 and DX11 GPUs must be
a multiple of 256.
In OpenGL, this can be queried via GL_UNIFORM_BUFFER_OFFSET_ALIGNMENT.
As noted in commit be4e46b21a,
this was missing before.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
A little analysis shows that the worst-case value for "nr" is 17:
- base_mrf = 2 ... 2
- header present (say gen == 5) ... 4
- aa_dest_stencil_reg (stencil test) ... 5
- SIMD16 mode: += 4 * reg_width ... 13
- source_depth_to_render_target ... 15
- dest_depth_reg ... 17
This resulted in us setting base_mrf to 2 and mlen to 15. In other
words, we'd try to use m2..m16. But m16 doesn't exist pre-Gen6. Also,
the instruction scheduler data structures use arrays of size 16, so this
would cause us to access them out of bounds.
While the debugger system routine may need m0 and m1, we don't use it
today, so the simplest solution is just to move base_mrf back to 1.
That way, our worst case message fits in m1..m15, which is legal.
An alternative would be to fail on SIMD16 in this case, but that seems
a bit unfortunate if there's no real need to reserve m0 and m1.
Fixes new piglit test shaders/depth-test-and-write on Ironlake.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=48218
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
To ensure that the alloca is at the top of the function body, otherwise
LLVM will not eliminate them, causing stack misalignment on 32bits.
Reviewed-by: James Benton <jbenton@vmware.com>
This is taken from the ogl-math project, with Inverse renamed to adj
(since it's not actually the inverse), transposed, and our types
plugged in. There are potential CSE opportunities in this code
(particularly for hardware with RCP but not DIV), but we should be
doing CSE anyway, so don't hand-optimize.
Fixes piglit inverse tests.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
This takes advantage of the builtin compiler to generate IR into a
string, the same way we read GLSL for function prototypes for our
profiles.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I keep getting lost in the Makefile trying to figure out what to edit
to work on builtin_compiler or glsl_compiler.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It appears that when using 'ld' with the offset bits, address bounds
checking happens before the offset is applied, so parts of the drawing
in piglit texelFetchOffset() with a negative texcoord go black.
It appears that when using 'ld' with the offset bits, address bounds
checking happens before the offset is applied, so parts of the drawing
in piglit texelFetchOffset() with a negative texcoord go black.
This couldn't be split because it would break bisecting.
Summary:
* r300g,r600g: stop using u_vbuf
* r300g,r600g: also report that the FIXED vertex type is unsupported
* u_vbuf: refactor for use in the state tracker
* cso: wire up u_vbuf with cso_context
* st/mesa: conditionally install u_vbuf
This adds the ability to initialize u_vbuf_caps before creating u_vbuf itself.
It will be useful for determining if u_vbuf should be used or not.
Also adapt r300g and r600g.
This fixes an assertion failure since:
commit 81afdd20f3
vbo: don't check twice whether it's valid to render
FLUSH_CURRENT may set _NEW_CURRENT_ATTRIB.
Reviewed-by: Brian Paul <brianp@vmware.com>
If the source region for a glCopyPixels is completely outside the
source buffer bounds, no-op the copy. Fixes a failed assertion.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The LLVM backend can now be enabled for r600g by using the
--enable-r600-llvm-compiler configure flag. If you configure with this
flag, you can still use the default compiler by setting the envrionment
variable R600_USE_LLVM=0
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Otherwise HAVE_LLVM won't be included in the $(DEFINES) variable for
Automake generated Makefiles.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
This shaves 2k off the final dri.so, and removes lots of pointless
NULL, 0 passing.
most like pointless - but it looked nicer to me.
Signed-off-by: Dave Airlie <airlied@redhat.com>
The __glapi_gentable_set_remaining_noop() routine treats the _glapi_struct
as an array of _glapi_get_dispatch_table_size() pointers, so we have to
allocate _glapi_get_dispatch_table_size()*sizeof(void*) bytes rather
than sizeof(struct _glapi_struct) bytes.
Reviewed-by: Jeremy Huddleston <jeremyhu@apple.com>
Alexandre Demers sent me some cayman results with no major problems.
I'll rip out the env var in a week or so.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Full piglit run on my rv610 with no regressions.
This only leaves cayman, however my cayman is resisting my attempt
to get through a full piglit run.
Signed-off-by: Dave Airlie <airlied@redhat.com>
I've done a piglit run on rv740 and confirmed no regressions.
We don't get GL3 on r700 due to transform feedback being busted still.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This cap is used by u_blitter to decide if it can use integers
in vertex data.
fixes some crashes with glsl130 in piglit
Signed-off-by: Dave Airlie <airlied@redhat.com>
I've done a piglit run on my SUMO machine and I see no regressions.
Lots of things to fix (skip->fail), but hey maybe we can fix them
if we can see them.
I'll try and work my way across r600,700,cayman sometime if nobody
else gets to them.
Signed-off-by: Dave Airlie <airlied@redhat.com>
The field wasn't actually used before and it's not used now either.
But this is a more logical place for it and will hopefully allow
doing smarter draw/array validation (per array object) in the future.
Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Our previous live interval analysis just said that anything in a loop
was live for the whole loop. If you had to spill a reg in a loop,
then we would consider the unspilled value live across the loop too,
so you never made progress by spilling. Eventually it would consider
everything in the loop unspillable and fail out.
With the new analysis, things completely deffed and used inside the
loop won't be marked live across the loop, so even if you
spill/unspill something that used to be live across the loop, you
reduce register pressure. But you usually don't even have to spill
any more, since our intervals are smaller than before.
This fixes assertion failure trying to compile the shader for the
"glyphy" text rasterier and piglit glsl-fs-unroll-explosion.
Improves Unigine Tropics performance 1.3% +/- 0.2% (n=5), by allowing
more shaders to be compiled in 16-wide mode.
This takes the fs_inst list generated by the visitor, and generates a
list of basic blocks with edges between them. This is a building
block for data-flow analysis.
We were checking for these at link time previously, which is not as
early as mandated, and would actually fail to detect conflicting
writes if dead code removal removed some writes.
Fixes failures in piglit
glsl-*/compiler/fragment-outputs/write-gl_Frag*
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This will be used for some compile-and-link-time error checking, where
currently we've been doing error checking only at link time.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This runs optimization-test and produces the usual automake test
output, which may be interesting to automated build systems.
This doesn't convert the tests to be individually exposed to the
automake runner, because automake doesn't like wildcards (due to being
nonportable in make, not that we care).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is the reason the declaration member existed in the reference
visitor, but I didn't copy the code from structure splitting that
avoided setting it.
This wasn't currently a problem, because we don't allow splitting of
in/out variables. But that would be nice to change some day.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This was carried over from structure splitting, without thinking about
whether the name still made sense in this context.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes encoding of VOP3 shader instructions.
The shift was wrong for source registers 2 and 3, and the resulting value was
only 32 bits, so the shift in SICodeEmitter::VOPPostEncode() didn't work as
intended.
If a non-default array object was bound at context destruction time
we'd try to unreference the array object after it was already deleted
in _mesa_free_varray_data(). Now do the unref first.
Fixes a regression from commit 86f53e6d6b.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
It's not nice when you have several variables pointing to the same array
and you wanna ask your editor "where is this used" and you only get an answer
for one of the four currval, legacy_currval, generic_currval, mat_currval,
which is quite useless, because you never see the whole picture.
Let's get rid of the additional pointers.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
It's already done in _mesa_validate_Draw* and it's not needed to do it again
unless I am missing something.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
This is a frequently-updated state and _NEW_ARRAY already causes revalidation
of the vbo module. It's kinda counter-productive to recompute arrays
in the vbo module if _NEW_ARRAY is set and then set _NEW_ARRAY again.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
This moves the RebindArrays flag into the vbo module, consolidates the code,
and adds missing vbo_draw_method calls.
Also with this change, the vertex arrays are not needlessly recalculated twice.
The issue with the old code was:
- If recalculate_input_bindings updates vp_varying_inputs, _NEW_ARRAY is set.
- _mesa_update_state is called and the vp_varying_inputs change causes
regeneration of the fixed-function shaders, which also sets _NEW_PROGRAM.
- The occurence of either _NEW_ARRAY or _NEW_PROGRAM sets
the recalculate_inputs flag to TRUE again.
- The new code sets the flag to FALSE after the second _mesa_update_state,
because there can't possibly be any change which would require recalculating
the arrays.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Vinson reported that we failed to initialize this, which would lead to
all kinds of crashes if we actually used it. Since we don't use it,
we may as well just delete the broken code.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Now that we use separate binding tables for WM, VS, and GS, and have
BRW_MAX_VS_SURFACES and BRW_MAX_GS_SURFACES macros, we really shouldn't
have an unqualified BRW_MAX_SURFACES macro. It's confusing.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
They had a number of issues:
- A paragraph states that we use a single binding table, but we don't.
- We labelled the WM binding table diagram as SOL/WM.
- The WM diagram had an "Only relevant to the WM" comment. Duh.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
This change uses the array object factory for gl_array_objects. This
prevents crashes when deriving from gl_array_object.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
We don't normally clear immediately after drawing something. But as it
was, the drawing would incorrectly appear after the clear.
Fixes piglit clear-varray-2.0 failure.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Deletes a lot of pointless duplication, as well as some run-time effort.
Conveniently, GLSL 1.40 no longer needs a .vert variant, since it
doesn't define any built-ins specific to the vertex shader stage.
ARB_texture_rectangle and OES_EGL_image_external also only need a single
profile, since the .vert and .frag variants were identical.
I didn't bother with EXT_texture_array and OES_texture_3D because
they're so tiny that the savings would be miniscule.
Cuts the generated builtin_function.cpp from 1.7MB to 1.0MB (41%).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
The built-in subsystem uses "profiles," or GLSL shaders containing
prototypes for all built-ins supported within a particular language
version (or extension) and shader stage.
Since profiles were stage-specific, we had to cut and paste almost all
the prototypes between (e.g.) 110.vert and 110.frag. Naturally, this
led to sundry cut and paste bugs, where someone fixed an issue in .frag
but neglected to update .vert, or vice-versa. Geometry shaders would
have only made this worse.
This patch introduces support for a new '.glsl' profile suffix which
contains prototypes common to all shader stages. The existing '.frag'
and '.vert' profiles need only contain the few stage-specific built-ins.
Not only does this remove duplication, it makes built-in setup slightly
faster: we don't need to re-read the common prototypes and function
bodies for both the vertex and fragment shader stage.
Internally, this was trivial. We already create a list of gl_shader
objects to search through for built-ins: one for the core language
version/stage, and additional shaders for any extensions in use. This
patch simply adds another shader to the list: core/common, core/stage,
and extensions.
The next patch will update the profiles to remove the duplication.
It's separated out purely to make review easier.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Accelerates a few glReadPixels cases for WebGL.
See https://bugs.freedesktop.org/show_bug.cgi?id=48545
v2: Per Jose, use bit twiddling for the swizzle case instead of ubyte
arrays (it's about 44% faster).
Note: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
The GLSL 1.30 -> 4.10 specs all erroneously say "vec2" for a few
overloads of textureProjGradOffset, while most overloads and all other
texturing functions use ivec types.
The GLSL 4.20 specification corrects these to "ivec2", but doesn't
mention this as being a conscious change in behavior. Nor does the
ARB_shading_language_420pack extension. So presumably it was a typo.
At any rate, our builtin functions all use ivec already, so the fact
that these prototypes use plain vecs will only lead to applications
dying in a fire when trying to use them.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This reverts commit 4ec449a6ed.
I meant to not push this one. Review found that a link error is not
mandated: it should link, but you get undefined rendering if you rely
on a missing stage.
page 42/55 section 2.11 "Vertex Shaders":
"If the program object has no vertex shader, or no program object
is currently in use, the results of vertex shader execution are
undefined."
(and similar for page 160/173 section 3.9 "Fragment Shaders" for FS,
and page 45/58 section 2.11.2 "Program Objects" for program being 0)
It turns out the commit was broken anyway, because it was missing a
"goto done", so linkstatus got smashed back to true later and the
error just showed up as a warning in the infolog.
All I know of that needs finishing in Mesa is to enable the extension
in a GL3.1 core context on i965 -- we're not going to expose it in
non-3.1 core contexts.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes the new piglit texelFetch() tests on these. Note that the rest
of the new functions are not tested (same as the non-2DRect versions
of most of them).
The non-integer versions were already reserved in 1.30, but apparently
these were forgotten.
Fixes piglit glsl-1.40/compiler/reserved/
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Prevents this error with Automake 1.9:
src/gallium/drivers/Makefile.am: C objects in subdir but
`AM_PROG_CC_C_O' not in `configure.ac'
autoreconf: automake failed with exit status: 1
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
radeonsi and r600 have duplicate symbols, so it's not possible to
statically link both. Remove the newcomer, radeonsi, until duplicate
symbols are fixed.
Most things that work on Fermi should work on Kepler too.
There are a few performance optimizations left to do, like better
placement of texture barriers and adding scheduling data to the
shader instructions (without them, a thread group will be masked
for 32 cycles after each single instruction issue).
Edit: Don't do it for the main function of (graphics) shaders,
its inputs and outputs always go through TGSI_FILE_INPUT/OUTPUT.
This prevents all TEMPs from counting as live out and reduces
register pressure.
The point is to keep an independent dictionary for each function.
The array that was being used as dictionary has been converted into a
"bimap" for two different reasons: first, because having an almost
empty instance of an array with as many entries as registers there are
in the program, once for every function, would be wasteful, and
second, because we want to be able to map Value pointers back to
locations at some point.
The reason is that several passes (regalloc, function argument
binding, inlining) are going to require the callees of a function to
be processed before the caller.
Instruction attributes like WriteALUResult and ALUResultCompare
were being discarded during the some of the local transformations.
This fixes the following piglit tests:
glsl1-inequality (vec2, pass)
loopfunc
fs-any-bvec2-using-if
fs-op-ne-bvec2-bvec2-using-if
fs-op-ne-ivec2-ivec2-using-if
fs-op-ne-mat2-mat2-using-if
fs-op-ne-vec2-vec2-using-if
fs-op-ne-mat2x3-mat2x3-using-if
fs-op-ne-mat2x4-mat2x4-using-if
https://bugs.freedesktop.org/show_bug.cgi?id=45921
NOTE: This is a candidate for the stable branches.
The loop registers weren't being cleared, so any shader that was
executed after a shader containing loops was at risk of having a loop
randomly inserted into it.
This fixes over one hundred piglit tests, although these test
only failed during full piglit runs and would pass if
run individually. The exact number of piglit tests that this patch
fixes will vary depending on the version of piglit and the order the
tests are run.
NOTE: This is a candidate for the stable branches.
This lets us significantly shorten p->instructions->push_tail(ir), and
will be used in a few more places.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now we can fold a bunch of our expression setup in ff_fragment_shader
into single-line, parseable commits.
v2: Make it actually work. I wasn't setting num_components in the
mask structure, and not setting up a mask structure is way easier.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Having to explicitly dereference is irritating and bloats the code,
when the compiler can detect and do the right thing.
v2: Use a little shim class to produce the automatic dereference
generation at compile time as opposed to runtime, while also
allowing compile-time type checking.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The C++ constructors with placement new, while functional, are
extremely verbose, leading to generation of simple GLSL IR expressions
like (a * b + c * d) expanding to many lines of code and using lots of
temporary variables. By creating a new ir_builder.h that puts simple
generators in our namespace and taking advantage of ralloc_parent(),
we can generate much more compact code, at a minor runtime cost.
v2: Replace ir_instruction usage with just ir_rvalue.
v3: Drop remaining missed as_rvalue() in v2.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The primary motivation for this rewrite was to have a maintainable driver
going forward, as nvfx was quite horrible in a lot of ways.
The driver is heavily based on the design of the nv50/nvc0 3d drivers we
already have, and uses the same common buffer/fence code. It also passes
a HEAP more piglit tests than nvfx did, supports a couple more features,
and a few more to come still probably.
The CPU footprint of this driver is far far less than nvfx, and translates
into far greater framerates in a lot of applications (unless you're using
a CPU that's way way newer than the GPUs of these generations....)
Basically, we once again have a maintained driver for these chipsets \o/
Feel free to report bugs now!
This driver hasn't been maintained properly for a very long time, and for
many very good reasons. It's horrible.
A new driver supporting these chipsets will appear with the commits that
port vieux/nv50/nvc0 to libdrm_nouveau-2.0.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
TEXTURED_TRIANGLE and MULTITEX_TRIANGLE are both a bit special in that if
you use any other graph object in the meantime they'll forget their state
and spew a lovely METHOD_CNT error at you when you try to draw.
The pre-newlib driver has a flush_notify() hook which does this state
re-emit, and a number of random workarounds like extra flushes and state
dirtying after various operations to solve this issue.
I'm taking a slightly different approach to things instead, which has the
nice side-effect of removing the divergent code-paths for ttri/mtri, the
flush/dirty workarounds and the need for flush_notify. Also gives a few
FPS boost in OA, yay.
This is just a function to tell if a certain blend mode requires dual sources.
v2: move to inlines as per Brian's suggestion
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds the blend mode mapping, it also uses the var->index in the
glsl to tgsi convertor - this is the other half of my using 4 in the GLSL
compiler.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds index support to the GLSL compiler.
I'm not 100% sure of my approach here, esp without how output ordering
happens wrt location, index pairs, in the "mark" function.
Since current hw doesn't ever have a location > 0 with an index > 0,
we don't have to work out if the output ordering the hw requires is
location, index, location, index or location, location, index, index.
But we have no hw to know, so punt on it for now.
v2: index requires layout - catch and error
setup explicit index properly.
v3: drop idx_offset stuff, assume index follow location
Signed-off-by: Dave Airlie <airlied@redhat.com>
Add implementations of the two API functions,
Add a new strings to uint mapping for index bindings
Add the blending mode validation for SRC1 + SRC_ALPHA_SATURATE
Add get for MAX_DUAL_SOURCE_DRAW_BUFFERS
v2:
Add check in valid_to_render to address case in spec ERRORS.
v3:
Add index to ir.h so this patch compiles on its own
fixup comment
v4: fixup Brian's comments
The GLSL patch will setup the indices.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This commit adds initial support for acceleration
on SI chips. egltri is starting to work.
The SI/R600 llvm backend is currently included in mesa
but that may change in the future.
The plan is to write a single gallium driver and
use gallium to support X acceleration.
This commit contains patches from:
Tom Stellard <thomas.stellard@amd.com>
Michel Dänzer <michel.daenzer@amd.com>
Alex Deucher <alexander.deucher@amd.com>
Vadim Girlin <vadimgirlin@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The following commits were squashed in:
======================================================================
radeonsi: Remove unused winsys pointer
This was removed from r600g in commit:
commit 96d882939d
Author: Marek Olšák <maraeo@gmail.com>
Date: Fri Feb 17 01:49:49 2012 +0100
gallium: remove unused winsys pointers in pipe_screen and pipe_context
A winsys is already a private object of a driver.
======================================================================
radeonsi: Copy color clamping CAPs from r600
Not sure if the values of these CAPS are correct for radeonsi, but the
same changed were made to r600g in commit:
commit bc1c836938
Author: Marek Olšák <maraeo@gmail.com>
Date: Mon Jan 23 03:11:17 2012 +0100
st/mesa: do vertex and fragment color clamping in shaders
For ARB_color_buffer_float. Most hardware can't do it and st/mesa is
the perfect place for a fallback.
The exceptions are:
- r500 (vertex clamp only)
- nv50 (both)
- nvc0 (both)
- softpipe (both)
We also have to take into account that r300 can do CLAMPED vertex colors only,
while r600 can do UNCLAMPED vertex colors only. The difference can be expressed
with the two new CAPs.
======================================================================
radeonsi: Remove PIPE_CAP_OUTPUT_READ
This CAP was dropped in commit:
commit 04e3240087
Author: Marek Olšák <maraeo@gmail.com>
Date: Thu Feb 23 23:44:36 2012 +0100
gallium: remove PIPE_SHADER_CAP_OUTPUT_READ
r600g is the only driver which has made use of it. The reason the CAP was
added was to fix some piglit tests when the GLSL pass lower_output_reads
didn't exist.
However, not removing output reads breaks the fallback for glClampColorARB,
which assumes outputs are not readable. The fix would be non-trivial
and my personal preference is to remove the CAP, considering that reading
outputs is uncommon and that we can now use lower_output_reads to fix
the issue that the CAP was supposed to workaround in the first place.
======================================================================
radeonsi: Add missing parameters to rws->buffer_get_tiling() call
This was changed in commit:
commit c0c979eebc
Author: Jerome Glisse <jglisse@redhat.com>
Date: Mon Jan 30 17:22:13 2012 -0500
r600g: add support for common surface allocator for tiling v13
Tiled surface have all kind of alignment constraint that needs to
be met. Instead of having all this code duplicated btw ddx and
mesa use common code in libdrm_radeon this also ensure that both
ddx and mesa compute those alignment in the same way.
v2 fix evergreen
v3 fix compressed texture and workaround cube texture issue by
disabling 2D array mode for cubemap (need to check if r7xx and
newer are also affected by the issue)
v4 fix texture array
v5 fix evergreen and newer, split surface values computation from
mipmap tree generation so that we can get them directly from the
ddx
v6 final fix to evergreen tile split value
v7 fix mipmap offset to avoid to use random value, use color view
depth view to address different layer as hardware is doing some
magic rotation depending on the layer
v8 fix COLOR_VIEW on r6xx for linear array mode, use COLOR_VIEW on
evergreen, align bytes per pixel to a multiple of a dword
v9 fix handling of stencil on evergreen, half fix for compressed
texture
v10 fix evergreen compressed texture proper support for stencil
tile split. Fix stencil issue when array mode was clear by
the kernel, always program stencil bo. On evergreen depth
buffer bo need to be big enough to hold depth buffer + stencil
buffer as even with stencil disabled things get written there.
v11 rebase on top of mesa, fix pitch issue with 1d surface on evergreen,
old ddx overestimate those. Fix linear case when pitch*height < 64.
Fix r300g.
v12 Fix linear case when pitch*height < 64 for old path, adapt to
libdrm API change
v13 add libdrm check
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
======================================================================
radeonsi: Remove PIPE_TRANSFER_MAP_PERMANENTLY
This was removed in commit:
commit 62f44f670b
Author: Marek Olšák <maraeo@gmail.com>
Date: Mon Mar 5 13:45:00 2012 +0100
Revert "gallium: add flag PIPE_TRANSFER_MAP_PERMANENTLY"
This reverts commit 0950086376.
It was decided to refactor the transfer API instead of adding workarounds
to address the performance issues.
======================================================================
radeonsi: Handle PIPE_VIDEO_CAP_PREFERED_FORMAT.
Reintroduced in commit 9d9afcb5ba.
======================================================================
radeonsi: nuke the fallback for vertex and fragment color clamping
Ported from r600g commit c2b800cf38.
======================================================================
radeonsi: don't expose transform_feedback2 without kernel support
Ported from r600g commit 15146fd1bc.
======================================================================
radeonsi: Handle PIPE_CAP_GLSL_FEATURE_LEVEL.
Ported from r600g part of commit 171be75522.
======================================================================
radeonsi: set minimum point size to 1.0 for non-sprite non-aa points.
Ported from r600g commit f183cc9ce3.
======================================================================
radeonsi: rework and consolidate stencilref state setting.
Ported from r600g commit a2361946e7.
======================================================================
radeonsi: cleanup setting DB_SHADER_CONTROL.
Ported from r600g commit 3d061caaed.
======================================================================
radeonsi: Get rid of register masks.
Ported from r600g commits
3d061caaed13b646ff40754f8ebe73f3d4983c5b..9344ab382a1765c1a7c2560e771485edf4954fe2.
======================================================================
radeonsi: get rid of r600_context_reg.
Ported from r600g commits
9344ab382a1765c1a7c2560e771485edf4954fe2..bed20f02a771f43e1c5092254705701c228cfa7f.
======================================================================
radeonsi: Fix regression from 'Get rid of register masks'.
======================================================================
radeonsi: optimize r600_resource_va.
Ported from r600g commit 669d8766ff.
======================================================================
radeonsi: remove u8,u16,u32,u64 types.
Ported from r600g commit 78293b99b2.
======================================================================
radeonsi: merge r600_context with r600_pipe_context.
Ported from r600g commit e4340c1908.
======================================================================
radeonsi: Miscellaneous context cleanups.
Ported from r600g commits
e4340c1908a6a3b09e1a15d5195f6da7d00494d0..621e0db71c5ddcb379171064a4f720c9cf01e888.
======================================================================
radeonsi: add a new simple API for state emission.
Ported from r600g commits
621e0db71c5ddcb379171064a4f720c9cf01e888..f661405637bba32c2cfbeecf6e2e56e414e9521e.
======================================================================
radeonsi: Also remove sbu_flags member of struct r600_reg.
Requires using sid.h instead of r600d.h for the new CP_COHER_CNTL definitions,
so some code needs to be disabled for now.
======================================================================
radeonsi: Miscellaneous simplifications.
Ported from r600g commits 38bf276348 and
b0337b679a.
======================================================================
radeonsi: Handle PIPE_CAP_QUADS_FOLLOW_PROVOKING_VERTEX_CONVENTION.
Ported from commit 8b4f7b0672.
======================================================================
radeonsi: Use a fake reloc to sleep for fences.
Ported from r600g commit 8cd03b933c.
======================================================================
radeonsi: adapt to get_query_result interface change.
Ported from r600g commit 4445e170be.
clang warns on these:
stroker.c:626:19: warning: implicit conversion from enumeration
type 'VGPathCommand' to different enumeration type 'VGPathSegment'
[-Wconversion]
No change in the underlying value.
Reviewed-by: Brian Paul <brianp@vmware.com>
Noticed by clang:
brw_wm_surface_state.c:330:30: warning: initializer overrides prior
initialization of this subobject [-Winitializer-overrides]
[MESA_FORMAT_Z24_S8] = 0,
^
brw_wm_surface_state.c:326:30: note: previous initialization is here
[MESA_FORMAT_Z24_S8] = 0,
^
No functionality change, since the array is declared static so
it was zero-initialized by default.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Silences a clang warning:
format_pack.c:2546:30: warning: implicit conversion from 'int' to
'GLubyte' (aka 'unsigned char') changes value from 65535 to 255
[-Wconstant-conversion]
d[i] = d[i] ? 0xffff : 0x0;
~ ^~~~~~
Reviewed-by: Brian Paul <brianp@vmware.com>
Noticed by clang:
egl_st.c:57:50: warning: field precision should have type 'int',
but argument has type 'size_t' (aka 'unsigned long') [-Wformat]
ret = util_snprintf(path, sizeof(path), "%.*s/%s" UTIL_DL_EXT,
~~^~
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
C still treats array arguments exactly like pointer arguments.
By sheer coincidence, this still worked fine on 64-bit
machines where 2 * sizeof(float) == sizeof(void*), but not
on 32-bit.
Noticed by clang:
text.c:76:51: warning: sizeof on array function parameter will
return size of 'const VGfloat *' (aka 'const float *') instead of
'const VGfloat [2]' [-Wsizeof-array-argument]
memcpy(glyph->glyph_origin, glyphOrigin, sizeof(glyphOrigin));
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
Noticed by clang:
eglimage.c:48:28: warning: argument to 'sizeof' in 'memset' call is
the same expression as the destination; did you mean to dereference
it? [-Wsizeof-pointer-memaccess]
memset(attrs, 0, sizeof(attrs));
~~~~~ ^~~~~
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
Most of the 256 values in the 'generic_to_slot' table were supposed to
be initialized with the default value 0xff, but were left at zero
(from CALLOC_STRUCT()) instead.
Noticed by clang:
u_linkage.h:60:31: warning: argument to 'sizeof' in 'memset' call is the same expression as the destination;
did you mean to provide an explicit length? [-Wsizeof-pointer-memaccess]
memset(table, 0xff, sizeof(table));
~~~~~ ^~~~~
Also fix a signed/unsigned comparison and a comment typo here.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
container_of() can legally return anything, even invalid addresses
that cause segfaults, when 'sample' is an uninitialized pointer.
Bug exposed by clang.
NOTE: This is a candidate for the 8.0 branch.
Fix uninitialized scalar field defect reported by Coverity.
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Commit 272bc48976 removed the damage implementation for the
wl_buffer_interface because that has been removed from git master of
Wayland. However this breaks building with the 0.85 branch of Wayland
because it would end up initialising the struct incorrectly.
For the time being it's quite convenient for some compositors to track
the 0.85 branch of Wayland because the protocol is stable but they
will also want to track the master branch of Mesa so that they can use
the gbm surface changes.
This patch adds a compile-time check for the version of Wayland so
that it can work with either Wayland master or the 0.85 branch.
krh: Edited to also account for API changes in 6802eaa68, which
removes the timestamp argument from wl_resource_destroy().
The upstream of gtest has decided that the intended usage model is for
projects to import the source and use it, which is reflected in their
recent removal of the gtest-config tool.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
By making a bool fs_reg only have a defined low bit (matching CMP
output), instead of being a full 0 or 1 value, we reduce the ANDs
generated in logic chains like:
if (v_texcoord.x < 0.0 || v_texcoord.x > texwidth ||
v_texcoord.y < 0.0 || v_texcoord.y > 1.0)
discard;
My concern originally when writing this code was that we would end up
generating unnecessary ANDs on bool uniforms, so I put the ANDs right
at the point of doing the CMPs that otherwise set only the low bit.
However, in order to use a bool, we're generating some instruction
anyway (e.g. moving it so as to produce a condition code update), and
those instructions can often be turned into an AND at that point. It
turns out in the shaders I have on hand, none of them regress in
instruction count:
Total instructions: 262649 -> 262545
39/2148 programs affected (1.8%)
14253 -> 14149 instructions in affected programs (0.7% reduction)
This change (before the previous two) produced a .23% +/- .11%
performance improvement in Unigine Tropics at 1024x768 on IVB.
Total instructions: 269270 -> 262649
614/2148 programs affected (28.6%)
179386 -> 172765 instructions in affected programs (3.7% reduction)
v2: Move some of the logic of finding the instruction that produced
the result of an expression tree to a helper.
This should fit in well with our lower_mat_op_to_vec code: now, in
addition to having expressions on each column of a matrix, we also
split the columns to separate variables so they can be tracked
individually by the copy propagation, dead code, and other passes.
This optimizes out some more code generation in unigine and gstreamer
shaders.
Total instructions: 269342 -> 269270
14/2148 programs affected (0.7%)
2226 -> 2154 instructions in affected programs (3.2% reduction)
I've had this code laying around almost done for a long time. The
idea is like opt_structure_splitting, that we've got a bunch of
transforms at the GLSL IR level that only understand scalars and
vectors, which just skip complicated dereferences. While driver
backends may manage some optimization after they split matrices up
themselves, it would be better to bring all of our optimization to
bear on the problem.
While I wasn't expecting changes quite yet, a few programs end up
winning: a gstreamer convolution shader, and the Humus dynamic
branching demo:
Total instructions: 269430 -> 269342
3/2148 programs affected (0.1%)
1498 -> 1410 instructions in affected programs (5.9% reduction)
The Android build was broken by
commit ca760181b4
Author: Kristian Høgsberg <krh@bitplanet.net>
Date: Fri Mar 16 12:55:40 2012 -0400
shared-glapi: Convert to automake
The offending change was that it redefined the filepaths in sources.mak
like this:
- FOO_FILES := bar.c
+ FOO_FILES := $(TOP)/src/mapi/mapi/bar.c
This broke the build because source filepaths in Android makefiles must be
relative to the makefile.
Ideally, this could be fixed by reverting the change in sources.mak and
making shared-glapi's Makefile.am use $(addprefix $(TOP)/src/mapi/mapi,
$(FOO_FILES)). However, automake doesn't understand builtin GNU make
functions, such as addprefix. So, it seems that automake and Android can
no longer share sources.mak.
Fix the build by duplicating the source lists from sources.mak into
Android.mk.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Keep a reference to any newly allocated aux buffers to avoid
re-allocating for every st_framebuffer_validate() (i.e. leaking).
Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
When using a separate stencil buffer, i965 requires that the pitch of
the buffer (in the 3DSTATE_STENCIL_BUFFER command) be specified as 2x
the actual pitch.
Previously this was accomplished by doubling the "cpp" and "pitch"
values stored in the intel_region data structure, and halving the
height. However, this was confusing, and it led to a subtle (but
benign) bug: since a stencil buffer is W-tiled, its true height must
be aligned to a multiple of 64; we were accidentally aligning its faux
height to a multiple of 64, causing memory to be wasted.
Note that for window system stencil buffers, the DDX also doubles the
cpp and pitch values. To facilitate fixing this DDX server bug in the
future, we fix the cpp and pitch values we receive from the X server
only if cpp has the "incorrect" value of 2.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
v2: Clarify comments about the DDX.
This is a related fix for the Wayland change:
commit 83685c506e76212ae4e5cb722205d98d3b0603b9
Author: Kristian Høgsberg <krh@bitplanet.net>
Date: Mon Mar 26 16:33:24 2012 -0400
Remove wl_buffer.damage and simplify shm implementation
Apparently, this should also fix a memory leak. When wl_buffer.damage
was removed from Wayland and Mesa was not fixed, wl_buffer.destroy ended
up in the (empty) damage function instead of calling
wl_resource_destroy().
Spotted during build as:
CC wayland-drm-protocol.lo
wayland-drm.c:80:2: warning: initialization from incompatible pointer type
wayland-drm.c:82:1: warning: excess elements in struct initializer
wayland-drm.c:82:1: warning: (near initialization for 'drm_buffer_interface')
Signed-off-by: Pekka Paalanen <ppaalanen@gmail.com>
Fixes uninitialized member defects reported by Coverity.
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
This was hacked in in one place for EGL image stuff, but the right
thing to do was just to provide the mapping from the mesa format to
the native hardware format, which includes render target support.
This turns out to be required for GL_ARB_texture_buffer_object, which
sees data in this layout.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It turns out this field *is* used, and it's the stride between samples
from the buffer. Discovered during TBO debugging.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
There was a function full of unused mappings from the GLenum to
datatype/comps, but that wasn't all the information a driver would
want, which includes the other fields that a gl_format has. Given
that all the texture buffer formats were represented in gl_format,
just use that as our description.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We have to skip some work that wants to look at texture images, since
buffer textures don't have any of that complexity.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
All that should be needed is that it exists. Fixes segfaults on first
_mesa_update_context() with a samplerBuffer-using shader active but
without a particular buffer texture enabled.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fix texelFetch(sampler2DRect) and textureSize(samplerBuffer)
generation to not reference a LOD at the same time because it's easier
than not fixing it.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The samplerBuffer type will be undefined in !glsl 1.40, and the
keyword is marked as reserved. The [iu]samplerBuffer types are not
marked as reserved pre-1.40, so they don't have separate tokens and
fall through to normal type handling.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We're supposed to just immediately call it. Fixes piglit
GL_ARB_texture_buffer_object/dlist
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is set correctly in gl.spec, but was missed in Mesa. As a
result, only one of the two was hooked up in Mesa.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We have lexer recognition of a bunch of our types based on the
handling. This code was mapping those recognized tokens to an enum
and then to a string of their name. Just drop the enums and provide
the string directly in the parser.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Nothing actually relied on them being mutable, and there was at least
one cast which discarded const qualifiers. The next patch would have
introduced many more.
Casting away const qualifiers should be avoided if at all possible.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
In "release" builds, Mesa would print this message if the MESA_DEBUG
variable was set. Make it so for debug builds as well.
I build debug builds all the time, but I'm not debugging this.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
While ir_to_mesa contains code that attempts to support functions, I
honestly doubt it's been tested and have little confidence that it
works.
The comment in visit(ir_function *ir) doesn't inspire confidence:
/* Ignore function bodies other than main() -- we shouldn't see calls to
* them since they should all be inlined before we get to ir_to_mesa.
*/
Furthermore, hardware drivers such as i915, i965, and (AFAICT) r200
don't support the BGNSUB/ENDSUB/CAL opcodes anyway. Only swrast does.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This never worked. brwProgramStringNotify also explicitly rejects
programs that use CAL and RET. So there's no need for this to exist.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
When SPRITE_POINT_ENABLE bit is set, the texture coord would be
replaced, and this is only needed when we called something like
glTexEnvi(GL_POINT_SPRITE, GL_COORD_REPLACE, GL_TRUE).
And more, we currently handle varying inputs as texture coord,
we would be careful when setting this bit and set it just when
needed, or you will find the value of varying input is not right
and changed.
Thus we do set SPRITE_POINT_ENABLE bit only when all enabled tex
coord units need do CoordReplace. Or fallback is needed to make
sure the rendering is right.
With handling the bit setup at i915_update_sprite_point_enable(),
we don't need the relative code at i915Enable then.
This patch would _really_ fix the webglc point-size.html test case and
of course, not regress piglit point-sprite and glean-pointSprite
testcase.
NOTE: This is a candidate for stable release branches.
v2: fallback just when all enabled tex coord units need do
CoordReplace (Eric)
v3: move the sprite point validate code at I915InvalidateState (Eric)
v4: sprite point enable bit update based on _NEW_PROGRAM, too
add relative _NEW-state comments to show what state is being used(Eric)
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Fix 'set but not used' warnings; gl_version, gl_versions_profiles and
glx_extensions variables are used just only HAVE_XCB_GLX_CREATE_CONTEXT
is defined. Thus those warnings are shown when that macro isn't defined.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Fixes clang error:
tgsi/tgsi_dump.c:72:12: error: no member named '__printf_chk' in 'struct dump_ctx'
ctx->printf( ctx, "%u", e );
~~~ ^
/usr/include/bits/stdio2.h:109:3: note: expanded from macro 'printf'
__printf_chk (__USE_FORTIFY_LEVEL - 1, __VA_ARGS__)
^
Idea stolen from:
http://www.mail-archive.com/pld-cvs-commit@lists.pld-linux.org/msg210998.html
Reviewed-by: Brian Paul <brianp@vmware.com>
Add the maximum base vertex offset to max_index for computing the
buffer size. Fixes a failed assertion in the u_upload_mgr.c code with
the VMware svga driver.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=48141
v2: incorporate Marek's suggestions.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Return 0 for features we don't support. Added debug_printf()
warnings when we fail to handle a new PIPE_CAP_x case. That will
alert us to interfaces changes in the future. We don't want to
just ignore new PIPE_CAPs and possibly miss something important.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Before, we weren't clamping the vertex colors produced by ARB vertex
programs. This could result in some rendering being too bright (in
ETQW, for example).
Also add cases for PIPE_CAP_VERTEX_COLOR_CLAMPED and
PIPE_CAP_FRAGMENT_COLOR_CLAMPED with comments to be complete.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
We already program all the sampler state correctly, we just didn't give
the GPU a pointer to it for the VS stage. Thus, any texturing other
than texelFetch() wouldn't work.
Fixes piglit test vs-textureLod-miplevels and 99 of oglconform's
glsl-bif-tex subtests.
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested with lp_test_arit with 100% passes and piglit tests with 100%
pass for log but some tests still fail for pow.
Signed-off-by: José Fonseca <jfonseca@vmware.com>
This reduces a little of CPU overhead.
The idea is to translate pipe vertex buffers directly into the CS
and not using any intermediate representations.
Framerate in Torcs:
before: 32.2
after: 34.6
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
st/mesa doesn't allow src_offset to be greater than stride and the maximum
stride r600 supports is 2047.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
llvm-3.1svn r153860 makes MCInstrInfo available to the MCInstPrinter.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Limits maximum loop iterations in a TGSI shader to prevent infinite
loops from occurring, any iteration in any loop counts towards this
limit
Signed-off-by: José Fonseca <jfonseca@vmware.com>
Fixes Coverity resource leak defects.
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Variables have types, expression trees have types, but statements don't.
Rather than have a nonsensical field that stays NULL in the base class,
just move it to where it makes sense.
Fix up a few places that lazily used ir_instruction even though they
actually knew the particular subclass.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously, set_callee() performed some assertions about the type of the
ir_call; protecting the bare pointer ensured these checks would be run.
However, ir_call no longer has a type, so the getter and setter methods
don't actually do anything useful. Remove them in favor of accessing
callee directly, as is done with most other fields in our IR.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Aside from ir_call, our IR is cleanly split into two classes:
- Statements (typeless; used for side effects, control flow)
- Values (deeply nestable, pure, typed expression trees)
Unfortunately, ir_call confused all this:
- For void functions, we placed ir_call directly in the instruction
stream, treating it as an untyped statement. Yet, it was a subclass
of ir_rvalue, and no other ir_rvalue could be used in this way.
- For functions with a return value, ir_call could be placed in
arbitrary expression trees. While this fit naturally with the source
language, it meant that expressions might not be pure, making it
difficult to transform and optimize them. To combat this, we always
emitted ir_call directly in the RHS of an ir_assignment, only using
a temporary variable in expression trees. Many passes relied on this
assumption; the acos and atan built-ins violated it.
This patch makes ir_call a statement (ir_instruction) rather than a
value (ir_rvalue). Non-void calls now take a ir_dereference of a
variable, and store the return value there---effectively a call and
assignment rolled into one. They cannot be embedded in expressions.
All expression trees are now pure, without exception.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Most of the time, we just want to read an ir_dereference, so there's no
need to have these in separate functions. However, the next patch will
want to read an ir_dereference_variable directly.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
When translating a call from AST to HIR, we need to decide whether it
can be evaluated to a constant before emitting any code (namely, the
temporary declaration, assignment, and call.)
Soon, ir_call will become a statement taking a dereference of where to
store the return value, rather than an rvalue to be used on the RHS of
an assignment. It will be more convenient to try evaluation before
creating a call. ir_function_signature seems like a reasonable place.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Currently, ir_call can be used as either a statement (for void
functions) or a value (for non-void functions). This is rather awkward,
as it's the only class that can be used in both forms.
A number of places use ir_call::get_error_instruction() to construct a
generic value of error_type. If ir_call is to become a statement, it
can no longer serve this purpose.
Unfortunately, none of our classes are particularly well suited for
this, and creating a new one would be rather aggrandizing. So, this
patch introduces ir_rvalue::error_value(), a static method that creates
an instance of the base class, ir_rvalue. This has the nice property
that you can't accidentally try and access uninitialized fields (as it
doesn't have any). The downside is that the base class is no longer
abstract.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
generate_call() and ast_function_expression::hir() both tried to verify
that 'out' and 'inout' parameters used l-values. Irritatingly, it
turned out that this was not redundant; both checks caught -some- cases.
This patch combines the two into a single "complete" function that does
all the parameter mode checking. It also adds a comment clarifying why
AST-level checking is necessary in the first place.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We used to have one big function, match_signature_by_name, which found
a matching signature, performed out-parameter conversions, and generated
the ir_call. As the code for matching against built-in functions became
more complicated, I split it internally, creating generate_call().
However, I left the same awkward interface. This patch splits it into
three functions:
1. match_signature_by_name()
This now takes a name, a list of parameters, the symbol table, and
returns an ir_function_signature. Simple and one purpose: matching.
2. no_matching_function_error()
Generate the "no matching function" error and list of prototypes.
This was complex enough that I felt it deserved its own function.
3. generate_call()
Do the out-parameter conversion and generate the ir_call. This
could probably use more splitting.
The caller now has a more natural workflow: find a matching signature,
then either generate an error or a call.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Function calls may have side effects that alter variables used inside
the loop. In the fragment shader, they may even terminate the shader.
This means our analysis about loop-constant or induction variables may
be completely wrong.
In general it's impossible to determine whether they actually do or not
(due to the halting problem), so we'd need to perform conservative
static analysis. For now, it's not worth the complexity: most functions
will be inlined, at which point we can unroll them successfully.
Fixes Piglit tests:
- shaders/glsl-fs-unroll-out-param
- shaders/glsl-fs-unroll-side-effect
NOTE: This is a candidate for release branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Certain applications don't call SwapBuffers before exiting. Yet, we'd
really like to see a bitmap containing the final rendered image even if
they choose never to present it.
In particular, Piglit tests (at least with -auto -fbo) fall into this
category. Many of them failed to dump any images at all.
Dumping one final image at context destruction time seems to work.
We may wish to pursue a more elegant solution later.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Fixes a Coverity resource leak defect.
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These can be used to implement EXT_texture_swizzle without baking
state-dependent swizzle instructions into the shader and forcing
recompiles.
For now, just set them to pass-through mode, so everything continues to
work as it did on Ivybridge. We can optimize this later.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
We only need one sample, since we don't support multisampling yet.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Apparently this needs to be the same as in 3DSTATE_WM.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Getting HiZ working means updating all the state packets for resolves
and clears. It's not worth doing until we get the basics working.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
For now, these all return 0, as I don't yet want to enable Haswell
support. Eventually they will be filled in with proper PCI IDs.
Also add an is_haswell field similar to is_g4x to make it easy to
distinguish Gen7 and Gen7.5.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
According to the BSpec ISA volume's "Accumulator Register" section:
"[DevIVB] SIMD16 execution on dwords is not allowed when accumulator is
explicit source or destination operand."
Fixes piglit tests:
- fs-multiply-const-ivec4
- fs-multiply-const-uvec4
- fs-multiply-ivec4-const
- fs-multiply-uvec4-const
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
This replaces the cryptic void* parameter with a union.
(based on union r600_query_result)
Users of this can still pass uint64* in it, but that cannot work for every
query type, obviously. Most importantly, the code now documents what should
be expected from get_query_result.
This also adds pipe_query_data_pipeline_statistics as per the D3D11 docs.
v2: fix indentation, add comments and use the doxygen style
Reviewed-by: Brian Paul <brianp@vmware.com>
This option allows targets to link against the LLVM shared library
instead of the static libs. With LLVM 2.9, his saves ~11 MB for each of
the r300 target libraries.
Pass a dri2_loader extension to the dri driver when gbm creates the dri
screen. The implementation jumps through pointers in the gbm device
so that an EGL on GBM implementation can provide the real implementations.
The idea here is to be able to create an egl window surface from a
gbm_surface. This avoids the need for the surfaceless extension and
lets the EGL platform handle buffer allocation, while keeping the user
in charge of somehow presenting the buffers (using kms page flipping,
for example).
gbm_surface_lock_front_buffer() locks a surface's front buffer and
returns a gbm bo representing it. This bo should later be returned
to the gbm surface using gbm_surface_release_buffer().
The function that counts the number of TGSI immediates also needs to
emit the immediates. This fixes assorted failures when using polygon
stipple with fragment shaders that have their own immediates.
NOTE: This is a candidate for the 8.0 branch.
They aren't winsys of their own,
just help dealing with them.
v2: add some more comments in vl_winsys.h
Signed-off-by: Christian König <deathsimple@vodafone.de>
"Use -no-undefined to assure libtool that the library has no unresolved
symbols at link time, so that libtool will build a shared library on
platforms that require that all symbols are resolved when the library is linked."
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
This is a regression introduced by commit cdcfd5, which forget to
increase the map_refcount for successfully-mapped region. Thus caused a
wrong non-blanced map_refcount.
This would fix the regression found in the two following webglc testcase
on Pineview platform:
texture-npot.html
gl-max-texture-dimensions.html
Cc: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
createDrawable may return NULL value, we should check it, or it will
make a segment failed.
[minor-indent-issue-fixed-by: Yuanhan Liu]
Signed-off-by: Wang YanQing <udknight@gmail.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
This extension just permits GL_UNPACK_ROW_LENGTH, GL_UNPACK_SKIP_ROWS
and GL_UNPACK_SKIP_PIXELS to be passed to glPixelStore on GLES2 so it
is trivial to implement.
Also fixes the usage of GL_IMPLEMENTATION_COLOR_READ_FORMAT_OES,
which may be set to a BGRA format e.g. for a MESA_FORMAT_ARGB8888 fb.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The extension is already exposed for GLES1, but the APIspec
doesnt allow the usage of GL_BGRA_EXT in glTex(Sub)Image2D.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Noticed this was missing when writing the "glapi: sort ARB extensions
by number" commit, which at least shows it was effective.
Reviewed-by: Brian Paul <brianp@vmware.com>
Noticed it was missing based on the lack of a descriptive enum
name from this bug's error message:
https://bugs.freedesktop.org/show_bug.cgi?id=44039
This moves two enums out of GL3x.xml. Though since this and
GL_ARB_texture_compression_rgtc are both strict subsets of GL3,
both extensions should have had all their enums in that file
to begin with, not just two of them.
Reviewed-by: Brian Paul <brianp@vmware.com>
And add comments to fill in for extensions that aren't there.
Noticed the comment about "ARB extensions sorted by extension number"
didn't extend to the <xi:include> directives when it became clear
GL_ARB_texture_rg was missing, going by the error message seen here:
https://bugs.freedesktop.org/show_bug.cgi?id=44039
This makes it easier to notice in the future if an extension is missing
when it shouldn't be.
Reviewed-by: Brian Paul <brianp@vmware.com>
A later error prints this properly, fix this case to do the same.
v2: remove attribute as per Ian's suggestion
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This adds the xml file covering ARB_blend_func_extended.
v2: fix SRC1_ALPHA
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This also seems like a bad idea. There were too many instances for me
to thoroughly scan the code as I did with the last two patches, but a
quick scan indicated that most callers newly allocate a variable,
dereference it, or NULL-check. In some cases, it wasn't clear that the
value would be non-NULL, but they didn't check for error_type either.
At any rate, not checking for this is a bug, and assertions will trigger
it earlier and more reliably than returning error_type.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The constructor currently returns a ir_dereference_variable of error
type when provided NULL, but that's about to change in the next commit.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Providing a NULL pointer to the ir_dereference_record() constructor
seems like a bad idea. Currently, if provided NULL, it returns a
partially constructed value of error type. However, none of the callers
are prepared to handle that scenario.
Code inspection shows that all callers do one of the following:
- Already NULL-check the argument prior to creating the dereference
- Already deference the argument (and thus would crash if it were NULL)
- Newly allocate the argument.
Thus, it should be safe to simply assert the value passed is not NULL.
This should also catch issues right away, rather than dying later.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Providing a NULL pointer to the ir_dereference_array() constructor seems
like a bad idea. Currently, if provided NULL, it returns a partially
constructed value of error type. However, none of the callers are
prepared to handle that scenario.
Code inspection shows that all callers do one of the following:
- Already NULL-check the argument prior to creating the dereference
- Already deference the argument (and thus would crash if it were NULL)
- Newly allocate the argument.
Thus, it should be safe to simply assert the value passed is not NULL.
This should also catch issues right away, rather than dying later.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
So if anything goes wrong we won't display a random image.
v2: flush before using the surface with the decoder.
Signed-off-by: Christian König <deathsimple@vodafone.de>
If you ran g-s in 16-bpp we'd do a bunch of memory corruption.
now it just misrenders for some other reasons.
applies to stable.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
ir_validate.cpp: In member function ‘virtual ir_visitor_status ir_validate::visit_leave(ir_swizzle*)’:
ir_validate.cpp:458:66: warning: narrowing conversion of ‘ir->ir_swizzle::mask.ir_swizzle_mask::x’ from ‘unsigned int’ to ‘int’ inside { } is ill-formed in C++11 [-Wnarrowing]
ir_validate.cpp:458:66: warning: narrowing conversion of ‘ir->ir_swizzle::mask.ir_swizzle_mask::y’ from ‘unsigned int’ to ‘int’ inside { } is ill-formed in C++11 [-Wnarrowing]
ir_validate.cpp:458:66: warning: narrowing conversion of ‘ir->ir_swizzle::mask.ir_swizzle_mask::z’ from ‘unsigned int’ to ‘int’ inside { } is ill-formed in C++11 [-Wnarrowing]
ir_validate.cpp:458:66: warning: narrowing conversion of ‘ir->ir_swizzle::mask.ir_swizzle_mask::w’ from ‘unsigned int’ to ‘int’ inside { } is ill-formed in C++11 [-Wnarrowing]
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
valgrind complained about an uninitialised value being used in
glsl_parser_extras.cpp, and this was the one it was giving out about.
Just initialise the value in the fakectx.
Signed-off-by: Dave Airlie <airlied@redhat.com>
for some reason when I configure --with-dri-drivers="" the src/mesa/drivers/dri
Makefile tries to call the am--refresh target in the toplevel Makefile,
we don't have one, and I'm not sure what it should look like.
This makes things continue on.
Signed-off-by: Dave Airlie <airlied@redhat.com>
piglit glx-tfp segfaults on llvmpipe when run vs a 16-bit radeon screen,
it now fails instead of segfaulting, much prettier.
Signed-off-by: Dave Airlie <airlied@redhat.com>
When a GL LD_PRELOAD library like apitrace was used,
glXGetProcAddress() would return the preload's symbols instead of
libGL's symbol, leading to infinite recursion when the returned
function was called. This didn't hit apitrace on most apps because
who calls glXGetProcAddress() on the global functions.
The -Bsymbolic, which was present in mklib before automake conversion,
causes the glxcmds.c:GLX_functions table to be resolved at link time,
so that LD_PRELOADs don't affect it any more.
Fixes crashes when running wine under apitrace.
Tested-by: Matt Turner <mattst88@gmail.com>
Tested-by: Marek Olšák <maraeo@gmail.com>
The default was 32 for the EmitNoLoops=0 case. This allows the oZone3D
soft shadows test to work properly with the vmware driver. Jose reported
that SM3 supports up to 255 loop iterations.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Instead of the hard-coded value of 32. Note that MaxUnrollIterations
defaults to 32 so there's no net change. But the gallium state tracker
can override this.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Improves Unigine Tropics performance at 1024x768 by 2.06236% +/-
0.50272% (n=11).
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Unigine Tropics uses INVALIDATE_BUFFER and not UNSYNCHRONIZED to reset
the buffer object when its streaming wraps. Don't penalize it by
flushing the batch at the wrap point, just allocate a new BO and get
to using it.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Use -no-undefined to assure libtool that the library has no unresolved
symbols at link time, so that libtool will build a shared library on
platforms that require that all symbols are resolved when the library
is linked.
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
The force-enable option is dropped, now that the hardware we were
concerned about has HiZ on by default. Now, instead of doing
INTEL_HIZ=0 to test disabling hiz, you can set hiz=false.
v2: Disable separate stencil on gen6 when HIZ is turned off.
(previously, this had to be done manually in addition).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
This was a debug option during gen6 transform feedback bringup (and a
similar one existed during gen4 bringup). However, it looks like
we're done with that, and we don't anticipate it being used again,
either for geometry shaders or transform feedback.
Suggested by: Kenneth Graunke <kenneth@whitecape.org>
This was added in the i915/i965 merge from the i915 driver, but I
don't recall it ever being used since then.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If you want to test the graphics driver, you want to test it under the
conditions that users will see, not some set of additional fallbacks.
If you want to test swrast, run the swrast driver (or no_rast=true)
instead.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
To avoid redundancies, this patch also removes .deps, .libs, and *.la
from .gitignore files in subdirectories.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
As Eric pointed out, we know the cube faces are square at this point
so we only need to test the texture widths for consistency.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The max size was 16Kx16K so a 4 byte/pixel, six-sided cube would require
6 GBytes of memory. If mipmapped, 8 GB. Reduce the max size to 4K to
make the total size more reasonable.
Fixes a crash with the new piglit max-texture-size test.
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Per the spec, only nearest filtering is supported for integer textures.
Otherwise, the texture is incomplete.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Instead of gl_texture_object::_Complete there are now two fields:
_BaseComplete and _MipmapComplete. The former indicates whether the base
texture level is valid. The later indicates whether the whole mipmap is
valid.
With sampler objects, a single texture can appear to be both complete and
incomplete at the same time. See the GL_ARB_sampler_objects spec for more
details. To implement this we now check if the texture is complete with
respect to a sampler state.
Another benefit of this is we no longer need to invalidate a texture's
completeness state when we change the minification/magnification filters
with glTexParameter().
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Merge the mipmap level checking code that was separate cases for 1D,
2D, 3D and CUBE before.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Move the simple MaxLevel < BaseLevel test earlier to be closer to where
we error-check BaseLevel. Also, use the local baseLevel var in more places.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
To make the no-change case faster, as we do for the other object-reference
functions.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
We want to start emitting an INVALID_OPERATION from here for transform
feedback. Note that this forced dlist.c to almost not use this
function, since it wants different behavior during dlist compile.
Just pull the non-TF, non-GS test out for compile, because:
1) TF doesn't matter in that case because there's no drawing.
2) I don't think we're going to see GSes and display lists in the same
context, if we don't do GL_ARB_compatibility.
Reviewed-by: Brian Paul <brianp@vmware.com>
This fixes a build problem where EGL links to libgbm.la, which encodes
a relative path to it's libglapi.so dependency. The relative path
breaks when the linker tries to resolve it from src/egl/main instead
of src/gbm. Typically we silently fall back to the system
libglapi.so, which is wrong and breaks when there isn't one.
Morale of the story: don't mix mklib and libtool.
Although some hardware support NPOT cubemap, but it seems we don't know
the right layout for NPOT cubemap. Thus seems we need do fallback for
other platforms as well.
See comments inline the code for more detailed info.
v2: give a more detailed info about why we need fallback for other
platfroms as well.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=46666
NOTE: This is a candidate for stable release branches.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
If we failed to allocate a memory resource for the texture we'd crash
when we tried to map it. Now we propogate the NULL back up to the
texstore code and generate GL_OUT_OF_MEMORY.
Fixes a crash with the upcoming piglit max-texture-size test.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
From the GLSL 1.30 spec:
The discard keyword is only allowed within fragment shaders. It
can be used within a fragment shader to abandon the operation on
the current fragment. This keyword causes the fragment to be
discarded and no updates to any buffers will occur. Control flow
exits the shader, and subsequent implicit or explicit derivatives
are undefined when this control flow is non-uniform (meaning
different fragments within the primitive take different control
paths).
v2: Don't emit the final HALT if no other HALTs were emitted.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
By setting lod to 0 in the builtin function implementation, we avoid
needing to update all the visitors to ignore LOD in this case, when
the hardware drivers actually want to ask for LOD 0 for rectangular
textures.
Fixes piglit spec/GLSL-1.40/textureSize-*Rect.
v2: Change style of looking for substrings.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is the one builtin function claimed to be dropped due to the
ARB_compatibility split.
Fixes piglit spec/GLSL-1.40/compiler/ftransform.vert
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This makes the process slightly more debuggable, though it would be
nice if the build just failed immediately instead.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Mostly this is a matter of removing variables that have been moved to
the compatibility profile. There's one addition: gl_InstanceID is
present in the core now.
This fixes the new piglit tests for GLSL 1.40 builtin variables.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
llvm-3.1svn r152620 refactored the OProfile profiling code.
createOProfileJITEventListener was moved from the llvm namespace to the
llvm::JITEventListener namespace.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
This avoids extra if statements in the common case of just comparing
two expressions that don't involve assignments or function calls,
along with simplifying the handling of constant expressions. Reduces
i965 instructions generated in unigine tropics and sanctuary,
yofrankie, warsow, gstreamer shaders, and the weston compositor.
shader-db results:
Total instructions: 213052 -> 212752
38/1246 programs affected (3.0%)
14309 -> 14009 instructions in affected programs (2.1% reduction)
The error was removed in:
commit 719909698c
Author: Ian Romanick <ian.d.romanick@intel.com>
Date: Tue Oct 18 16:01:49 2011 -0700
mesa: Rewrite the way uniforms are tracked and handled
The GL_ARB_robustness spec doesn't say the implementation
should truncate the output, so just return after setting
the required error like it did before the above commit.
Also fixup an old comment and add an assert.
NOTE: This is a candidate for the 8.0 branch.
Handle the special case of glFramebufferTextureLayer() for which we pass
teximage = 0 internally in framebuffer_texture(). This patch makes failing
piglit test fbo-array, fbo-depth-array to pass.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=47126
V4: Removed the duplicated code.
Note: This is a candidate for the stable branches.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This will replace the soon-to-be-removed _DD_NEW_SEPARATE_SPECULAR flag.
Note: there's a similar composite _MESA_NEW_NEED_EYE_COORDS flag set already.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Just use the corresponding _NEW_x flags intead. The _DD_NEW_x flags
will be removed in a following patch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
The computed stencil.clear and depth.clear values aren't used anywhere.
Those fields have been removed too.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Set the close on exec flag when opening dri character devices, so they
will be closed and free any resouces allocated in exec.
Signed-off-by: David Fries <David@Fries.net>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This issue might recur on other OSes. If so then it might be better
to remove the C-preprocessor magic, and use fully qualified defines
instead.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This state is needed for deciding whether or not to log
application messages with IDs that haven't been specifically
passed to glDebugMessageControlARB yet.
State for each individual ID number ever passed to
glDebugMessageControlARB (per-context) still needs to be added.
Unfortunately, Unigine Heaven 3.0 still needs this.
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
min_index/max_index are merely conservative guesses, so we can't
make buffer overflow detection based on their values.
Tested-by: Jakob Bornecrantz <jakob@vmware.com>
There are several cases in which we need to explicity "rebase" colors
(ex: set G=B=0) when getting GL_LUMINANCE textures:
1. If the luminance texture is actually stored as rgba
2. If getting a luminance texture, but returning rgba
3. If getting an rgba texture, but returning luminance
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=46679
Also fixes the new piglit getteximage-luminance test.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Based on a patch submitted by Vic Lee. The other part of his patch
which checked the fs pointer wasn't needed.
This fixes a crash when clear() is called before any VS or FS is set.
But this can only happen when the driver is used without the Mesa
state tracker.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
This gets xine working with VDPAU.
v2: some minor bugfixes.
v3: create the resource with the subsampled
format to avoid tilling problems
Signed-off-by: Christian König <deathsimple@vodafone.de>
These will be used by glReadPixels() and glGetTexImage() to fix issues
with reading GL_LUMINANCE and other formats.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Before, we were only counting top-level instructions. But if we have
an assignment of a giant expression tree (such as the ones eventually
generated by glsl-fs-unroll), we were counting the same as an
assignment of a variable deref.
glsl-fs-unroll-explosion now fails in a reasonable amount of time on
i965 because the unrolling didn't go ridiculously far.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I will use SX_MISC instead.
This reverts commit 734792e83f.
Conflicts:
src/gallium/drivers/r600/evergreen_hw_context.c
src/gallium/drivers/r600/evergreen_state.c
src/gallium/drivers/r600/r600_hw_context.c
src/gallium/drivers/r600/r600_pipe.h
draw module calls back into the driver and sets certain parts
of the state to whatever it needs, unfortunately unless you
get the ordering of calls to draw just right you'll end up
reseting your own driver state. That's what was happening to us
draw module would under certain conditions reset our own driver
state.
Reviewed-by: Brian Paul <brianp@vmware.com>
This fixes the libGLU.so.* build when a system libGL.so is not present
since it is relying on the lib/ to build against until it gets
converted to automake.
Tested-by: Stéphane Marchesin <marcheu@chromium.org>
Reviewed-by: Stéphane Marchesin <marcheu@chromium.org>
That is by making the dri extension variables static in gbm_dri.c.
The image_lookup_extension is provided by egl_dri2 when using x11 or wayland
platforms, when using the drm platform, gbm_dri has a wrapper for it.
Both use the same variables name image_lookup_extension.
Since -fvisibility=hidden was (probably by mistake) removed when converting to
automake, the "image_lookup_extension" symbol from egl_dri2.c became exported
in libEGL.so, so "image_lookup_extension" from gbm_dri.c was ignored.
This resulted in calling incorrect callbacks.
We cant make the image_lookup_extension static in egl_dri2.c right now,
since its used across multiple files.
Bugzilla: https://bugs.freedesktop.org/attachment.cgi?id=58099
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
If the texture is a 1D array, don't remove the border pixel from the
height. Similarly for 2D array textures and the depth direction.
Simplify the function by assuming the border is always one pixel.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This patch add the support of gl_PointCoord gl builtin variable for
platform gen4 and gen5(ILK).
Unlike gen6+, we don't have a hardware support of gl_PointCoord, means
hardware will not calculate the interpolation coefficient for you.
Instead, you should handle it yourself in sf shader stage.
But badly, gl_PointCoord is a FS instead of VS builtin variable, thus
it's not included in c.vue_map generated in VS stage. Thus the current
code doesn't aware of this attribute. And to handle it correctly, we
need add it to c.vue_map manually to let SF shader generate the needed
interpolation coefficient for FS shader. SF stage has it's own copy of
vue_map, thus I think it's safe to do it manually.
Since handling gl_PointCoord for gen4 and gen5 platforms is somehow a
little special, I added a lot of comments and hope I didn't overdo it ;)
v2: add a /* _NEW_BUFFERS */ comment to note the state flag dependency
and also add the _NEW_BUFFERS dirty mask (Eric).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=45975
Piglit: glsl-fs-pointcoord and fbo-gl_pointcoord
NOTE: This is a candidate for stable release branches.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
llvm-3.1svn r152043 changes createMCInstPrinter to take an additional
MCRegisterInfo argument.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
This add clipdistance support like the non-llvm draw paths,
if we have a clip distance we compare with it instead of doing
the dot4.
We also have to put the have_clipvertex bit into the emitted
vertex header.
Fixes vs-clip-distance-all-planes-enabled, vs-clip-distance-const-reject,
vs-clip-distance-enables, vs-clip-distance-implicitly-sized,
vs-clip-distance-in-param, vs-clip-distance-uint-index.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This fixes the rest of the piglit clipvertex tests.
v2: fixup comments.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
We incorrectly setup clipmask for gl_ClipVertex, this fixes the clipmask
setup.
v2: fix comment
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
fix comment
This is just a simple text file containing a list of goals for gallivm/llvmpipe
and some info on what is required to get there along with some info on who
is looking at things.
v2: add EXT_texture_array.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
_mesa_max_texture_levels() is also used to test valid texture target
in _mesa_GetTexLevelParameteriv(). GL_TEXTURE_CUBE_MAP is not allowed
as texture target in glGetTexLevelParameter(). So, this should throw
GL_INVALID_ENUM error.
Few other functions which use _mesa_max_texture_levels() like
getcompressedteximage_error_check() and getteximage_error_check()
also don't accept GL_TEXTURE_CUBE_MAP.
Above fix makes piglit fbo-cubemap test to fail. This is because of
incorrect texture target passed to _mesa_max_texture_levels() in
framebuffer_texture(). Fixing that as well
Note: This is a candidate for the stable branches
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
"Use -no-undefined to assure libtool that the library has no
unresolved symbols at link time, so that libtool will build a shared
library on platforms require that all symbols are resolved when the
library is linked."
If I had a dollar for every time I wrote this patch, I'd have about
$10 :-)
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
There's even a comment in the code containing the right swizzling
computations!
Previously this has not been noticed because we need to manually
enabled swizzling on snb/ivb (kernel 3.4 will do that) and we
don't use the separate stencil on ilk (where the bios enables
swizzling). This fixes
piglit ./bin/fbo-stencil readpixels GL_DEPTH32F_STENCIL8 -auto
on recent drm-intel-next kernels.
Also remove the comment about ivb, it's stale now.
Swizzling detection is done by allocating a temporary x-tiled
buffer object. Unfortunately kernels before v3.2 lie on snb/ivb
because they claim that swizzling is enable, but it isn't. The
kernel commit that fixes this for backport to pre-v3.2 is
commit acc83eb5a1e0ae7dbbf89ca2a1a943ade224bb84
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date: Mon Sep 12 20:49:16 2011 +0200
drm/i915: fix swizzling on gen6+
But if the kernel doesn't lie, this now works on swizzling and
not swizzling machines.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This replaces the previously used wl_display_destroy.
wl_display_destroy was povided by wayland-client.so and
wayland-server.so, to resolve that conflict its renamed client-side.
Otherwise streamout with rasterizer discard will make the kernel upset
if the state tracker doesn't set a depth-stencil-alpha state.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Unused by the current stack and APIs, therefore untestable.
It was used to facilitate the transition to integers.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
For polygons, we have been using face culling with success, but that doesn't
work for points and lines.
Setting the point size and line width to 0 fixes it.
Also improve it even more by setting SCREEN_SCISSOR to a zero area.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Implement it right using STRMOUT_CONFIG.RAST_STREAM. This fixes rasterizer
discard with points and lines.
This also adds another derived state. It's a combination of rasterizer discard
and streamout enable.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
We must use VPORT_SCISSOR, because that's the only one we can use for multiple
scissor rectangles in ARB_viewport_array.
R700 can use the VPORT_SCISSOR_ENABLE bit, but R600 doesn't have that and must
emit a 8192x8192 rectangle if scissor is disabled.
This commit also cleanups magic numbers in create_rs_state.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
VPORT_SCISSOR is the OpenGL scissor. How do I know? Because there are
16 of them just like GL4.1 has multiple scissor rectangles.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Also use XXX in the other ones, because it's the most used word for that
purpose in Mesa.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Timer queries should be able to measure the time spent in u_blitter as well.
Queries are split into two groups: the timer ones and the others (streamout,
occlusion), because we should only suspend non-timer queries for u_blitter,
and later if the non-timer queries are suspended, the context flush should
only suspend and resume the timer queries.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
And rename or inline functions where appropriate.
There is no reason to keep this stuff in r600_hw_context.c.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
The current code would ignore the point size specified by gl_PointSize
builtin variable in vertex shader on Pineview. This patch servers as
fixing that.
This patch fixes the following issues on Pineview:
webglc: https://cvs.khronos.org/svn/repos/registry/trunk/public/webgl/sdk/tests/conformance/rendering/point-size.html
piglit: glsl-vs-point-size
NOTE: This is a candidate for stable release branches.
v2: pick Eric's nice tip for fixing this issue in hardware rendering.
v3: the last arg of EMIT_ATTR specify the size in _byte_. (Eric)
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This fixes the egl_gallium.so driver build when no system libEGL.so is
present, since it's relying on the lib/ to build against until it gets
converted to automake.
We were looking at the size of batch.map for how big the batchbuffer
was, but on 865 we just use a single-page batchbuffer due to hardware
limits.
v2: Removed check for sizeof map < bo->size, since that's always false.
[change by anholt]
NOTE: This is a candidate for release branches.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=41495
In order to prevent an overflow of the batch buffer when emitting
triangles, we need to limit the initial primitive to fit within the
current batch. To do we need to measure the remaining space and thence
compute the maximum number of vertices that fit into that space.
Reported-by: Kurt Roeckx <kurt@roeckx.be>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=41495
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Eric Anholt <eric@anholt.net>
NOTE: This is a candidate for release branches.
The hardware, like i915, uses an inclusive bounds on min and max for
the drawing rectangle, but we were providing a number for exclusive.
The number of bits used by the hardware only covers this value going
up to the maximum size, so when we programmed 2048 as the maximum
inclusive X, it saw a maximum X of 0 and clipped all rendering. This
caused rendering failures in gnome-shell.
Fixes piglit fbo-maxsize.
v2: dropped changes to the blitter, which does use an exclusive x2, y2.
[change by anholt]
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=45558
Reviewed-by: Eric Anholt <eric@anholt.net>
NOTE: This is a candidate for release branches.
Michel pointed out that my assumption of a global
index namespace is incorrect and breaks r300g.
Signed-off-by: Christian König <deathsimple@vodafone.de>
This reverts commit d5a6c17254.
llvm-3.1svn r151687 makes MemoryObject accessor members const again.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
This should speed things up a bit, but also shows
some bugs with the kernel implementation.
v2: require xcb-dri2 version 1.8
Signed-off-by: Christian König <deathsimple@vodafone.de>
While the ARB_map_buffer_range extension spec says nothing about these
queries -- they were added in GL 3.0 --, it seems like this could be an
error in the extension spec. This is one of the extensions, like
ARB_framebuffer_object, that "back ports" OpenGL 3.0 functionality to
previous versions. These extensions are supposed to provide identical
functionality to OpenGL 3.0. The other cases of mismatches have been
determined to be bugs in the extension specs.
And tools like apitrace rely on such queries to function properly.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: José Fonseca <jfonseca@vmware.com>
Acked-by: Brian Paul <brianp@vmware.com>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
We currently don't support gl_PrimitiveID, and I believe asking the
hardware to generate it results in vertex cache invalidations.
This could result in slowdowns for applications that use gl_InstanceID,
which would be counter-productive. Just turn it off for now.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
visit(ir_variable *) sets dst_reg::writemask to the appropriate channel
for system values. Unfortunately, visit(ir_dereference_variable *) then
calls swizzle_for_size, which for a float, sets the swizzle to .x.
This works for gl_VertexID, since we store it in the .x component (see
brw_draw_upload.c:732 - VID), but fails for gl_InstanceID (IID) since we
store it in the .y channel.
To fix this, avoid calling swizzle_for_size on ir_var_system_values.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Originally ARB_draw_instanced only specified that ARB decorated name.
Since no vendor actually implemented that behavior and some apps use
the undecorated name, the extension now specifies that both names are
available.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
When you called them in a display list compile before, you would just
end up calling through NULL.
Fixes piglit GL_ARB_draw_instanced/dlist.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Kernels prior to 271d81b84171d84723357ae6d172ec16b0d8139c (March 2011)
don't support relocations outside of the target buffer object. Rather
than guarding this with a I915_PARAM_HAS_RELAXED_DELTA check, just
smash the bound to 0xfffff001 like we do on Ironlake.
This effectively gives us no upper bound check, just like we did prior
to commit 271d81b84171d84723357ae6d172ec16b0d8139c.
Daniel Vetter would also like to mention that this relies on the guard
page at the end of the GTT.
NOTE: This is a candidate for release branches.
Fixes a regression since 271d81b84171d84723357ae6d172ec16b0d8139c.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=46766
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The drivers/ walk-through-subdirs makefile is converted as well so I
didn't need to keep EGL_DRIVERS_DIRS along with the per-driver
HAVE_EGL_DRIVER_WHATEVER.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The default case code was set up in a separate way, while this makes
it more normal. I wanted to add code to the explicit x11 platform and
default x11 platform cases in the next commit.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
All users of the shine table outside of the tnl module
are gone. Move the implementation into the tnl module and
prefix the public functions with _tnl.
Reviewed-by: Alex Deucher <alexdeucher@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
Since the shine tables are now only used in the tnl lighting stage, where
they are validated through the tnl driver function NotifyMaterialChange
called in tnl/t_vb_light.c, we can not omit calling
_mesa_validate_all_lighting_tables (which only validates the shine tables)
in main/light.c.
Reviewed-by: Alex Deucher <alexdeucher@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
Use direct computation of pow for computing the shininess
in _tnl_RasterPos. Since the _tnl_RasterPos function is still
used by plenty drivers that do only need the shine table for
_tnl_RasterPos but do not make use of swtnl computations, this
enables pushing down the shine table computation and validation
into the tnl module, which will happen in a followup change.
Reviewed-by: Alex Deucher <alexdeucher@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
Since the shine tables are implicitly invalidated by having
a different shininess value than the current one, we can
omit the explicit invalidation of the shine table.
Reviewed-by: Alex Deucher <alexdeucher@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
This lets us use the resource_copy_region() path when blitting from
R8G8B8A8 to R8G8B8x8, for example.
v2: be smarter when src_format==dst_format
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Assertions of the form assert(a && b) should be written as separate assertions
so that you can actually tell which part is false when there's a failure.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Move structs, enums, etc so they're in more logical order. In particular,
the shader and transform feedback-related structs/enums were pretty
scattered around.
After biasing we need to clamp to be sure we don't exceed the number of
levels in the mipmap. This fixes an assertion at svga_sampler_view.c:70
v2: simplify the biasing, clamping code per Jose's suggestion.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
We need to allocate new space every time to avoid blocking on the last
HiZ op completing. There are two easy ways to do this:
brw_state_batch() and intel_upload_data(). brw_state_batch() is
simpler and avoids another buffer allocation.
Improves Unigine Tropics performance 0.376416% +/- 0.148722% (n=7).
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The ralloc string appending functions were originally intended for
simple, non-hot-path uses like printing to an info log.
Cuts Unigine Tropics load time by around 20% (6 seconds).
v2: Avoid strlen() on every newline, too.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> [v1]
Acked-by: José Fonseca <jfonseca@vmware.com> [v1]
Both callers of rewrite_tail immediately compute the new total string
length by adding the (known) length of the existing string plus the
length of the newly appended text. Unfortunately, callers generally
won't know the length of the new text, as it's printf-formatted.
Since ralloc already computes this length, it makes sense to add it in
and save the caller the effort. This simplifies both existing callers,
but more importantly, will allow for cheap-appending in the next commit.
v2: The link_uniforms code needs both the old and new length.
Apply the obvious fix (which sadly makes it less of a cleanup).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> [v1]
Acked-by: José Fonseca <jfonseca@vmware.com> [v1]
This adds support for all the opcodes needed for native integer
support with GLSL 1.20 enabled, and some of the ones for GLSL1.30
support.
I've split them between non-cpu and cpu along the same lines
Tom's code did for the other ones I think, but I'm open to review
on which ones should go where.
With instance ids fixed I get no regressions on my box here
with LLVM 2.8, will test with later LLVMs as well.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Backends usually advertise a SVGA3D_DEVCAP_MAX_POINT_SIZE between 63 and
256, so an hardcoded max point size of 80 is often incorrect.
This limitation does not apply for anti-aliased points (as they are done
via draw module) but we still advertise the same limit for both, because
all others pipe drivers do.
Reviewed-by: Brian Paul <brianp@vmware.com>
Mesa has a fast path for the generic fallback when using glReadPixels
for RGBA data which uses memcpy. However it was really difficult to
hit this case because it would not be used if any transferOps are
enabled. Any type apart from floating point or non-normalized integer
types (so any of the common types) would force enabling clamping so
the fast path could not be used. This patch makes it ignore clamping
when determining whether to use the fast path if the data type of the
buffer is an unsigned normalized type because in that case clamping
will not have any effect anyway.
https://bugs.freedesktop.org/show_bug.cgi?id=46631
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Brian Paul <brianp@vmware.com>
postpone unreferences until end of function, as the ones in use will
get naturally dereferenced.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Can't see any reason this wouldn't be better off as an inline.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Some backends may advertise more temps than SVGA3D_TEMPREG_MAX, but the
driver is hardwired to only support up to the value defined by
SVGA3D_TEMPREG_MAX, so clamp to it.
Reviewed-by: Brian Paul <brianp@vmware.com>
r600g is the only driver which has made use of it. The reason the CAP was
added was to fix some piglit tests when the GLSL pass lower_output_reads
didn't exist.
However, not removing output reads breaks the fallback for glClampColorARB,
which assumes outputs are not readable. The fix would be non-trivial
and my personal preference is to remove the CAP, considering that reading
outputs is uncommon and that we can now use lower_output_reads to fix
the issue that the CAP was supposed to workaround in the first place.
KILP instruction inside IF blocks were being lowered to an unconditional
KIL. Since r300 doesn't support branching, when the IF's were lowered
to conditional moves, the KIL would always be executed. This is not a
problem with the mesa state tracker, because the GLSL compiler handles
lowering IF's, but this bug was appearing in the VDPAU state tracker,
which does not use the GLSL compiler.
Note: This is a candidate for the stable branches.
The xvmc state tracker is completely seperate and
doesn't shares code or anything else with the
xorg state tracker.
Signed-off-by: Christian König <deathsimple@vodafone.de>
This patch allows the Mac OS X SCons build to complete. The assembly
sources contain psuedo-ops that not are supported on Mac OS X.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
We were inverting the meaning of the stencil op flags: in svga/d3d
the normal incr/decr wraps and the SAT ops clamp.
This fixes piglit failures (at least stencil-twoside and stencil-wrap).
We should backport this everywhere we can.
Reviewed-by: Brian Paul <brianp@vmware.com>
Two of the switch cases used PIPE_FORMAT_ tokens instead of SVGA3D_ tokens.
As it happens, the token values are equal for these formats so there's no
net change.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
We always mapped the query buffer in begin_query, causing stalls
if the buffer was busy.
This commit reworks it such that the query buffer is only mapped
in get_query_result as it's supposed to be.
The query buffer is no longer treated as a ring buffer. Instead, the results
are just appended and when the buffer is full, we create a new one. One query
can have more than one query buffer, though that's a very rare case.
Begin_query releases all query buffers.
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
From http://www.opengl.org/registry/specs/ARB/seamless_cube_map.txt:
Accepted by the <cap> parameter of Enable, Disable and IsEnabled,
and by the <pname> parameter of GetBooleanv, GetIntegerv, GetFloatv
and GetDoublev:
TEXTURE_CUBE_MAP_SEAMLESS 0x884F
This caused a change in enums.c, which is manually built from the .xml
files.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The linked list of memory allocations was not protected by a mutex.
This lead to sporadic failures with multi-threaded apps.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This fixes another case of faulting when freeing a pipe_sampler_view
that belongs to a previously destroyed context.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Basically, instead of immediately freeing deleted surfaces, hang onto
them in a cache to do quick re-allocation. This helps when surfaces
are frequently destroyed and then reallocated a bit later.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
There was a SVGA_HOST_SURFACE_CACHE_BYTES symbol, but it was never
used.
Now when we go to add a newly deleted surface to the cache we check
if the cache size would be exceeded. If so, try to free the least
recently "unused" surfaces until the cache is smaller. If we can't
do that, simply don't cache the newly deleted surface. The alternative
involves flushing and waiting and we don't want to do that.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Before, if shader translation failed for any reason we'd keep trying
to translate the shader over and over again during state validation.
The dummy fragment shader emits solid red so that might be visual
clue that translation is failing.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The assertion recently added in dst_register() was invalid because that
function is also (suprisingly) used to declare constant registers.
Move the assertion to the callers where we're really creating temp
registers and add some code to prevent emitting invalid temp register
indexes for release builds.
Also, update the comment for get_temp(). It didn't return -1 if it
ran out of registers and none of the callers checked for that.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
And assert on the register index in dst_register(). The dest can
only be an output or temp reg and there's more of the later.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Commit 980f6f1 (mesa: move gl_texture_image::Width/Height/DepthScale
fields to swrast) moved the initialization of the Width, Height, and
DepthScale fields to _swrast_alloc_texture_image_buffer(). However,
i915 doesn't call this function because it performs its own buffer
allocation. As a result, the Width, Height, and DepthScale fields
weren't getting initialized properly, and some operations requiring
swrast would fail.
This patch ensures that Width, Height, and DepthScale are properly
initialized by separating the code that sets them into a new function,
_swrast_init_texture_image(), which is called by
intel_alloc_texture_image_buffer() as well as
_swrast_alloc_texture_image_buffer(). It also moves the
initialization of _IsPowerOfTwo into this function.
Fixes piglit test fbo/fbo-cubemap on i915.
Partially fixes https://bugs.freedesktop.org/show_bug.cgi?id=41216
This is a candidate for the 8.0 branch.
Reviewed-and-tested-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
GBM needs the buffer format in order to communicate with DRM and clients
for things like scanout.
So track the DRI format requested in the various back ends and use it to
return the DRI format back to GBM when requested. GBM will then map
this into the GBM surface type (which is in turn based on the DRM fb
format list).
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
These were rotting in an internal branch, but contain nothing confidential,
and would be much more useful if kept up-to-date with latest gallium
interface changes.
Several authors including Keith Whitwell, Zack Rusin, and Brian Paul.
In the gen6 GS case, we were under-counting and so other state would
get smashed. In the VS case, we were over-counting, so everything was
fine.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Kenneth Graunke <kenneth@whitecape.org>
This was copy and paste from the VS where I had similar code. We're
only looking at things derived from BRW_NEW_VERTEX_PROGRAM in this
block.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Kenneth Graunke <kenneth@whitecape.org>
This is a state which is derived from other states and is actually the first
state which doesn't correspond to any gallium state.
There are two state flags:
bool occlusion_query_enabled
bool flush_depthstencil_enabled
Additional flags can be added later if needed, e.g. bool hiz_enabled.
The emit function will have to figure out the register values by itself.
It basically just emits the registers when the state changes.
This commit also adds a few helper functions for writing registers directly
into a command stream.
This is the first pure command buffer. It contains CS initialization
packets and emits invariant state (i.e. the registers which never or rarely
change).
The affected registers are removed from *_hw_context.c, so that both ways
of emitting commands can co-exist.
v2: emit context_control in cayman's start_cs too
Suggested by José.
We don't provide shader caching in CSO. Most of the time the api provides
object semantics for shaders anyway, and the cases where it doesn't
(eg mesa's internall-generated texenv programs), it will be up to
the state tracker to implement their own specialized caching.
Improves VS state change microbenchmark performance by 7.08729% +/-
1.22289% (n=10) on gen7, because we don't upload the 64 dwords of
unused binding table any more.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is a step toward making the samplers/binding tables reflect
sampler uniform mappings instead of embedding those in the programs.
No significant performance difference on the microbenchmark (n=10).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We always say no. Improves VS state change microbenchmark performance
7.68747% +/- 1.40826% (n=10).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
With this and the previous patch, 640x480 nexuiz is running 0.169118%
+/- 0.0863696% faster (n=121). On a VS state change microbenchmark,
performance is increased 8.28645% +/- 0.460478% (n=52).
v2: Fix CACHE_NEW_VS comment.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This reduces recomputation of state based on non-clipping-related
transform changes, and is a step toward removing VUE map
recomputation.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
For a 1D texture array, the border only applies to the width. For a 2D
texture array the border applies to the width and height but not the depth.
Sucha cases were not handled correctly in _mesa_init_teximage_fields().
Note: This is a candidate for stable branches
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Tracing function entry/exits is a bit pointless
when VDPAU_TRACE=1 does the same thing.
v2: use WARN instead of ERR for application problems
Signed-off-by: Christian König <deathsimple@vodafone.de>
Like TGSI_OPCODE_ARL, destination should be an integer.
This fixes invalid LLVM IR on an internal state tracker (currently Mesa
never emits this opcode).
In the future consider making ADDR register also a integer-as-float array,
like all other register kinds, or simply replace ADDR & ARR/ARL with
integer temp and instructions.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Fixes this GCC warning.
native_drm.c:153:1: warning: ‘drm_display_authenticate’ defined but not
used [-Wunused-function]
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Avoid setting dirty state flags when enabling or disabling a vertex
attribute arrays when there's no change.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
If you're resorting to the dummy shader, you've probably already turned
off SIMD16 mode. But if you didn't, it would die in a fire.
We could either fail to compile in SIMD16 mode...or just fix it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The dummy FB write failed to specify EOT and a message length, causing
the GPU to hang. Now we can enjoy "everyone's favorite color" again.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Fixes this GCC warning.
mask.c: In function ‘mask_layer_fill’:
mask.c:387:12: warning: variable ‘alpha_color’ set but not used
[-Wunused-but-set-variable]
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Fixes these GCC warnings.
glx_api.c: In function ‘choose_visual’:
glx_api.c:678:8: warning: variable ‘trans_value’ set but not used
[-Wunused-but-set-variable]
glx_api.c:677:8: warning: variable ‘trans_type’ set but not used
[-Wunused-but-set-variable]
glx_api.c:663:8: warning: variable ‘min_ci’ set but not used
[-Wunused-but-set-variable]
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Now that we have a index_range_invalid flag, we can just use that rather
than calling vbo_validated_drawrangeelements directly and returning.
NOTE: This is a candidate for release branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This failed to take basevertex into account:
If basevertex < 0:
(end + basevertex) might actually be in-bounds while 'end' is not.
We would have clamped in this case when we probably shouldn't.
This could break application drawing.
If basevertex > 0:
'end' might be in-bounds while (end + basevertex) might not.
We would have failed to clamp in this place. There's a comment
indicating the TNL module depends on max_index being in-bounds;
if so, it would likely break horribly.
Rather than trying to clamp correctly in the face of basevertex, simply
delete the clamping code and indicate that we don't have a valid range.
This causes _tnl_vbo_draw_prims to use vbo_get_minmax_indices() to
compute the actual bounds, which is much safer.
NOTE: This is a candidate for release branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The application supplied [start, end] range is merely a conservative
hint of the ranges of index values inside the index buffer. There is no
requirement that all vertices in the range [start, end] be referenced.
Passing an 'end' value larger than the maximum legal index is perfectly
acceptible; applications can legally pass 0xffffffff when they don't
have a tighter bound readily available.
Thus, the warning doesn't indicate a correctness issue; it could only
indicate a performance issue. However, it does not even do that.
glDrawRangeElements is designed to optimize non-VBO vertex data uploads
by providing an upper bound on the size of buffers a driver would need
to allocate. With VBOs, the data is already in an uploaded buffer, so
the range doesn't help.
The clincher is: we only know _MaxElement for VBOs. For user-space
arrays, we just set it to 2,000,000,000 (see mesa/main/varray.h:63.)
So we can only check this in the case where it is not useful.
Many applications, including the Unigine demos, currently trigger this
warning, which suggests the applications are buggy when they're actually
fine. Eliminating the warning should confuse users less while not
actually losing any benefit to application developers.
NOTE: This is a candidate for release branches.
Suggested-by: Jose Fonseca <jfonseca@vmware.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
There's a serious trap for drivers: RenderTexture() does not indicate
that the texture is currently bound to the draw buffer, despite
FinishRenderTexture() signaling that the texture is just now being
unbound from the draw buffer.
We were acting as if RenderTexture() *was* the start of rendering and
that we could make texturing incoherent with the current contents of
the renderbuffer. This caused intel oglconform sRGB
Mipmap.1D_textures to fail, because we got a call to TexImage() and
thus RenderTexture() on a texture bound to a framebuffer that wasn't
the draw buffer, so we skipped validating the new image into the
texture object used for rendering.
We can't (easily) make RenderTexture() indicate the start of drawing,
because both our driver and gallium are using it as the moment to set
up the renderbuffer wrapper used for things like MapRenderbuffer().
Instead, postpone the setup of the workaround render target miptree
until update_renderbuffer time, so that we no longer need to skip
validation of miptrees used as render targets. As a bonus, this
should make GL_NV_texture_barrier possible.
(This also fixes a regression in the gen4 small-mipmap rendering since
3b38b33c16, which switched
set_draw_offset from image->mt to irb->mt but didn't move the irb->mt
replacement up before set_draw_offset).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44961
NOTE: This is a candidate for the 8.0 branch.
Infer from the operand the type of value to store.
MOV is untyped but we use the float store path.
v2: make MOV use float store path.
I've had to squash merge the ARL fix to be stored
as an integer in here to avoid regressions in a number
of piglit tests.
From now on ARL stores to an integer just like HW does.
Signed-off-by: Dave Airlie <airlied@redhat.com>
The infers the type of data required using the opcode,
and casts the input to the appropriate type.
So far this only handles non-indirect constant and temporaries.
v2: as per Jose suggestion, fetch immediates via floats
Signed-off-by: Dave Airlie <airlied@redhat.com>
These are used inside the action handlers for the integer opcodes.
v2: use uint_bld/int_bld, drop higher level uint_bld.
Signed-off-by: Dave Airlie <airlied@redhat.com>
For now just pass the current context, but when we want to
store int or unsigned we need to pass those later.
Signed-off-by: Dave Airlie <airlied@redhat.com>
These two functions produce the src/dst types for an opcode.
MOV is special since it can be used to mov float->float and int->int,
so just return VOID.
v2: use a new enum for the opcode type as per Jose's suggestion.
Signed-off-by: Dave Airlie <airlied@redhat.com>
If the texture format is integer, the incoming user data must also be
integer (and similarly for non-integer textures).
NOTE: This is a candidate for the stable branches.
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
I recently discovered this text in the BSpec. It seems wise to comply,
though I haven't observed it to fix anything yet.
Fixes a regression in glean/fbo since 28cfa1fa21.
NOTE: This is a candidate for stable release branches.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=45221
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Fixes (with the previous commit) piglit GL_ARB_multisample/pushpop.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
In the table of of push/pop attributes, this one doesn't fall under
the enable group.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This fixes build errors like
In file included from glapi_dispatch.c:91:
../../../src/mapi/glapi/glapitemp.h:4641: error: no previous prototype for
'glDrawBuffersNV'
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Lucas Stach <dev@lynxeye.de>
Similar to the previous commit. Also fix incorrect setting of the
sampler view's state after it's created. We need to specify the
first/last_level fields in the template instead.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Rather than the one in st_texture_object. This sampler view really has
no connection to the one used for rendering.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
And remove needless & 0xff in _mesa_pack_uint_24_8_depth_stencil_row().
As suggested by José.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously, this function only handled 2D textures.
The fallback texture is used when we try to sample from an incomplete
texture object. GLSL says sampling an incomplete texture should return
(0,0,0,1).
v2: use a 1-texel texture image, per José.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Added in _mesa_pack_uint_24_8_depth_stencil_row(). This could be hit
by something like glDrawPixels(GL_DEPTH_STENCIL, GL_UNSIGNED_INT_24_8)
into a MESA_FORMAT_Z32_FLOAT_X24S8 buffer.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The st_renderbuffer_alloc_storage() function is used to allocate both
window-system buffers and user-created renderbuffers. The later kind
are never directly displayed so don't set PIPE_BIND_DISPLAY_TARGET for
those surfaces.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Commit dc7f449d1a introduced a new method
for avoiding MOVs: try to rewrite the destination of the instruction
that produced the RHS so it writes into the LHS.
Unfortunately, this is not safe for swizzled texturing operations, as
they return a set of four contiguous registers. Consider the following:
(assign (x)
(var_ref vec_ctor_x)
(swiz x (tex vec4 (var_ref m_sampY) (var_ref m_cordY) 0 1 ())))
In this case, the source and destination registers are equal, since
reg_offset is 0 for both. Yet, this is only a partial move: the texture
operation generates four registers, and the LHS only covers one.
Fixes color distortion in XBMC when using GLSL shaders.
NOTE: This is a candidate for the 8.0 branch (with the previous commit).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44333
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Certain instructions write more than one register. Texturing, for
example, returns 4 registers. (We set rlen to 4 even for TXS and float
shadow sampling.) Some math functions return 2. Most return 1.
The next commit introduces a use of this function.
NOTE: This is a candidate for the 8.0 branch (dependency of a fix).
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reallocate/resize decompress FBO only if texture image width/height is
greater than existing decompress FBO width/height.
This is a candidate for stable branches.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
A filter strength of zero or one doesn't make any
sense. Thanks to Andy Furniss for pointing this out.
Signed-off-by: Christian König <deathsimple@vodafone.de>
The virtual address but follow the alignment requirement of the
tiled surface. The bo from handle case is not properly fix. Need
bigger change for a proper fix. Work around that by enforcing 1M
alignment for those bo.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Commit 2e5a1a2 (intel: Convert from GLboolean to 'bool' from
stdbool.h.) converted the "specoffset" local variable (in
intel_tris.c) from a GLboolean to a bool. However, GLboolean was the
wrong type for specoffset--it should have been a GLuint (to match the
declaration of specoffset in struct intel_context).
This patch changes specoffset to the proper type.
Fixes piglit test general/two-sided-lighting-separate-specular.
This is a candidate for stable branches.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=45917
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It turns out the same messages work on gen7, we were just being paranoid.
Fixes the penumbra shadows mode of Lightsmark since the register
allocation fix.
NOTE: This is a candidate for release branches.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We just abort later, but at least this should result in more
informative bug reports.
NOTE: This is a candidate for release branches.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
r300g is able to sleep until a fence completes rather than busywait because
it creates a special buffer object and relocation that stays busy until the
CS containing the fence is finished.
Copy the idea into r600g, and use it to sleep if the user asked for an
infinite wait, falling back to busywaiting if the user provided a timeout.
Note: this is a candidate for the stable branches.
Signed-off-by: Simon Farnsworth <simon.farnsworth@onelan.co.uk>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This patch adds the pixel store operations in decompress_texture_image().
decompress_texture_image() is used in glGetTexImage() for compressed
textures with unsigned, normalized values.
It also fixes the failures in intel oglconform pxstore-gettex due to
following sub test cases:
- Test all mipmaps with byte swapping enabled
- Test all small mipmaps with all allowable alignment values
- Test subimage packing for all mipmap levels
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=40864
Note: This is a candidate for stable branches
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
X86Target is a variable, and therefore isn't defined at compile time. So
LLVM_NATIVE_ARCH == X86Target
is translated into
0 == 0
and since X86 is first, we always pick it.
Therefore we replace the logic with PIPE_ARCH_*.
https://bugs.freedesktop.org/show_bug.cgi?id=45420
Fixes a regression from commit 660ed923de.
The basic idea is to look at the format of the dest renderbuffer and
choose either GLubyte or GLfloat for colors. The previous code used
_mesa_format_to_type_and_comps() which could return a bunch types other
than ubyte/float.
Determine the datatype at renderbuffer mapping time to avoid frequent
calls to the format query functions.
NOTE: This is a candidate for the 8.0 branch.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=45578
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=45577
Fix build with llvm-3.1svn.
llvm-3.1svn r149918 changed BufferMemoryObject::getExtent and
BufferMemoryObject::readByte from const member functions to non-const
member functions in include/llvm/Support/MemoryObject.h.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Ironlake appears to check our pointer against the General State Base
Address upper bound, rather than ignoring the zero bound as it ought.
Unfortunately, since we leave GSBA set to zero, there is no logical
upper bound. Set it to the maximum possible value, which should work
since our virtual addresses only go up to 2GB.
+94 piglits.
NOTE: This is a candidate for stable release branches.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=28924
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Improves nexuiz performance 0.65% +/- .10% (n=5) on my gen6, and .39%
+/- .11% (n=10) on gen7. No statistically significant performance
difference on warsow (n=5, but only one shader has MADs).
v2: Add support for MADs in 16-wide by using compression control.
v3: Don't generate MADs when it will force an immediate to be moved to a temp.
(it's not clear whether this is a win or not, but it should result in less
questionable change to codegen compared to v2).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v2)
Our only instruction with a 3rd source so far was linterp, and that
value was never register-allocated.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
WGL_ARB_pixel_format establishes the existence of pixel formats which
are invisible to GDI.
However we still need to pass a valid pixelformat to GDI, so that
context creation/binding works.
The actual WGL_TYPE_RGBA_FLOAT_ARB implementation is from Brian Paul.
The mapping from TEXTURE_x_INDEX to GL_TEXTURE_x was broken in
alloc_proxy_textures() because the elements in the targets[] array
were in the wrong order.
This didn't actually cause any failures since we never really use the
proxy texture's Target field. But let's get it right.
NOTE: This is a candidate for the 8.0 branch.
Use the float tables instead. Pixel maps are seldom used so this
shouldn't be a big deal. Next, we can get rid of the gl_pixelmap::Map8
array.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
There's a mismatch in row strides for compressed textures between
what Driver.MapTextureImage() returns and what the software fetch-texel
functions use. Move it down a layer. The next step would be to fix
this in the fetch-texel functions.
Just use pow() instead. Spot lights aren't too common and fixed-function
lighting isn't as important as it used to me.
This saves 32KB per context. Each table was 4KB and there's 8 lights.
This is a shader based median filter, generally
used for noise reduction, it could still need some
improvements, but should usually work out of the box.
Signed-off-by: Christian König <deathsimple@vodafone.de>
The wm max threads is in the same dword as the dispatch enable. The
hardware gets super angry if you set max threads to 0, even if you
aren't dispatching threads.
Avoid unrollong loops that are either nested loops or
where the loop body times the unroll count is huge.
The change is far from being perfect but it extends the
loop unrolling decision heuristic by some additional
safeguard. In particular this cuts down compilation of
a shader precomputing atmospheric scattering integral
tables containing two nesting levels in a loop from
something way beyond some minutes (I never waited for
it to finish) to some fractions of a second.
This fixes piglit tests glsl-fs-unroll-explosion and
glsl-vs-unroll-explosion on r600g.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
width, height parameter in glTexImage2D() includes: texture image
width + 2 * border (if any). So when doing the texture size check
in _mesa_test_proxy_teximage() width and height should not exceed
maximum supported size for target texture type + 2 * border.
i.e. 1 << (ctx->Const.MaxTextureLevels - 1) + 2 * border
Texture border is anyway stripped out before it is given to intel
or gallium drivers.
This patch fixes Intel oglconform test case:
max_values negative.textureSize.textureCube
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44970
Note: This is a candidate for mesa 8.0 branch.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <idr@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
If we have no more enabled samplers and we've reset all the previously
used ones, no need to keep going around this loop.
(just moved some stuff around to clean it up a bit).
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
From what I can see we were taking the debug path all the time,
when we probably only want it for enable debug path.
Signed-off-by: Dave Airlie <airlied@redhat.com>
We don't want our VBOs mapped when we're drawing. This change checks
if the vertex store VBO is mapped before we execute a list, unmaps it,
then remaps it after drawing. This situation pops up when building a
nested display list in GL_COMPILE_AND_EXECUTE mode.
Reviewed-by: Eric Anholt <eric@anholt.net>
Something has gone wrong if swrast is requested but cannot be
loaded. The user really should be made aware of this, (and instructed
to set LIBGL_DEBUG for more details).
The wording of this error message is updated from "reverting to
indirect rendering" to the more objectively descriptive "failed to
load driver: swrast". The former wording makes assumptions about what
the calling code will decide to do next, rather than simply describing
what went wrong within the current function. The new wording is
consistent with the critical errors recently added for hardware
drivers that fail to load.
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Something has gone wrong if we were asked to load a driver of a
specific name, but it failed to load for some reason. The user really
should be made aware of this, (and instructed to set LIBGL_DEBUG for
more details).
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Sometimes an error is so sever that we want to print it even when the
user hasn't specifically requested debugging by setting LIBGL_DEBUG.
Add a CriticalErrorMessageF macro to be used for this case. (The error
message can still be slienced with the existing LIBGL_DEBUG=quiet).
For critical error messages we also direct the user to set the
LIBGL_DEBUG environment variable for more details.
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
The description of ErrorMessageF was misleading in the case of
LIBGL_DEBUG being unset, (the previous comment could be understood to
mean the error should be printed, but the code does not print in this
case).
InfoMessageF previously had no comment at all.
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
The build was broken by the line below, added in commit 4f82fed4.
s_expression.cpp:26: #include <limits>
Mesa's half of the fix is to add 'external/astl/include' to the include
path. The other half of the fix requires implementing
numeric_limits<float>::infinity() in astl, for which I have patches
submitted upstream for review.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Outputs should be treated in the same way as
inputs and temporaries here.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Alex Deucher <alexdeucher@gmail.com>
If we had no vertex textures or samplers previously and we have none now,
don't bother doing the enables dance.
I was profiling nexuiz on noop and noticed these two functions in the
profile, this drops their usage from 0.86% to 0.03% and 0.23% to 0.03%
for texture and samplers.
Signed-off-by: Dave Airlie <airlied@redhat.com>
We were doing saturate-based clamping on the [0,width] or [0,height]
coordinate, which meant only the first pixel was addressable.
Fixes piglit ARB_texture_rectangle/texwrap-RECT-bordercolor
NOTE: This is a candidate for the 8.0 release branch.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We should be able to merge self-move instruction into the MRF move
anyway, and this simplifies things for the next commit.
NOTE: This is a candidate for the 8.0 release branch.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The HiZ op was implemented as a meta-op. This patch reimplements it by
emitting a special HiZ batch. This fixes several known bugs, and likely
a lot of undiscovered ones too.
==== Why the HiZ meta-op needed to die ====
The HiZ op was implemented as a meta-op, which caused lots of trouble. All
other meta-ops occur as a result of some GL call (for example, glClear and
glGenerateMipmap), but the HiZ meta-op was special. It was called in
places that Mesa (in particular, the vbo and swrast modules) did not
expect---and were not prepared for---state changes to occur (for example:
glDraw; glCallList; within glBegin/End blocks; and within
swrast_prepare_render as a result of intel_miptree_map).
In an attempt to work around these unexpected state changes, I added two
hooks in i965:
- A hook for glDraw, located in brw_predraw_resolve_buffers (which is
called in the glDraw path). This hook detected if a predraw resolve
meta-op had occurred, and would hackishly repropagate some GL state
if necessary. This ensured that the meta-op state changes would not
intefere with the vbo module's subsequent execution of glDraw.
- A hook for glBegin, implemented by brwPrepareExecBegin. This hook
resolved all buffers before entering
a glBegin/End block, thus preventing an infinitely recurring call to
vbo_exec_FlushVertices. The vbo module calls vbo_exec_FlushVertices to
flush its vertex queue in response to GL state changes.
Unfortunately, these hooks were not sufficient. The meta-op state changes
still interacted badly with glPopAttrib (as discovered in bug 44927) and
with swrast rendering (as discovered by debugging gen6's swrast fallback
for glBitmap). I expect there are more undiscovered bugs. Rather than play
whack-a-mole in a minefield, the sane approach is to replace the HiZ
meta-op with something safer.
==== How it was killed ====
This patch consists of several logical components:
1. Rewrite the HiZ op by replacing function gen6_resolve_slice with
gen6_hiz_exec and gen7_hiz_exec. The new functions do not call
a meta-op, but instead manually construct and emit a batch to "draw"
the HiZ op's rectangle primitive. The new functions alter no GL
state.
2. Add fields to brw_context::hiz for the new HiZ op.
3. Emit a workaround flush when toggling 3DSTATE_VS.VsFunctionEnable.
4. Kill all dead HiZ code:
- the function gen6_resolve_slice
- the dirty flag BRW_NEW_HIZ
- the dead fields in brw_context::hiz
- the state packet manipulation triggered by the now removed
brw_context::hiz::op
- the meta-op workaround in brw_predraw_resolve_buffers (discussed
above)
- the meta-op workaround brwPrepareExecBegin (discussed above)
Note: This is a candidate for the 8.0 branch.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Paul Berry <stereotype441@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=43327
Reported-by: xunx.fang@intel.com
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44927
Reported-by: chao.a.chen@intel.com
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
If size is small (such as 1),
pitch = ROUND_DOWN_TO(MIN2(size, (1 << 15) - 1), 4);
makes pitch = 0. Then
height = size / pitch;
causes a division-by-zero exception. If pitch is zero, set height to
1 and avoid the division.
This fixes piglit's bin/getteximage-formats test and glean's
bufferObject test.
NOTE: This is a candidate for the 8.0 release branch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44971
There are cases where a buffer can be mapped while another buffer is
flushed. This can happen in the CopyPixels meta-op path for piglit's
fbo-mipmap-copypix. After some discussion with Eric, it seems this
assertion is no longer necessary, and it has always been too strict.
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=43328
Cc: Eric Anholt <eric@anholt.net>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
This was only used by glReadPixels and glDrawPixels. Now those
functions do the corresponding error checks.
Signed-off-by: Brian Paul <brianp@vmware.com>
Basically the same story as the previous commit. But we were
already calling _mesa_source_buffer_exists() in ReadPixels().
Yeah, we were calling it twice.
Signed-off-by: Brian Paul <brianp@vmware.com>
The _mesa_error_check_format_type() function does two things: check
that format/type is legal and check that the destination (or source
buffer for glReadPixels) actually exists. Just move the relevant
parts of that into _mesa_DrawPixels().
We'll do a similar change in glReadPixels then get rid of the function
altogether.
Signed-off-by: Brian Paul <brianp@vmware.com>
This replaces the _mesa_is_legal_format_and_type() function.
According to the spec, some invalid format/type combinations to
glDrawPixels, ReadPixels and glTexImage should generate
GL_INVALID_ENUM but others should generate GL_INVALID_OPERATION.
With the old function we didn't make that distinction and generated
GL_INVALID_ENUM errors instead of GL_INVALID_OPERATION. The new
function returns one of those errors or GL_NO_ERROR.
This will also let us remove some redundant format/type checks in
follow-on commit.
v2: add more checks for ARB_texture_rgb10_a2ui at the top of
_mesa_error_check_format_and_type() per Ian.
Signed-off-by: Brian Paul <brianp@vmware.com>
Tiled surface have all kind of alignment constraint that needs to
be met. Instead of having all this code duplicated btw ddx and
mesa use common code in libdrm_radeon this also ensure that both
ddx and mesa compute those alignment in the same way.
v2 fix evergreen
v3 fix compressed texture and workaround cube texture issue by
disabling 2D array mode for cubemap (need to check if r7xx and
newer are also affected by the issue)
v4 fix texture array
v5 fix evergreen and newer, split surface values computation from
mipmap tree generation so that we can get them directly from the
ddx
v6 final fix to evergreen tile split value
v7 fix mipmap offset to avoid to use random value, use color view
depth view to address different layer as hardware is doing some
magic rotation depending on the layer
v8 fix COLOR_VIEW on r6xx for linear array mode, use COLOR_VIEW on
evergreen, align bytes per pixel to a multiple of a dword
v9 fix handling of stencil on evergreen, half fix for compressed
texture
v10 fix evergreen compressed texture proper support for stencil
tile split. Fix stencil issue when array mode was clear by
the kernel, always program stencil bo. On evergreen depth
buffer bo need to be big enough to hold depth buffer + stencil
buffer as even with stencil disabled things get written there.
v11 rebase on top of mesa, fix pitch issue with 1d surface on evergreen,
old ddx overestimate those. Fix linear case when pitch*height < 64.
Fix r300g.
v12 Fix linear case when pitch*height < 64 for old path, adapt to
libdrm API change
v13 add libdrm check
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Refine 80aa78142d "dri: make sure to build libdricommon.la"
so we don't build libdricommon if we aren't building a dri driver which needs it (i.e.
if we are just building swrast)
In particular, this restores the ability to build the swrast dri driver without having to
have a xf86drm.h
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
in check_index_bounds the comparison needs to be "greater equal" since
contrary to the name _MaxElement is the count of the array (this matches
similar code in vbo_exec_DrawRangeElementsBaseVertex).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes the build of builtin_compiler on my 32-bit build where xcb-dri2
is in a custom prefix but the custom prefix flags weren't available.
It shouldn't have been in LIBS anyway.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This checks for advertised LLC support by the GPU instead of relying on
the GPU generation for detection.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Rely on libdrm HAS_LLC parameter to verify if hardware supports it. In
case the libdrm version does not supports this check, fallback to older
way of detecting it which assumed that GPUs newer than GEN6 have it.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Was previously being done in a state-tracker, but in a way which was
difficult for some drivers to optimize. Push down to this level and
make it the individual drivers responsibility.
FBOs differ from textures in a significant way. With textures, we can
strip the border and get correct rendering except when the application
fetches texels outside [0,1].
With an FBO, the pixel at (0,0) is in the border. The
ARB_framebuffer_object spec says:
"If the attached image is a texture image, then the window
coordinates (x[w], y[w]) correspond to the texel (i, j, k), from
figure 3.10 as follows:
i = (x[w] - b)
j = (y[w] - b)
k = (layer - b)
where <b> is the texture image's border width..."
Since the border doesn't exist, we can never render any pixels in the
correct location. Just mark these FBOs FRAMEBUFFER_UNSUPPORTED.
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=42336
Ever since xserver commit 531869448d07e00ae241120b59f3aaaa5709d59c,
the server no longer sends invalidate events to clients, unless they
have performed a GetBuffers request since the drawable was last
invalidated.
If the drawable gets invalidated immediately after the GetBuffers
request was processed by the X server, it's possible that Xlib
will process the invalidate event while waiting for the GetBuffers
reply. So the server, thinking the client knows that the buffers
are invalid, is waiting for another GetBuffers request before
sending any more invalidate events. The client, on the other hand,
believes the buffers to be valid, and thus is expecting to receive
another invalidate event before it has to send another GetBuffers
request. The end result is that the client never again sends
a GetBuffers request.
To avoid this problem, take a snapshot of the lastStamp before
doing GetBuffers, and retry if the snapshot and the current
lastStamp no longer match after the GetBuffers reply has been
processed.
Signed-off-by: Ville Syrjälä <syrjala@sci.fi>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The error message I chose matches gcc's error. Fixes piglit
switch-case-duplicated.vert.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Otherwise, the upcoming error messages said the location was 0:0(0).
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
It's not quite spelled out in the spec text, but the grammar indicates
that only constant values are allowed as switch() case labels (and
only constant values make sense, anyway).
Fixes piglit glsl-1.30/compiler/switch-statement/switch-case-uniform-int.vert.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This stuffs them all in a struct for sanity. Fixes piglit
glsl-1.30/execution/switch/fs-uniform-nested.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
For all the extension entrypoints using the get_buffer() helper, they
wanted the same error handling. In some cases, the error was doing
the same error return whether target was a bad enum, or a user buffer
wasn't bound.
(Actually, GL_ARB_map_buffer_range doesn't specify the error for a zero
buffer being bound for MapBufferRange, though it does for
FlushMappedBufferRange. This appears to be an oversight).
Fixes piglit GL_ARB_copy_buffer/negative-bound-zero.
Reviewed-by: Brian Paul <brianp@vmware.com>
Even though it should be safe to use them for one frame, better be sure.
Suggested by Michael Dänzer.
NOTE: This is a candidate for the 8.0 stable branch.
Signed-off-by: Lauri Kasanen <cand@gmx.com>
This prevents a possible lapse of the depth buffer - the situation where
the app and pp have different depth buffers.
NOTE: This is a candidate for the 8.0 stable branch.
Signed-off-by: Lauri Kasanen <cand@gmx.com>
In commit 6ecee54a9a a call to
talloc_reference was replaced with a call to talloc_steal. This was in
preparation for moving to ralloc which doesn't support reference
counting.
The justification for talloc_steal within token_list_append in that
commit is that the tokens are being copied already. But the copies are
shallow, so this does not work.
Fortunately, the lifetime of these tokens is easy to understand. A
token list for "replacements" is created and stored in a hash table
when a function-like macro is defined. This list will live until the
macro is #undefed (if ever).
Meanwhile, a shallow copy of the list is created when the macro is
used and the list expanded. This copy is short-lived, so is unsuitable
as a new parent.
So we can just let the original, longer-lived owner continue to own
the underlying objects and things will work.
This fixes bug #45082:
"ralloc.c:78: get_header: Assertion `info->canary == 0x5A1106'
failed." when using a macro in GLSL
https://bugs.freedesktop.org/show_bug.cgi?id=45082
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
NOTE: This is a candidate for stable release branches.
This test cases exposes a bug as described in this bug report:
"ralloc.c:78: get_header: Assertion `info->canary == 0x5A1106'
failed." when using a macro in GLSL
https://bugs.freedesktop.org/show_bug.cgi?id=45082
Clearly, some memory is getting (incorrectly) freed on the first macro
invocation, leading to problems with the second macro invocation.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The trick here is that flex always chooses the rule that matches the most
text. So with a input text of "two:" which we want to be lexed as an
IDENTIFIER token "two" followed by an OTHER token ":" the previous OTHER
rule would match longer as a single token of "two:" which we don't want.
We prevent this by forcing the OTHER pattern to never match any
characters that appear in other constructs, (no letters, numbers, #,
_, whitespace, nor any punctuation that appear in CPP operators).
Fixes bug #44764:
GLSL preprocessor doesn't replace defines ending with ":"
https://bugs.freedesktop.org/show_bug.cgi?id=44764
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
NOTE: This is a candidate for stable release branches.
GL_RG_INTEGER only has two components, not three. I'll be surprised
if anyone ever tries to glReadPixels(..., GL_SHORT, GL_RG_INTEGER,
...). This was found by inspection.
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
With 0963990 the flag was only set when Bind created the object. In
all cases where ::ARBsemantics could be true, this path never
happened. Instead, add a _Used flag to track whether a VAO has ever
been bound. On the first Bind, set the _Used flag, and set the
ARBsemantics flag to the correct value.
NOTE: This is a candidate for release branches.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=45423
This is a hack, and it will result in incorrect rendering. However,
it does eliminate spurious warnings in several piglit CopyPixels tests
that involve floating-point depth buffers.
The real solution is to add a zf field to SWspan to store float Z
values. When a float depth buffer is involved, swrast should also
populate the zf field. I'll consider this post-8.0 work.
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Now that the draw module avoids flushing, it may flush precisely when
binding a NULL shader, so care must be taken when restoring the original
fragment shader.
Reviewed-by: Brian Paul <brianp@vmware.com>
When GLAPIENTRY is __stdcall (ie Windows), the stack is popped by the
callee making the number/type of arguments significant, therefore
using a generic no-op causes stack corruption for many entry-points.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
width, height parameter in glTexImage2D() includes: texture image
width + 2 * border (if any). So when doing the texture size check
in _mesa_test_proxy_teximage() width and height should not exceed
maximum supported size for target texture type.
i.e. 1 << (ctx->Const.MaxTextureLevels - 1)
Texture border is anyway stripped out before it is given to intel
or gallium drivers.
This patch fixes Intel oglconform test case: max_values
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44970
Note: This is a candidate for mesa 8.0 branch.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <idr@freedesktop.org>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
In certain situations API's will call pipe->clear which doesn't
require fragment shader, but then we'd try to verify the pipeline
and assume fragment shader was always set. This was leading to
crash when API would just call simple clear's before anything else.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
The node_attrsz[] array is initially copied from the node->attrsz[]
array but some values get rewritten. Thereafter, we need to use the
node_attrsz[] values.
Fixes a bug when replaying a display list that uses generic vertex
array[16] (at least).
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The warnings were:
nv50_pc_regalloc.c: In function ‘pass_generate_phi_movs’:
nv50_pc_regalloc.c:423:41: warning: array subscript is above array bounds
codegen/nv50_ir_peephole.cpp: In member function ‘bool nv50_ir::MemoryOpt::replaceStFromSt(nv50_ir::Instruction*, nv50_ir::MemoryOpt::Record*)’:
codegen/nv50_ir_peephole.cpp:1475:18: warning: array subscript is above array bounds
codegen/nv50_ir_peephole.cpp:1475:18: warning: array subscript is above array bounds
codegen/nv50_ir_peephole.cpp:1475:18: warning: array subscript is above array bounds
codegen/nv50_ir_peephole.cpp:1475:18: warning: array subscript is above array bounds
And add some assertions to catch this sooner in debug builds.
This fixes a dangling texture object pointer bug hit via wglShareLists().
When we push the GL_TEXTURE_BIT state we may push references to the default
texture objects which are owned by the gl_shared_state object. We don't
want to accidentally delete that shared state while the attribute stack
references shared objects. So keep a reference to it.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
This cleans up the reference counting of shared context state.
The next patch will use this to fix an actual bug.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
This also significantly improves the RV670 flush by using the CB1 flush
*always* and also DEST_BASE_0_ENA, which appears to magically fix some tests.
I am not entirely sure, but it's possible that RV670 flushing is fixed
completely.
v2: fix cayman by flushing texture cache instead of vertex cache
Thanks to Dave Airlie for testing Cayman.
Commit 99476561 (automake: src/glsl and src/glsl/glcpp) changed the
build system so that src/glsl/glsl_test is not built by default. This
inadvertently broke "make check", since the tests in
src/glsl/tests/lower_jumps (which are run by "make check") rely on
glsl_test.
This patch ensures that "make check" builds glsl_test before running
any tests.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes these GCC warnings.
osmesa.c: In function ‘osmesa_renderbuffer_storage’:
osmesa.c:417: warning: comparison is always false due to limited range of data type
osmesa.c:423: warning: comparison is always false due to limited range of data type
osmesa.c:431: warning: comparison is always false due to limited range of data type
osmesa.c:437: warning: comparison is always false due to limited range of data type
osmesa.c:447: warning: comparison is always false due to limited range of data type
osmesa.c:453: warning: comparison is always false due to limited range of data type
osmesa.c:463: warning: comparison is always false due to limited range of data type
osmesa.c:466: warning: comparison is always false due to limited range of data type
osmesa.c:476: warning: comparison is always false due to limited range of data type
osmesa.c:479: warning: comparison is always false due to limited range of data type
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Signed-off-by: Brian Paul <brianp@vmware.com>
Success was (tests-passed AND valgrind-tests-passed) but this meant that
if the valgrind tests weren't run it would be considered a failure.
The logic is now (tests-passed AND (!valgrind OR valgrind-tests-passed))
which lets us return success if the valgrind tests aren't run.
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Matt Turner <mattst88@gmail.com>
Needed for automake. Using AC_PROG_PATH(bison/flex) causes automake to
fail to build .y and .l files.
It is up to the builder to use bison/flex instead of yacc/lex.
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Matt Turner <mattst88@gmail.com>
Exporting a publicly visible class with a generic name like
"variable_entry" via ir_variable_refcount.h is kind of mean.
Many IR transformers would like to define their own "variable_entry"
class. If they accidentally include this header, the compiler/linker
may get confused and try to instantiate the wrong variable_entry class,
leading to bizarre runtime crashes.
The hope is that renaming this one will allow .cpp files to safely
declare and use their own file-scope "variable_entry" classes.
This avoids crashes caused by converting src/glsl to automake.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-and-tested-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This uses point size clamping to force point size to a particular value,
making the vertex shader output irrelevant.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
We don't set the other bits anywhere else except the other DSA states,
which are mutually-exclusive with this one.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
This fixes the gl_PointSize transform feedback test.
Point size clamping should happen at the rasterizer stage,
i.e. after the vertex and geometry shaders and transform feedback.
Drivers are expected to do this by themselves.
Simplifies the general case code in the ubyte-valued texture format
functions. More consolidation to come in subsequent commits.
Reviewed-by: Eric Anholt <eric@anholt.net>
Specifially, this being present works around a bug in Unigine
Sanctuary on i965 which previously resulted in bad rendering.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This can be used to work around broken application behavior, like in
Unigine where it attempts to use texture arrays without declaring
either "#extension GL_EXT_texture_array : enable" or "#version 130".
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
While typing out the new decode, I added a fallback mode for dumping
when we fail to re-map the BO after execution. This should get us a
minimal dump when trying to dump a batch that results in a GPU hang.
We were allocating registers into the MRF hack region, resulting in
sparkly renering in a few of the scenes. We could do better
allocation by making an MRF class, having MRFs conflict with the
corresponding GRFs, and tracking the live intervals of the "MRF"s and
setting up the conflicts. But this is way easier for the moment.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
After the removal of the dri driver link test, this should help avoid
the original problem that it was designed to catch: The warning about
a missing prototype due to typoing a function name scrolling by in the
Mesa build spew, and you not noticing until you try to run an
application and it falls back to swrast.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The envvar works for R100 and R200 too, and the classic R300 driver
doesn't even exist anymore.
"RADEON_NO_TCL" is already mentioned in the code and is the same envvar
used for the R300g driver.
lp_bld_tgsi_soa.c has been adapted to use this new interface, but
lp_bld_tgsi_aos.c has only been partially adapted, since nothing in
gallium currently uses it.
v2:
- Rename lp_bld_tgsi_action.[ch] => lp_bld_tgsi_action.[ch]
- Initialize tgsi_info in lp_bld_tgsi_aos.c
- Fix copyright dates
Prior commit 576161289d,
the parameter format was bpp, thus both 24bit and 32bit formats were
requested with format set to 32. Handle 24bit seperately now.
Fixes RGBX formats in wayland platform for egl_dri2 (EGL_ALPHA_SIZE=0).
Note: This is a candidate for the 8.0 branch.
This just copies what the LUMINANCE_ALPHA bits do.
Fixes piglit tests on softpipe complaining about missing unpack.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Cayman needs some of the MUL instructions spread across a full slot
of vectors.
It also no longer has RECIP_UINT, the recommendation is to replace it
with a U2F + RECIP_IEEE + MUL + F2U.
Signed-off-by: Dave Airlie <airlied@redhat.com>
The warning is absolutely useless. It doesn't actually say that there are
uninitialized variables. It points out the fact that there are missing
initializers and that variables are initialized to zero implicitly, which is
exactly what we want and what we commonly make use of.
C90 and C99 require all unspecified variables in the initializer list to be set
to zero.
The check for ctx->API was unnecessary, because OES extensions are not exposed
in desktop GL.
Also require renderbuffer support for ARB_texture_rgb10_a2ui,
as per the spec.
Tested by comparing old and new glxinfo with softpipe and r600g.
v2: fix bugs
v3: rename need_only_one -> need_at_least_one
rename num_elements -> num_mappings
add comments
use const when appropriate
Reviewed-by: Brian Paul <brianp@vmware.com>
This change is not exactly equivalent (sometimes we checked for non-zero,
sometimes if >0 or >1), but the behavior shouldn't change, because all drivers
report 0 for unsupported CAPs.
Exposing CAP_STREAM_OUTPUT_PAUSE_RESUME without CAP_MAX_STREAM_OUTPUT_BUFFERS
is a driver bug and st/mesa does no checking if the latter is supported as
well. Drivers must report CAPs consistently.
v2: make the array const
v2: handle the cap in r300 and r600 as well
Additional info for r600g:
The env var R600_GLSL130=1 enables GLSL 1.3.
Along with R600_STREAMOUT=1, it enables full GL 3.
Fix an access to uninitialized memory pointed out by valgrind in
glsl_to_tgsi_visitor::simplify_cmp(void).
Note: This is a candidate for the 8.0 branch.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Fix this GCC warning.
draw_pipe_clip.c: In function ‘interp’:
draw_pipe_clip.c:122:13: warning: variable ‘clip_dist’ set but not used
[-Wunused-but-set-variable]
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
When rendering to FBO, rendering is inverted. At the same time, we would
also make sure the point sprite origin is inverted. Or, we will get an
inverted result correspoinding to rendering to the default winsys FBO.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44613
NOTE: This is a candidate for stable release branches.
v2: add the simliar logic to ivb, too (comments from Ian)
simplify the logic operation (comments from Brian)
v3: pick a better comment from Eric
use != for the logic instead of ^ (comments from Ian)
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This simplifies the code quite a bit, consolidates some cases and
possibly catches more cases for the memcpy path.
More such changes will follow. Do just a few at a time to help bisect
any possible regressions.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This will let us use memcpy in more situations. We can also remove
the checks for byte spapping that happen before the calls to
_mesa_format_matches_format_and_type().
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
In a recent commit,
commit 1c0f1dd42a
Author: Chad Versace <chad.versace@linux.intel.com>
swrast: Fix fixed-function fragment processing
I defined a new function,_swrast_fragment_program, but neglected
to #include s_fragprog.h for clients of that function.
Note: This is a candidate for the 8.0 branch.
Reported-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
The evergreen+ CB no longer supports the following formats
compared to 6xx/7xx:
- COLOR_4_4
- COLOR_3_3_2
- COLOR_6_5_5
- COLOR_8_24_FLOAT
- COLOR_24_8_FLOAT
- COLOR_11_11_10
- COLOR_11_11_10_FLOAT
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
On i965, _mesa_ir_link_shader is never called. As a consequence, the
current fragment program (ctx->FragmentProgram->_Current) exists but is
invalid because it has no instructions. Yet swrast continued to attempt to
use the empty program.
To avoid using the empty program, this patch 1) defines a new function,
_swrast_use_fragment_program, which checks if the current fragment program
exists and differs from the fixed function fragment program, and, when
appropriate, 2) replaces checks of the form
if (ctx->FragmentProgram->_Current == NULL)
with
if (_swrast_use_fragment_program(ctx))
Fixes the following oglconform regressions on i965/gen6:
api-fogcoord(basic.allCases.log)
api-mtexcoord(basic.allCases.log)
api-seccolor(basic.allCases.log)
api-texcoord(basic.allCases.log)
blend-separate(basic.allCases)
colorsum(basic.allCases.log)
The tests were ran with the GLXFBConfig:
visual x bf lv rg d st colorbuffer sr ax dp st accumbuffer ms cav
id dep cl sp sz l ci b ro r g b a F gb bf th cl r g b a ns b eat
----------------------------------------------------------------------------
0x021 24 tc 0 32 0 r y . 8 8 8 8 . . 0 24 8 0 0 0 0 0 0 None
(Note: I originally believed that the hunk in
_swrast_update_fragment_program was unnecessary. But it is required to fix
blend-separate.)
Note: This is a candidate for the 8.0 branch.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=43327
Reveiwed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <idr@freedesktop.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Color clamping should be enabled in glGetTexImage if texture dataType is
GL_UNSIGNED_NORMALIZED and format is GL_LUMINANCE or GL_LUMINANCE_ALPHA
Fixes 2 Intel oglconform test cases: pxconv-gettex and pxtrans-gettex
https://bugs.freedesktop.org/show_bug.cgi?id=40864
NOTE: This is a candidate for the 8.0 branch
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This was losing bits of precision. Fixes (with the previous commits):
piglit EXT_texture_integer/getteximage-clamping
piglit EXT_texture_integer/getteximage-clamping GL_ARB_texture_rg
oglc advanced.mipmap.upload
Regresses oglc negative.typeFormatMismatch.teximage from fail to
abort, because it's been hitting texstore for a format/type combo that
shouldn't happen.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
In the core, we always treat spans of int/uint data as uint, so this
extract function was truncating storage of integer pixel data to a n
int texture to (0, max_int) instead of (min_int, max_int). There is
probably missing code for handling truncation on conversion between
pixel formats, still, but this does improve things.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
Mostly fixes piglit EXT_texture_integer/getteximage-clamping. The
remaining failure involves precision loss on storing of int32 texture
data (something I knew was an issue, but wasn't trying to test).
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
This cut and paste is pretty awful. I'm tempted to do a lot of this
using preprocessor tricks for customizing the parameter type from a
template function, but that's just a different sort of hideous.
Fixes 8 Intel oglconform int-textures cases.
NOTE: This is a candidate for the 8.0 branch.
v2: Add alpha formats, too.
Reviewed-by: Brian Paul <brianp@vmware.com>
Otherwise, when you asked for the _BaseFormat of an rb wrapping a
GL_RGB texture, you got GL_RGBA because that's what we were storing
the texture data as.
NOTE: This is a candidate for the 8.0 branch.
Most of this function was just calling
intel_renderbuffer_update_wrapper(), which was called immediately
afterwards in the only caller.
NOTE: This is a candidate for the 8.0 branch.
Fixes piglit ARB_copy_buffer-overlap, on swrast, which previously
assertion failed.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
A pure swrast-allocated buffer gets an irb of NULL, so we segfaulted
in the clear-accum test. Just look at the swrast renderbuffer pointer
for handling swrast rbs.
From the extension spec:
Added to section 5.4, as part of the discussion of which commands
are not compiled into display lists:
"Certain commands, when called while compiling a display list, are
not compiled into the display list but are executed immediately.
These are: ..., RenderbufferStorageMultisampleEXT..."
Fixes piglit EXT_framebuffer_multisample/dlist.
Reviewed-by: Brian Paul <brianp@vmware.com>
Noticed when handling a similar problem in EXT_framebuffer_multisample.
From the EXT_framebuffer_object spec:
Added to section 5.4, as part of the discussion of which commands
are not compiled into display lists:
"Certain commands, when called while compiling a display list, are
not compiled into the display list but are executed immediately.
These are: ..., GenFramebuffersEXT, BindFramebufferEXT,
DeleteFramebuffersEXT, CheckFramebufferStatusEXT,
GenRenderbuffersEXT, BindRenderbufferEXT, DeleteRenderbuffersEXT,
RenderbufferStorageEXT, FramebufferTexture1DEXT,
FramebufferTexture2DEXT, FramebufferTexture3DEXT,
FramebufferRenderbufferEXT, GenerateMipmapEXT..."
Reviewed-by: Brian Paul <brianp@vmware.com>
Constants array is always assumed to be RGBA, which means we need to
swizzle the constant elements into place to match the AoS ordering
(e.g., BGRA) that was passed to lp_build_tgsi_aos().
Signed-off-by: José Fonseca <jfonseca@vmware.com>
Should avoid dangling pointer derreference with
glean --run results --overwrite --quick --tests texSwizzle
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
We just prefix the $CLANG environment variable in configure.ac with acv_mesa_
Found by: tinderbox
Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This was horribly broken and has cost everyone more time than we were
ever going to save using it. It might have been fixable, but the
problem it was originally trying to solve can be better solved with
-Werror=missing-prototypes and -Werror=implicit-function-declaration.
Also, it was always producing a big scary warning about how the link
test was non-portable.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44928
Substantially increases performance in GLBenchmark PRO:
- 320x240 => 3.28x
- 1920x1080 => 1.47x
- 2560x1440 => 1.27x
The LD message ignores the sampler unit index and SAMPLER_STATE pointer,
instead relying on hard-wired default state. Thus, there's no need to
worry about running out of sampler units or providing SAMPLER_STATE;
this small patch should be all that's required.
NOTE: This is a candidate for release branches.
(It requires the preceding commit to compile.)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
brw_SAMPLE is full of complex workarounds for original Broadwater
hardware, and I'd rather avoid all that for my next Ivybridge patch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This function releases the buffer that contains user-space vertex data.
The buffer_offset field points into that buffer. So reset the
buffer_offset to zero when we release the buffer so that subsequent
draws don't inadvertantly get a bad offset.
Fixes error messages / failed assertions (in the draw module's bounds/size
checking code) when running piglit's polygon-mode test.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
-fvisibility=hidden was preventing them from being exported, which
combined with shared-glapi was causing undefined symbol errors at
runtime.
We don't want to make these functions part of the ABI, and given
how simple they are, we simply inline them.
From c998f732d42da5e962fe5da294493132c3e8dc5f Mon Sep 17 00:00:00 2001
From: Lucas Stach <dev@lynxeye.de>
Date: Tue, 24 Jan 2012 09:46:32 +0100
Subject: [PATCH] nvfx: fix nv3x fallout from state validation changes
Apparently nv3x needs some curde hacks to work properly. This
is clearly not the right fix, but it's the behaviour of the old
code and fixes regressions seen by users.
This has the drawback that when creating configure for
distribution, wayland needs to be available for the packager.
Also the the macros has the wayland prefix hardcoded, so
we cant copy it in mesa right now.
In bad applications like ipers which does a lot of draw calls with
no state changes this helps to greatly reduce time spent in prepare.
In ipers around 7% of CPU was spent in various prepare functions,
after this commit no prepare function show on the profile.
This commit also has the added benefit of now grouping all pipelined
drawing into a single draw call if the driver uses vbuf_render.
Reviewed-by: Stéphane Marchesin <marcheu@chromium.org>
Tested-by: Stéphane Marchesin <marcheu@chromium.org>
Previously, max_vs_entries was set to 128 for GT1, and 256 for GT2,
based on the PRM (see Vol2, part1, p28). However, Bspec section 1.6.5
indicates that the maximum number of VS entries is 256 for GT1.
No piglit regressions on GT1.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When storing data in a buffer of type DYNAMIC_DRAW, we don't create a
drm_intel_bo for it; instead we store the data in system memory and
defer allocation of the GPU buffer until it is needed. Therefore, in
brw_update_sol_surface(), we can't just consult the "buffer" field of
the intel_buffer_object structure; we need to call
intel_bufferobj_buffer() to ensure that the deferred allocation
occurs.
This parallels a similar fix for gen7 (see commit ba6f4c9).
Fixes piglit test EXT_transform_feedback/buffer-usage on gen6.
This is a candidate for the 8.0 release branch.
Reviewed-by: Eric Anholt <eric@anholt.net>
It always had the same value as ctx->Extensions.EXT_framebuffer_sRGB.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Strictly speaking, it's not legal to expose EXT_texture_integer without
EXT_gpu_shader4. It might be even dangerous (apps can assume EXT_gpu_shader4
is available without checking for it).
The check in compute_version is removed as well, because that's already
covered by GLSLVersion >= 130.
Reviewed-by: Brian Paul <brianp@vmware.com>
- use OR to combine bind flags
- combine both conditionals into one
- move the ARB_fbo enable where it belongs
Reviewed-by: Brian Paul <brianp@vmware.com>
For ARB_color_buffer_float. Most hardware can't do it and st/mesa is
the perfect place for a fallback.
The exceptions are:
- r500 (vertex clamp only)
- nv50 (both)
- nvc0 (both)
- softpipe (both)
We also have to take into account that r300 can do CLAMPED vertex colors only,
while r600 can do UNCLAMPED vertex colors only. The difference can be expressed
with the two new CAPs.
A current incomplete framebuffer was incorrectly used as a
st_framebuffer. When accessing st_framebuffer childs bad things happen:
e.g. st_framebuffer::iface was used to check whether its an incomplete
fb, instead we need to compare st_framebuffer::Base against
mesa_get_incomplete_framebuffer.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44919
Note: This is a candidate for the 8.0 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
Fixes Intel oglconform negative.typeFormatMismatch.copyteximage.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
This is part of fixing Intel oglconform
negative.typeFormatMismatch.copyteximage.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
This code is unprepared for handling integer (particularly, the
baseFormat of the TexFormat comes out as GL_RGBA, not GL_RGBA_INTEGER,
so the direct call of Driver.ReadPixels crashes due to the int vs
non-int error checking not having happened). I'm frankly tempted to
convert this code to MapRenderbuffer/MapTexImage rather than doing it
as meta ops, now that we have that support.
Improves the remaining crash in Intel oglconform for int-textures to
just a rendering failure.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
This aborts and crashes in intel oglconform's int-textures into being
just rendering failures. Clamping isn't handled yet.
v2: Add missing "break".
v3: Drop the int/uint distinction, since they don't need different clamping.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Brian Paul <brianp@vmware.com> (v2)
Similarly to how we handle this in texstore, we have to remap height
to depth so that we MapTextureImage each image layer individually.
Fixes part of Intel oglconform's int-textures advanced.fbo.rtt
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
This is a step toward fixing Intel oglconform's
int-textures advanced.fbo.rtt.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
This doesn't result in correct rendering -- GL requires that logic ops
work, while the hardware specs say it doesn't do them. I'm not sure
how we would want to handle this.
NOTE: This is a candidate for the 8.0 branch.
When we're actually rendering into a texture, map the texture image
instead of the corresponding renderbuffer. Before, we just copied
a pointer from the texture image to the renderbuffer. This change
will make the code usable by hardware drivers.
ctx->Driver.MapTexture() always points to _swrast_map_texture().
We're already reaching into swrast from t_vb_program.c anyway.
This will let us remove the ctx->Driver.Map/UnmapTexture() functions.
These are temporary, actually, but they'll make follow-on work easier to
implement in a step-by-step manner. Eventually the Map and RowStrideBytes
fields will go into a new swrast_renderbuffer type, but adding that type
now would involve touching a _lot_ of code that'll eventually be removed.
The fields marked as obsolete will go away completely at some point.
That field is only used by swrast code so there's no reason to mess
with it in the gallium state tracker.
This also lets us remove the unused st_format_data() type function and
related code.
When ARB VAOs are used, glPopClientAttrib does not resurrect a deleted
VAO or VBO. This difference between the two spec is, unfortunately,
not very well spelled out in the specs.
Fixes oglc vao(advanced.pushPop.deleteVAO) and
vao(advanced.pushPop.deleteVBO) tests.
NOTE: This is a candidate for release branches.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
There are more differences between Apple and ARB than just requiring
that all arrays be stored in VBOs. Additional uses will be added in
following commits.
Also, set the flag at Bind time instead of Gen time. The ARB_vao spec
specifies that behavior.
NOTE: This is a candidate for release branches.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This is a hack to work around drivers such as i965 that:
- Set _MaintainTexEnvProgram to generate GLSL IR for
fixed-function fragment processing.
- Don't call _mesa_ir_link_shader to generate Mesa IR from the
GLSL IR.
- May use swrast to handle glDrawPixels.
Since _mesa_ir_link_shader is never called, there is no Mesa IR to
execute. Instead do regular fixed-function processing.
Even on platforms that don't need this, the software fixed-function
code is much faster than the software shader code.
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44749
At least one place, the _mesa_need_secondary_color function in
state.h, uses this to make decisions. The next patch in this series
will add another dependency. Ideally, this field would go away and be
replace by a flag or something.
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
When rowstride was negatie, unsigned promotion caused a segfault here:
299│ if (rb->Format == MESA_FORMAT_S8) {
300│ const GLuint rowStride = rb->RowStride;
301│ for (i = 0; i < count; i++) {
302│ if (x[i] >= 0 && y[i] >= 0 && x[i] < w && y[i] < h) {
303├> stencil[i] = *(map + y[i] * rowStride + x[i]);
304│ }
305│ }
306│ }
Fixes segfault in oglconform
separatestencil-neu(NonPolygon.BothFacesBitmapCoreAPI),
though test still fails.
Note: This is a candidate for the stable branches.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=43327
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
i965 processes assignments of whole structures using
vec4_visitor::emit_block_move, a recursive function which visits each
element of a structure or array (to arbitrary nesting depth) and
copies it from the source to the destination. Then it increments the
source and destination register numbers so that further recursive
invocations will copy the rest of the structure. In addition, it sets
the swizzle field for the source register to an appropriate value of
swizzle_for_size(...) for the size of each element being copied, so
that later optimization passes won't be fooled into thinking that
unused vector elements are live.
This all works fine. However, emit_block_move also contains an
assertion to verify, before setting the swizzle field for the source
register, that the source register doesn't already contain a
nontrivial swizzle. The intention is to make sure that the caller of
emit_block_move hasn't already done some swizzling of the data before
the call, which emit_block_move would then counteract when it
overwrites the swizzle field. But the assertion is at the lowest
level of nesting of emit_block_move, which means that after the first
element is copied, instead of checking the swizzle field set by the
caller, it checks the swizzle field used when moving the previous
element. That means that if the structure contains elements of
different vector sizes (which therefore require different swizzles),
the assertion will erroneously fire.
This patch moves the assertion from emit_block_move to the calling
function, vec4_visitor::visit(ir_assignment *). Since the caller is
non-recursive, the assertion will only happen once, and won't be
fooled by emit_block_move's modification of the swizzle field.
This patch also reverts commit fe006a7 (i965/vs: Fix swizzle related
assertion), which attempted to fix the bug by making the assertion
more lenient, but only worked properly for structures, arrays, and
matrices in which each constituent vector is the same size.
This fixes the problem described in comment 9 of
https://bugs.freedesktop.org/show_bug.cgi?id=40865. Unfortunately, it
doesn't fix the whole bug, since the test in question is also failing
due to lack of register spilling support in the VS.
Fixes piglit test vs-assign-varied-struct. No piglit regressions on
Sandy Bridge.
This is a candidate for the 8.0 release branch.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=40865#c9
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is similar to a commit that did the same for the FS.
Shaves several more instructions off of the VS in Lightsmark, but no
statistically significant performance difference (n=5).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Shaves a few instructions off of the VS in Lightsmark, but no
statistically significant performance difference on gen7 (n=5).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
AC_CHECK_LIB has this nasty behavior, like the cflags tests, of
automatically putting the tested value into the global LIBS on
success. This caused -lexpat to end up in LIBS, but without the
--with-expat dir, so my 32-bit build on a 64 system using expat from a
custom prefix could only find the system expat and fail to link on the
one current consumer of the LIBS variable: the dri driver test link.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
While reading through the simulator, I found some interesting code that
looks like it checks the sampler default color pointer against the bound
set in STATE_BASE_ADDRESS. On failure, it appears to program it to the
base address itself.
So I decided to try programming a legitimate bound, and lo and behold,
border color worked.
+92 piglits on Sandybridge. Also fixes Lightsmark on Ivybridge.
NOTE: This is a candidate for stable release branches.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=28924
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38868
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Since we now always build shared glapi, this exposes the fact that libOSMesa was
underlinked when glapi was built shared.
Fix this by doing the same thing as drivers/X11/Makefile already does, ensuring
that the library is linked with the shared glapi library.
(I'm not clear why we link with both glapi.a and glapi.so, so this may be all wrong)
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Refine "always build shared dricore" so we don't build it if we don't need
it because we aren't actually building any dri drivers because of --disable-driglx-direct
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Looks insane, but it does appear we need a full slot per input/output.
This fixes another 180 or so piglit tests.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Adds all the easier lowhanging opcodes.
Fixes ~3000 piglit tests with GLSL1.30 enabled on cayman.
This just leaves the mul/div/mod ops to fix up.
Signed-off-by: Dave Airlie <airlied@redhat.com>
"If set, forces degamma on XYZ if format is
FMT_8_8_8_8, FMT_BC1, FMT_BC2, or FMT_BC3"
Don't claim support for sRGB on any other formts.
This fixes glean texture_srgb.
Signed-off-by: Dave Airlie <airlied@redhat.com>
It doesn't pass the piglit test, but it seems to be a lot closer
than it was before. I need to track down if there is another problem.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Due to the changes for multiple kcache banks support, now we are assigning
final SRCx_SEL values for kcache access at the later stage, when building the
bytecode. So we need to take into account kcache banks to distinguish
the constants with the same address but different bank index.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Same fix as previously done by Dave Airlie for r600/r700
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Clip planes are uploaded as a constant buffer and used by the vertex
shader to produce corresponding clip distances for hw clipping.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Add support for multiple kcache banks (constant buffers).
Lock the required lines only.
Allow up to 4 kcache line sets in the alu clause by using ALU_EXTENDED on eg+.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Fix this GCC warning on non-debug builds.
glsl_types.cpp: In member function 'gl_texture_index
glsl_type::sampler_index() const':
glsl_types.cpp:157: warning: control reaches end of non-void function
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Enable it in the evergreen_context_draw if needed.
Same as already done in the r600_context_draw for r6xx/r7xx.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
BURST_COUNT is clipped with ARRAY_SIZE, so set it to the max value
to avoid clipping.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
libglapi.so, libGL.so, libGLESv2.so, libGLESv1_CM.so must all
come from the same version of Mesa or bad things may happen.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Matt Turner <mattst88@gmail.com>
When the framebuffer has separate depth and stencil buffers, and HiZ is
not enabled on the depth buffer, mark the framebuffer as unsupported. This
happens when trying to create a framebuffer with Z16/S8 because we haven't
enabled HiZ on Z16 yet.
Fixes gles2conform test stencil8.
Note: This is a candiate for the 8.0 branch.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44948
Reviewed-and-tested-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed--by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
This loosens the format validation in glBlitFramebuffer. When blitting
depth bits, don't require an exact match between the depth formats; only
require that the two formats have the same number of depth bits and the
same depth datatype (float vs uint). Ditto for stencil.
Between S8_Z24 buffers, the EXT_framebuffer_blit spec allows
glBlitFramebuffer to blit the depth and stencil bits separately. So I see
no reason to prevent blitting the depth bits between X8_Z24 and S8_Z24 or
the stencil bits between S8 and S8_Z24. However, we of course don't want
to allow blitting from Z32 to Z32_FLOAT.
Fixes Piglit fbo/fbo-blit-d24s8 on Intel drivers with separate stencil
enabled.
The problem was that, on Intel drivers with separate stencil, the default
framebuffer has separate depth and stencil buffers with formats X8_Z24 and
S8. The test attempts to blit the depth bits from a S8_Z24 buffer into the
default framebuffer.
v2: Check that depth datatypes match.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44665
Note: This is a candidate for the 8.0 branch.
Reported-by: Xunx Fang <xunx.fang@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
The nvc0 gallium driver is advertising 128 MAX_INTERLEAVED_COMPS
which made it always assert in the linker when TFB was used since
the Outputs array was smaller than that maximum.
v2: added assertions
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
So it appears R600s (except rv670) do AR handling different using a different
opcode. This patch fixes up r600g to work properly on r600.
This fixes ~100 piglit tests here (in GLSL1.30 mode) on rv610.
v3: add index_mode as per the docs.
This still fails any dst relative tests for some reason I can't quite see yet,
but it passes a lot more tests than without.
v4: add a nop after dst.rel this could be improved using a second pass,
where we only insert nops if two instructions are sure to collide.
The docs say r600, rv610, rv630 needs this, and not rv670, rs780, rs880,
need AMD to confirm rv620, rv635.
v5: add is_nop_inst.
NOTE: This is a candidate for stable branches.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Use a bitmask approach to compute gl_array_object::_MaxElement.
To make this work correctly depending on the shader type actually used,
make use of the newly introduced typed bitmask getters.
With this change I gain about 5% draw time on some osgviewer examples.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
Depending on the installed shader type, different arrays are used
from gl_array_object. Provide helper functions that compute
the bitmask of these arrays that are finally enabled for a given
shader type. The will be used in a followup change.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
Commit ede60bc467 (glsl: Add isinf() and
isnan() builtins) uses "+INF" in the .ir file to represent infinity.
This worked on C99-compliant compilers, since the s-expression reader
uses strtod() to read numbers, and C99 requires strtod() to understand
"+INF". However, it didn't work on non-C99-compliant compilers such
as MSVC.
This patch modifies the s-expression reader to explicitly check for
"+INF" rather than relying on strtod() to support it.
This is a candidate for the 8.0 branch.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44767
Tested-by: Morgan Armand <morgan.devel@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
To fix failed assertions when calling glCopyBufferSubData().
svga_texture() asserts that the resource is a texture. Simply move the
calls to svga_texture() after the code that handles non-texture copies
so that we don't call it with non-texture resources.
Fixes glean bufferObject failure.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Two assignments to num_immediates were missing in
get_pixel_transfer_visitor() and get_bitmap_visitor().
The uninitialized value led to valgrind errors and crashes in some
cases.
Added new assertions to catch future problems in this area. Also
changed num_immediates to unsigned to avoid signed/unsigned
comparison warnings.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
The default access flags for OpenGL ES (via GL_OES_map_buffer) and
desktop OpenGL are different. The code previously tried to handle
this, but the decision was made at compile time. Since the same
driver binary can be used for both OpenGL ES and desktop OpenGL, the
decision must be made at run-time.
This should fix bug #44433. It appears that the test case does
various map and unmap operations and inspects the state of the buffer
object around each. When it sees that GL_BUFFER_ACCESS does not match
its expectations, it fails.
NOTE: This is a candidate for release branches.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44433
If we don't find an exact PIPE_FORMAT_x for a GL_(COMPRESSED)_RED/RG format,
try uncompressed formats. We were already doing this for the RGB(A) formats.
Fixes piglit arb_texture_compression-internal-format-query test.
NOTE: This is a candidate for the stable branches.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
msg_type moved by a bit, so the message type was being disassembled
incorrectly. In particular, render target writes were showing up as
"OWORD block write".
NOTE: This is a candidate for stable release branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Compared to sampler_gen5, simd_mode shifted by a bit and msg_type grew
by a bit. So we were printing slightly incorrect numbers.
NOTE: This is a candidate for stable release branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Both the VF and VS share space in the URB. First, the VF stores
attributes (shader inputs) there. The VS then reads the attributes,
executes, and reuses the space to store varyings (shader outputs).
Thus, we need to calculate the amount of URB space necessary for inputs,
outputs, and pick whichever is greater.
The old VS backend correctly did this (brw_vs_emit.c:408), but the new
VS backend only considered outputs.
Fixes vertex scrambling in GLBenchmark PRO on Ivybridge.
NOTE: This is a candidate for the 8.0 branch.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=41318
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
In the following scenario:
- CreateContext C1
- MakeCurrent C1
- DestroyContext C1 (does not actually destroy the first context, postponed
until the next MakeCurrent)
- CreateContext C2
- MakeCurrent C2
MakeCurrent will call flush on a half destroyed context, leading to crashes.
Since the other paths (destroy and makecurrent) already flush the context,
there is no need to flush here, so we remove this useless flush front call.
This fixes GPU crashes with Chrome and gallium drivers.
v2: Don't flag the format as being HiZ ready (there's DRI2 handshake
pain to go through).
Fixes piglit gl-3.0-required-sized-texture-formats
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
This is required for Z16 support for texturing, which is the first
thing to have a horizontal alignment of 8. Renderbuffers don't need
it, since they're always set up as the only mip level, but do it for
completeness anyway.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
This field is actually set up above.
NOTE: This is a candidate for the 8.0 branch, to avoid conflicts.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
I copy-and-pasted the thing I was allocating for as the context, so
the first time it would be NULL (root of a ralloc context) and they'd
chain off each other from then on.
NOTE: This is a candidate for the 8.0 branch.
The legal range for the device is apparently [-16.0, +15.0].
Limiting the range to [-15, +15] fixes piglit's lodbias test.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
The interaction between the mipmap lod min/max limits and the texture
base/max level limits is kind of tricky. Changing the base level
didn't work as expected before.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
This makes lod clamping more consistent with other drivers.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Update the dd.h docs to indicate that GL_MAP_INVALIDATE_RANGE_BIT
can be used with GL_MAP_WRITE_BIT when mapping renderbuffers and
texture images.
Pass the flag when mapping texture images for glTexImage, glTexSubImage,
etc. It's up to drivers whether to actually make use of the flag.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
To try to use less tex memory and maybe get better performance.
Spotted by Roland Scheidegger.
NOTE: This is a candidate for the 8.0 and 7.11 branches.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
The i965 driver advertises GL_ARB_texture_float and GL_ARB_texture_rg
support but the ctx->TextureFormatSupported[] table entries for
MESA_FORMAT_R_FLOAT32 and MESA_FORMAT_RGBA_FLOAT32 are false on gen 4
hardware. So the case for GL_R32F would fail and we'd print an
implementation error.
This patch adds more Mesa tex format options for GL_R32F and other R/G
formats so we fall back to 16-bit formats when 32-bit formats aren't
available.
Eric made the same fix in commit 6216a5b4 for the non R/G formats.
v2: try 16-bit formats before 32-bit formats and try RG formats before
RGBA where possible.
This should fix https://bugs.freedesktop.org/show_bug.cgi?id=44039
NOTE: This is a candidate for the 8.0 and 7.11 branches.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
This enables linear gradients if we need a linear,
it also sets the flat shade flag for color/constant interpolations.
Signed-off-by: Dave Airlie <airlied@redhat.com>
When I originally implemented the hack to use GRFs 111+ as fake MRFs, I
did so purely to avoid rewriting all the code that dealt with MRFs.
However, it turns out that a similar hack is actually required.
Newly discovered language in the BSpec indicates that SEND instructions
with EOT set "should" use g112-g127 as their source registers. Based on
assertions in the simulator, this is actually a requirement on certain
platforms.
Since we're faking MRFs already, we may as well use the officially
sanctioned range. My guess is that we avoided this issue because we
seldom use m0: URB writes in the new VS backend start at m1, and RT
writes in the new FS backend start at m2.
NOTE: This is a candidate for stable release branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Now that we no longer generate Mesa IR from GLSL IR, it's impossible to
use the old vertex shader backend for GLSL programs. There's simply no
Mesa IR to codegen from.
Any attempt to do so would result in immediate GPU hangs, presumably due
to the driver uploading an empty program with no EOT message.
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
According to Table 6.8 (Page 348) in the OpenGL 3.0 specification,
glGetVertexAttribiv supports GL_VERTEX_ATTRIB_ARRAY_INTEGER.
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes the following OGLConform tests on gen5:
depth-stencil(misc.state_on.depth_int)
fbo_db_ARBfp(basic.OnlyDepthBuffDrawBufferRender)
The problem was that, if the depth buffer's Mesa format was X8_Z24, then
we emitted the hardware format D24_UNORM_X8. But, on gen5, D24_UNORM_S8
must be emitted.
This bug was introduced by:
commit d84a180417
Author: Eric Anholt <eric@anholt.net>
i965: Base HW depth format setup based on MESA_FORMAT, not bpp.
v2: Deref 'intel' directly. Move the branch for newer chipset to top.
Quote the PRM. As requested by Ken.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=43408
Note: This is a candidate for the 8.0 branch.
Reported-by: Xunx Fang <xunx.fang@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
The original R600 requires the UNCACHED_FIRST_INST bit
to be set in the PS.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Note: this is candidate for the stable branches.
With the conversion to automake in commit
e326480e4e, several additional build
artifacts are created:
src/mesa/drivers/dri/i965/.deps/
src/mesa/drivers/dri/i965/.libs/
src/mesa/drivers/dri/i965/Makefile
src/mesa/drivers/dri/i965/Makefile.in
src/mesa/drivers/dri/i965/i965_dri.la
src/mesa/drivers/dri/i965/i965_symbols_test
This patch adds all of these files to .gitignore.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
TestMipMaps() function in src/OGLconform/textureNPOT.c calls glTexImage2D()
with width = 0. Texture with zero size skips miptree allocation due to a
condition in function _mesa_store_teximage3d(). While calling glGetTexImage()
it results in assertion failure in intel_map_texture_image() due to null mt
pointer.
This patch fixes the issue by detecting the zero size texture early in
glGetTexImage and glGetCompressedTexImage functions. In such a case function
simply returns doing nothing.
Verified that below mentioned bug is fixed by this patch.
https://bugs.freedesktop.org/show_bug.cgi?id=42334
NOTE: This is a candidate for stable branches
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This does introduce a warning by the automake build system, that the
missing-symbols test build is non-portable. That's true -- Mac OS X
can't take something built as a loadable module and just link it as a
library. Of course, we aren't building this on OS X at all, so it
would be nice to be able to suppress it, but I haven't found a way.
Still, the build is going to be much quieter than we have ever had
before, so I think this is a fair tradeoff until we find a way to shut
that warning up.
v2: Put a link in /lib to avoid transition pains for people.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
Reviewed-by: Matt Turner <mattst88@gmail.com> (v1)
Nothing works if HiZ is enabled and the DDX is incapable of HiZ (that is,
the DDX version is < 2.16).
The problem is that the refactoring that eliminated
intel_renderbuffer::stencil_rb broke the recovery path in
intel_verify_dri2_has_hiz(). Specifically, it broke line
intel_context.c:1445, which allocates the region for
DRI_BUFFER_DEPTH_STENCIL. That allocation was creating a separate stencil
miptree, despite the buffer being a packed depthstencil buffer. Havoc
ensued.
This patch introduces a bool flag that prevents allocation of that stencil
miptree.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44103
Tested-by: Ian Romanick <idr@freedesktop.org>
Note: This is a candidate for the 8.0 branch.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Fix this GCC warning with non-LLVM builds.
sp_screen.c: In function ‘softpipe_get_shader_param’:
sp_screen.c:141:28: warning: unused variable ‘sp_screen’ [-Wunused-variable]
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Calling glXSwapBuffers with no bound context causes segmentation
fault in function intelDRI2Flush. All the gl calls should be
ignored after setting the current context to null. So the contents
of framebuffer stay unchanged. But the driver should not seg fault.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44614
Reported-by: Yi Sun <yi.sun@intel.com>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Yi Sun <yi.sun@intel.com>
Fix this GCC warning.
lp_test_round.c: In function ‘test_round’:
lp_test_round.c:126:13: warning: variable ‘packed’ set but not used
[-Wunused-but-set-variable]
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Fix this GCC 4.6 warning with 64-bit builds.
u_debug_stack.c: In function ‘debug_backtrace_capture’:
u_debug_stack.c:45:17: warning: variable ‘frame_pointer’ set but not
used [-Wunused-but-set-variable]
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
The i915 GPU can't do A8 dst, so we abuse GREEN8 buffers for that
purpose. However, things get hairy as we start to do blending,
because then GL_DST_*_ALPHA should be replaced with GL_DST_*_COLOR.
This is what we do here.
Fixes piglt fbo-alpha.
v2: select the colors in the pixel shader
v3: fix rs state creation for pre-evergreen
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Create the video buffers in the format the driver preffers.
This temporary creates problems with decoder less VDPAU video playback.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Add a second extened constructor that takes plane
textures for the video buffer. Also provide a
function for texture templates.
Signed-off-by: Christian König <deathsimple@vodafone.de>
This requires GLSL 1.30 enabled, which requires integer types enabled,
so don't bother doing an INT to FLT conversion on it.
We should probably remove the instance id flt->int conversion when
turning on native integers.
this passes the three piglit tests with GLSL 1.30 forced on.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This commit rewrites a lot of the state_fb code to support
rendering to targets not aligned to 64 byte.
This allows us to drop the render temporaries as unaligned
targets are the only use-case where they are really needed. The
temporaries code was used for a lot of things more, but apparently
those also work without temps.
There is one regression in piglit fbo-clear-formats, but this will
be fixed with the use of real hardware clears and doesn't matter in
practice as no real application tries to scissor clear a 2x2 pixel
render target.
Signed-off-by: Lucas Stach <dev@lynxeye.de>
There are 3 changes:
1) stride is specified for each buffer, not just one, so that drivers don't
have to derive it from the outputs
2) new per-output property dst_offset, which specifies the offset
into the buffer in dwords where the output should be stored,
so that drivers don't have to compute the offsets manually;
this will also be useful for gl_SkipComponents
from ARB_transform_feedback3
3) register_mask is removed, instead, there is start_component
and num_components; register_mask with non-consecutive 1s
doesn't make much sense (some hardware cannot do packing of components)
Christoph Bumiller: fixed nvc0.
v2: resolve merge conflicts in Draw and clean it up
Virtual address space put the userspace in charge of their GPU
address space. It's up to userspace to bind bo into the virtual
address space. Command stream can them be executed using the
IB_VM chunck.
This patch add support for this configuration. It doesn't remove
the 64K ib size limit thought this limit can be extanded up to
1M for IB_VM chunk.
v2: fix rendering
v3: fix rendering when using index buffer
v4: make vm conditional on kernel support add basic va management
v5: catch the case when we already have va for a bo
v6: agd5f: update on top of ioctl changes
v7: agd5f: further ioctl updates
v8: indentation cleanup + fix non cayman
v9: rebase against lastest mesa + improvement from Marek & Michel
v10: fix cut/paste bug
v11: don't rely on updated radeon_drm.h
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Make the comments precise. Explain why each branch is needed and correct.
Document the potential pitfall in the true-branch.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
When using Mesa with a GLES API, calling _mesa_FramebufferRenderbuffer
with GL_DRAW_FRAMEBUFFER will report a 'user error' because
get_framebuffer_target validates that this enum from the framebuffer
blit extension is only used on GL. To work around it this patch makes
it use the GL_FRAMEBUFFER enum instead in that case.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=43418
Note: This is a candidate for the 8.0 branch.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
The gl_renderbuffer::Format field wasn't always set properly. This
didn't matter much in the past but with the recent swrast/renderbuffer
mapping changes, core Mesa will be directly touching OSMesa colorbuffers
so using the right MESA_FORMAT_x value is important.
Unfortunately, there aren't MESA_FORMATs for all the possible OSmesa
format/type combinations, such as GL_FLOAT / OSMESA_ARGB. If anyone
runs into these we can add new Mesa formats.
v2: add warnings for unsupported formats, fix ARGB_REV mix-up.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
We always access pull constant buffers using the message types "OWord
Block Read" or "OWord Dual Block Read". According to the Sandy Bridge
PRM, Vol 4 Part 1, pages 214 and 218, when using these messages:
"the surface pitch is ignored, the surface is treated as a
1-dimensional surface. An element size (pitch) of 16 bytes is
used to determine the size of the buffer for out-of-bounds
checking if using the surface state model."
Previously we were setting the pitch for pull constant buffers to the
size of the whole constant buffer--this made no sense and would have
led to incorrect behavior if it were not for the fact that the pitch
is ignored.
For clarity, this patch sets the pitch for pull constant buffers to 16
bytes, consistent with the hardware's behavior.
v2: Clarify the meaning of the ignored values by writing them as (16 - 1).
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Commit 9bdc44a528 (i965: Replace struct
with bit shifting for WM pull constant surfaces) accidentally
introduced off-by-one errors into the calculation of the surface
width, height, and depth. This patch restores the correct
computation.
The reason this wasn't noticed by Piglit tests is that the size of our
constant surfaces is always less than 2^20, therefore the off-by-one
error was causing the "depth" field of the surface to be set to all
1's. The hardware interpreted this as an extremely large surface, so
overflow checking was effectively disabled.
No Piglit regressions on Sandy Bridge.
NOTE: This is a candidate for the 7.11 and 8.0 branches.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Flat SHADE_MODEL still overrides any non-flat interpolation
qualifier, but pulling that state out of the rasterizer cso
isn't really worth the effort, is it ?
NOTE: This is a candidate for the 8.0 branch.
This fixes accum buffer operations. The accumulation buffer is the
only malloc-based renderbuffer for the intel drivers.
v2: apply x/y offset to returned pointer
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Eric Anholt <eric@anholt.net>
Fixes piglit EXT_framebuffer_multisample/negative-copypixels.
Reviewed-by: Brian Paul <brianp@vmware.com>
NOTE: This is a candidate for the 8.0 branch.
Fixes piglit EXT_framebuffer_multisample/negative-copyteximage.
Reviewed-by: Brian Paul <brianp@vmware.com>
NOTE: This is a candidate for the 8.0 branch.
Fixes piglit EXT_framebuffer_multisample-negative-readpixels.
Reviewed-by: Brian Paul <brianp@vmware.com>
NOTE: This is a candidate for the 8.0 branch.
Fixes piglit EXT_framebuffer_multisample/renderbuffer-samples.
Reviewed-by: Brian Paul <brianp@vmware.com>
NOTE: This is a candidate for the 8.0 branch.
Previously, we were saying that everything from the starting tile to
region width+height was part of the limits of our depthbuffer, even if
the tile was near the bottom of the depthbuffer. This mean that our
range was not clipping to buffer buonds if the start tile was anything
but the start of the buffer.
In bebc91f0f3, this was changed to
saying that we're just rendering to a region of the size of the
renderbuffer. This is great -- we get a range that should actually
match what we want. However, the hardware's range checking occurs
after the X/Y offset addition, so we were clipping out rendering to
small depth mip levels when an X/Y offset was present. Just add
tile_x/y to the width in that case -- the WM won't produce negative
x/y values pre-offset, so we just need to get the left/bottom sides of
the region to cover our buffer.
Fixes the following Piglit regressions on gen7:
spec/ARB_depth_buffer_float/fbo-clear-formats
spec/ARB_depth_texture/fbo-clear-formats
spec/EXT_packed_depth_stencil/fbo-clear-formats
NOTE: This is a candidate for the 8.0 branch.
The array holds GLuint values so remove the float cast.
Note, however, that to compute the average of four GLuints we really
want to do (a+b+c+d)/4 but that could overflow. This change doesn't
address that for now.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
In the first case, the newImage[] array contains GLuint values.
In the second case, the parameter type is GLuint, but the maxDepth
value is never used in this case (GL_FLOAT_32_UNSIGNED_INT_24_8_REV).
Pass ~OU just to be safe.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
We include both imports.h and u_math.h in the state tracker. This
leads to multiple, conflicting definitions of ffs() with MSVC.
Use FFS_DEFINED to skip the ffs() in u_math.h.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
include mesa headers before gallium headers to avoid problem with
ffs() being defined in u_math.h and then again in imports.h
The next commit will add some #ifdefs to prevent multiple definitions
of ffs().
Call ffs() and ffsll() everywhere. Define our own ffs(), ffsll()
functions when the platform doesn't have them.
v2: remove #ifdef _WIN32, __IBMC__, __IBMCPP_ tests inside ffs()
implementation. The #else clause was recursive.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Alexander von Gluck <kallisti5@unixzen.com>
Introduce vbo_get_minmax_indices() function to handle the min/max index
computation for nr_prims(>= 1). The old code just compute the first
prim's min/max index; this would results an error rendering if user
called functions like glMultiDrawElements(). This patch servers as
fixing this issue.
As when nr_prims = 1, we can pass 1 to paramter nr_prims, thus I made
vbo_get_minmax_index() static.
v2: per Roland's suggestion, put the indices address compuation into
vbo_get_minmax_index() instead.
Also do comination if possible to reduce map/unmap count
v3: per Brian's suggestion, use a pointer for start_prim to avoid
structure copy per loop.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Instead, do the uniform setting and input / output mapping directly in
brw_link_shader. Hurray for not generating Mesa IR! However, once
the i965 driver stops calling _mesa_ir_link_shader, UsesClipDistance
and UsesKill are no longer set.
Ideally gen6_upload_vs_push_constants should use the
gl_shader_program, but I don't see a way to propagate the information
there. The other alternative, since this is the only usage, is to
move gl_vertex_program::UsesClipDistance to brw_vertex_program.
The compile (and precompile) stages use UsesKill to determine the
cache key for the shader. This is then used to determine whether or
not to compile the shader. Calculating this data during compilation
is too late.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Eric Anholt <eric@anholt.net>
This previously enabled some optimizations in the fragment shader
(interpolation, etc.) if some input components were always 0.0 or
1.0. However, this data was generated by analyzing Mesa IR. The
next patch in this series removes generation of Mesa IR for GLSL
paths. When we detect that case, just set the used mask to ~0 and
circumvent the optimizations.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It used to be done in ir_to_mesa, and that was kind of a bad place.
I didn't change st_glsl_to_tgsi because there is some strange stuff
happening in the code that generates glDrawPixels shaders. It looked
like this would break horribly if I touched anything.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Track the calculated data in gl_shader_program instead of the
individual assembly shaders.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Rather than looking at the settings in individual assembly programs,
look at the settings in the top-level uniform values. The old code
was flawed because examining each shader stage in isolation could
allow inconsitent usage across stages (e.g., bind unit 0 to a
sampler2D in the vertex shader and sampler1DShadow in the fragment
shader).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously the fixed-function fragment shader was tracked as a
gl_program. This means that it shows up in the driver as a Mesa IR
program instead of as a GLSL IR program. If a driver doesn't generate
Mesa IR from the GLSL IR, that program is empty. If the program is
empty there is either no rendering or a GPU hang.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Poking directly at the backing resources works only by luck. Core
Mesa code should only know about the gl_uniform_storage structure.
Soon other code that looks at samplers will use the gl_uniform_storage
structures instead of the data in the gl_program.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
It looks like AC_PROG_SED was added in 2.59b, and wasn't in the
original 2.59 in the original 2.59. Presumably that's why, though
it could've been an oversight.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Matt Turner <mattst88@gmail.com>
The gen7_urb atom depends on CACHE_NEW_VS_PROG and CACHE_NEW_GS_PROG,
causing gen7_upload_urb() to be called when switching to a new VS
program.
In addition to partitioning the URB space between the VS and GS,
gen7_upload_urb() also allocated space for VS and PS push constants.
Unfortunately, this meant that whenever CACHE_NEW_VS was flagged, we'd
reallocate the space for the PS push constants. According to the BSpec,
after sending 3DSTATE_PUSH_CONSTANT_ALLOC_PS, we must reprogram
3DSTATE_CONSTANT_PS prior to the next 3DPRIMITIVE.
Since our URB allocation for push constants is entirely static, it makes
sense to split it out into its own atom that only subscribes to
BRW_NEW_CONTEXT. This avoids reallocating the space and trashing
constants.
Fixes a rendering artifact in Extreme Tuxracer, where instead of a snow
trail, you'd get a bright red streak (affectionately known as the
"bloody penguin bug").
This also explains why adding VS-related dirty bits to gen7_ps_state
made the problem disappear: it made 3DSTATE_CONSTANT_PS be emitted after
every 3DSTATE_PUSH_CONSTANT_ALLOC_PS packet.
NOTE: This is a candidate for the 7.11 branch.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38868
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Both dri2_create_context_attribs and drisw_create_context_attribs call
dri2_convert_glx_attribs, expecting it to fill in *api on success.
However, when num_attribs == 0, it was returning true without setting
*api, causing the caller to use an uninitialized value.
Tested-by: Vadim Girlin <vadimgirlin@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
svga_sampler_view contains a pointer to a pipe_resource (base class of
svga_texture) and svga_texture contains a pointer to an svga_sampler_view.
This circular dependency prevented the objects from ever being freed when
they pointed to each other. Make the svga_sampler_view::texture pointer
a "weak reference" (no reference counting) to break the dependency.
This is safe to do because the pipe_resource/texture always has a longer
lifespan than the sampler view so when svga_sampler_view stops referencing
the texture, the texture's refcount never hits zero.
Fixes a memory leak seen with google earth and other apps.
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
We were naively emitting each component at a time, even if we were
emitting the same value to multiple channels. Improves on a codegen
regression from the old VS to the new VS on some unigine shaders
(because we emit constant vecs/matrices as immediates instead of
loading them as push constants, so we had over 4x the instructions for
using them).
shader-db results:
Total instructions: 58594 -> 58540
11/870 programs affected (1.3%)
765 -> 711 instructions in affected programs (7.1% reduction)
WGL_ARB_extensions_string states that wglGetExtensionsStringARB should
return NULL for invalid HDCs. And some applications rely on it.
Reviewed-By: "Keith Whitwell" <keithw@vmware.com>
It caused an X protocol error in some (rare) situations.
This is a follow-on to the previous commits which fixes a bug reported
by Wayne E. Robertz.
NOTE: This is a candidate for the 7.11 branch.
Reviewed-by: Adam Jackson <ajax@redhat.com>
This is the same fix as the previous commit, except it's for the gallium
glx/xlib state tracker.
NOTE: This is a candidate for the 7.11 branch.
Reviewed-by: Adam Jackson <ajax@redhat.com>
as we do in Fake_glXChooseVisual(). This registers the MesaGLX
extension on the display so we can clean up buffers, etc. when
the display connection is closed.
Fixes a bug reported by Wayne E. Robertz.
NOTE: This is a candidate for the 7.11 branch.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Prevent any state from carrying over to a new translation in cases
where we assume that data is still zero from initial calloc (these
would require us to do individual zeroing before translation which
would be more code).
When creating an EGLImage from a struct wl_buffer * this ensures
that we create an XRGB8888 image if the wayland buffer doesn't have an
alpha channel. To determine if a wl_buffer has a valid alpha channel
this patch adds an internal wayland_drm_buffer_has_alpha() function.
It's important to get the internal format for an EGLImage right so that
if a GL texture is later created from the image then the GL driver will
know if it should sample the alpha from the texture or flatten it to
a constant of 1.0.
This avoids needing fragment program workarounds in wayland compositors
to manually ignore the alpha component of textures created from wayland
buffers.
krh: Edited to use wl_buffer_get_format() instead of wl_buffer_has_alpha().
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
No functional change. In the function
__indirect_glAreTexturesResident(), the variable cmdlen is only used
if USE_XCB is not defined. This patch avoids a compile warning in the
event that USE_XCB is defined.
v2: just move cmdlen declaration inside the #else part.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previous to this patch, we didn't do the limit check for
MAX_TRANSFORM_FEEDBACK_INTERLEAVED_COMPONENTS until the end of the
store_tfeedback_info() function, *after* storing all of the transform
feedback info in the gl_transform_feedback_info::Outputs array. This
meant that the limit check wouldn't prevent us from overflowing the
array and corrupting memory.
This patch moves the limit check to the top of tfeedback_decl::store()
so that there is no risk of overflowing the array. It also adds
assertions to verify that the checks for
MAX_TRANSFORM_FEEDBACK_INTERLEAVED_COMPONENTS and
MAX_TRANSFORM_FEEDBACK_SEPARATE_COMPONENTS are sufficient to avoid
array overflow.
Note: strictly speaking this patch isn't necessary, since the maximum
possible number of varyings is MAX_VARYING (16), whereas the size of
the Outputs array is MAX_PROGRAM_OUTPUTS (64), so it's impossible to
have enough varyings to overflow the array. However it seems prudent
to do the limit check before the array access in case these limits
change in the future.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
On drivers that set gl_shader_compiler_options::LowerClipDistance (for
example i965), we need to handle transform feedback of gl_ClipDistance
specially, to account for the fact that the hardware represents it as
an array of vec4's rather than an array of floats.
The previous way this was accounted for (translating the request for
gl_ClipDistance[n] to a request for a component of
gl_ClipDistanceMESA[n/4]) doesn't work when performing transform
feedback on the whole unsubscripted array, because we need to keep
track of the size of the gl_ClipDistance array prior to the lowering
pass. So I replaced it with a boolean is_clip_distance_mesa, which
switches on the special logic that is needed to handle the lowered
version of gl_ClipDistance.
Fixes Piglit tests "EXT_transform_feedback/builtin-varyings
gl_ClipDistance[{1,2,3,5,6,7}]-no-subscript".
Reviewed-by: Eric Anholt <eric@anholt.net>
The function tfeedback_decl::num_components() was not correctly
accounting for transform feedback of whole arrays and gl_ClipDistance.
The bug was hard to notice in tests, because it only affected the
checks for MAX_TRANSFORM_FEEDBACK_SEPARATE_COMPONENTS and
MAX_TRANSFORM_FEEDBACK_INTERLEAVED_COMPONENTS.
This patch fixes the computation, and adds an assertion to verify
num_components() even when MAX_TRANSFORM_FEEDBACK_SEPARATE_COMPONENTS
and MAX_TRANSFORM_FEEDBACK_INTERLEAVED_COMPONENTS are not exceeded.
The assertion requires keeping track of components_so_far in
tfeedback_decl::store(); this will be useful in a future patch to fix
non-multiple-of-4-sized gl_ClipDistance.
Reviewed-by: Eric Anholt <eric@anholt.net>
This just fixes up the enables for native integers and EXT_texture_integer
support in st/mesa.
It also set the MaxClipPlanes to 8.
We should consider exposing caps for MCP vs MCD, but since core
mesa doesn't care yet maybe we can wait for now.
v2: use 32-bit formats as per Marek's mail.
v3: add calim's fix for INT_DIV_TO_MUL_RCP disabling.
Signed-off-by: Dave Airlie <airlied@redhat.com>
It doesn't look like the GLSL compiler will produce sign op
for an unsigned anyways (seems insane anyways).
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds integer version of SSG that GLSL 1.30 can produce.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This enables fragment clamping in softpipe, it passes more
tests than it did previously with no regressions, There are still
a couple of failures in the SNORM types to investigate.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This fixes a number of texelFetch swizzle tests, and consoldiates
the swizzle handling in a new function.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Add support for using the clipdistance instead of clip plane.
Passes all piglit clipdistance tests.
v2: fixup some comments from Brian in review.
Signed-off-by: Dave Airlie <airlied@redhat.com>
softpipe always clipped using the position vector, however for unclipped
vertices it stored the position in window coordinates, however when position
and clipping are separated, we need to store the clip-space position and
the clip-space vertex clip, so we can interpolate both separately.
This means we have to take the clip space position and store it to use later.
This allows softpipe to pass all the clip-vertex piglit tests.
v2: fix llvm draw regression, the structure being passed into llvm needed
updating, remove some hardcoded ints that should have been enums while there.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This required changing the system value semantics, so we stored
a system value per vertex, instance id is the only other system
value we currently support, so I span it across the channels.
This passes the 3 vertexid-* piglit tests + lots of instanceid tests.
Signed-off-by: Dave Airlie <airlied@redhat.com>
If draw isn't using llvm we can support vertex texture and integers,
These will be fixed up later, but for now allow this check to happen
at run-time.
v2: since 3e22c7a253 we can ask draw for a non-llvm
context. Just track if ask and set the vars accordingly. This probably isn't perfect but should cover the cases we care about.
v3: use debug option, restructure to store in screen, as suggested by Jakob.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Mesa shouldn't call into the drivers if there are no renderbuffers
bound to the attachments for the buffers to be cleared.
Fixes a number of the clearbuffer-* tests on softpipe.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This fixes the test to allow cube/depth combinations on GL3
or EXT_gpu_shader4.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
We're not quite ready to actually support it in the implementation,
but at least this allows GL 3.0 API-reliant applications to hopefully
run successfully, though they won't get multisampling.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The EXT_texture_array required only 64, but GL 3.0 required 256.
Since we're already exposing values that can get us way beyond our
ability to map the single object directly, go ahead and expose all the
way to hardware limits.
Tested with new piglit EXT_texture_array/maxlayers on gen7.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We were doing the kill of the updated channels, then adding our copy
to the list of available stuff to copy. But if the copy was updating
its own source channels, we didn't notice, breaking this code:
R0.xyzw = arg0 + arg1;
R0.xyzw = R0.wwwx;
gl_FragColor.xyzw = clamp(R0.xyzw, 0.0, 1.0);
Fixes piglit glsl-copy-propagation-self-2.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch modifies all batches needed for HiZ. The batch length for
3DSTATE_HIER_DEPTH_BUFFER is also corrected from 4 to 3.
Performance +6.7% on Citybench.
num-frames: 400
resolution: 1918x1031
avg-hiz-off: 127.90 fps
avg-hiz-on: 136.50 fps
kernel: git://people.freedesktop.org/~anholt/linux.git branch=gen7-reset-sol sha=23360e4
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
It is unwise to use a stencil region's size to determine its
renderbuffer's size, because at region creation we fudge the width and
height to accomodate interleaved rows. (See the comment for MESA_FORMAT_S8
in intel_miptree_create()). Most users of stencil_region->{width,height}
should be converted to use stencil_rb->{Width,Height}.
We have already done the replacement in several locations. This patch
continues the replacement in {brw,gen7}_emit_depthbuffer(). To make those
functions look consistent, I've also done the equivalent replacement for
the depth buffer.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
It was named GEN6_WM_DEPTH_RESOLVE. Luckily, this caused no conflict,
because the value is identical for gen6 and gen7.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Don't create clip outputs if no clip planes are enabled.
Move clip validation after program validation: we were calling
linkage validation in case the VP needed rebuilding before the
FP was validated.
The vertex program needs to be built first because when
ClipDistance is used we'll want to only enable those outputs that
are also written.
These initialization functions weren't initializing all the fields so
some had undefined values. The callers of these functions sometimes use
a structure assignment to initialize new objects from these templates
so we'd just propagate the undefined values. That made for some confusing
info when debugging, plus it could lead to bugs.
v2: fix surf pointer mix-up: "&surf" -> "surf"
Jakob Bornecrantz <jakob@vmware.com>
This code isn't used anymore in preference for DRI2 client side swap buffers
throttling or throttling done inside the xa or xorg driver.
Signed-off-by: Jakob Bornecrantz <jakob@vmware.com>
Reviewed-by Brian Paul <brianp@vmware.com>
This peice of code has been here since the inital commit (c5c5cd71) and the
code that used instance_id_index was removed in (caede752) by José.
Signed-off-by: Jakob Bornecrantz <jakob@vmware.com>
Reviewed-by Brian Paul <brianp@vmware.com>
So the targets can drop the sw_wrapper winsys when no sw driver is being used.
Signed-off-by: Jakob Bornecrantz <jakob@vmware.com>
Reviewed-by Brian Paul <brianp@vmware.com>
This replaces the current code with an implementation compatible with
the new gallium interface. I've left some of the remains of the interface
intact so llvmpipe keeps building correctly, and I'll take a look at fixing
llvmpipe up later.
v2: fixup as per Brian's review
Signed-off-by: Dave Airlie <airlied@redhat.com>
This introduces an unspecified interpolation paramter that is only allowed for
color semantics, so a specified GLSL interpolation will override the ShadeModel
specified interpolation, but not vice-versa.
This fixes a lot of the interpolation tests in piglit.
v2: rename from unspecified to color
Signed-off-by: Dave Airlie <airlied@redhat.com>
This could lead to incorrect code when fixed regs are involved.
Surprisingly, the increased freedom actually leads to lower
register usage in some cases. Still want to find a better way
to treat constraints though ...
Always set position to insert before the current instruction,
the previous behaviour led to confusion (bug in checkPredicate
for BBs with only a single conditional branch).
Conflicts:
src/gallium/auxiliary/tgsi/tgsi_strings.c
src/mesa/state_tracker/st_atom_clip.c
commit d919791f2742e913173d6b335128e7d4c63c0840
Author: Christoph Bumiller <e0425955@student.tuwien.ac.at>
Date: Fri Jan 6 17:59:22 2012 +0100
d3d1x: adapt to new clip state
commit cfec82bca3fefcdefafca3f4555285ec1d1ae421
Author: Christoph Bumiller <e0425955@student.tuwien.ac.at>
Date: Fri Jan 6 14:16:51 2012 +0100
gallium/docs: update for clip state changes
commit c02bfeb81ad9f62041a2285ea6373bbbd602912a
Author: Christoph Bumiller <e0425955@student.tuwien.ac.at>
Date: Fri Jan 6 14:21:43 2012 +0100
tgsi: add TGSI_PROPERTY_PROHIBIT_UCPS
commit d4e0a785a6a23ad2f6819fd72e236acb9750028d
Author: Brian Paul <brianp@vmware.com>
Date: Thu Jan 5 08:30:00 2012 -0700
tgsi: consolidate TGSI string arrays in new tgsi_strings.h
There was some duplication between the tgsi_dump.c and tgsi_text.c
files. Also use some static assertions to help catch errors when
adding new TGSI values.
v2: put strings in tgsi_strings.c file instead of the .h file.
Reviewed-by: Dave Airlie <airlied@redhat.com>
commit c28584ce0d8c62bd92c8f140729d344f88a0b3cd
Author: Christoph Bumiller <e0425955@student.tuwien.ac.at>
Date: Fri Jan 6 12:48:09 2012 +0100
gallium: extend user_clip_plane_enable to apply to clip distances
commit f1d5016c07f786229ed057effbe55fbfd160b019
Author: Marek Olšák <maraeo@gmail.com>
Date: Fri Jan 6 02:39:09 2012 +0100
nvfx: adapt to new clip state
commit 6f6fa1c26bd19f797c1996731708e3569c9bfe24
Author: Marek Olšák <maraeo@gmail.com>
Date: Fri Jan 6 01:41:39 2012 +0100
st/mesa: fix DrawPixels with GL_DEPTH_CLAMP
commit c86ad730aa1c017788ae88a55f54071bf222be12
Author: Christoph Bumiller <e0425955@student.tuwien.ac.at>
Date: Tue Jan 3 23:51:30 2012 +0100
nv50: adapt to new clip state
commit 3a8ae6ac243bae5970729dc4057fe02d992543dc
Author: Christoph Bumiller <e0425955@student.tuwien.ac.at>
Date: Tue Jan 3 23:32:36 2012 +0100
nvc0: adapt to new clip state
commit 6243a8246997f8d2fcc69ab741a2c2dea080ff11
Author: Marek Olšák <maraeo@gmail.com>
Date: Thu Dec 29 01:32:51 2011 +0100
draw: initalize pt.user.planes in draw_init
This fixes a crash in glean/fpexceptions.
commit e3056524b19b56d473f4faff84ffa0eb41497408
Author: Marek Olšák <maraeo@gmail.com>
Date: Mon Dec 26 06:26:55 2011 +0100
svga: adapt to new clip state
commit c5bfa8b37d6d489271df457229081d6bbb51b4b7
Author: Marek Olšák <maraeo@gmail.com>
Date: Sun Dec 25 14:11:51 2011 +0100
r600g: adapt to new clip state
commit f11890905362f62627c4a28a8255b76eb7de7df2
Author: Marek Olšák <maraeo@gmail.com>
Date: Sun Dec 25 14:10:26 2011 +0100
r300g: adapt to new clip state
commit e37465327c79a01112f15f6278d9accc5bf3103f
Author: Marek Olšák <maraeo@gmail.com>
Date: Sun Dec 25 12:39:16 2011 +0100
draw: adapt to new clip state
This adds a regression in the LLVM clipping path. Can anybody see anything
wrong with the code? It works for every other case, just glean/fpexceptions
crashes when doing the "Infinite clip plane test".
commit b474d2b18c72d965eefae4e427c269cba5ce6ba2
Author: Marek Olšák <maraeo@gmail.com>
Date: Sun Dec 25 13:14:59 2011 +0100
u_blitter: don't save/set/restore clip state
commit 9dd240ea91f523a677af45e8d0adb9e661e28602
Author: Marek Olšák <maraeo@gmail.com>
Date: Sun Dec 25 13:11:56 2011 +0100
gallium: don't cso_save/set/restore clip state
The enable bits are in the rasterizer state.
commit a4f7031179f5f4ad524b34b394214b984ac950f6
Author: Marek Olšák <maraeo@gmail.com>
Date: Sun Dec 25 12:58:55 2011 +0100
gallium: default depth_clip to 1
depth_clip = !depth_clamp
commit fe21147a00ab90e549d63fe12ee4625c9c2ffcc3
Author: Marek Olšák <maraeo@gmail.com>
Date: Mon Dec 26 06:14:19 2011 +0100
trace,util: update state logging to new clip state
Also dump the other missing flags.
commit 2a3b96e84ac872dcc5bc1de049fe76bb58d64b23
Author: Marek Olšák <maraeo@gmail.com>
Date: Sun Dec 25 10:43:43 2011 +0100
st/mesa: adapt to new clip state
commit b7b656a42fca19d7c85267f42649a206a85a2c72
Author: Marek Olšák <maraeo@gmail.com>
Date: Sat Dec 17 15:45:19 2011 +0100
gallium: move state enable bits from clip_state to rasterizer_state
This brings the code in sync with gen6_sf_state.c; presumably the
mistake was a botched rebase on initial Ivybridge bring-up patches.
Found by diffing batch buffer dumps and noticing the random values.
Thanks to Eric for catching the obvious mistake.
NOTE: This is a candidate for the 7.11 branch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
In f3e9ccb3b, I renamed gen6_upload_wm_constants to
gen6_upload_wm_push_constants, but neglected to update this comment.
I don't think there ever was a gen7_prepare_wm_constants function; it
was probably a search and replace error. Of course, "prepare" functions
died a while back as well.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
CACHE_NEW_SAMPLER doesn't cover max_wm_threads, but it does cover
brw->sampler.count. BRW_NEW_PS_BINDING_TABLE is obvious, but it's
probably worth adding a comment anyway.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The dirty bit was already correctly in place, but there was no comment.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Also, annotate the use of _NEW_POINT as long as we're adding a comment.
NOTE: This is a candidate for the 7.11 branch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
According to a comment in gen6_sf_state, calls to get_attr_override need
both _NEW_PROGRAM and _NEW_LIGHT. Since Gen7 reuses the same function,
the same dirty bits should apply.
NOTE: This is a candidate for the 7.11 branch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The BRW_NEW_CURBE_OFFSETS dirty bit is only flagged by the
brw_curbe_offsets state atom which is only used on Gen4-5.
Since it's never flagged, there's no reason to depend on it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The BRW_NEW_URB_FENCE dirty bit is only flagged by the
brw_recalculate_urb_fence state atom which isn't used on Gen6+.
Since it's never flagged, there's no reason to depend on it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This brings the dirty bits in line with the comments.
This does /not/ need to be cherry-picked to stable branches because the
access requiring _NEW_BUFFERS was added in master as part of HiZ.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The trick was to produce an assignment in the IR along the lines of:
(assign (xyzw) (var_ref R0) (swiz wwww (var_ref R0) ))
which occurs only rarely even in code that looks like it should do
this, because of the assignment temporaries generated in ast_to_hir.
From the IR above, this optimization pass would then propagate
references of R0 into R0.wwww (seems reasonable), but without this
patch, a later reference of R0.wwww would see R0 first, turning that
into R0.wwww.wwww, which triggered opt_swizzle_swizzle, and then we
looped back to this code to do it again. Avoid that by skipping over
the usual ir_rvalue visitor's ir_swizzle hook, so that we get
handle_rvalue() on the ir_swizzle itself, not its referenced value.
Looking at only the swizzle will always optimize away at least as much
as looking at the swizzle's refererenced value.
We now still claim to propagate r0.w into r0.w, but at least we don't
trigger the loop.
v2: Rewrite commit message (changes by anholt)
Fixes piglit glsl-copy-propagation-self-1
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=34006
The r300 driver requires LLVM when building and other drivers that
depend on it for all TNL, like i915g will be a lot slower without it.
Signed-off-by: Jakob Bornecrantz <wallbraker@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The optimization was supposed to turn an attribute component that was
always 1.0 into a mov of 1.0. But by leaving loop this patch removes
out of that test, we applied the projection correction to the 1.0 and
got some other value, breaking openarena once it was converted to
using the new compiler backend.
Originally this hunk was separate from the former loop to make the
generated instructions slightly better pipelined. We now have
automatic instruction scheduling to handle that, and the generated
instruction sequence looked the same to me after this change (except
for the bugfix).
Previous to this patch, if the client requested transform feedback
using a subscript, but the variable was not an array
(e.g. "gl_FrontColor[0]"), we would produce a bogus error message like
"Transform feedback varying gl_FrontColor[0] found, but it's an array
([] expected)".
Changed the error message to e.g. "Transfrorm feedback varying
gl_FrontColor[0] requested, but gl_FrontColor is not an array."
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We were only comparing the number of depth and stencil bits but the
extension spec actually says the formats must match:
The error INVALID_OPERATION is generated if BlitFramebufferEXT is
called and <mask> includes DEPTH_BUFFER_BIT or STENCIL_BUFFER_BIT
and the source and destination depth or stencil buffer formats do
not match.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Commit acf82657f4 supposedly enabled
SIMD16 dispatch, but neglected to set the "16 Pixel Dispatch Enable"
bit, so nothing actually got enabled.
Furthermore, it neglected to set up the Dispatch GRF Start Register for
kernel 2, which is the SIMD16 program.
Increases performance in Nexuiz by ~15% at 800x600 (n=3).
NOTE: This is a candidate for the 7.11 branch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
$ dict invarient
No definitions found for "invarient", perhaps you mean:
gcide: Invariant
wn: invariant
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Replace target, level parameters with gl_texture_image.
Add gl_renderbuffer parameter to indicate source buffer for the copy.
This removes some redundant code in the drivers to find the source
renderbuffer and the destination texture image (which we already had
in _mesa_CopyTexSubImage).
Signed-off-by: Brian Paul <brianp@vmware.com>
We were comparing 32-bit Z buffer values against 16-bit fragment values.
Need to do scaling like for the 24-bit case.
Triangle Z testing was OK since it didn't hit this code path.
If the assertion was hit, it probably meant that we were unable to allocate
or map a vertex buffer. Instead of dying in a debug build, issue a warning
and continue.
We need to pass the pre-projection matrix clip planes into the driver,
instead of the post for the case we have a vertex shader that writes clip
vertex.
Signed-off-by: Dave Airlie <airlied@redhat.com>
translate signed/unsigned integers to coresponding uint/sint r32g32b32a32 types.
This fixes a bunch of piglit tests.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Brian mentioned that mesa-demos/reflect was broken on softpipe,
by my previous commit. The problem was were blindly translating none
to perspective, when color/pntc at least need it linear.
this is the final version that fixes the reflect regression.
Signed-off-by: Dave Airlie <airlied@redhat.com>
The IR for mix(float, float, bool) was missing a write mask, causing the
IR reader to die horribly. Furthermore, I neglected to add any of the
new prototypes to the 1.30 profiles.
Fixes oglconform's glsl-bif-com advanced.mix test cases.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44477
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The various l-value errors this was designed to catch are now caught
by other means. Marking the temporaries as read-only now just
prevents sensible error messages from being generated. It's
0:0(0): error: function parameter 'out p' references the read-only variable '_post_incdec_tmp'
versus
0:13(5): error: function parameter 'out p' references a post-decrement operation
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
These were used by swrast to make a combined depth+stencil buffer look
like separate depth and stencil buffers. But that's no longer needed
after rewriting the depth/stencil code in swrast.
Reviewed-by: Eric Anholt <eric@anholt.net>
These functions updated the gl_renderbuffer::_DepthBuffer and
_StencilBuffer fields. But those fields are no longer used.
Reviewed-by: Eric Anholt <eric@anholt.net>
Everything about this that we have tests for works except for the
deprecated metaops. The conclusion we came to on IRC sounded like we
were OK with turning it on as long as core functionality works. The
remaining failures (copypixels, drawpixels) should just be a matter of
finishing the MapRenderbuffer for them.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes on i965:
ARB_depth_buffer_float/fbo-depthstencil-GL_DEPTH32F_STENCIL8-blit
ARB_depth_buffer_float/fbo-stencil-GL_DEPTH32F_STENCIL8-blit
Reviewed-by: Brian Paul <brianp@vmware.com>
We were converting our ubyte stencil value to a float. Just write it
as a uint, which overwrites the X24 part of X24S8 with 0 but shouldn't
matter.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
I'm so surprised that gcc didn't catch this that I feel like I must be
misreading. srcMap is what we initialize (along with dstMap) from
this map value right after this check.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
They were meaning to do the same thing of memcpying rows, so just
write the code once.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
I'm going to reuse this function from glBlitFramebuffer() handling,
which wants to do the same thing.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
The last major issue (intervening-read) is fixed, so let's turn this
on for real. The only other known issue is a hardware limitation for
tesselation with flat shading.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
We need the kernel to reset our pointers to 0 in between. Note that
the initialization of function pointer had to move to after
InitContext since we didn't have intel->gen set up yet.
Fixes piglit EXT_transform_feedback/immediate-reuse
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The new kernel patch I submitted makes the interface opt-in, so all
batchbuffers aren't preceded by the 4 MI_LOAD_REGISTER_IMMs. This
requires the updated i915_drm.h present in libdrm 2.4.30.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The existing glsl_to_tgsi::remove_output_read pass did not work properly
when indirect addressing was involved; this commit replaces it with a
lowering pass that occurs before TGSI code generation.
Fixes varying-array related piglit tests.
Signed-off-by: Vincent Lejeune <vljn@ovi.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is similar to Gallium's existing glsl_to_tgsi::remove_output_read
lowering pass, but done entirely inside the GLSL compiler.
Signed-off-by: Vincent Lejeune <vljn@ovi.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This fixes
draw-elements-base-vertex user_varrays
draw-elements-instanced-base-vertex user_varrays
for softpipe with no llvm support (DRAW_USE_LLVM=false)
I'm not sure if this is the correct answer, but these tests were showing
a max_index of 7, then trying to fetch up to 43, maybe it should be fixing
max_index earlier somewhere to take care of this.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This patch silences these GCC warnings.
warning: unused variable 'texelBytes'
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
It is not explicitly stated in the GL 3.0 spec that transform feedback
can be performed on a whole varying array (without supplying a
subscript). However, it seems clear from context that this was the
intent. Section 2.15 (TransformFeedback) says this:
When writing varying variables that are arrays, individual array
elements are written in order.
And section 2.20.3 (Shader Variables), says this, in the description
of GetTransformFeedbackVarying:
For the selected varying variable, its type is returned into
type. The size of the varying is returned into size. The value in
size is in units of the type returned in type.
If it were not possible to perform transform feedback on an
unsubscripted array, the returned size would always be 1.
This patch fixes the linker so that transform feedback on an
unsubscripted array is supported.
Fixes piglit tests "EXT_transform_feedback/builtin-varyings
gl_ClipDistance[{4,8}]-no-subscript" and
"EXT_transform_feedback/output_type *[2]-no-subscript".
Note: on back-ends that set
gl_shader_compiler_options::LowerClipDistance (for example i965),
tests "EXT_transform_feedback/builtin-varyings
gl_ClipDistance[{1,2,3,5,6,7}]" still fail. I hope to address this in
a later patch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
With the addition of unit tests in commit
3ef3ba4d2e, several additional build
artifacts are created:
bin/depcomp
bin/missing
tests/Makefile
tests/Makefile.in
tests/glx/Makefile
tests/glx/Makefile.in
tests/glx/.deps/
tests/glx/.gitignore
This patch adds all of these files to .gitignore.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously we were using
gl_transform_feedback_object::Buffers[i]->Name to service an indexed
get request for GL_TRANSFORM_FEEDBACK_BUFFER_BINDING. However, if no
buffer has been bound, gl_transform_feedback_object::Buffers[i] is
NULL, so this was causing a segfault.
This patch switches to using
gl_transform_feedback_object::BufferNames[i], which is equal to
gl_transform_feedback_object::Buffers[i]->Name if
gl_transform_feedback_object::Buffers[i] is not NULL, and 0 if it is
NULL.
Fixes piglit test "EXT_transform_feedback/get-buffer-state
indexed_binding".
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
On drivers that set gl_shader_compiler_options::LowerClipDistance (for
example i965), references to gl_ClipDistance (a float[8] array) will
be converted to references to gl_ClipDistanceMESA (a vec4[2] array).
This patch modifies the linker so that requests for transform feedback
of gl_ClipDistance are similarly converted.
Fixes Piglit test "EXT_transform_feedback/builtin-varyings
gl_ClipDistance".
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When using transform feedback, there are three circumstances in which
it is useful for Mesa to instruct a driver to stream out just a
portion of a varying slot (rather than the whole vec4):
(a) When a varying is smaller than a vec4, Mesa needs to instruct the
driver to stream out just the first one, two, or three components of
the varying slot.
(b) In the future, when we implement varying packing, some varyings
will be offset within the vec4, so Mesa will have to instruct the
driver to stream out an arbitrary contiguous subset of the components
of the varying slot (e.g. .yzw or .yz).
(c) On drivers that set gl_shader_compiler_options::LowerClipDistance,
if the client requests that an element of gl_ClipDistance be streamed
out using transform feedback, Mesa will have to instruct the driver to
stream out a single component of one of the gl_ClipDistance varying
slots.
Previous to this patch, only (a) was possible, since
gl_transform_feedback_info specified only the number of components of
the varying slot to stream out. This patch adds
gl_transform_feedback_info::ComponentOffset, which indicates which
components should be streamed out.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, on i965 Gen6 and above, we weren't allocating space for
gl_ClipVertex in the VUE, since the VS was automatically converting it
to clip distances. This prevented transform feedback from being able
to capture gl_ClipVertex.
This patch goes aheads and allocates space for gl_ClipVertex in the
VUE on Gen6 and above. The old behavior is retained on Gen5 and
below, since (a) transform feedback is not yet supported on those
platforms, and (b) those platforms don't currently support
gl_ClipVertex anyhow.
Note: this constitutes a slight waste of VUE space for shaders that
use gl_ClipVertex and don't use transform feedback to capture it.
However, that seems preferable to making the VUE map (and all of the
state that depends on it) dependent on transform feedback settings.
Fixes Piglit test "EXT_transform_feedback/builtin-varyings
gl_ClipVertex".
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
On i965 Gen6 and above, gl_PointSize is stored in component W of the
first VUE slot (which corresponds to VERT_RESULT_PSIZ in the VUE map).
Normally we store varying floats in component X of a VUE slot, so we
need special case logic for gl_PointSize.
For Gen6, we do this with a ".wwww" swizzle in the GS. For Gen7, we
shift the component mask by 3 to select the W component.
Fixes Piglit test "EXT_transform_feedback/builtin-varyings
gl_PointSize".
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Commit 9d36c96d6e (mesa: Fix
glGetTransformFeedbackVarying()) accidentally added an extra memset()
call to the store_tfeedback_info() function, causing
prog->LinkedTransformFeedback.NumBuffers to be erased.
This patch removes the extra memset and rearranges the other
operations in store_tfeedback_info() to be in the correct order.
Fixes piglit tests "EXT_transform_feedback/api-errors *unbound*"
Reviewed-by: Eric Anholt <eric@anholt.net>
The src/dst arrays would overlap but dst was less than src so a simple
version of memcpy() would do the right thing. But this isn't guaranteed
when memcpy() is optimized.
Fixes demos/copypix when the dest region was clipped by the left side of
the window.
Reviewed-by: Adam Jackson <ajax@redhat.com>
This is useful for apps which don't print FPS.
Only enabled in SwapBuffers.
v2: track state per drawable, use libGL prefix
Reviewed-by: Michel Dänzer <michel@daenzer.net>
Do it after we check whether inst_end != -1.
Also move the code structure at the beginning of r300_fragment_shader_code
to detect underflows easily with valgrind.
Improves performance from cca 1 fps to 23 fps in Cogs.
This new codepath is not always used, instead, there is a heuristic which
determines whether to use it. Using translate for uploads is generally
slower than what we have had already, it's a win only in a few cases.
This is for GL_ARB_vertex_type_2_10_10_10_rev.
I just took the code from u_format_table.c. It's based on pack_rgba_float.
I had no other choice. The u_format hooks are not exactly compatible
with translate. The cleanup of it is left for future work.
Reviewed-by: Dave Airlie <airlied@redhat.com>
The conversion is limited to only a few cases, because converting to any other
type shouldn't happen in any driver.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Fetching int as float and vice versa is not allowed.
Fetching unsigned int as signed int and vice versa is not allowed either.
Doing conversions like that isn't allowed for samplers in OpenGL.
The three hooks could be consolidated into one fetch hook, which would fetch
uint as uint32, sint as sint32, and everything else as float. The receiving
parameter would be void*. This would be useful for implementing vertex fetches
for shader model 4.0, which has untyped registers.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Please see the diff for further info.
This paves the way for moving user buffer uploads out of drivers and should
allow to clean up the mess in u_upload_mgr in the meantime.
For now only allowed for buffers on r300 and r600.
Acked-by: Christian König <deathsimple@vodafone.de>
We don't wanna convert per-instance or constant (zero-stride) attribs into
ordinary vertex attribs.
More importantly, the translation of instance attribs now finally works.
To match what transfer_map returns. Really, subtracting the offset leads
to bugs if someone expects it to work exactly like transfer_map.
Reviewed-by: Brian Paul <brianp@vmware.com>
The current implementation was totally broken -- it was looking in an
unpopulated structure for varyings, and trying to do so using the
current list of varying names, not the list used at link time.
v2: Fix leaking of memory into the program per re-link.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
There was some duplication between the tgsi_dump.c and tgsi_text.c
files. Also use some static assertions to help catch errors when
adding new TGSI values.
v2: put strings in tgsi_strings.c file instead of the .h file.
Reviewed-by: Dave Airlie <airlied@redhat.com>
This reverts commit 5a478976ae.
It broke the build. DRI drivers were no longer being installed by
`make install` (and probably not being built at all). It appears to be
due to a few small, subtle mistakes, and the fix isn't clear enough to
simply commit without going through review. In the meantime, revert it.
All other xorg modules require at least 2.60 (released in 2006), so we
may as well increase it to match. It's also doubtful anyone tests the
build with 2.59 (from 2003), so it may not even work anyway.
Adds two missing '|| srcFormat == GL_RG_INTEGER' in assertions and a
bunch of missing pixel converions cases.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Commit 0ed11e3331 fixed a "use after free"
bug by getting the next pointer before deleting the current node.
Unfortunately, it also made "next" never get updated if i->need != need.
Fixes infinite loops in piglit tests fbo-depth-array and fbo-depthtex.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
textureSize() returns an int, ivec2, or ivec3, but never an ivec4.
Creating the destination register as an ivec4 triggered later failures,
even though the register did hold the proper values.
For example, piglit test vs-textureSize-compare calls textureSize on a
2D texture and compares the result to an expected value. Unfortunately,
our generated code also tried to compare the third and fourth components
which were undefined, and failed.
Fixes piglit test vs-textureSize-compare as well as 19 subcases of
oglconform's glsl-bif-tex-size test.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44339
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Commit d45814c925 totally added a data
dependency on _NEW_TEXTURE, even including the comment, but didn't
actually add the dirty bit.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
From the EXT_transform_feedback spec:
The error INVALID_OPERATION is also generated by BeginTransformFeedbackEXT
if no binding points would be used, either because no program object is
active or because the active program object has specified no varying
variables to record.
...
The error INVALID_VALUE is generated by BindBufferRangeEXT or
BindBufferOffsetEXT if <offset> is not word-aligned.
Fixes Piglit tests:
- EXT_transform_feedback/api-errors no_prog_active
- EXT_transform_feedback/api-errors interleaved_no_varyings
- EXT_transform_feedback/api-errors separate_no_varyings
- EXT_transform_feedback/api-errors bind_offset_offset_1
- EXT_transform_feedback/api-errors bind_offset_offset_2
- EXT_transform_feedback/api-errors bind_offset_offset_3
- EXT_transform_feedback/api-errors bind_offset_offset_5
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
From the EXT_transform_feedback spec:
The error INVALID_OPERATION is generated by
BeginTransformFeedbackEXT if any transform feedback buffer object
binding point used in transform feedback mode does not have a
buffer object bound.
This required adding a new NumBuffers field to the
gl_transform_feedback_info struct, to keep track of how many transform
feedback buffers are required by the current program.
Fixes Piglit tests:
- EXT_transform_feedback/api-errors interleaved_unbound
- EXT_transform_feedback/api-errors separate_unbound_0_1
- EXT_transform_feedback/api-errors separate_unbound_0_2
- EXT_transform_feedback/api-errors separate_unbound_1_2
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Other parts of the compiler assume that expressions will have
well-formed types or the error type. Just using the type of the thing
being operated on can cause expressions like ~3.14 or ~false to not
have a well-formed type. This could then result in an assertion
failure in the context epxression handler.
If there is an error processing the expression, set the type of the IR
expression to error.
Fixes piglit's bit-not-0[789].frag tests.
NOTE: This is a candidate for the 7.11 branch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=42755
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: Vinson Lee <vlee@vmware.com>
Detect whether a new enough version of XCB is installed at configure
time. If it is not, don't enable the extension and don't build the
unit tests.
v2: Move the AM_CONDIATION outside the case-statement so that it is
invoked even for non-GLX builds. This prevents build failures with
osmesa, for example.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Robert Hooker <robert.hooker@canonical.com>
The core Mesa code does the equivalent memory allocation, image mapping,
storing and unmapping. We just need to call prep_teximage() first to
handle the 'surface_based' stuff.
The other change is to always use the level=0 mipmap image when accessing
individual mipmap level images that are stored in resources/buffers.
Apparently, we were always using malloc'd memory for individual mipmap
images, not resource buffers, before.
Signed-off-by: Brian Paul <brianp@vmware.com>
This was disabled a year ago due to not having a story for handling
the blitter at the time. We're fine with using the blitter now.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The new assert in intelEmitCopyBlit() gets angry if we don't align to
dwords. Rather than make the assert have a special case for height ==
1 on the assumption that the hardware doesn't use it in that case,
just supply a correct pitch.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=43214
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We didn't consume these flags in any way that would produce a
functional difference, but we might have some day.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We don't want to match the visual against the default screen. If the
drawable is on a non-default screen then the appropriate visual might not
exist on the default screen. Conversely, if the same visual is
available on multiple screens then simply selecting for the right VID is
sufficient, since the server has promised that the same visual is
compatible with multiple screens.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Adam Jackson <ajax@redhat.com>
There are a couple scenarios where the source could be zero and the
operand could be either SRC_ALPHA or ONE_MINUS_SRC_ALPHA. For
example, if the source was ZERO. This would result in something like
(0).w, and a later call to ir_validate would get angry.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=42517
Coverity reported a read from pointer after free defect in
src/mesa/drivers/dri/intel/intel_mipmap_tree.c. Bug# 44205
In intel_miptree_all_slices_resolve() function, i = i->next was
executing after freeing i. I have defined a temporary variable
(next) to store the value of i->next before freeing i
Reported-by: Vinson Lee <vlee@vmware.com>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
In the interest of softpipe preferring correctness over speed and passing more
piglit tests, set this to off by default. For speed you really want llvmpipe.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This patch remove the 32bits limitation. As a side effect, it bring the support for the GL_ARB_depth_buffer_float extension.
No regression have been found on piglit, and all tests for GL_ARB_depth_buffer_float pass successfully.
Signed-off-by: Dave Airlie <airlied@redhat.com>
If glUniform1i and friends are going to dump data directly in
driver-allocated, the pointers have to be updated when the storage
moves. This should fix the regressions seen with commit 7199096.
I'm not sure if this is the only place that needs this treatment. I'm
a little uncertain about the various functions in st_glsl_to_tgsi that
modify the TGSI IR and try to propagate changes about that up to the
gl_program. That seems sketchy to me.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
v2:
Revalidate when shader_program is not NULL.
Update the pointers for all _LinkedShaders.
Init glsl_to_tgsi_visitor::shader_program to NULL in the
get_pixel_transfer_visitor & get_bitmap_visitor.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
A lot of tests in 'make check' will fail under these circumstances,
but at least the build should work.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds a new tests directory at the top-level and some extra build
infrastructure. The tests use the Google C++ Testing Framework, and
they will only be built if configure can detect its availability. The
tests are automatically wired-in to run with 'make check'.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Chad Versace <chad.versace@linux.intel.com>
Using 'new' as a function parameter name prevents including
glxclient.h the unit tests (future patch) that use the Google C++
Testing Framework.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
This extension is only enabled if the underlying driver advertises
support for OpenGL ES 2.0. This happens either through the getAPIMask
function in version 2 of the DRI2 extension or implicity through
version 2 of the DRISW extension.
Since there is no OpenGL ES 2.0 protocol, this extension is marked as
only available with direct-rendering.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
This also enables GLX_ARB_create_context and
GLX_ARB_create_context_profile if the driver supports DRI_DRISW
version 3 or greater.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
This converts all of the GLX data from glXCreateContextAttribsARB to
the values expected by the DRI driver interfaces.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
This adds the function and modifies dri2CreateNewContextForAPI to call
it. At this point only version 2 of the DRI2 API is advertised to the
loader.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Note that these extensions are not automatically enabled for screens
capable of direct-rendering.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
This also enables GLX_ARB_create_context and
GLX_ARB_create_context_profile if the driver supports DRI_DRI2 version
3 or greater.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
__glX_send_client_info only supports XCB, so use that instead of
__glXClientInfo when USE_XCB is defined.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
This function picks the correct client-info protocol (based on the
server's GLX version and set of extensions) and sends it to the
server.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
This code was generating the gcc warning:
variable ‘clearValue’ set but not used [-Wunused-but-set-variable]
Reviewed-by: Brian Paul <brianp@vmare.com>
The were always zero. When doing a sub-texture replacement we account
for the dstX/Y/Zoffsets when we map the texture image. So no need to
pass them into the texstore code anymore.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Buffers for shader based decoding can now be
released without its component still being around.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Acked-by: Maarten Lankhorst <m.b.lankhorst@gmail.com>
Changing pipe_resource was wrong, because it can be used by other contexts
at the same time. This fixes the last possible race condition in r300g
that I know of.
This also fixes blitting NPOT compressed textures. Random pixels sometimes
appeared at the right-hand edge of the texture.
Finally, this removes r300_texture_desc::stride_in_pixels. It makes little
sense with sampler views and surfaces being able to override width0, height0,
and the format entirely.
This fixes a regresssion (broken cube maps) caused by the
ctx->Driver.TexImage parameter simplification commit. The target var
is always GL_TEXTURE_CUBE_MAP at this point so the Face field was always
getting set to zero.
These field assignments aren't needed anyway since core Mesa sets them.
This fixes the latc fetches for llvmpipe, fixes
fbo-generatemipmap-formats GL_ARB_texture_compression
fbo-generatemipmap-formats GL_ATI_texture_compression_3dc
fbo-generatemipmap-formats GL_EXT_texture_compression_latc
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Dave Airlie <airlied@gmail.com>
As with previous commits, the target, level and texObj info can be
obtained through the texImage pointer.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
As with TexSubImage(), the target, level and texObj values can be obtained
through the texImage pointer.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
There's no need to pass the target, level and texObj parameters since
they can be easily obtained from the texImage pointer.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Since the move to Map/UnmapTextureImage, the core mesa routines are
equivalent to what the state tracker was doing.
The TexImage functions can be replaced too, but there's a few differences
that will need to be handled.
inv_swizzles is used in lp_tile_soa.py to create lp_tile_soa.c, we overwrite swizzles if they are already set.
This results in the i8 format getting alpha instead of red, and the l8 format
getting blue instead of red.
Fixes fbo-alphatest-formats, fbo-alphatest-formats ARB_texture_float,
and fbo-alphatest-formats EXT_texture_snorm on llvmpipe.
Signed-off-by: Dave Airlie <airlied@redhat.com>
introduce vbo_sizeof_ib_type() function to return the index data type
size. I see some place use switch(ib->type) to get the index data type,
which is sort of duplicate.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
On failure, intel_miptree_create() needs to *release* the miptree, not
just free it, so that the stencil_mt gets released too.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
This saves a couple of instructions on most programs with control
flow. More interestingly, 6 shaders from unigine sanctuary now fit
into 16-wide without register spilling.
We were printing out the line triggering the flush, but a variety of
different causes just printed the line number for intel_flush()'s call
of intel_batchbuffer_flush(). Plumb the line numbers from the caller
of intel_flush() on through.
Since the refactor in d7b33309fe, depth
in the miptree changed from 1 to 6, so we always decided it didn't
match, and we would relayout to something that would still not
"match".
Improves performance 23.8% (+/- 1.1%, n=4)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=43329
Fixes this GCC warning.
arrayobj.c: In function '_mesa_update_array_object_max_element':
arrayobj.c:310: warning: implicit declaration of function 'ffsll'
Signed-off-by: Vinson Lee <vlee@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
bitset.h is still used by classic nouveau -- see `git grep '\<BITSET_'`
-- and the state stored is too big to fit in 64bit integers (it requires
approximately 87 bits), so there is no obvious alternative here.
This effecively reverts commit 196800d798.
Since commit 82b9661894 and
34eae1c72a vbo support
is mandatory for all drivers. So, remove the remaining
FEATURE_ARB_vertex_buffer_object guards.
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
Consider the following code:
MOV A.x, B.x
MOV B.x, C.x
After the first line, cur_value[A][0] == B, indicating that A.x's
current value came from register B.
When processing the second line, we update cur_value[B][0] to C.
However, for drect copies, we fail to reset cur_value[A][0] to NULL.
This is necessary because the value of A is no longer the value of B.
Fixes Counter-Strike: Source in Wine (where the menu rendered completely
black in DX9 mode), completely white textures in Civilization V, and the
new Piglit test glsl-vs-copy-propagation-1.shader_test.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=42032
Tested-by: Matt Turner <mattst88@gmail.com>
Tested-by: Christopher James Halse Rogers <christopher.halse.rogers@canonical.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
In this code, 'i' loops over the number of virtual GRFs, while 'j' loops
over the number of vector components (0 <= j <= 3).
It can't possibly be correct to see if bit 'i' is set in the destination
writemask, as it will have values much larger than 3. Clearly this is
supposed to be 'j'.
Found by inspection.
Tested-by: Matt Turner <mattst88@gmail.com>
Tested-by: Christopher James Halse Rogers <christopher.halse.rogers@canonical.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
In Android IceCreamSandwich, SurfaceFlinger requires GL_OES_image_external
for basic compositing tasks. Without the extension, SurfaceFlinger fails
to start.
Despite the incompleteness of the extension's implementation introduced by
this patch, it is good enough to enable SurfaceFlinger and to unblock the
people who need to begin testing Mesa on IceCreamSandwich.
To enable the extension, set the environment variable
MESA_EXTENSION_OVERRIDE="+GL_OES_EGL_image_external". Ideally, Android
should set this in init.rc.
WARNING: This implementation of GL_OES_EGL_image_external is not complete.
Some of it is even incorrect. When we begin to really implement
GL_OES_EGL_image_external, much of the patch will need reverting.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
If the meta flag MESA_META_TEXTURE is present, then disable the texture
target GL_TEXTURE_EXTERNAL_OES.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
just noticed this in passing, not sure it actually fixes any issus.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmare.com>
Now the gl_array_object's layout matches the one used in
recalculate_input_bindings. Make use of this and remove the
bind_array_obj function.
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmare.com>
This fixes a regression seen with the isosurf demo when switching between
glBegin/End and glDrawArrays (do it several times). The problem was the
driver wasn't getting _NEW_ARRAY when the arrays were subtly changed:
(vertex3f, normal3f) vs. (normal3f, vertex3f).
This patch fixes that by signaling _NEW_ARRAY whenever we transition
between glBegin/End and glDrawArrays mode and display lists.
The patch also fixes up the initialization of the map_vp_none[] array
to stop putting strange values in the last five elements of the array.
v2: remove DRAW_ELEMENTS, don't distinguish between glDrawArrays and
glDrawElements
v3: add DRAW_DISPLAY_LIST for the display list case, just to be safe.
Reviewed-by: Mathias Froehlich <Mathias.Froehlich@web.de>
Tested-by: Mathias Froehlich <Mathias.Froehlich@web.de>
Remove gl_light::_dli and gl_light::_sli.
Both are only used for a value previously used in
color indexed rendering. Also both variables are only used
and never written.
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
Here is the final patch to enable dynamic eu instruction store size:
increase the brw eu instruction store size dynamically instead of just
allocating it statically with a constant limit. This would fix something
that 'GL_MAX_PROGRAM_INSTRUCTIONS_ARB was 16384 while the driver would
limit it to 10000'.
v2: comments from ken, do not hardcode the eu limit to (1024 * 1024)
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
A single next_insn may change the base address of instruction store
memory(p->store), so call it first before referencing the instruction
store pointer from an index.
This the final prepare work to enable the dynamic store size.
v2: comments from Ken, define emit_endif as bool type
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
If dynamic instruction store size is enabled, while after the brw_JMPI()
and before the brw_land_fwd_jump() function, the eu instruction store
base address(p->store) may change. Thus, the safe way to reference the
jmp instruction is by index instead of by the instruction address.
v2: comments from Eric, don't change the prototype of brw_JMPI
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
If dynamic instruction store size is enabled, while after
the brw_IF/ELSE() and before the brw_ENDIF() function, the
eu instruction store base address(p->store) may change.
Thus let if_stack just store the instruction index. This is
somehow more flexible and safe than store the instruction
memory address.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This partially reverts commit 363ff84475.
It caused severe performance drops in Nexuiz. Reported by Phoronix.
Tested by me on r300g and by IRC people on r600g.
When updating SOL indices, we were accidentally putting the starting
index in dword 1 and the SVBI number to increment in dword 2--these
should be reversed. Usually both of these values are zero, so we
didn't see any problem. However, if a transform feedback operation
spans multiple batch buffers, the starting index will be nonzero.
Fixes piglit test "EXT_transform_feedback/intervening-read output".
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When rendering triangle strips, vertices come down the pipeline in the
order specified, even though this causes alternate triangles to have
reversed winding order. For example, if the vertices are ABCDE, then
the GS is invoked on triangles ABC, BCD, and CDE, even though this
means that triangle BCD is in the reverse of the normal winding order.
The hardware automatically flags the triangles with reversed winding
order as _3DPRIM_TRISTRIP_REVERSE, so that face culling and two-sided
coloring can be adjusted to account for the reversed order.
In order to ensure that winding order is correct when streaming
vertices out to a transform feedback buffer, we need to alter the
ordering of BCD to BDC when the first provoking vertex convention is
in use, and to CBD when the last provoking vertex convention is in
use.
To do this, we precompute an array of indices indicating where each
vertex will be placed in the transform feedback buffer; normally this
is SVBI[0] + (0, 1, 2), indicating that vertex order should be
preserved. When the primitive type is _3DPRIM_TRISTRIP_REVERSE, we
change this order to either SVBI[0] + (0, 2, 1) or SVBI[0] + (1, 0,
2), depending on the provoking vertex convention.
Fixes piglit tests "EXT_transform_feedback/tessellation
triangle_strip" on Gen6.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The code for storing 1D, 2D and 3D tex images (whole or sub-images) was
all pretty similar. This consolidates those six paths.
v2: rework switch statement to catch unexpected targets
Reviewed-by: José Fonseca <jfonseca@vmware.com>
For 1D arrays, map each slice separately. Note that this was handled
correctly in _mesa_store_teximage2d() but not here.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
This gets rid of another renderbuffer->PutRow() call and _DepthBuffer
usage. We always work with 32-bit uint Z values now.
Reviewed-by: Eric Anholt <eric@anholt.net>
Use Map/UnmapRenderbuffer() for the special, optimized cases we care about.
Note that we're dropping some seldom-used cases in the new fast-path
code: as CI->RGB conversion and zooming.
Reviewed-by: Eric Anholt <eric@anholt.net>
We don't want to call these functions where we'll be using
Map/UnmapRenderbuffer(). So push them further down in the drawpixels
cases so that we can switch over to Map/UnmapRenderbuffer() step by step.
Reviewed-by: Eric Anholt <eric@anholt.net>
Stop using deprecated renderbuffer PutRow() function. Note that we
aren't using Map/UnmapRenderbuffer() yet because this call is inside
a swrast_render_start/finish() pair.
v2: use _mesa_pack_uint_24_8_depth_stencil_row(), per Eric.
Hopefully glCopyPixels(GL_DEPTH_STENCIL) will be handled by the
fast copy function. Otherwise, just do the copy with separate
depth + stencil copies. That's effectively what the removed code
did anyway.
Reviewed-by: Eric Anholt <eric@anholt.net>
The functions that read depth/stencil values understand all (packed)
depth/stencil buffer formats now so there's no reason to use the
wrappers.
Also, improve the format checks in fast_copy_pixels() to catch mismatched
depth/stencil cases.
v2: fix the test for combined depth+stencil buffers, per Eric.
Stop using the deprecated renderbuffer Get/Put Row/Values functions.
Consolidate code paths, etc. The file is nearly half the size it used
to be!
Reviewed-by: Eric Anholt <eric@anholt.net>
Use format pack/unpack functions instead of deprecated renderbuffer
GetRow/PutRow functions.
v2: use get_stencil_address(), s/destVals/newVals/
Reviewed-by: Eric Anholt <eric@anholt.net>
The former was only used for clearing buffers. The later wasn't used
anywhere! Remove them and all implementations of those functions.
Reviewed-by: Eric Anholt <eric@anholt.net>
Another step toward getting rid of the renderbuffer PutRow/etc functions.
v2: fix assorted depth/stencil clear bugs found by Eric
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Fixed the build failure, fixed a warning where attributs and error arguments had
been
inverted and fixed another call that was missing an argument.
Signed-off-by: José Fonseca <jfonseca@vmware.com>
Fixes almost all of the transform feedback piglit tests. Remaining
are a few tests related to tesselation for
quads/trifans/tristrips/polygons with flat shading.
v2: Incorporate Paul's feedback (squash with previous, state flag note,
static assert, update FINISHME)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
Reviewed-by: Paul Berry <stereotype441@gmail.com>
The code was relying on gs.prog_data's copy of the
number-of-verts-per-prim, which segfaulted on gen7 since it doesn't
make a GS program. We can easily calculate that value right here.
v2: Fix svbi_0_starting_index regression.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Although there is not much documentation of this fact, there are in
fact two separate VF caches:
- an "index-based" cache (described in the Sandy Bridge PRM, vol 2
part 1, section 2.1.2 "Vertex Cache"). This cache stores URB
handles of vertex shader outputs; its purpose is to avoid redundant
invocations of the vertex shader when drawing in random access mode
(e.g. glDrawElements()), and the same vertex index is specified
multiple times. It is automatically invalidated between
3D_PRIMITIVE commands and between instances within a single
3D_PRIMITIVE command.
- an "address-based" cache (mentioned briefly in vol 2 part 1, section
1.7.4 "PIPE_CONTROL Command"). This cache stores the data read from
vertex buffers; its purpose is to avoid redundant memory accesses
when doing instanced drawing or when multiple 3D_PRIMITIVE commands
access the same vertex data. It needs to be manually invalidated
whenever new data is written to a buffer that is used for vertex
data.
Previous to this patch, it was not necessary for Mesa to explicitly
invalidate the address-based cache, because there were no reasonable
use cases in which the GPU would write to a vertex data buffer during
a batch, and inter-batch flushing was taken care of by the kernel.
However, with transform feedback, there is now a reasonable use case:
vertex data is written to a buffer using transform feedback, and then
that data is immediately re-used as vertex input in the next drawing
operation. To make this use case work, we need to flush the
address-based VF cache between transform feedback and the next draw
operation. Since we are already calling
intel_batchbuffer_emit_mi_flush() when transform feedback completes,
and intel_batchbuffer_emit_mi_flush() is intended to invalidate all
caches, it seems reasonable to add VF cache invalidation to this
function.
As with commit 63cf7fad13 (i965: Flush
pipeline on EndTransformFeedback), this is not an ideal solution. It
would be preferable to only invalidate the VF cache if the next draw
call was about to consume data generated by a previous draw call in
the same batch. However, since we don't have the necessary dependency
tracking infrastructure to figure that out right now, we have to
overzealously invalidate the cache.
Fixes Piglit test "EXT_transform_feedback/immediate-reuse".
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
After creating new binding table entries for transform feedback, we
need to set the dirty flag BRW_NEW_SURFACES, so that a new binding
table pointer will be sent to the hardware. Otherwise the new binding
table entries will not take effect.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The surface states tracked by BRW_NEW_WM_SURFACES are no longer used
for just WM. They are also used for vertex texturing and transform
feedback. To avoid confusion, this patch renames BRW_NEW_WM_SURFACES
to BRW_NEW_SURFACES.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
X8 depth formats weren't supported until Ironlake (Gen 5).
Fixes GPU hangs introduced in d84a180417.
One example test case was "fbo-missing-attachment-blit from".
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Fixes piglit tests "EXT_transform_feedback/generatemipmap buffer" and
"EXT_transform_feedback/generatemipmap prims_written" on i965 Gen6.
Reviewed-by: Brian Paul <brianp@vmare.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Although i965 gen6 does not yet support ARB_transform_feedback2 or
NV_transform_feedback2, it needs to support pause/resume functionality
so that meta-ops will work correctly.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When transform feedback is paused, it is legal to change programs or
to perform drawing operations using a drawing mode that doesn't match
the transform feedback mode.
Reviewed-by: Brian Paul <brianp@vmare.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If a client calls BeginTransformFeedback(), then
PauseTransformFeedback(), then EndTransformFeedback(), we need to make
sure that the transform feedback object is not left in a "paused"
state, otherwise the next call to BeginTransformFeedback() will leave
transform feedback paused.
Reviewed-by: Brian Paul <brianp@vmare.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
During meta-operations (such as _mesa_meta_GenerateMipmap()), we need
to be able to draw even if GL_RASTERIZER_DISCARD is enabled. This
patch causes _mesa_meta_begin() to save the state of
GL_RASTERIZER_DISCARD and disable it (so that drawing can be done
during the meta-op), and causes _mesa_meta_end() to restore it.
Fixes piglit test "EXT_transform_feedback/generatemipmap discard" on
i965 Gen6.
Reviewed-by: Brian Paul <brianp@vmare.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This won't be used in the client-side libGL, but the xserver has to
generate a different protocol error depending on the reason context
creation failed.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Chia-I Wu <olv@lunarg.com>
There seems to have been two different ways to communicate the
profile. There were flags and there were profiles. I've opted to
remove the profile flags and use ST_PROFILE_DEFAULT (compatibility
profile) and ST_PROFILE_OPENGL_CORE (core profile) consistently
instead.
Also change the values of the ST_CONTEXT_FLAG_DEBUG and
ST_CONTEXT_FLAG_FORWARD_COMPATIBLE flags to match the WGL and GLX
values.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Chia-I Wu <olv@lunarg.com>
If the server returned BadContext, the error would just get droped on
the floor.
Fixes the piglit test glx-import-context-single-process
NOTE: This is a candidate for the 7.11 branch, but it also requires
the previous patch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Only initialize vlc in MPEG2 decoding once for all slices,
add more sanity checks to vlc decoding functions, support
multiple vlc input buffer, improve documentation of the
vlc functions.
v2: also implement multiple inputs for the vlc functions
v3: some bug fixes for buffer size and alignment corner cases
v4: rework of the patch, some more improvements
Signed-off-by: Maarten Lankhorst <m.b.lankhorst@gmail.com>
Signed-off-by: Christian König <deathsimple@vodafone.de>
Double free and array overflow, even if only 2 members are
used the last one needs to be set to NULL explicitly.
Signed-off-by: Maarten Lankhorst <m.b.lankhorst@gmail.com
Hi all
This fixes a memory leak of 32 bytes on exit.
From 924f8fdccb41b011f372bc57252005bcdb096105 Mon Sep 17 00:00:00 2001
From: Lauri Kasanen <curaga@operamail.com>
Date: Thu, 22 Dec 2011 21:28:33 +0200
Subject: [PATCH] gallivm: Close a memory leak
As reported by "valgrind --leak-check=full glxgears".
Signed-off-by: Lauri Kasanen <curaga@operamail.com>
Signed-off-by: José Fonseca <jfonseca@vmware.com>
In the case where a front and back output are specified, the draw code will
copy the back output into the front color slot and everything is happy.
However if no front is specified then the draw code will do a bad copy (separate patch), but also the frag shader won't pick up the color as there there is
no write to COLOR from the vertex shader just BCOLOR.
This patch fixes that problem so if it can't find a vertex shader output
for the front color slot, it will go and lookup and use one for the back color
slot.
Signed-off-by: Dave Airlie <airlied@redhat.com>
fixing these makes piglit fbo-integer pass on softpipe.
modified to re-order things, haven't addressed Eric's concerns,
can't find anything in spec that mentions sign extensions, it does say
integers aren't clamped or modified.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This makes it easier to keep track of which dirty bits correspond to
which pieces of context, since it makes _NEW_RASTERIZER_DISCARD
correspond with ctx->RasterDiscard.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Previously we were storing the RasterDiscard flag (for
GL_RASTERIZER_DISCARD) in gl_context::TransformFeedback. This was
confusing, because we use the _NEW_TRANSFORM flag (not
_NEW_TRANSFORM_FEEDBACK) to track state updates to it, and because
rasterizer discard has effects even when transform feedback is not in
use.
This patch makes RasterDiscard a toplevel element in gl_context rather
than a subfield of gl_context::TransformFeedback.
Note: We can't put RasterDiscard inside gl_context::Transform, since
all items inside gl_context::Transform need to be pieces of state that
are saved and restored using PushAttrib and PopAttrib.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Previously, we only enabled transform feedback when
MESA_GL_VERSION_OVERRIDE was 3.0 or greater, since transform feedback
support was not completely finished, so it didn't make sense to
advertise support for it unless absolutely necessary.
Now that transform feedback is fully implemented on gen6, we can
enable this extension unconditionally.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch adds software-based PRIMITIVES_GENERATED and
TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN queries that work by keeping
track of the number of primitives that are sent down the pipeline, and
adjusting as necessary to account for the way each primitive type is
tessellated.
In the long run we'll want to replace this with a hardware-based
implementation, because the software approach won't work with geometry
shaders or primitive restart. However, at the moment, we don't have
the necessary kernel support to implement a hardware-based query (we
would need the kernel to save GPU registers when context switching, so
that drawing performed by another process doesn't get counted).
Fixes Piglit tests EXT_transform_feedback/query-primitives_generated-*
and EXT_transform_feedback/query-primitives-written-*.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, i965 only supported two query types: GL_TIME_ELAPSED_EXT
and GL_SAMPLES_PASSED_ARB, and it distinguished between the two using
if/else statements that compared query->Base.Target to
GL_TIME_ELAPSED_EXT.
This patch changes the if/else statements to switch statements so that
we can add more query types without having to have a chain of
else-ifs.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We don't currently have kernel support for saving GPU registers on a
context switch, so if multiple processes are performing transform
feedback at the same time, their SVBI registers will interfere with
each other. To avoid this situation, we keep a software shadow of the
state of the SVBI 0 register (which is the only register we use), and
re-upload it on every new batch.
The function that updates the shadow state of SVBI 0 is called
brw_update_primitive_count, since it will also be used to update the
counters for the PRIMITIVES_GENERATED and
TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN queries.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is needed by i965 to ensure that transform feedback counters are
not incremented during meta-ops.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This function computes the number of primitives that will be generated
when the given drawing operation is performed. It accounts for the
tessellation that is performed on line strips, line loops, triangle
strips, triangle fans, quads, quad strips, and polygons, so it is
suitable for implementing the primitive counters needed by transform
feedback.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It isn't necessary to call FLUSH_VERTICES from bind_buffer_range,
because transform feedback buffers are not allowed to be changed when
transform feedback is active.
Thanks to Marek Olšák for pointing out this bug.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
This patch enables rasterizer discard functionality (a part of
transform feedback) in Gen6, by generating an alternate GS program
when rasterizer discard is active. Instead of forwarding vertices
down the pipeline, the alternate GS program uses a URB Write message
to deallocate the URB entry that was allocated by FF sync and
terminate the thread.
Note: parts of the Sandy Bridge PRM seem to imply that we could do
this more efficiently, by clearing the GEN6_GS_RENDERING_ENABLE bit,
and not allocating a URB entry at all. However, it's not clear how we
are supposed to terminate the thread if we do that. Volume 2 part 1,
section 4.5.4, says "GS threads must terminate by sending a URB_WRITE
message with the EOT and Complete bits set.", and my experiments so
far confirm that.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
A common use case for transform feedback is to perform one draw
operation that writes transform feedback output to a buffer, followed
by a second draw operation that consumes that buffer as vertex input.
Since vertex input is consumed at an earlier pipeline stage than
writing transform feedback output, we need to flush the pipeline to
ensure that the transform feedback output is completely written before
the data is consumed.
In an ideal world, we would do some dependency tracking, so that we
would only flush the pipeline if the next draw call was about to
consume data generated by a previous draw call in the same batch.
However, since we don't have that sort of dependency tracking
infrastructure right now, we just unconditionally flush the buffer
every time glEndTransformFeedback() is called. This will cause a
performance hit compared to the ideal case (since we will sometimes
flush the pipeline unnecessarily), but fortunately the performance hit
will be confined to circumstances where transform feedback is in use.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Previous to this patch, the function intel_batchbuffer_emit_mi_flush()
was a bit of a misnomer. On Gen4+, when not using the blit engine, it
didn't actually flush the pipeline--it simply generated a PIPE_CONTROL
command with the necessary bits set to flush GPU caches. This was
usually sufficient, since in most situations where
intel_batchbuffer_emit_mi_flush() was called, all we really care about
was ensuring cache coherency.
However, with the advent of OpenGL 3.0, there are two cases in which
data output by one stage of the pipeline might be consumed, in a later
draw operation, by an earlier stage of the pipeline:
(a) When using textures in the vertex shader.
(b) When using drawing with a vertex buffer that was previously
generated using transform feedback.
This patch addresses case (a) by changing
intel_batchbuffer_emit_mi_flush() so that on Gen6+, it sets the
PIPE_CONTROL_CS_STALL bit (this forces the pipeline to actually
flush). (Case (b) will be addressed by the next patch in the series).
This is not an ideal solution--in a perfect world, the driver would
have some buffer dependency tracking so that we would only have to
flush the pipeline in the two cases above. Until that dependency
tracking is implemented, however, it seems prudent to have
intel_batchbuffer_emit_mi_flush() actually flush the pipeline, so that
we get correct rendering, at the expense of a (hopefully small)
performance hit.
The change is only applied to Gen6+, since at the moment only Gen6+
supports the OpenGL 3.0 features that make a full pipeline flush
necessary.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This patch advertises support for EXT_transform_feedback on Intel
Gen6.
Since transform feedback support is not completely finished yet, for
now we only advertise support for it when MESA_GL_VERSION_OVERRIDE is
3.0 or greater (since transform feedback is required by GL version
3.0).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch adds basic transform feedback capability for Gen6 hardware.
This consists of several related pieces of functionality:
(1) In gen6_sol.c, we set up binding table entries for use by
transform feedback. We use one binding table entry per transform
feedback varying (this allows us to avoid doing pointer arithmetic in
the shader, since we can set up the binding table entries with the
appropriate offsets and surface pitches to place each varying at the
correct address).
(2) In brw_context.c, we advertise the hardware capabilities, which
are as follows:
MAX_TRANSFORM_FEEDBACK_INTERLEAVED_COMPONENTS 64
MAX_TRANSFORM_FEEDBACK_SEPARATE_ATTRIBS 4
MAX_TRANSFORM_FEEDBACK_SEPARATE_COMPONENTS 16
OpenGL 3.0 requires these values to be at least 64, 4, and 4,
respectively. The reason we advertise a larger value than required
for MAX_TRANSFORM_FEEDBACK_SEPARATE_COMPONENTS is that we have already
set aside 64 binding table entries, so we might as well make them all
available in both separate attribs and interleaved modes.
(3) We set aside a single SVBI ("streamed vertex buffer index") for
use by transform feedback. The hardware supports four independent
SVBI's, but we only need one, since vertices are added to all
transform feedback buffers at the same rate. Note: at the moment this
index is reset to 0 only when the driver is initialized. It needs to
be reset to 0 whenever BeginTransformFeedback() is called, and
otherwise preserved.
(4) In brw_gs_emit.c and brw_gs.c, we modify the geometry shader
program to output transform feedback data as a side effect.
(5) In gen6_gs_state.c, we configure the geometry shader stage to
handle the SVBI pointer correctly.
Note: ordering of vertices is not yet correct for triangle strips
(alternate triangles are improperly oriented). This will be addressed
in a future patch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This patch stores the geometry shader VUE map from a local variable in
compile_gs_prog() to a field in the brw_gs_compile struct, so that it
will be available while compiling the geometry shader. This is
necessary in order to support transform feedback on Gen6, because the
Gen6 geometry shader code that supports transform feedback needs to be
able to inspect the VUE map in order to find the correct vertex data
to output.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The Sandy Bridge PRM, volume 4, part 2, section 5.3.10 ("5.3.10
Register Region Restrictions") contains the following restriction on
the execution size and operand width of instructions:
"3. ExecSize must be equal to or greater than Width."
When emitting an IF instruction in single program flow mode on Gen6+,
we use an ExecSize of 1, therefore the Width of each operand must also
be 1.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In _mesa_BindBufferRange(), we need to verify that the offset and size
specified by the client do not exceed the size of the underlying
buffer. We were accidentally doing this check using ">=" rather than
">", so we were generating a bogus error if the client specified an
offset and size that fit exactly in the underlying buffer.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch adds two new fields to the gl_transform_feedback_info
struct:
- BufferStride records the total number of components (per vertex)
that transform feedback is being instructed to store in each buffer.
- Outputs[i].DstOffset records the offset within the interleaved
structure of each transform feedback output.
These values are needed by the i965 gen6 and r600g back-ends, so it
seems better to have the linker provide them rather than force each
back-end to compute them independently.
Also, DstOffset helps pave the way for supporting
ARB_transform_feedback3, which allows the transform feedback output to
contain holes between attributes by specifying
gl_SkipComponents{1,2,3,4} as the varying name.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Mapping to software and uploading again clearing is killing performance.
Signed-off-by: Maarten Lankhorst <m.b.lankhorst@gmail.com>
Signed-off-by: Christian König <deathsimple@vodafone.de>
Valgrind complains about a definitely lost block allocated in
intelNewTextureImage(). This leak was apparently created by
6e0f9001fe, "mesa: move
gl_texture_image::Data, RowStride, ImageOffsets to swrast", as it
removes the free() from _mesa_delete_texture_image().
Put the free() back, fixes a Valgrind error.
Signed-off-by: Pekka Paalanen <ppaalanen@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
There is no point in having them when we distribute eglext.h.
As for unofficial extensions, there is a chance that we might remove some of
them evetually. Keeping the #ifdef's for now should make that easier.
Update to revision 15052.
EGL_MESA_drm_image is now official. But apparently we have our own extension
to it and we need this in eglmesaext.h:
#ifdef EGL_MESA_drm_image
/* Mesa's extension to EGL_MESA_drm_image... */
#ifndef EGL_DRM_BUFFER_USE_CURSOR_MESA
#define EGL_DRM_BUFFER_USE_CURSOR_MESA 0x0004
#endif
#endif
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously, we advertised 0 VS texture units. Now that we have proper
support for using the sampling engine in the VS, we can advertise 16,
which is conveniently the number required for OpenGL 3.0.
v2: Enable on Gen4. I hacked up my tests to not use flat ivec varyings
and they pass.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This should avoid state-dependent FS recompiles when samplers that are
only used by the VS change.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The idea is to reuse this for the VS and (in the future) GS as well.
v2: Include yuvtex data since we're not dropping GL_MESA_ycbycr.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net> [v1]
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The visit() half computes the values to put in the header based on the
IR and simply stuffs that in the vec4_instruction; the emit() half uses
this to set up the message header. This works out well since emit() can
use brw_reg directly and access individual DWords without kludgery.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We'll want to reuse this for the VS, and it's complex enough that I'd
rather not cut and paste it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This translates the GLSL compiler's IR into vec4_instruction IR,
generating code to load coordinates, LOD info, shadow comparitors, and
so on into the appropriate message registers.
It turns out that the SIMD4x2 parameters are identical on Gen 5-7, and
the Gen4 code is similar enough that, unlike in the FS, it's easy enough
to support all generations in a single function.
v2: Load zeros for missing coordinates (fixing vs-texelFetch-sampler1D
and 2D on G45), and fix G45 message length for shadow comparisons.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
This is the part that takes the vec4_instruction IR and turns it into
actual Gen ISA.
v2: Add Gen4 messages, don't retype m0 to UW.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Prior to Ironlake, cube maps were stored as 3D textures. In recent
refactoring, we removed a separate "layers" parameter in favor of using
depth. Unfortunately, depth was getting minified, which is only correct
for actual 3D textures.
Fixes piglit tests:
- bugs/crash-cubemap-order
- fbo/fbo-cubemap
- texturing/cubemap
Also changes texturing/cubemap npot from abort to fail.
This hasn't seen a full test run since Piglit on Mesa master hangs
GM45 a lot.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
All of the extensions require that both libGL and either the server or
the direct rendering driver (or both) enable the extension before it's
advertised. It seems safe to assume that none of the other components
on OS X will enable these extensions, so all the #ifdef blocks here
just clutter the code.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: Jeremy Huddleston <jeremyhu@apple.com>
There are a few unsupported extensions (e.g., the ATI and NV float
extensions) that are still in the list. There is some small chance
that these may be supported some day.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
__glXInitialize calls AllocAndFetchScreenConfigs.
AllocAndFetchScreenConfigs unconditionally sends a glXQuerySeverString
request to the server. This request is only supported with GLX 1.1 or
later, so we were already implicitly incompatible with GLX 1.0
servers. How many more similar bugs lurk in the code that nobody has
noticed in years?
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously the share_xid was only set in the glXImportContextEXT path,
and it was left set to None in all of the other create-context paths.
Fixes the piglit test glx-query-context-info-ext.
NOTE: This is a candidate for the 7.11 branch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Send the DestroyContext protocol immediately when glXDestroyContext is
called, and never call it when glXFreeContextEXT is called. In both
cases, either destroy the client-side structures or, if the context is
current, set xid to None so that the client-side structures will be
destroyed later.
I believe this restores the behavior of the original SGI code. See
src/glx/x11 around commit 5df82c8. The spec doesn't say anything
about glXDestroyContext not really destroying imported contexts (it
acts like glXFreeContextEXT instead), but that's what the original
code did. Note that glXFreeContextEXT on a non-imported context does
not destroy it either.
Fixes the piglit test glx-free-context.
NOTE: This is a candidate for the 7.11 branch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Fixes the piglit test glx-get-context-id.
NOTE: This is a candidate for the 7.11 branch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
The primary problem was that the number of reply bytes read is clamped
to sizeof(propList), but the loop that processes the properties tries
to examine all of the properties sent by the server. If the server
sends 47,000 properties, we only read 3 but process all 47,000.
NOTE: This is a candidate for the 7.11 branch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Each of the DRI, DRI2, and DRISW backends contain code like the
following in their create-context routine:
if (shareList) {
pcp_shared = (struct dri2_context *) shareList;
shared = pcp_shared->driContext;
}
This assumes that the glx_context *shareList is actually the correct
derived type. However, if shareList was created as an
indirect-rendering context, it will not be the expected type. As a
result, shared will contain garbage. This garbage will be passed to
the driver, and the driver will probably segfault. This can be
observed with the following GLX code:
ctx0 = glXCreateContext(dpy, visinfo, NULL, False);
ctx1 = glXCreateContext(dpy, visinfo, ctx0, True);
Create-context is the only case where this occurs. All other cases
where a context is passed to the backend, it is the 'this' pointer
(i.e., we got to the backend by call something from ctx->vtable).
To work around this, check that the shareList->vtable->destroy method
is the same as the destroy method of the expected type. We could also
check that shareList->vtable matches the vtable or by adding a "tag"
to glx_context to identify the derived type.
NOTE: This is a candidate for the 7.11 branch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This is not exposed generally yet because some of the swrast paths hit
in piglit (drawpixels, copypixels, blit) aren't yet converted to
MapRenderbuffer.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is a little more unusual than the separate MESA_FORMAT_S8_Z24
support, because in addition to storing the real stencil data in a
MESA_FORMAT_S8 miptree, we also make the Z miptree be
MESA_FORMAT_Z32_FLOAT instead of the requested format.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
With separate stencil GL_DEPTH32F_STENCIL8, the miptree will have a
really different format (MESA_FORMAT_Z32_FLOAT) from the teximage
(MESA_FORMAT_Z32_FLOAT_X24S8).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The format handling here is tricky, because we're not actually
generating a Z32_FLOAT_X24S8 miptree, so we're guessing the format
that GL wants based on seeing Z32_FLOAT with a separate stencil.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
All the operations were just trying to get at irb->wrapped_depth->mt,
which is the same as irb->mt now.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
gen7 only supports the non-packed formats, even if you associate a
real separate stencil buffer -- otherwise it's as if the depth test
always fails.
This requires a little bit of care in the match_texture_image case,
since the miptree format no longer matches the texture image format.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This little bit of logic was duplicated, which isn't much, but I was
going to need to duplicate a bit of additional logic in the next
commit.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
There were only two places it was really used at this point, which was
in the batchbuffer emit of the separate stencil packets for gen6/7.
Just write in the ->stencil_mt reference in those two places and ditch
all this flailing around with allocation and refcounts.
v2: Fix separate stencil on gen7.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
this mentions which channels are used for slice and depth comparison values.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The 4th texcoord is used in this case for the comparison.
This fixes piglit glsl-fs-shadow2DArray* on softpipe.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The code didn't handle the case where front wasn't specified in the vertex
shader outputs, but back was.
In that case we were doing a copy from back to non-existant front,
this code checks we have existant front/backs and only does the copy when
they both exist.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Sets rgba layer as zeroth layer if a custom background_surface is specified.
Signed-off-by: Maarten Lankhorst <m.b.lankhorst@gmail.com>
Signed-off-by: Christian König <deathsimple@vodafone.de>
It's harmless to add support for attributes we don't support,
since they require a feature enabled for them to affect
something. As long as they aren't enabled, nothing happens.
This enables support for custom colorspaces and background colors.
Signed-off-by: Maarten Lankhorst <m.b.lankhorst@gmail.com>
Signed-off-by: Christian König <deathsimple@vodafone.de>
Currently only validating, since nothing else can be done with it yet
Signed-off-by: Maarten Lankhorst <m.b.lankhorst@gmail.com>
v2: removed check_video_surface
Signed-off-by: Christian König <deathsimple@vodafone.de>
This sample compare was always doing linear, and this makes the
glsl-fs-shadow1DArray test render like the Intel driver.
fix wrong 0->j from initial patch
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This is the first part of a fix to piglit glsl-fs-shadow1DArray
also fix the passing of unused r[2] in the normal 1D case.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The piglit draw-pixel-with-texture was asserting in the glsl->tgsi code,
due to 0 texture target, this makes sure the texture target is copied over
correctly when we copy instructions around.
v2: drive-by fix bitmap on the way past.
This avoids the assertion, have to contemplate fixing things as per the spec
later.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This will be especially useful for loading texturing parameters, where I
need to (for example) reference m3.xz<D>.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Copy and pasted from fs_inst::is_tex(), but without TXB.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
We'll be reusing most of these for the VS shortly. The one exception is
TXB (texturing with LOD bias), which is explicitly forbidden in the VS.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Fixes a regression since d2235b0f46,
in my new textureSize sampler(1DArrayShadow|2DShadow|2DArrayShadow)
piglit tests, though I'm not honestly sure how this ever worked.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Noticed a "warning: array subscript is above array bounds" given at one of
the existing sanity-check asserts. Turns out all the arrays of strings
haven't matched the corresponding enum values in a while, if ever.
I didn't know the proper names for any of these and couldn't find
them in the base specs aside from "result.pointsize" in
ARB_vertex_program, so I just filled in the enum's value
as was done with other slots.
Also add four STATIC_ASSERT()s to be sure and catch future additions
or bumps to MAX_VARYING/etc again, and some more non-static asserts
where there weren't any before.
(Note, the fragment enum that corresponded to result.color(half) was removed in
8d475822e6e19fa79719c856a2db5b6a205db1b9.)
Reviewed-by: Brian Paul <brianp@vmware.com>
llvm-3.1svn r145714 moved global variables into a new TargetOptions
class. TargetMachine constructor now needs a TargetOptions object as
well.
Signed-off-by: Vinson Lee <vlee@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This helper function is used during mipmap generation to prepare space
for the destination mipmap levels.
This improves/fixes two things:
1. If the texture object was created with glTexStorage2D, calling
_mesa_TexImage2D() to allocate the new image would generate
INVALID_OPERATION since the texture is marked as immutable.
2. _mesa_TexImage2D() always frees any existing texture image memory
before allocating new memory. That's inefficient if the existing
image is the right size already.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Namely:
- EXT_transform_feedback
- ARB_transform_feedback2
- ARB_transform_feedback_instanced
The old interface was not useful for OpenGL and had to be reworked.
This interface was originally designed for OpenGL, but additional
changes have been made in order to make st/d3d1x support easier.
The most notable change is the stream-out info must be linked
with a vertex or geometry shader and cannot be set independently.
This is due to limitations of existing hardware (special shader
instructions must be used to write into stream-out buffers),
and it's also how OpenGL works (stream outputs must be specified
prior to linking shaders).
Other than that, each stream output buffer has a "view" into it that
internally maintains the number of bytes which have been written
into it. (one buffer can be bound in several different transform
feedback objects in OpenGL, so we must be able to have several views
around) The set_stream_output_targets function contains a parameter
saying whether new data should be appended or not.
Also, the view can optionally be used to provide the vertex
count for draw_vbo. Note that the count is supposed to be stored
in device memory and the CPU never gets to know its value.
OpenGL way | Gallium way
------------------------------------
BeginTF = set_so_targets(append_bitmask = 0)
PauseTF = set_so_targets(num_targets = 0)
ResumeTF = set_so_targets(append_bitmask = ~0)
EndTF = set_so_targets(num_targets = 0)
DrawTF = use pipe_draw_info::count_from_stream_output
v2: * removed the reset_stream_output_targets function
* added a parameter append_bitmask to set_stream_output_targets,
each bit specifies whether new data should be appended to each
buffer or not.
v3: * added PIPE_CAP_STREAM_OUTPUT_PAUSE_RESUME for ARB_tfb2,
note that the draw-auto subset is always required (for d3d10),
only the pause/resume functionality is limited if the CAP is not
advertised
v4: * update gallium/docs
v5: * compactified struct pipe_stream_output_info, updated dump/trace
It's like DrawArrays, but the count is taken from a transform feedback
object.
This removes DrawTransformFeedback from dd_function_table and adds the same
function to GLvertexformat (with the function parameters matching GL).
The vbo_draw_func callback has a new parameter
"struct gl_transform_feedback_object *tfb_vertcount".
The rest of the code just validates states and forwards the transform
feedback object into vbo_draw_func.
Xa doesn't support it yet. Trying to do that would cause a segfault.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
When doing format conversion copies between a format without an
alpha channel and a format with an alpha channel, make sure the
destination alpha is set to 1.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
Backends indicate that they support this extension by returning
EGL_TRUE when native_display::get_param() is called with
NATIVE_PARAM_PRESENT_REGION and NATIVE_PARAM_PRESERVE_BUFFER.
native_present_control is extended to include the region that should
be presented. When native_present_control::num_rects is zero,
the whole surface is to be presented.
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
The comment said they deserved to be in emit_depthbuffer, and at this
point they were all there already.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now that we have miptrees for everything, we can more easily test for
!has_separate_stencil completeness. Also, test for whether the
stencil rb is the wrong kind of format for separate stencil, or if we
are trying to do packed to different images of a single miptree.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now there's the thing that CALLOCs and sets up window system vtable,
and the thing that CALLOCs and sets up user renderbuffer vtable. The
user renderbuffer vtable gets replaced later by
intel_renderbuffer_update_wrapper for wrapped renderbuffers (things
with name == ~0).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We were doing it in the caller in the renderbuffer code, but it was
missed in the separate stencil creation for textures. Apparently our
testing was using renderbuffers or pre-aligned sizes.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This used to be needed because irb->mt would be unset for fake packed
depth/stencil, but no longer.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The cool part was that in the "fbo-depthstencil -drawpixels
GL_DEPTH24_STENCIL8 32F_24_8_REV" testcase, the shifting happened to
end up with a value awfully close to the expected value, except for
every other pixel being 0 (the stencil value, shifted away to
nothing).
Reviewed-by: Brian Paul <brianp@vmware.com>
vlVdpPresentationQueueDisplay shouldn't scale, so
use size of destination surface as source rectangle.
Based on work of Maarten Lankhorst <m.b.lankhorst@gmail.com>
Signed-off-by: Christian König <deathsimple@vodafone.de>
Correctly use destination_rect and destination_video_rect
in the mixer, and also use a dirty area tracking for output surfaces.
Based on work of Maarten Lankhorst <m.b.lankhorst@gmail.com>
Signed-off-by: Christian König <deathsimple@vodafone.de>
Take viewport and scissors into account and make
the dirty area a parameter instead of a member.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Invalid shaders containing the character % at an unexpected location
would cause Bison to call yyerror with a message of:
syntax error, unexpected '%'
Bison expects yyerror() to take a string, while _mesa_glsl_error() is a
printf-style function. This hit the classic printf string escape issue:
_mesa_glsl_error(loc, state, "unexpected '%'"); // invalid!
_mesa_glsl_error(loc, state, "%s", "unexpected '%'"); // correct.
This caused assertion failures after ralloc_asprintf_append called
vsnprintf to determine the length of the text that would be printed:
vsnprintf would see the invalid format and return -1, an invalid length.
The solution is to define a proper yyerror() wrapper function that calls
_mesa_glsl_error with the "%s". Since we compile with -p "_mesa_glsl",
yyerror is defined as:
#define yyerror _mesa_glsl_error
So we have to #undef yyerror in order to be able to declare it.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=43564
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Paul Berry <stereotype441@gmail.com>
The versions in the xserver and in libGL have diverged enough that the
xserver doesn't want these.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
glext.h doesn't have GL_MIN_PROGRAM_TEXEL_OFFSET_EXT or
GL_MAX_PROGRAM_TEXEL_OFFSET_EXT. Using them in the XML causes code to
be generated for the xserver that won't compile. Use the names that
exist instead.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
That file was removed from the xserver with commit:
commit a80780a7638f847c3be20e5e0c7fe85e83d9bdd1
Author: Adam Jackson <ajax@redhat.com>
Date: Wed Nov 17 09:03:06 2010 -0500
glx: Remove swap barrier and hyperpipe support
Never implemented in any open source driver. The implementation
assumed explicit DDX driver knowledge of how the client-side driver
worked, since at the time the server's GL renderer was not a DRI driver.
But now, it is, so any implementation of these should be done with
additional DRI driver API, like the swap control extension.
Reviewed-by: Julien Cristau <jcristau@debian.org>
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Signed-off-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
This is only temporary until a better solution is available.
v2: print warnings and add gallium CAPs
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Since gl_framebuffer::_DepthBuffer and _StencilBuffer are only used
by swrast, do the validation of those fields in swrast too.
The main/depthstencil.[ch] code is no longer used and will be removed
next.
Reviewed-by: Eric Anholt <eric@anholt.net>
These files are copies of main/depthstencil.[ch] with s/mesa/swrast/.
The main/depthstencil.[ch] will go away soon.
Reviewed-by: Eric Anholt <eric@anholt.net>
These functions update the gl_framebuffer::_DepthBuffer and _StencilBuffer
fields, possibly creating renderbuffer wrappers that make a shared
depth+stencil accessible as depth-only or stencil only.
This stuff is only used by swrast now and will be moved there next.
Reviewed-by: Eric Anholt <eric@anholt.net>
We're just looking at the depth/stencil renderbuffers to do error
checking. We don't need to look at the depth/stencil wrappers to do
that. Also, remove pointless readRb = depthRb = NULL assignments.
Reviewed-by: Eric Anholt <eric@anholt.net>
We never want to use the depth/stencil buffer wrappers so always just
use the attachment renderbuffers. This is a step toward removing the
_DepthBuffer, _StencilBuffer fields.
Reviewed-by: Eric Anholt <eric@anholt.net>
GLfloat doesn't have enough precision to exactly represent 0xffffff
and 0xffffffff. (and a reciprocal of those, if I am not mistaken)
If -ffast-math is enabled, using GLfloat causes assertion failures in:
- fbo-blit-d24s8
- fbo-depth-sample-compare
- fbo-readpixels-depth-formats
- glean/depthStencil
For example:
fbo-depth-sample-compare: main/format_unpack.c:1769:
unpack_float_z_Z24_X8: Assertion `dst[i] <= 1.0F' failed.
Reviewed-by: Brian Paul <brianp@vmware.com>
fixes the following build error since
c83fb4d45f:
/usr/include/strings.h:46:13: error: expected declaration specifiers or
‘...’ before numeric constant
/usr/include/strings.h:46:13: error: conflicting types for ‘memset’
In file included from
../../../../src/gallium/winsys/g3dvl/xlib/xsp_winsys.c:34:0:
../../../../src/gallium/auxiliary/util/u_inlines.h: In function
‘pipe_buffer_create’:
../../../../src/gallium/auxiliary/util/u_inlines.h:189:4: error: too
many arguments to function ‘memset’
/usr/include/strings.h:46:13: note: declared here
bzero is defined in X11 as: #define bzero(b,len) memset(b,0,len)
including strings.h after the X11 header results in preprocessor
replacing 'bzero' in strings.h and generating unbuildable code.
Signed-off-by: Tobias Droste <tdroste@gmx.de>
This fixes the segfault, and seems to put this closer to where other
properties are being set. Hopefully it still conforms.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This just adds the correct checks and asserts in the right places. This doesn't
fix all the tests that I've sent to piglit, need to add int paths to go alongside the uint paths that don't go via float to fix it up properly.
I'm not sure how much of that could be templated/shared will have a look
once I write it the long way.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
It sets the wrong values (GL_XXX_LEFT instead of GL_XXX), and no other
Mesa driver does this, given that Mesa sets the right draw/read buffers
provided the Mesa visual has the doublebuffer flag filled correctly
which is the case.
Reviewed-by: Brian Paul <brianp@vmware.com>
This avoids forming invalid pointers needlessly, which even if
never dereferenced is undefined behavior. It also makes
_mesa_validate_pbo_access() more comprehensible.
Reviewed-by: Brian Paul <brianp@vmware.com>
NULL as an error indicator is meaningless, since it will return NULL
on success anyway if the caller passes in zero as the image's address
and asks to calculate the offset of the first pixel. For example,
_mesa_validate_pbo_access() does this.
This also matches the code in the non-GL_BITMAP codepath, which
already has an assert like this.
v2: Per Brian Paul's review, remove the function call entirely
and tighten the assert to only accept the two formats compatible with
GL_BITMAP. They always have one component per pixel.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The bug, reported to me by Vadim Girlin on IRC, was causing overzealous
elimination of code in parallel if statements such as the following:
if (x) {
r = false;
}
if (y) {
r = true;
}
Before this commit, the assignment inside the first if block would be
misdetected as dead code and removed.
Number of fragment shader variants is not very representative of the
memory used by LLVM, neither is number of shader instructions, as often
texture sampling constitutes most of the generated code.
This change adds an additional trim criteria: least recently used
fragment shader variants will be freed until the total number of LLVM IR
instruction falls below a specified threshold.
Reviewed-by: Brian Paul <brianp@vmware.com>
u_simple_list.h uses a sentinel element, and not a NULL element. So
ensure list is not empty when reducing the list of shader variants.
Something I noticed while trying to free variants more aggressively.
Reviewed-by: Brian Paul <brianp@vmware.com>
In a few places we need to allocate space for some number of generic
pixels. Use this new define instead of a magic number like 16 or
4 * sizeof(GLuint).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Copying these files is the first step in moving the software buffer
code from main/renderbuffer.c to swrast/s_renderbuffer.c
Reviewed-by: Eric Anholt <eric@anholt.net>
Implemented in terms of renderbuffer mapping/unmapping and format
packing/unpacking functions.
The swrast and state tracker code for implementing accumulation are
unused and will be removed in the next commit.
v2: don't use memcpy() in _mesa_clear_accum_buffer()
v3: don't allocate MAX_WIDTH arrays, be more careful with mapping flags
Reviewed-by: Eric Anholt <eric@anholt.net>
Change and document the interpretation of the color conversion matrix
in order to make the function more versatile and to simplify the
generated shader.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
In Gen6, transform feedback is accomplished by having the geometry
shader send vertex data to the data port using "Streamed Vertex Buffer
Write" messages, while simultaneously passing vertices through to the
rest of the graphics pipeline (if rendering is enabled).
This patch adds a geometry shader program that simply passes vertices
through to the rest of the graphics pipeline. The rest of transform
feedback functionality will be added in future patches.
To make the new geometry shader easier to test, I've added an
environment variable "INTEL_FORCE_GS". If this environment variable
is enabled, then the pass-through geometry shader will always be used,
regardless of whether transform feedback is in effect.
On my Sandy Bridge laptop, I'm able to enable INTEL_FORCE_GS with no
Piglit regressions.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Eric Anholt <eric@anholt.net>
R02_PRIM_END and R02_PRIM_START don't actually refer to bits in DWORD
2 of R0 (as the name, and comments in the code, would seem to
indicate). Actually they refer to bits in DWORD 2 of the header for
URB_WRITE messages.
This patch renames the defines to reflect what they actually mean. It
also addes a define URB_WRITE_PRIM_TYPE_SHIFT, which previously was
just hardcoded in .c files.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Prior to this patch, in the Gen4 and Gen5 GS, we used GRF 0 (called
"R0" in the code) as a staging area to prepare the message header for
the FF_SYNC and URB_WRITE messages. This cleverly avoided an
unnecessary MOV operation (since the initial value of GRF 0 contains
data that needs to be included in the message header), but it made the
code confusing, since GRF 0 could no longer be relied upon to contain
its initial value once the GS started preparing its first message.
This patch avoids confusion by using a separate register ("header") as
the staging area, at the cost of one MOV instruction.
Worse yet, prior to this patch, the GS would completely overwrite the
contents of GRF 0 with the writeback data it received from a completed
FF_SYNC or URB_WRITE message. It did this because DWORD 0 of the
writeback data contains the new URB handle, and that neds to be
included in DWORD 0 of the next URB_WRITE message header. However,
that caused the rest of the message header to be corrupted either with
undefined data or zeros. Astonishingly, this did not produce any
known failures (probably by dumb luck). However, it seems really
dodgy--corrupting FFTID in particular seems likely to cause GPU hangs.
This patch avoids the corruption by storing the writeback data in a
temporary register and then copying just DWORD 0 to the header for the
next message. This costs one extra MOV instruction per message sent,
except for the final message.
Also, this patch moves the logic for overriding DWORD 2 of the header
(which contains PrimType, PrimStart, PrimEnd, and some other data that
we don't care about yet). This logic is now in the function
brw_gs_overwrite_header_dw2() rather than in brw_gs_emit_vue(). This
saves one MOV instruction in brw_gs_quads() and brw_gs_quad_strip(),
and paves the way for the Gen6 GS, which will need more complex logic
to override DWORD 2 of the header.
Finally, the function brw_gs_alloc_regs() contained a benign bug: it
neglected to increment the register counter when allocating space for
the "temp" register. This turned out not to have any effect because
the temp register wasn't used on Gen4 and Gen5, the only hardware
models (so far) to require a GS program. Now, all the registers
allocated by brw_gs_alloc_regs() are actually used, and properly
accounted for.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When the GS is not in use, the entire URB space is available for the
VS. When the GS is in use, we split the URB space 50/50.
The 50/50 split is probably not optimal--we'll probably want tune this
for performance in a future patch. For example, in most situations,
it's probably worth allocating more than 50% of the space to the VS,
since VS space is used for vertex caching. But for now this is good
enough.
Based on previous work by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
We never filled this in before because we didn't care.
I'm skeptical these are correct; my sources indicate that both the VS
and GS # of entries are 256 on both GT1 and GT2.
I'm also loathe to change it and break stuff.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Normally when outputting instructions in SPF (single program flow)
mode, we convert IF and ELSE instructions to conditional ADD
instructions applied to the IP register. On platforms prior to Gen6,
flow control instructions cause an implied thread switch, so this is a
significant savings.
However, according to the SandyBridge PRM (Volume 4 part 2, p79):
[Errata DevSNB{WA}] - When SPF is ON, IP may not be updated by
non-flow control instructions.
So we have to disable this optimization on Gen6.
On later platforms, there is no significant benefit to converting flow
control instructions to ADDs, so for the sake of consistency, this
patch disables the optimization on later platforms too.
The reason we never noticed this problem before is that so far we
haven't needed to use SPF mode on Gen6. However, later patches in
this series will introduce a Gen6 GS program which uses SPF mode.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, GS generation code contained a lookup table that mapped
primitive types POLYGON, TRISTRIP, and TRIFAN to TRILIST, mapped
LINESTRIP to LINELIST, and left all other primitives unchanged. This
was silly, because we never generate a GS program for those primitive
types anyhow.
This patch removes the unnecessary lookup table.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch adds a new bit to the ctx->NewState bitfield,
_NEW_TRANSFORM_FEEDBACK, to track state changes that affect
ctx->TransformFeedback. This bit can be used by driver back-ends to
avoid expensive recomputations when transform feedback state has not
been modified.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
When driCreateScreen calls driConvertConfigs to try to convert the
configs for swrast, it fails and returns NULL. Instead of checking,
it just clobbers psc->base.configs. Then, when the application asks
for the FBconfigs, there aren't any.
Instead, make the caller responsible for freeing the old modes lists
if both calls to driConvertConfigs succeed.
Without the second fix, glxinfo fails unless you run it with
LIBGL_ALWAYS_INDIRECT:
$ glxinfo
name of display: :0.0
Error: couldn't find RGB GLX visual or fbconfig
$ LIBGL_ALWAYS_INDIRECT=1 glxinfo
name of display: :0.0
display: :0 screen: 0
direct rendering: No (LIBGL_ALWAYS_INDIRECT set)
server glx vendor string: NVIDIA Corporation
server glx version string: 1.4
[...]
Signed-off-by: Aaron Plattner <aplattner@nvidia.com>
Reviewed-and-tested-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
This patch fixes the samplerCubeShadow support in GLSL shader compiler.
shader compiler was picking the 'r' texture coordinate for shadow comparison
when the expected behaviour is to use 'q' texture coordinate in case of cube
shadow maps.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Fixes piglit tests fbo-array, fbo-depth-array, fbo-generatemipmap-array,
and array-texture, as well as the array variants of my new textureSize
and texelFetch tests.
Not a candidate for 7.11 because EXT_texture_array wasn't supported.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Fixes many crashes on Ivybridge due to upload_sf_state calling
brw_depthbuffer_format without an actual depth buffer. This was a
recent regression on master.
+3992 piglits on Ivybridge.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This evolved over several commits, and I also wanted to document some
new information about how we handle formats.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Now that all RBs have miptrees, and miptree mapping covered these last
two code paths, consistently use them.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Right now the fake packed d/s RBs are creating two sub-renderbuffers
with their own storage, and the hardware setup and the mapping code
have been explicitly referencing them. By setting miptrees on them,
we'll be able to make our renderbuffer code for fake packed
depth/stencil more consistent with all our other renderbuffers.
The interesting new behavior here is that there is now a mt with a
non-depthstencil format (X8Z24) that has a stencil_mt field
associated. This looks like it should be safe, and we'll need to be
able to do this for floating point depth/stencil as well.
Before, we had an uncached read of S8 to untile, then a RMW (so
uncached penalty) of the packed S8Z24 to store the value, then the
consumer would uncached read that once per pixel. If data was written
to the map, we would then have to uncached read the written data back
out and do the scatter to the tiled S8 buffer (also uncached access
penalties, since WC couldn't actually combine). So 3 or 5 uncached
accesses per pixel in the ROI (and we we were ignoring the ROI, so it
was the whole image).
Now we get an uncached read of S8 to untile, and an uncached read of
Z. The consumer gets to do cached accesses. Then if data was
written, we do streaming Z writes (WC success), and scattered S8
tiling writes (uncached penalty). So 2 or 3 uncached accesses per
pixel in the ROI.
This should be a performance win, to the extent that anybody is doing
software accesses of packed depth/stencil buffers.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
We don't gripe about void * arithmetic for our driver, and this
prevents silly casting when assigning the result of mapping to
non-byte types.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
We're going to want to reuse this logic in mapping of fake packed
miptrees wrapping separate depth/stencil miptrees.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
This code will be incrementally moving to a model like intel_fbo.c's
renderbuffer mapping with helper functions, as I move that code here.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
This will be used for things like packed depth/stencil temporaries and
making LLC-cached temporary mappings using blits.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
This will let us share teximage mapping logic with renderbuffer
mapping, which has an intel_mipmap_tree but not a gl_texture_image.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
This required is_hiz_depth_format to start returning true on S8_Z24 as
well, since that's the format we have here. The two previous callers
are only calling it on non-depthstencil formats.
This avoids us needing to have HiZ working on a new Z format
immediately upon exposing the format (particularly painful for
Z32_FLOAT_X24S8, which means all the fake packed depth/stencil paths).
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Some hardware can't reinterpret the format of hardware buffers and thus
the X server needs to know the format when the buffer is created.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Michel Daenzer <michel@daenzer.net>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
See intel_vertical_texture_alignment_unit() in intel_tex_layout.c;
certain surface types require setting this to VALIGN_4.
Analogous to commit dd0e46c410 on Gen6.
Fixes piglit test fbo-generatemipmap-formats with the
GL_ARB_depth_texture and GL_EXT_packed_depth_stencil arguments.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
The code forces single program flow to be enabled on Ironlake, or
equivalently, disables multiple program flow. The comment was reversed.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This moves the detiling to the fbo mapping, r200 depth is always tiled,
and we can't detile it with the blitter.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This could have been split up better, but the driver is just broken now,
so bisecting the brokenness is going to be painful no matter what.
This adds renderbuffer mapping/unmapping along with texture image allocation.
It drops all the old texture upload paths, some of which could possible be
reimplemented with the blitter later.
It also redoes the span code paths to use its own set of image mapping handlers,
along with removing the tiling decode paths for the color buffers, since
we now hope to use the blitter for this.
Signed-off-by: Dave Airlie <airlied@redhat.com>
I think there is a missing state update or flush somewhere, and every
so often PP_CNTL goes to the kernel with a texture enabled but no texture.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Previously a zero writemask would result in dst_chan == -1, meaning an
unnecessary MOV with the destination register dictated by undefined
memory contents would be emitted before returning. This caused
intermittent GPU hangs, e.g. with glean/texCombine.
Reviewed-by: Eric Anholt <eric@anholt.net>
Anything of less than (bw, bh) size is possible when you consider
rectangular textures, and this code is (now) safe for those. Even for
power-of-two textures, width could be 4 for FXT1 while not being
aligned to block size.
Fixes piglit compressedteximage GL_COMPRESSED_RGB_FXT1_3DFX
Reviewed-by: Brian Paul <brianp@vmware.com>
Generally this code works with width and height aligned to compressed
blocks, but at the 2x2 and 1x1 levels of a square texture (or height <
bh in general), we were skipping uploading our single row of blocks.
Fixes piglit compressedteximage GL_COMPRESSED_RGBA_S3TC_DXT5_EXT.
Reviewed-by: Brian Paul <brianp@vmware.com>
Since the MapTextureImage changes on Intel, nwn had corruption in the
scrollbar at the load game menu, and corrupted ground textures in the
starting zone. Heroes of Newerth's intro screen was also thoroughly
garbled. A new piglit test "compressedteximage" was created to
regression test this.
The issue was this code now seeing dstRowStride aligned to hardware
requirements instead of a temporary buffer that gets uploaded to
hardware later. The existing code was just trying to memcpy
srcRowStride * height / bh, while the glCompressedTexSubImage2D()
storage code nearby did the correct walking by blockheight rows at a
time. Just reuse the subimage upload instead of duplicating that
logic.
v2: Update comment at the top of the function (suggestion by Joel
Forsberg)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=41451
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
We checked if srcType == GL_UNSIGNED_BYTE earlier so there was no
way to reach this code. This was left-over code from the GLchan
removal work.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
If mode is not GL_POINT/LINE/FILL we'll have already reported the
error earlier in the function and returned so we can never get here.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
We shouldn't call _mesa_error() if the target is a proxy texture.
Errors are handled later in the function.
Fixes a Coverity warning.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Attempting to move an MRF to a MRF is not only pointless, it will fail
because MRFs are read-only, resulting in garbage in your register.
If we already set up a MRF source, there's nothing to resolve anyway.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The BITSET64_{TEST,SET,CLEAR}_RANGE macros only work on ranges
wither in the lower 32 or in the upper 32 bits of the bitset.
This change extends these macros to work on arbitrary ranges
possibly crossing the bitset word boundary.
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
The format is defined by GL_OES_compressed_ETC1_RGB8_texture.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Add support for GL_OES_compressed_ETC1_RGB8_texture to core mesa. There is no
driver support yet.
Unlike desktop GL compressed texture formats, GLES compressed texture formats
usually can only be used with glCompressedTexImage2D. All other gl*Tex*Image*
functions are updated to check for that.
Reviewed-by: Brian Paul <brianp@vmware.com>
The format is defined by GL_OES_compressed_ETC1_RGB8_texture. These routines
will be used in the following commit.
Reviewed-by: Brian Paul <brianp@vmware.com>
In swrast_map_renderbuffer negative strides lead to
render buffer map pointers that are off by 2^32.
Make sure that intermediate negative values are not
converted to an unsigned.
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
Fixes these GCC warnings.
u_vbuf.c: In function ‘u_vbuf_draw_begin’:
u_vbuf.c:839:20: warning: ‘max_index’ may be used uninitialized in this function [-Wuninitialized]
u_vbuf.c:838:20: warning: ‘min_index’ may be used uninitialized in this function [-Wuninitialized]
Signed-off-by: Vinson Lee <vlee@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
We weren't doing the necessary byte swap.
v2: use same arithmetic as unpack_ARGB1555() to be consistent.
Reviewed-by: Michel Dänzer <michel@daenzer.net>
In the refactor for handling user-defined out params, we failed to set
up the new color output tracking when there was no color drawbuffer in
place but alpha testing was on. Just always set up at least one when
handling gl_FragColor, since we won't make use of its value unless we
need to.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=42806
In 6d874d0ee1, I checked whether a
register that had been stored was BAD_FILE (as opposed to a legitimate
GRF), but actually the unset register was ARF NULL because it had been
memset to 0. Finding BAD_FILE for unset values in debugging was my
intention with that file, so make it the case more often by
rearranging the enum. There was only one place we relied on the magic
enum register_file to hardware register file correspondance anyway.
It is useful to have this option for shader-db, and it was also good
at the time where we were rejecting shaders due to various internal
limits we hadn't supported yet. However, at this point the precompile
step takes extra time (since not all NOS is known at link time) and
spews misleading debug in the common case of debugging a real app.
This is left in place for VS, where we still have a couple of codegen
failure paths that result in link failure through precompile. Those
need to be fixed.
shader-db can still get at the debug info it wants using
"shader_precompile=true" driconf option. Long term, we can probably
build a good-enough app for shader-db to trigger real codegen.
When new MESA_FORMAT_x enums are added we need to add a new entry in
the table of texture fetch functions. In the past this has been
missed if swrast isn't actually tested. Using a static assertion
should help with that.
This can be used to check that tables have the right number of entries,
etc. at compile-time. This will hopefully catch things that are missed
if particular drivers aren't tested, for example.
v2: Simplify the macro to omit the extra line number info (the compiler
already indicates the line number). And wrap the macro for readability.
There was only one consumer of this API, meta.c, which was intending
to ask "is this format just stencil index (and nothing else)?".
Instead, if one tried to glDrawPixels of GL_DEPTH_STENCIL-type
formats, it would just try to draw the stencil parts. Nothing good
came of this.
This function looks rather silly at this point, but I'm leaving it in
place to be the obvious parallel API to _mesa_is_depth_format(). Note
that if you want the old behavior, you should use it as
(_mesa_is_stencil_format() || _mesa_is_depthstencil_format()) like is
commonly done for depth-related tests.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Asking for the datatype of MESA_FORMAT_Z32_FLOAT_X24S8 is a bit funny
-- there's a float depth channel, and a stencil channel that doesn't
have a particular GLenum associated with its type, so what's the
correct response?
Because there is no query for stencil, just make this format's
datatype be that of the depth channel. It fixes the depth query (and
thus a failure in piglit gl-3.0-required-sized-formats), and none of
the other consumers of the _mesa_get_format_datatype() API care.
v2: Add a comment for why the DataType is this way for this format.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We were already doing it through the shader (layered underneath
GL_EXT_texture_swizzle) in the shadow compare case. This avoids
having per-format logic for switching out the surface format dependent
on the depth mode.
v2: Also do the swizzling for DEPTH_STENCIL. oops.
Reviewed-by: Ian Romanick <idr@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I tripped over this bug in the next commit, relying on our
EXT_texture_swizzle to do some shadow sampler-related swizzling. If a
writemask was masking out a channel of the destination that was a live
channel of the texture swizzle, it would read undefined values.
Fixes piglit ARB_fragment_program_shadow/masked.
Reviewed-by: Ian Romanick <idr@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This will make handling new formats (like actually exposing Z32F)
easier and more reliable.
v2: Remove the check for hiz buffer -- the MESA_FORMAT should really
be giving us the value we want even for hiz.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Complicates Gallium3D development and doesn't seem to have active users.
Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Signed-off-by: José Fonseca <jfonseca@vmware.com>
For the non-separate-stencil-only case, we've been using a NULL
surface for depth, so we didn't have to care. However, to support
separate stencil with no depthbuffer, we have to make the depth
surface non-NULL or the stencil test always fails thanks to separate
stencil inheriting the surface type of depth.
Fixes hiz-depth-stencil-test-d0-s8.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Page 77 (page 91 of the PDF) says about glGetActiveAttrib:
"The returned attribute name can be the name of a generic
attribute or a conventional attribute (which begin with the prefix
"gl_", see the OpenGL Shading Language specification for a
complete list)."
Page 261 (page 275 of the PDF) says about glGetProgramiv:
"If pname is ACTIVE_ATTRIBUTES, the number of active attributes in
program is returned."
It doesn't say anything about built-in vs. user-defined attributes.
From the language around glGetActiveAttrib and the lack of an
exclusion of built-in attributes, which exists other places (e.g.,
around glBindAttribLocation), we can infer that GL_ACTIVE_ATTRIBUTES
should include the active attribute count. It should also be included
in the values returned by glGetActiveAttrib.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=43138
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Yi Sun <yi.sun@intel.com>
To each switch statement in s_texfilter.c, add a break statement to the
default case.
Eliminates the Eclipse static analysis warning: No break at the end of
this case.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Replace the distinct struct gl_client_array members in gl_array_object by
an array of gl_client_arrays indexed by VERT_ATTRIB_*.
Renumber the vertex attributes slightly to keep the old semantics of the
distinct array members. Make use of the upper 32 bits in VERT_BIT_*.
Update all occurances of the distinct struct members with the array
equivalents.
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
Reviewed-by: Eric Anholt <eric@anholt.net>
Make gl_program::InputsRead a 64 bits bitfield.
Adapt the intel and radeon driver to handle a 64 bits
InputsRead value.
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
Reviewed-by: Eric Anholt <eric@anholt.net>
Introduce a set of defines for VERT_ATTRIB_* and VERT_BIT_*
that will be used in the followup patches.
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
Reviewed-by: Eric Anholt <eric@anholt.net>
According opengl spec 4.2.pdf table 6.12 (Vertex Array Object State) at
page 515, the element buffer object is listed in vertex array object.
So, move the ElementArrayBufferObj inside gl_array_object to make
element buffer object per-vao.
This would fix most of(3 left) intel oglc vao test fail
NOTE: this is a candidate for the 7.11 branch.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The position of the red and green bits was misstated in the comments.
Arguably, the names of these formats should be changed to "GR" to reflect
the component ordering and to be consistent with other formats.
And warn loudly in case people want to use it. Too many tester report
gpu hangs on irc and we rootcause this ...
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
If alpha test is enabled and there's no color buffers we still need the
fragment shader to emit a color.
v2: add _NEW_COLOR flag in _mesa_update_state_locked()
Fixes piglit fbo-alphatest-nocolor-ff failures with Gallium drivers.
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Eric Anholt <eric@anholt.net> (i965)
The source array elements are 8-bytes (float + uint) so we need
to multiply the src index by 2 to get the right array stride.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This format is used in the ARB_texture_rgb10_a2ui spec.
It adds core mesa support, texformat + texstore support, format_unpack
and fbobject.c (all patches from list merged + fixed up).
also fixes some whitespace issues.
Parts were:
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
These codepaths were missing the cases for BGR_INTEGER/BGRA_INTEGER.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
After reading ARB_texture_rgb10_a2ui it appears the packed formats
for integer types are only specified via this extension, and not via
the original ones. So condition the checks on this.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
MESA_FORMAT_RGBX8888_REV is one of the opaque pixel formats used on Android.
Thanks to texture-from-pixmap, drivers may actually see texture images with
this format on Android.
MESA_FORMAT_RGBX8888 is added only for completeness.
Reviewed-by: Brian Paul <brianp@vmware.com>
[olv: Move the new formats after MESA_FORMAT_ARGB8888_REV in gl_format. I
accidentally moved them to the wrong place when preparing the patch.]
GLX functions are sometimes directly available in the current binary. In such
cases, we do not need any alternate library loaded using dlopen. Otherwise,
dlopen may find the wrong libGL library and get functions that conflicts with
the current loaded ones.
For example, on Debian Sid with nvidia binary drivers, using mesa's libEGL with
GLX driver leads to wrong glXGetFBConfigs symbol loaded (or loaded twice?),
which leads to "GLX: failed to create any config" error message as the
glXGetFBConfigs symbol seems to return garbage. If the binary is linked with
nvidia's libGL, the GLX symbols are already available.
Without this patch, convert_fbconfig (src/egl/drivers/glx/egl_glx.c:233) fails
for every config found, after glXGetFBConfigAttrib(... GLX_RENDER_TYPE, ...)
call, as the value returned has GLX_COLOR_INDEX_BIT and not GLX_RGBA_BIT.
[olv: initialize handle, prepend egl_glx to the commit log]
Also fix up Makefiles to use the default mesa compilation flags.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jakob Bornecrants <jakob@vmware.com>
Also remove some unused variables in the st/xa makefile.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
With ICS (Android 4.0), several headers and structs are renamed. Define
ANDROID_VERSION so that we can choose a different path depending on the
platform version.
I've tested only softpipe and llvmpipe. r600g is also reported to work.
Enable the bit 3DSTATE_DEPTH_BUFFER.Tiled_Surface. From the Sandybridge
PRM, Volume 2, Part 1, Section 7.5.5.1.1 3DSTATE_DEPTH_BUFFER, Bit 1.27
Tiled Surface:
[DevGT+]: This field must be set to TRUE.
Fixes GPU hangs on the following Piglit tests:
hiz-stencil-test-fbo-d0-s8
hiz-stencil-read-fbo-d0-s8
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
v2: Guard against rb->mt being NULL, since we may enter the draw
regions path before intel_prepare_render() has been called to set
them.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com> (v1)
This is a no-op change on gen6, but should result in some
actually-unsupported formats on gen4 no longer being chosen (like
RGBA_FLOAT32 now being RGBA_FLOAT16).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
GL 3.0 specifies GL_RGB10_A2 as a required sized format for rendering
and texturing.
This introduces two piglit regressions: one due to fbo-mipmap-copypix
hitting swrast GetRow (we want to convert swrast to MapRenderbuffer),
and one due to fbo-blending-formats being too picky while leaving
dithering on.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now that all the rest of the driver is driven off of the surface
formats table, all we really need to do is add the mapping from
MESA_FORMAT to BRW_SURFACEFORMAT. However, we also add format
override for I16/L16 render targets at the same time, so that existing
users of I16 that were getting promoted to I32 and then getting the
I32->R32 override still get FBO support.
Fixes failures in piglit gl-3.0-required-sized-texture-formats, and
will prevent regressions in ARB_texture_float on gen4 when moving to
fully table-driven texture format setup.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Until GL 3.0, there isn't any requirement on the actual sizes of
channels chosen. By falling back to 16 here, we can correctly support
ARB_texture_float on original i965 hardware, which can't correctly
filter 32-bit floats.
Not all i965 hardware can do RGB float16, and this will at least save
half the memory and have expected behavior in terms of precision.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This should be a no-op change. The initializers are reordered to
match the ordering of the enum, since there isn't a clearly sensible
ordering, but "the order they were added to the driver, sort of" is
definitely not one.
Also, the unsupported formats are explicitly initialized to 0, so it's
more obvious what we aren't claiming to support.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is currently duplicated with intel_context.c's setup of the
formats table, and sets true for exactly the same set of formats on
gen6.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I've never seen a use for the thread ID value, but knowing the format
being rendered is kind of a big deal.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We are already testing this if appropriate in
intel_validate_framebuffer (FBO completeness), so no need to avoid
attaching the texture to the renderbuffer here.
This causes MESA_FORMAT_R11_G11_B10_FLOAT to now be renderable as a texture
attachment on i965.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We don't want to go writing GetRow/PutRow for every format required by
GL 3.0, when it's very hard to get those functions called, and in
every case we want to make swrast do direct mapping through
MapRenderbuffer anyway.
This causes MESA_FORMAT_R11_G11_B10_FLOAT to be considered complete on gen6.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This moves any chipset-dependent logic we want for render target
format choices to init time as well. There is still logic left at
state update for SRGB handling, where format choices change based on
GL state.
The brw_render_target_supported() function should now return correct
results, instead of relying on the limited results from
intel_span_supports_format() to avoid lying about FBO completeness.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This will be used to drive chosing formats and determining framebuffer
completeness, instead of the bunch of ad-hoc checks we have had until
now.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
The formats.c code's "datatype" value is "what does this value mean",
i.e. unorm or snorm or float, and is the return value from the
GL_TEXTURE_RED_TYPE class of queries. The depth formats were marked
as GL_UNSIGNED_INT, which is what we use for integer, and not what we
should be returning from the glGetTexLevelParameter.
In texstore, we were inappropriately using it as an argument to
_mesa_unpack_depth_span() that was expecting a value like
GL_UNSIGNED_INT or GL_UNSIGNED_SHORT. Just hardcode
_mesa_unpack_depth_span()'s arguments for now, though it looks like
the consumers of that interface would be happier with using
MESA_FORMAT.
Reviewed-by: Brian Paul <brianp@vmware.com>
The GL_TEXTURE_WHATEVER_SIZE entrypoints were checking if the
specified base type of the texture allowed that channel to be present
before reporting the size of the channel, so that GL_RGB didn't end up
with an alpha size if the hardware driver had to store it that way.
The GL_TEXTURE_WHATEVER_TYPE entrypoints weren't checking it, so you
would end up with strange responses from the GL involving 0-bit
floating-point alpha components in GL_RGB32F, even though it says
GL_NONE as expected for other 0-sized channels.
Make _TYPE check _BaseFormat the same as _SIZE, which results in
fixing most of the GL_RGB* testcases of gl-3.0-required-sized-formats
pass on i965.
v2: Add a default case with a warning (suggestion by Brian Paul)
Reviewed-by: Brian Paul <brianp@vmware.com> (v1)
The motivation behind this is to add some self-documentation in the code
about how each CAP can be used.
The idea is:
- enum pipe_cap is only valid in get_param
- enum pipe_capf is only valid in get_paramf
Which CAPs are floating-point have been determined based on how everybody
except svga implemented the functions. svga have been modified to match all
the other drivers.
Besides that, the floating-point CAPs are now prefixed with PIPE_CAPF_.
Regresses one Piglit test: bugs/fdo10370.
I'm not enabling HiZ for gen7 yet because it causes a mysterious
performance regression.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
For depthstencil renderbuffers, we were using separate stencil only if the
hardware required it. Since the performance gains from HiZ is so high, we
should always use separate stencil if the hardware supports it.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
I implemented functions for horizontal/vertical alignment units separately
because I find it easier to read that way...especially with all the
corner-cases.
[chad] Corrected the vertical alignment calculation by checking for
depthstencil formats.
v2:
- Fix typos in intel_horizontal_texture_alignment_unit():
s/height/width/ and s/VALIGN/HALIGN.
- Remove special case for compressed formats in
intel_get_texture_alignment unit(). Compressed formats are already
handled in the halign and valign functions.
- Replace check ``_mesa_is_depth_format(...) ||
_mesa_is_depthstencil_format(...)`` with explcitit checks against
GL_DEPTH_COMPONENT and GL_DEPTH_STENCIL.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
This allows us to replace all the calls to
intel_get_texture_alignment_unit() with a single call at miptree creation.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
When a depth texture is first attached to framebuffer, allocate a HiZ
miptree for it.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
After brw_try_draw_prims() emits a batch, mark that the depth buffer needs
a depth resolve if the buffer was written to and if it has an accompanying
HiZ buffer.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Resolve all buffers that will be mapped by intelSpanRenderStart. This
comprises resolving the depth buffer of each enabled texture and of the
read and draw buffers.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Factor the mapping loops from intelSpanRenderStart() into
intel_span_map_buffers(). This in preparation for the next commit,
which resolves the buffers before mapping.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Before emitting primitives in brw_try_draw_prims(), resolve the depth
buffer's HiZ buffer and resolve the depth buffer of each enabled depth
texture.
v2: [anholt] The driver no longer validates drm bo's, so update a comment
to reflect that.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
To do so, we must resolve all buffers on entering a glBegin/glEnd block.
For the detailed explanation, see the Doxygen comments in this patch.
v2:
- Fix typo: s/enusure/ensure/.
- In brwPrepareExecBegin(), do the same resolves as done by
brw_predraw_resolve_buffers().
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
A lot of the state manipulation is handled by the meta-op state setup.
However, some batches need manual intervention.
v2:
Do not special-case the 3DSTATE_DEPTH_STENCIL.Depth_Test_Enable bit
for HiZ in gen6_upload_depth_stencil(). The HiZ meta-op sets
ctx->Depth.Test, just read the value from that.
v3:
Add a new dirty flag, BRW_STATE_HIZ, for brw_tracked_state. Flag it
immediately before and after executing the HiZ operation in
gen6_resolve_slice(). Add the flag to the the dirty bits for the
following state packets:
gen6_clip_state
gen6_depth_stencil_state
gen6_sf_state
gen6_wm_state
v4:
- Add BRW_NEW_STATE_HIZ to the dirty bit table in brw_state_upload.c.
This is needed for INTEL_DEBUG=state.
- Align brw dirty bit for gen6_depth_stencil_state.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Some state batches also need to be manipulated. That's done in the next
commit.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
brw_context::hiz contains state needed to perform HiZ meta-ops and
indicates if a HiZ operation is currently in progress.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Add the following functions:
intel_renderbuffer_resolve_hiz
intel_renderbuffer_resolve_depth
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Add functions that
- set a miptree slice as needing a resolve
- resolve a single slice of a miptree
- resolve all slices of a miptree
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
This is a map of miptree slices to needed resolves, implemented as
a linked list. A future commit will embed such a list in
intel_mipmap_tree.
If you think I'm crazy to put a list in a miptree, read the Doxygen in
this patch for intel_resolve_map.
v2: [anholt] Move Doxygen from functin prototypes to definitions.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Now that intel_renderbuffer::region has been replaced with a miptree, the
HiZ functions region parameter must be replaced with a miptree parameter.
Change the return type from bool to void.
Rename the 'depth' parameter to 'layer', because it will correspond to
irb->mt_layer.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Remove the following functions:
i830_hiz_resolve_noop
i915_hiz_resolve_noop
brw_hiz_resolve_noop
My original strategy for how intel->vtbl.resolve_*buffer was used has
substantially changed. The above functions are no longer called in the
current strategy.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
This is required to correctly implement HiZ for mipmapped and
multi-layered textures.
v2: Accomodate refcount fixes in intel_process_dri2_buffer_*() that were
introduced in v2 of commit
intel: Replace intel_renderbuffer::region with a miptree [v2]
Reviewed-by: Eric Anholt <eric@anholt>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
For depthstencil textures using separate stencil, we embedded a stencil
buffer in intel_texture_image. The intention was that the embedded stencil
buffer would be the golden copy of the texture's stencil bits. When
necessary, we scattered/gathered the stencil bits between the texture
miptree and the embedded stencil buffer.
This approach had a serious deficiency for mipmapped or multi-layer
textures. Any given moment the embedded stencil buffer was consistent with
exactly one miptree slice, the most recent one to be scattered. This
permitted tests of type A to pass, but broke tests of type B.
Test A:
1. Create a depthstencil texture.
2. Upload data into (level=x1,layer=y1).
3. Read and test stencil data at (level=x1, layer=y1).
4. Upload data into (level=x2,layer=y2).
5. Read and test stencil data at (level=x2, layer=y2).
Test B:
1. Create a depthstencil texture.
2. Upload data into (level=x1,layer=y1).
3. Upload data into (level=x2,layer=y2).
4. Read and test stencil data at (level=x1, layer=y1).
5. Read and test stencil data at (level=x2, layer=y2).
v2:
Only allocate stencil miptree if intel->must_use_separate_stencil,
because we don't make the conversion from must_use_separate_stencil to
has_separate_stencil until commit
intel: Use separate stencil whenever possible
v3:
Don't call ChooseNewTexture in intel_renderbuffer_wrap_miptree() in
order to determine the renderbuffer format. Instead, pass the format as
a param to that function.
CC: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
This is in preparation for properly implementing glFramebufferTexture*()
for mipmapped depthstencil textures. The FIXME comments deleted by this
patch give a rough explanation of what was broken.
This refactor does the following:
- In intel_update_wrapper() and intel_wrap_texture(), change the
parameters to prepare to remove functions' dependency on
gl_texture_image.
- Move the call to intel_renderbuffer_set_draw_offsets() from
intel_render_texture() into intel_udpate_wrapper().
Each time I encounter those functions, I dislike their vague names.
(Update which wrapper? What is wrapped? What is the wrapper?). So, while
I was mucking around, I also renamed the functions.
v2:
In addition to the ``GLenum internal_format`` parameter to
intel_wrap_miptree(), add a ``gl_format format`` parameter. This
removes the need to recalculate for the true format from
internal_format with ChooseNewTextureFormat, which was just weird.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
This is a small helper function that asserts that a given level and layer
are valid for a miptree. I will be extensively using it in the future
miptree HiZ functions.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Since the renderbuffer tracks the miptree level and layer that it wraps,
the 'tex_image' and 'zoffset' params are no longer needed to calculate the draw
offsets.
Not only are they no longer needed, but their presence would prevent
calculating the renderbuffer draw offsets in situations where there were
no texture image. Such situations will occur during the HiZ meta-op and
during scatter/gather of separate stencil textures.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
TODO: Make v2 for kwg.
Add two fields to intel_renderbuffer:
mt_level
mt_layer
Multiple renderbuffers may simultaneously wrap a single texture and each
provide a different view into that texture. [Consider
glFramebufferTextureLayer()]. The new fields indicate which slice of the
miptree is wrapped by the renderbuffer.
The buffer resolve operations, to be introduced in the future, require
these fields in order to resolve the correct slice in the miptree.
To add the fields, it was necessary to replace the type of some function
parameters from gl_texture_image to gl_renderbuffer_attachment.
v2: [kwg] Replace confusing condition `CubeMapFace > 0` with the more
sensible `Target == GL_TEXTURE_CUBE_MAP`.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
For all texture targets except GL_TEXTURE_CUBE_MAP, the 'nr_images' and
'depth' fields of intel_mipmap_level were identical. In the exceptional
case, nr_images == 6 and depth == 1.
It is simple to determine if a texture is a cube or not, so the presence
of two fields here was not helpful. Worse, it was confusing. When we
eventually implement GL_ARB_texture_cube_map_array, this mess would have
become even more confusing.
This patch removes 'nr_images' and assigns to 'depth' a consistent
meaning: depth is the number of 2D slices at each miplevel. The exact
semantics of depth varies according to the texture target:
- For GL_TEXTURE_CUBE_MAP, depth is 6.
- For GL_TEXTURE_2D_ARRAY, depth is the number of array slices. It is
identical for all miplevels in the texture.
- For GL_TEXTURE_3D, it is the texture's depth at each miplevel. Its
value, like width and height, varies with miplevel.
- For other texture types, depth is 1.
As a consequence, parameters were removed from the following function
signatures:
intel_miptree_set_level_info
Remove 'nr_images'.
i945_miptree_layout
brw_miptree_layout_texture
brw_miptree_layout_texture_array
Remove 'slices'.
v2:
- Replace "It's" with "Its".
- Remove all hunks in intel_fbo.c. The hunks were spurious and sneaked
in during a rebase.
- Remove unneeded hunk in intel_tex_map_image_for_swrast(). It was
a little refactor of the for-loop's upper bound.
v4:
In intel_miptree_get_image_offset(), document the conditions under
which different if-branches are taken.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
They're not supported by hw directly, but it's easy to emulate
them with a shader swizzling fixup.
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
[danvet: The important thing is to write a 1 to the unused alpha
channel, the ddx is relying on this for render accel.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
If the gallium driver doesn't support PIPE_FORMAT_R16G16B16A16_SNORM
the call to st_choose_renderbuffer_format() would fail and we'd generate
an GL_OUT_OF_MEMORY error. We'd never get to the subsequent code that
handles software/malloc-based renderbuffers.
Add a special-case check for PIPE_FORMAT_R16G16B16A16_SNORM which is used
for software-based accum buffers. This could be fixed in other ways but
it would be a much larger patch. st_renderbuffer_alloc_storage() could
be reorganized in the future.
This fixes accum buffer allocation for the svga driver.
Note: This is a candidate for the 7.11 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Extract the body of the inner loop into a new function,
intel_miptree_copy_slice().
This is in preparation for adding support for separate stencil and HiZ to
intel_miptree_copy_teximage(). When copying a slice of a depthstencil
miptree that uses separate stencil, we will also need to copy the
corresponding slice of the stencil miptree. The easiest way to do this
will be to call intel_miptree_copy_slice() recursively. Analogous
reasoning applies to copying a slice of a depth miptree with HiZ.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Add a new field, intel_mipmap_level::slice, and move the offset fields
into it. Also add some much needed documentation for these fields.
Before this patch, a separate array was allocated for the
intel_mipmap_level::{x,y}_offsets. This was just silly; it incurred an
extra call to malloc and diminished memory locality.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Essentially, this patch just globally substitutes `irb->region` with
`irb->mt->region` and then does some minor cleanups to avoid segfaults
and other problems.
This is in preparation for
1. Fixing scatter/gather for mipmapped separate stencil textures.
2. Supporting HiZ for mipmapped depth textures.
As a nice benefit, this lays down some preliminary groundwork for easily
texturing from any renderbuffer, even those of the window system.
A future commit will replace intel_mipmap_tree::hiz_region with a miptree.
v2:
- Return early in intel_process_dri2_buffer_*() if region allocation
fails.
- Fix double semicolon.
- Fix miptree reference leaks in the following functions:
intel_process_dri2_buffer_with_separate_stencil()
intel_image_target_renderbuffer_storage()
v3:
- [anholt] Fix check for hiz allocation failure. Replace
``if (!irb->mt)` with ``if(!irb->mt->hiz_region)``.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Move the following inline functions:
intel_get_rb_region
intel_framebuffer_has_hiz
A future commit will replace the renderbuffer's region with a miptree.
This small refactor will eliminate the need for intel_fbo.h to include
intel_mipmap_tree.h on that commit. I'd like to avoid the situation where
each header transitively includes every other header.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
The only user of intel_framebuffer_get_hiz_region() was
intel_framebuffer_has_hiz(). So I folded the body of the former into the
latter.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
A great refactor thrashing begins after this commit for HiZ and separate
stencil. Removing code for texture HiZ will make that refactoring easier,
because then we don't have to maintain that code during the refactor.
To disable HiZ for textures, I've removed the hook in
intel_update_wrapper() that allocates a HiZ buffer when attaching a depth
texture to a framebuffer.
HiZ was broken for textures anyway, so there's no regression here.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
The function gathered the stencil buffer into the depth buffer only when
the map mode contained the read bit. But we must do the gather even if the
map mode is write-only. If we do not, then, when the depth buffer's stencil
bits are scattered into the stencil buffer by intel_unmap_renderbuffer(),
some of the scattered stencil bits would be invalid.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
1. Don't map the depthstencil buffer twice
Place a guard in intel_renderbuffer_map() to prevent a renderbuffer
from being mapped twice. This happened if a single buffer was attached to
the framebuffer's depth and stencil attachment points. (Interestingly,
because intel_map_renderbuffer_gtt() is idempotent, the double mapping did
not cause bugs for depthstencil buffers *without* separate stencil).
2. Stop overriding gl_framebuffer::_DepthBuffer,_StencilBuffer
Normally, if a depthstencil buffer is attached to the framebuffer's
depth attachment point, then _mesa_update_framebuffer() installs
a wrapper depth renderbuffer at gl_framebuffer::_DepthBuffer. Ditto for
the stencil attachment point and gl_framebuffer::_StencilBuffer
A depthstencil intel_renderbuffer with separate stencil contains hidden
depth and stencil renderbuffers, which are the *real* renderbuffers. In
order to force swrast to work, we were installing, in
brw_update_draw_buffer(), the hidden renderbuffers at
gl_framebuffer::_DepthBuffer and _StencilBuffer, thus overriding the
behavior of _mesa_update_framebuffer(). However, now that
intel_renderbuffer_map() is implemented with MapRenderbuffer(),
overriding _mesa_update_framebuffer's introduces bugs. This patch
removes the override code.
Fixes several Piglit tests on gen7.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
The special stencil span accessors, as set by intel_span_init_funcs.
perform software W detiling. Since intel_renderbuffer_map() now uses
MapRenderbuffer, rb->Data points to an *untiled* stencil buffer.
Fixes several Piglit tests on gen7.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
It's intended to indicate whether the driver/hardware supports reading
of the values written into shader outputs.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
texture_combine converts the result rgba to CHAN_TYPE from FLOAT. At the
same time, make sure the span->array->ChanType is changed, too.
v2: pick a nicer comment from Brian
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Parameter n and rgbaChan are both from structure span, thus using span
as paramter to simplify the prototype. Function texture_combine is only
used by _swrast_texture_span, so I guess it's safe to do so.
This patch is mainly for the next patch.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
And update r300g.
This is different from util_draw_max_index in how it obtains vertex elements
and that it doesn't have to call util_format_description due to additional
precomputed data in vertex elements.
This forks vbo_get_minmax_index. We need to know the index range when
translating non-native vertices into native ones. There is no other way
around it.
Previously we were mapping/unmapping the index buffer each time we
found the restart index in the buffer. This is bad when the restart
index is frequently used. Now just map the index buffer once, scan
it to produce a list of sub-primitives, unmap the buffer, then draw
the sub-primitives.
Also, clean up the logic of testing for indexed primitives and calling
handle_fallback_primitive_restart(). Don't call it for non-indexed
primitives.
v2: per Jose, only map the relevant part of the index buffer with
pipe_buffer_map_range()
Reviewed-by: José Fonseca <jfonseca@vmware.com>
On original gen4, the surface format didn't determine the return data
type from sampling like it does on g45 and later.
Fixes GL_EXT_texture_integer/texture_integer_glsl130
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Merge may produce incorrect order of operations for r600-eg:
x: inst1 R0.x, ... ; //from current group
...
t: inst0 R0.x, ... ; //from previous group, same destination
Result of inst1 will be lost.
So compare destinations and don't allow this.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Previously a vertex shader that used no samplers would get updated (by
calling the driver's ProgramStringNotify) when a sampler in the
fragment shader was updated. This was discovered while investigating
some spurious code generation for shaders in Cogs. The behavior in
Cogs is especially pessimal because it ping-pongs sampler uniform
settings:
glUniform1i(sampler1, 0);
glUniform1i(sampler2, 1);
draw();
glUniform1i(sampler1, 1);
glUniform1i(sampler2, 0);
draw();
glUniform1i(sampler1, 0);
glUniform1i(sampler2, 1);
draw();
// etc.
ProgramStringNotify is still too big of a hammer. Applications like
Cogs will still defeat the shader cache. A lighter-weight mechanism
that can work with the shader cache is needed. However, this patch at
least restores the previous behavior.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In slow_read_depth_stencil_pixels_separate() we might have separate
depth and stencil buffers or a combined buffer. In the later case,
don't map the buffer twice. This function is used when the depth
scale/bias pixel transfer values are not the defaults.
Fixes http://bugs.freedesktop.org/show_bug.cgi?id=42963
Reviewed-by: José Fonseca <jfonseca@vmware.com>
glspec doesn't say that we should skip the attenuation and spot
calculation for infinite light(Ppli.w == 0). Instead, it gives a same
formula to do the light calculation for both finite light and infinite
light(see page 62 of glspec 2.1.pdf)
Also from the formula (2.4) at page 62 of glspec 2.1.pdf, we can skip
attenuation calculation if Ppli.w == 0.
This would fix all the intel oglc l_sed fail subcases and introduces no
intel oglc regressions.
v2: fix an wrong intendation(comments from Brian).
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Brian Paul <brianp@vmware.com>
Make sure all lighting tables are updated before using the table to
calculate something, say using _SpotExpTable to calculate
_VP_inf_spot_attenuation.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Fixes regression in piglit:
ARB_color_buffer_float/GL_RGBA16F-getteximage
ARB_color_buffer_float/GL_RGBA16F-readpixels
ARB_color_buffer_float/GL_RGBA32F-getteximage
ARB_color_buffer_float/GL_RGBA32F-readpixels
Reviewed-by: Brian Paul <brianp@vmware.com>
Fixes some spurious GL errors in the upcoming
gl-3.0-required-sized-formats piglit test.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
intelAllocateBuffer() was oblivious to separate stencil buffers. This
patch fixes it to allocate a non-tiled stencil buffer with special pitch,
just as the DDX does.
Without this, any app that attempted to create an EGL surface with stencil
bits would crash. Of course, this affected only environments that used the
builtin DRI2 backend, such as Android and Wayland.
Fixes GLBenchmark2.1 on Android on gen7.
Note: This is a candidate for the 7.11 branch.
Tested-by: Louie Tsaie <louie.tsai@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
I changed the dimensions of the stencil buffer's region, as allocated by
the DDX, at xf86-video-intel commit
commit 3e55f3e88b40471706d5cd45c4df4010f8675c75
dri: Do not tile stencil buffer
But I forgot to make the analogous update to the Intel DRI2 glue in Mesa.
This patch makes that update.
Surprisingly, the mismatch did not cause any bugs. But the mismatch, if
left unfixed, *would* create bugs in the next commit.
Note: This is a candidate for the 7.11 branch.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
When calculating the y offset needed for detiling window system stencil
buffers, replace the term
region->height * 2 + region->height % 2 - 1
with
rb->Height - 1 .
The two terms are incidentally equivalent due to some out-of-date,
incorrect code in the Intel DRI2 glue for DDX. (See
intel_process_dri2_buffer_with_separate_stencil(), line ``buffer_height /=
2;``).
Note: This is a candidate for the 7.11 branch (only the intel_span.c hunk).
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Rather than redefining the BYTE/SHORT_TO_FLOAT macros, just define new
ones with different names. These macros preserve zero when converting.
Reviewed-by: Eric Anholt <eric@anholt.net>
Don't assert/die if a VBO is too small. Return zero instead. For
debug builds, emit a warning message since this is an unusual situation
that might indicate that there's a bug in the app.
Note that util_draw_max_index() now returns max_index+1 instead of
max_index. This lets us return zero to indicate that one of the VBOs
is too small to draw anything.
Fixes a failure with the new piglit vbo-too-small test.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
The swrast ReadPixels code has no dependencies on swrast since moving
to Map/UnmapRenderbuffer(). We'll be able to remove s_readpix.c and
remove the state tracker's glReadPixels code next.
Acked-by: Eric Anholt <eric@anholt.net>
This was only used by the xlib driver to add an alpha channel to the
front/window color buffer. This was no longer going to work well with
the move to direct mapping of renderbuffers.
Reviewed-by: Eric Anholt <eric@anholt.net>
Seldom used and this won't work when we move to using Map/UnmapRenderbuffer
everywhere. This will let us remove a bunch of core Mesa code too.
Reviewed-by: Eric Anholt <eric@anholt.net>
For a depthstencil buffer with separate stencil,
intel_renderbuffer::region is null. (The regions are kept in hidden depth
and stencil buffers). Since the region is null, intel_map_renderbuffer()
assumed there was no data and returned a null map pointer, which in turn
was dereferenced (!) by MapRenderbuffer's caller.
This patch fixes intel_map_renderbuffer() to map the hidden depth buffer
through the GTT and return that as the mapped pointer. Also, the stencil
bits are scattered and gathered when needed.
Fixes the following Piglit tests on gen7:
fbo/fbo-readpixels-depth-formats
hiz/hiz-depth-read-fbo-d24s8
hiz/hiz-stencil-read-fbo-d24s8
EXT_packed_depth_stencil/fbo-clear-formats
EXT_packed_depth_stencil/fbo-depth-GL_DEPTH24_STENCIL8-blit
EXT_packed_depth_stencil/fbo-depth-GL_DEPTH24_STENCIL8-drawpixels
EXT_packed_depth_stencil/fbo-depth-GL_DEPTH24_STENCIL8-readpixels
EXT_packed_depth_stencil/fbo-depthstencil-GL_DEPTH24_STENCIL8-readpixels-24_8
EXT_packed_depth_stencil/fbo-depthstencil-GL_DEPTH24_STENCIL8-readpixels-FLOAT-and-USHORT
EXT_packed_depth_stencil/fbo-stencil-GL_DEPTH24_STENCIL8-readpixels
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
If a window system stencil buffer had a region with odd height, then the
calculated y offset needed for software detiling was off by one. The bug
existed in intel_{map,unmap}_renderbuffer_s8() and in the intel_span.c
accessors.
Fixes the following Piglit tests on gen7:
general/depthstencil-default_fb-readpixels-24_8
general/depthstencil-default_fb-readpixels-FLOAT-and-USHORT
Fixes SIGABRT in the following Piglit tests on gen7:
general/depthstencil-default_fb-blit
general/depthstencil-default_fb-copypixels
general/depthstencil-default_fb-drawpixels-24_8
general/depthstencil-default_fb-drawpixels-FLOAT-and-USHORT
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
When gathering the temporary buffer's pixles into the gem buffer, we had
the two buffers juxtaposed. Oops.
Fixes the following Piglit tests on gen7:
general/GL_SELECT - alpha-test enabled
general/GL_SELECT - depth-test enabled
general/GL_SELECT - no test function
general/GL_SELECT - scissor-test enabled
general/GL_SELECT - stencil-test enabled
Fixes SIGABRT in Piglit tests EXT_framebuffer_object/fbo-stencil-* on
gen7.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
The function already implements 3 cases (map through GTT, blit to
a temporary, and detile stencil buffer to temporary), and a 4th will be
added soon: scatter/gather for depthstencil buffers using separate
stencil. For sanity's sake, this factors each case out into its own
function.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Don't call set_unfiform_initializers if link failed, or it would trigger
a GL_INVALID_OPERATION error. That's not an expected behavior of
glLinkProgram function.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Previously, we would fail to compile the following shader due to a bug
in lazy built-in importing:
#version 130
void main() {
float f = abs(5.0);
int i = abs(5);
}
The first call, abs(5.0), would fail to find a local signature, look
through the built-ins, and import "float abs(float)".
The second call, abs(5), would find the newly imported float signature
in the local shader, and settle for that. Unfortunately, it failed to
search the built-ins for the correct/exact signature, "int abs(int)".
Thus, abs(5) ended up being a float, causing a bizarre type error when
we tried to assign it to an int.
Fixes piglit test builtin-overload-matching.frag.
This is /not/ a candidate for stable branches, as it should only be
possible to trigger this bug using GLSL 1.30's built-in functions that
take integer arguments. Plus, the changes are fairly invasive.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
match_function_by_name performs two fairly separate tasks:
1. Hunt down the appropriate ir_function_signature for the callee.
2. Generate the actual ir_call (assuming we found the callee).
Both of these are complicated. The first has to handle exact/inexact
matches, lazy importing of built-in prototypes, different scoping rules
for 1.10, 1.20+, and ES. Not to mention printing a user-friendly error
message with pretty-printed "maybe you meant this" candidate signatures.
The second has to deal with void/non-void functions, pre-call implicit
conversions for "in" parmeters, and post-call "out" call conversions.
Trying to do both in one function is just too unwieldy. Time to split.
This patch purely moves the code to generate an ir_call into a separate
function and reindents it. Otherwise, the code is identical.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
When matching function signatures across multiple linked shaders, we
often want to see if the current shader has _any_ match, but also know
whether or not it was exact. (If not, we may want to keep looking.)
This could be done via the existing mechanisms:
sig = f->exact_matching_signature(params);
if (sig != NULL) {
exact = true;
} else {
sig = f->matching_signature(params);
exact = false;
}
However, this requires walking the list of function signatures twice,
which also means walking each signature's formal parameter lists twice.
This could be rather expensive.
Since matching_signature already internally knows whether a match was
exact or not, we can just return it to get that information for free.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
We need something that looks like a compiler and not like some hacker
put some functions together. /rant
This is a band-aid for these two problems:
- The R600 and EG control-flow instructions appear in switch statements
next to each other, causing conflicts when adding new instructions.
- The ALU control-flow instructions are bitshifted by 3 (from CF_INST 26:29
to CF_INST 23:29, as is defined by r600 ISA) even for EG, where CF_INST
is 22:29.
To fix this mess, the 'inst' field is bitshifted to the left either by 22, 23,
or 26 (directly in the definitions), such that it can be just or'd when making
bytecode without any shifting. All switch statements have been divided into
two, one for R600 and the other for EG.
Of course, there is a better way to do this, but that is left for future
work.
Tested on RV730 and REDWOOD with no regressions.
v2: minor cleanup as per Alex's comment.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
This is also done in ir_to_mesa and st_glsl_to_tgsi, but that code
will be removed soon.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If they were disabled on entry, and we enabled one (like for
BlitFramebuffer), we wouldn't disable it on the way out. Retain the
attempted optimization here (don't keep calling to set each bit for
changes that won't matter) by just setting the bits directly with
appropriate flushing.
Fixes misrendering on the second draw of piglit fbo-blit.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
It's unlikely that we changed the object but no other texture
parameter, but be correct anyway. Noticed by inspection.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Also, actually update const_storage_size, therefore avoiding to
unnecessarily reallocate aligned_constant_storage every single time
draw_vs_set_constants() is called.
Reviewed-by: Brian Paul <brianp@vmware.com>
Although textureSize is represented as an ir_texture with op == ir_txs,
it doesn't have a coordinate, so normalizing it doesn't make sense.
Fixes crashes in oglconform glsl-bif-tex-size basic.samplerCube.* tests.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This patch solves three bugs.
1. When a texture was attached to the GL_DEPTH_STENCIL_ATTACHMENT point,
Mesa attached the texture only to the depth attachment point
gl_framebuffer::Attachment[BUFFER_DEPTH]
and failed to attach it to the stencil attachment point
gl_framebuffer::Attachment[BUFFER_STENCIL]
2. When a texture was attached to the GL_DEPTH_ATTACHMENT point and then
later attached to the GL_STENCIL_ATTACHMENT point, Mesa created two
separate renderbuffer wrappers. This caused a GL error in
glGetFramebufferAttachmentParameteriv().
3. Same as 2, but with depth and stencil juxtaposed.
Fixes Piglit test ARB_framebuffer_object/same-attachment-glFramebufferTexture2D-GL_DEPTH_STENCIL
Note: This is a candidate for the stable branches.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
The compiler setup for these VF-uploaded attributes looks a little
cheesy with mixing system values and real VBO-sourced attributes. It
would be nice if we could just compute the ATTR[] map to GRF index up
front and use it at visit time instead of using ir->location in the
ATTR file. However, we don't know the reg_offset at
visit(ir_variable *) time, so we can't do the mapping that early.
Fixes piglit vertexid test.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We only allow 16 vec4s of attributes in our GLSL/ARB_vp programs, and
1 more element will get used for gl_VertexID/gl_InstanceID. So it
should never have been possible to hit this fallback, unless there was
another bug. If you do hit this, you're probably using gl_VertexID
and falling back to swrast won't work for you anyway.
This also updates the limits for gen6+.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This used to be script-generated, but now it's just a bunch of static
variables in a .h file for no good reason.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
v2: check if pipe_buffer_map() returns NULL, and return NULL from
svga_vbuf_render_map_vertices(). Per Jose's suggestion.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Previously, if we failed to allocate a VBO (either for display list
compilation or immediate mode rendering) we'd eventually segfault
when trying to map the non-existant buffer or in a glVertex/Color/etc
call when we hit a null pointer.
Now we don't try to map non-existant buffers and if we do fail to
allocate a VBO we plug in no-op functions for glVertex/Color/etc
so we don't segfault.
None of the code in api_noop.c was used anymore. The new vbo_noop.c
functions are true no-ops. They'll be used to no-op glBegin/End functions
when we run out of VBO memory.
Only a handful of functions from api_noop.c are actually used by
the VBO module. Move them to the VBO module. With this change,
none of the code in api_noop.c is actually used anymore.
NEW_COLOR is only needed on Gen4-5 as brw_update_renderbuffer_surfaces
only uses ctx->Color when intel->gen < 6.
This should reduce unnecessary state updates.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Constant expressions which called GLSL's equal() and notEqual()
built-ins on bvecs would hit an assertion failure; we simply forgot to
implement them for booleans.
NOTE: This is a candidate for stable release branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
These simply don't exist in the 1.30 specification---none of the Offset
variants allow samplerCube. This must have been a cut and paste error
from textureGrad, which /does/ allow cubemaps.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Due to a cut and paste error, these were accidentally misnamed
textureProj() rather than textureProjOffset().
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
From the GLSL 1.30 spec, section 8.7 "Texture Lookup Functions":
"In all functions below, the bias parameter is optional for fragment
shaders. The bias parameter is not accepted in a vertex shader."
This was a cut and paste mistake.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
brw_wm_samplers actually enables any active samplers regardless of what
pipeline stage is using them, so it doesn't make much sense for it to be
WM-specific. So, rename it to "brw_samplers."
To properly generalize it, move sampler_count and sampler_offset from
brw_context::wm to a new brw_context::sampler that can be shared without
looking strange.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Like for the WM pull constants, we can merge the former prepare/emit
stages into one tracked state atom. Furthermore, the code that used to
handle the binding table was removed in the last commit, leaving some
rather silly looking short functions that can easily be folded in.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Although the hardware supports separate binding tables for each pipeline
stage, we don't see much advantage over a single shared table.
Consider the contents of the binding table:
- Textures (16)
- Draw buffers (8)
- Pull constant buffers (1 for VS, 1 for WM)
OpenGL's texture bindings are global: the same set of textures is
available to all shader targets. So our binding table entries for
textures would be exactly the same in every table.
There are only two pull constant buffers (not many), and although draw
buffers aren't interesting to the VS, it shouldn't hurt to have them in
the table. The hardware supports up to 254 binding table entries, and
we currently only use 26.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
First, the texturing setup code is relevant for all pipeline stages,
while renderbuffer surfaces are only used by the WM.
Secondly, renderbuffer and texture setup depends on a different set of
dirty bits. There's no reason to walk the array of textures when
changing draw buffers, or vice-versa.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
These were only split for historical reasons: brw_wm_constants used to
be the "prepare" step, while brw_wm_constant_surface was "emit". Now
that both happen at emit time, it makes sense to combine them.
Call the newly combined state atom "brw_wm_pull_constants" to indicate
help distinguish it from the Gen6+ atoms that handle push constants.
Finally, remove the BRW_NEW_WM_CONSTBUF dirty bit entirely now that it's
never flagged nor used.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
When reading the "brw_wm_constants" and "gen6_wm_constants" atoms
side-by-side, I initially failed to notice the crucial difference:
the Gen6 atoms are for Push Constants, while brw_wm_constants handles
Pull Constants. (Gen4/5 Push Constants are handled by "brw_curbe.")
Renaming these should clarify the code and save me from constant
confusion over the fact that "gen6_wm_constants" isn't just a newer
version of "brw_wm_constants."
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
This code is fairly fragile, as it depends on the ordering of the
entries in the binding table, which will change soon.
Also, stop listening on the BRW_NEW_WM_CONSTBUF dirty bit as it's no
longer required.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
These fields control how many entries the hardware prefetches into the
state cache, so they only impact performance, not correctness. However,
it's not clear how to use this in a way that's beneficial.
According to the documentation, kernels "using a large number" of
entries may wish to program this to zero to avoid thrashing the cache;
it's unclear how many is too many. Also, Ironlake's WM was missing this
feature entirely---the count had to be zero.
The dirty bit tracking to handle this complicates the surface state
and binding table setup; removing it should simplify things and make
future refactoring easier. So just set 0 for the number of entries
rather than trying to compute and track it.
Appears to have no impact on Nexuiz and OpenArena on Sandybridge.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
The comment states that brw_update_vs_constant_surface produces a
CACHE_NEW_SURF_BIND dirty bit, but it doesn't. In fact, that bit
no longer even exists.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
brw_vs_surfaces _produces_ the BRW_NEW_NR_VS_SURFACES dirty bit, so it
makes no sense for it to subscribe to it.
Fixes an assertion failure in many piglit tests when INTEL_DEBUG is set:
brw_state_upload.c:484: void brw_upload_state(struct brw_context *):
Assertion `!check_state(&examined, &generated)' failed.
One such piglit test is vs-uniform-array-mat2-col-rd.shader_test.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Comparing brw_upload_vs_pull_constants and brw_upload_wm_pull_constants,
it became evident that something was amiss: the VS code had both
CACHE_NEW_VS_PROG and BRW_NEW_VERTEX_PROGRAM, while the WM code was
missing the CACHE_NEW_WM_PROG flag.
Not observed to fix anything, but likely necessary.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Now that we have vtable entries in place, we should use them. This
allows us to drop the cut and pasted Gen7 brw_tracked_state atoms as
they now do exactly the same thing as their brw_wm_surface_state
counterparts.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Gen7+ SURFACE_STATE is different from Gen4-6, so we need separate
per-generation functions for creating and updating it. However, the
usage is the same, and callers just want to utilize the appropriate
functions with minimal pain. So, put them in the vtable.
Since these take a brw_context pointer and are only used on Gen4, just
add a forward declaration. This is the simplest (if not cleanest)
solution. It would be nicer to have a i965-specific vtable, but that's
a refactor for another day.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
These two are fairly unique types so add specific cases for decoding them.
Passes piglit fbo-clear-format and fbo-generatemipmap-format tests for these
two extensions.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This ports the softpipe NV_conditional_render support to llvmpipe.
This passes the nv_conditional_render-* piglit tests.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds a new function r600_need_cs_space. Currently, it's easy to overflow
the CS - queries are not counted in. I guess that's not the only case where
the driver may crap out.
This is the inverse operation to _mesa_pack_rgba_span_int. The 16-bit
code isn't done because of lack of testing and not being sure how sign
extension/clamping should be handled between, say, 16-bit int and
32-bit int or uint.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This requires using a new fragment shader to get the integer color
output, and a new vertex shader because #version has to match between
the two.
v2: Clarify that there's no need for BindFragDataLocation.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
We're missing support for the software paths still, but basic
rendering is working.
v2: Override RGB_INT32/UINT32 to not be renderable, since the hardware
can't do it but we do allow texturing from it now. Drop the
DataType override, since the _mesa_problem() isn't in that path
any more.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
Before, I was tracking the ir_variable * found for gl_FragColor or
gl_FragData[]. Instead, when visiting those variables, set up an
array of per-render-target fs_regs to copy the output data from. This
cleans up the color emit path, while making handling of multiple
user-defined out variables easier.
v2: incorporate idr's feedback about ir->location (changes by Kenneth Graunke)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When rendering to integer color buffers, we need to be careful to use
MRFs of the correct type when emitting color writes.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, brw_type_for_base_type returned UD for array variables,
similar to structures. For structures, each field may have a different
type, so every field access must explicitly override the register's type
with that field's type. We chose to return UD in this case since it was
the least common, so errors would be more obvious.
For arrays, it makes far more sense to return the type corresponding to
an element of the array. This allows normal array access to work
without the hassle of explicitly overriding the register's type.
This should obsolete a bunch of type overrides throughout the code.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: s/GL_TRUE/true/, and re-enable RGB_INT32 based on discussion
yesterday about required RB formats vs texture formats.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
This will let the feature be incrementally developed, hidden behind
the flag we're all using as we work on GL 3.0 support.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
While not required by any particular spec version, mplayer was asking
for L16 and hoping for actual L16 without checking. The 8 bits
allocated led to 10-bit planar video data stored in the lower 10 bits
giving only 2 bits of precision in video. While it was an amusing
effect, give them what they actually wanted instead.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=41461
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We want to be able to support some formats for texturing that we can't
render to, which means that some choices for RenderbufferStorage end
up being incomplete (for example, L8 currently). For these, where we
don't render to them, we don't want to have to make up an rb->DataType
that's only used for GetRow()/PutRow().
Commit 1401b96b (radeon: cleanup radeon shared code after r300 and
r600 classic drivers removal) removed the file
src/mesa/drivers/dri/radeon/server/radeon.h, but it left behind the
symlink which was used to share that file into the
src/mesa/drivers/dri/r200/server directory.
This patch removes the dangling symlink.
Reviewed-by: Alex Deucher <alexdeucher@gmail.com>
This patch modifies the GLSL linker to assign additional slots for
varying variables used by transform feedback, and record the varying
slots used by transform feedback for use by the driver back-end.
This required modifying assign_varying_locations() so that it assigns
a varying location if either (a) the varying is used by the next stage
of the GL pipeline, or (b) the varying is required by transform
feedback. In order to avoid duplicating the code to assign a single
varying location, I moved it into its own function,
assign_varying_location().
In addition, to support transform feedback in the case where there is
no fragment shader, it is now possible to call
assign_varying_locations() with a consumer of NULL.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Tested-by: Marek Olšák <maraeo@gmail.com>
This just validates the input parameters so far.
Fixes piglit's bindfragdata-invalid-parameters test.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Up until now modifying the GLSL compiler has been pretty straightforward.
This is where things get interesting. But still pretty straightforward.
Switch statements can be thought of a series of if/then/else statements.
Case labels are compared with the value of a test expression and the case
statements are executed if the comparison is true.
There are a couple of aspects of switch statements that complicate this simple
view of the world. The primary one is that cases can fall through sequentially
to subsequent case, unless a break statement is encountered, in which case,
the switch statement exits completely.
But break handling is further complicated by the fact that a break statement
can impact the exit of a loop. Thus, we need to coordinate break processing
between switch statements and loop statements.
The code generated by a switch statement maintains three temporary state
variables:
int test_value;
bool is_fallthru;
bool is_break;
test_value is initialized to the value of the test expression at the head of
the switch statement. This is the value that case labels are compared against.
is_fallthru is used to sequentially fall through to subsequent cases and is
initialized to false. When a case label matches the test expression, this
state variable is set to true. It will also be forced to false if a break
statement has been encountered. This forcing to false on break MUST be
after every case test. In practice, we defer that forcing to immediately after
the last case comparison prior to executing a case statement, but that is
an optimization.
is_break is used to indicate that a break statement has been executed and is
initialized to false. When a break statement is encountered, it is set to true.
This state variable is then used to conditionally force is_fallthru to to false
to prevent subsequent case statements from executing.
Code generation for break statements depends on whether the break statement is
inside a switch statement or inside a loop statement. If it inside a loop
statement is inside a break statement, the same code as before gets generated.
But if a switch statement is inside a loop statement, code is emitted to set
the is_break state to true.
Just as ASTs for loop statements are managed in a stack-like
manner to handle nesting, we also add a bool to capture the innermost switch
or loop condition. Note that we still need to maintain a loop AST stack to
properly handle for-loop code generation on a continue statement. Technically,
we don't (yet) need a switch AST stack, but I am using one for orthogonality
with loop statements, in anticipation of future use. Note that a simple
boolean stack would have sufficed.
We will illustrate a switch statement with its analogous conditional code that
a switch statement corresponds to by examining an example.
Consider the following switch statement:
switch (42) {
case 0:
case 1:
gl_FragColor = vec4(1.0, 2.0, 3.0, 4.0);
case 2:
case 3:
gl_FragColor = vec4(4.0, 3.0, 2.0, 1.0);
break;
case 4:
default:
gl_FragColor = vec4(0.0, 0.0, 0.0, 0.0);
}
Note that case 0 and case 1 fall through to cases 2 and 3 if they occur.
Note that case 4 and the default case must be reached explicitly, since cases
2 and 3 break at the end of their case.
Finally, note that case 4 and the default case don't break but simply fall
through to the end of the switch.
For this code, the equivalent code can be expressed as:
int test_val = 42; // capture value of test expression
bool is_fallthru = false; // prevent initial fall through
bool is_break = false; // capture the execution of a break stmt
is_fallthru |= (test_val == 0); // enable fallthru on case 0
is_fallthru |= (test_val == 1); // enable fallthru on case 1
is_fallthru &= !is_break; // inhibit fallthru on previous break
if (is_fallthru) {
gl_FragColor = vec4(1.0, 2.0, 3.0, 4.0);
}
is_fallthru |= (test_val == 2); // enable fallthru on case 2
is_fallthru |= (test_val == 3); // enable fallthru on case 3
is_fallthru &= !is_break; // inhibit fallthru on previous break
if (is_fallthru) {
gl_FragColor = vec4(4.0, 3.0, 2.0, 1.0);
is_break = true; // inhibit all subsequent fallthru for break
}
is_fallthru |= (test_val == 4); // enable fallthru on case 4
is_fallthru = true; // enable fallthru for default case
is_fallthru &= !is_break; // inhibit fallthru on previous break
if (is_fallthru) {
gl_FragColor = vec4(0.0, 0.0, 0.0, 0.0);
}
The code generate for |= and &= uses the conditional assignment capabilities
of the IR.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We now tie the grammar to the ctors of the ASTs they reference.
This requires that we actually have definitions of the ctors.
In addition, we also need to define "print" and "hir" methods for the AST
classes. The Print methods are pretty simple to flesh out. However, at this
stage of the development, we simply stub out the "hir" methods and flesh
them out later.
Also, since actual class instances get returned by the productions in the
grammar, we also need to designate the type of the productions that
reference those instances.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The grammar is modified to support switch statements. Rather than follow the
grammar in the appendix, which allows case labels to be placed ANYWHERE
as a regular statement, we follow the development of the grammar as
described in the body of the GLSL spec.
In this variation, the switch statement has a body which consists of a list
of case statements. A case statement is preceded by a list of case labels and
ends with a list of statements.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Data structures for switch statement and case label are created that parallel
the structure of other AST data.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It is needed for nv50's new shader backend. With this change, both u_math.h
and imports.h in core mesa define the same function. I have to #undef log2f
here to avoid the conflict. Not sure if there is a better way to deal with
the situation.
Acked-by: José Fonseca <jfonseca@vmware.com>
Switch all of the code in ir_to_mesa, st_glsl_to_tgsi, glUniform*,
glGetUniform, glGetUniformLocation, and glGetActiveUniforms to use the
gl_uniform_storage structures in the gl_shader_program.
A couple of notes:
* Like most rewrite-the-world patches, this should be reviewed by
applying the patch and examining the modified functions.
* This leaves a lot of dead code around in linker.cpp and
uniform_query.cpp. This will be deleted in the next patches.
v2: Update the comment block (previously a FINISHME) in _mesa_uniform
about generating GL_INVALID_VALUE when an out-of-range sampler index
is specified.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Tom Stellard <thomas.stellard@amd.com>
_mesa_ir_link_shader needs to be called before cloning the IR tree so
that the var->location field for uniforms is set.
WARNING: This change breaks several integer division related piglit
tests. The tests break because _mesa_ir_link_shader lowers integer
division to an RCP followed by a MUL. The fix is to factor out more
of the code from ir_to_mesa so that _mesa_ir_link_shader does not need
to be called at all by the i965 driver. This will be the subject of
several follow-on patches.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Tom Stellard <thomas.stellard@amd.com>
These were both useful debugging aids while developing this code.
log_uniform will be used to keep the MESA_GLSL=uniform behavior.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Tom Stellard <thomas.stellard@amd.com>
Connects all of the gl_program_parameter structures with the correct
gl_uniform_storage structures.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Tom Stellard <thomas.stellard@amd.com>
These functions are used to create and destroy the connections between
a uniform and the storage used by the driver to hold its value.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Tom Stellard <thomas.stellard@amd.com>
This function propagates the values from the backing storage of a
gl_uniform_storage structure to the driver supplied data locations.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Tom Stellard <thomas.stellard@amd.com>
v2: Remane class count_uniform_size based on feedback from Eric:
"Maybe just "count_uniform_size"? "usage" makes me think "way it's
dereferenced" or something."
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Tom Stellard <thomas.stellard@amd.com>
v2: Update a comment block about the different treatment of
location=-1 based on feedback from Ken.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Tom Stellard <thomas.stellard@amd.com>
Prepend _mesa_uniform_ to the names and rework the calling
convention. The calling convention was changed for a couple reasons.
1. Having a single variable named 'location' have completely different
meanings at different places in the function is confusing. Before
calling split_location_offset the location is the encoded value
returned by glGetUniformLocation. After calling split_location_offset
it's the index of the uniform in the gl_uniform_list::Uniforms array.
2. In a later commit the original value of 'location' is needed after
split_location_offset has been called.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Tom Stellard <thomas.stellard@amd.com>
v2: Update some comments based on feedback from Eric Anholt.
v3: Remove gl_uniform_storage::dirty field. Make
gl_uniform_storage::initialized be bool, and make
gl_uniform_storage::sampler be uint8_t.
v4: Include stdbool.h after Tom Stellard noticed a build failure that
was introduced by the changes in v2. Oops.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Tom Stellard <thomas.stellard@amd.com>
There are cases where we might want to internally query the location
of a uniform in a shader that failed linking.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Tom Stellard <thomas.stellard@amd.com>
Some C code will want access to the glsl_base_type and
glsl_sampler_dim enums in the near future.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Tom Stellard <thomas.stellard@amd.com>
The spec says "Only ClearBufferiv should be used to clear
stencil buffers." and "Only ClearBufferfv should be used to clear
depth buffers." However, on the following page it also says:
"The result of ClearBuffer is undefined if no conversion between
the type of the specified value and the type of the buffer being
cleared is defined (for example, if ClearBufferiv is called for a
fixed- or floating-point buffer, or if ClearBufferfv is called
for a signed or unsigned integer buffer). *This is not an error.*"
Emphasis mine.
Fixes problems with piglit's clearbuffer-invalid-drawbuffer test.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This fixes a regression from the recent glReadPixels changes found
with the piglit hiz tests.
Use either MESA_FORMAT_RGBA8888 or MESA_FORMAT_RGBA8888_REV for color
buffers depending on endian-ness. Before, the gl_renderbuffer::Format
field was MESA_FORMAT_RGBA8888 but the data was really stored as
MESA_FORMAT_RGBA8888_REV when using a little endian machine.
Getting this right matters now that we can access renderbuffer data
without going through the span functions (namely glReadPixels() +
MapRenderbuffer()).
These vars will just get overwritten when we call _mesa_add_renderbuffer()
anyway. We only need to set the InternalFormat field when we create the
software renderbuffer.
Reviewed-by: Eric Anholt <eric@anholt.net>
This fixes the glReadPixels() regression for reading from the front/back
color buffers.
Note, we only allow one mapping of an XImage/Pixmap renderbuffer
at any time. That might need to be revisited in the future.
One of the points of GL_ARB_texture_storage is to make it impossible
to have malformed mipmap stacks. If we know the texture object is
immutable, we can skip a bunch of size checking.
Commit a73c65c534 had a typo which
accidentally enabled the workaround-free Gen7 code on Gen6.
Fixes GPU hangs in anything using pow() or integer division/modulus.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
According to the documentation, Ivybridge's math instruction works in
SIMD16 mode for the fragment shader, and no longer forbids align16 mode
for the vertex shader.
The documentation claims that SIMD16 mode isn't supported for INT DIV,
but empirical evidence shows that it works fine. Presumably the note
is trying to warn us that the variant that returns both quotient and
remainder in (dst, dst + 1) doesn't work in SIMD16 mode since dst + 1
would be sechalf(dst), trashing half your results. Since we don't use
that variant, we don't care and can just enable SIMD16 everywhere.
The documentation also still claims that source modifiers and
conditional modifiers aren't supported, but empirical evidence and
study of the simulator both show that they work just fine.
Goodbye workarounds. Math just works now.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
It seems line loop stipple in hardware needs something I don't know, it might
need a proper geometry shader who knows.
Signed-off-by: Dave Airlie <airlied@redhat.com>
SPI semantic indices for PS/VS are now static, so we don't
need to update spi config for every shaders combination. We can move
the functionality of r600_spi_update to r600(evergreen)_pipe_shader_ps.
Flatshade state is now controlled by the global FLAT_SHADE_ENA flag
instead of updating FLAT_SHADE for all inputs.
Sprite coord still requires the update of spi setup when
sprite_coord_enable is first changed from zero (enabled), and then
only when it's changed to other non-zero value (enabled for other input).
Change to zero (disabling) and back to the same value is handled via
global SPRITE_COORD_ENA.
New field "sprite_coord_enable" added to "struct r600_pipe_shader"
to track current state for the pixel shader. It's checked in the
r600_update_derived_state.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
There is no need to duplicate semantic mapping which is done in hw, so get
rid of r600_find_vs_semantic_index.
TGSI name/sid pair is mapped to the 8-bit semantic index for SPI.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
gbm_gallium does not depend on DRI, but its build rules depend on DRI_LIB_DEPS
being set. Output an error when the user enables gbm_gallium but disables
DRI. This is just a workaround.
If the VS has outputs that aren't consumed by the FS we were mapping
them all to one unused VS output index, but that's illegal. Instead,
map unused VS outputs to unique indexes.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
The code expects the geometry shader to be NULL.
We don't have geometry shaders now, but it's good to be prepared.
v2: check for support in the cso context
SPI semantic indices for PS/VS are now static, so we don't
need to update spi config for every shaders combination. We can move
the functionality of r600_spi_update to r600(evergreen)_pipe_shader_ps.
Flatshade state is now controlled by the global FLAT_SHADE_ENA flag
instead of updating FLAT_SHADE for all inputs.
Sprite coord still requires the update of spi setup when
sprite_coord_enable is first changed from zero (enabled), and then
only when it's changed to other non-zero value (enabled for other input).
Change to zero (disabling) and back to the same value is handled via
global SPRITE_COORD_ENA.
New field "sprite_coord_enable" added to "struct r600_pipe_shader"
to track current state for the pixel shader. It's checked in the
r600_update_derived_state.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
There is no need to duplicate semantic mapping which is done in hw, so get
rid of r600_find_vs_semantic_index.
TGSI name/sid pair is mapped to the 8-bit semantic index for SPI.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
The readpixels microbenchmark in mesa-demos goes from 47Mpix/sec at
1000x1000 to 450Mpix/sec. The 10x10 sizes stay about the same.
Reviewed-by: Brian Paul <brianp@vmware.com>
Renderbuffer mapping handles flushing the batchbuffer if required, so
all we need to do is make sure any pending rendering has reached the
batchbuffer.
Reviewed-by: Brian Paul <brianp@vmware.com>
There doesn't appear to be any particular reason for this -- it's not
like the width is changing between the deref and the use.
Reviewed-by: Brian Paul <brianp@vmware.com>
In all of piglit, only two tests hit it (reading to RGBA float, where
GetRow would drop floats into place from R, RG, or RGB). Mostly this
is because _ColorReadClamp has been causing transferOps to always be
set, skipping any fast-paths anyway.
Reviewed-by: Brian Paul <brianp@vmware.com>
This may be a bit slower than before because we're switching from
per-format compiled loops in GetRow to
_mesa_unpack_rgba_block_unpack's loop around a callback to unpack a
pixel. The solution there would be to make _mesa_unpack_rgba_block
fold the span loop into the format handlers.
(On the other hand, function call overhead will hardly matter if
MapRenderbuffer means the driver gets the data into cacheable memory
instead of uncached).
The adjust_colors code should no longer be required, since the unpack
function does the 565 to float conversion in a single pass instead of
converting it (poorly) through 8888 as apparently happened in the
past.
Reviewed-by: Brian Paul <brianp@vmware.com>
This should be useful in making more generic fast paths in the pixel
paths.
v2: Add note about PACK_SWAP_BYTES, and fix up for endianness by
synchronizing with memcpy_texture paths in texstore.c.
Reviewed-by: Brian Paul <brianp@vmware.com>
This introduces two new span helper functions we'll want to use in
several places as we move to MapRenderbuffer, which pull out integer
depth and stencil values from a renderbuffer mapping based on the
renderbuffer format.
v2: Use format_unpack helper for stencil read.
v3: Clean up comment after conversion to format_unpack.
Reviewed-by: Brian Paul <brianp@vmware.com>
This also makes it handle 24/8 vs 8/24, fixing piglit
depthstencil-default_fb-readpixels-24_8 on i965. While here, avoid
incorrectly fast-pathing if packing->SwapBytes is set.
v2: Move the unpack code to format_unpack.c, fix BUFFER_DEPTH typo
v3: Fix signed/unsigned comparison.
Reviewed-by: Brian Paul <brianp@vmware.com>
This avoids going through the wrapper that has to rewrite the data for
packed depth/stencil. This isn't done in _swrast_read_stencil_span
because we don't want to map/unmap for each span.
v2: Move the unpack code to format_unpack.c.
v3: Fix signed/unsigned comparison.
Reviewed-by: Brian Paul <brianp@vmware.com>
i965's MUL instruction can't take an immediate value as its first
argument. So normally, if constant propagation wants to propagate a
constant into the first argument of a MUL instruction, it swaps the
order of the two arguments.
This doesn't work for 32-bit integer (and unsigned integer)
multiplies, because the MUL operation is asymmetric in that case (it
multiplies 16 bits of one operand by 32 bits of the other).
Fixes piglit tests {vs,fs}-multiply-const-{ivec4,uvec4}.
Reviewed-by: Eric Anholt <eric@anholt.net>
If we're drawing sprites and the fragment shader needs both auto-
generated texcoords and user-defined varying vars we need to use
this fallback path.
The reason is when we enable auto texcoord generation, it gets
enabled for all texcoord sets. And that clobbers the user-defined
varying vars.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
If we use the draw-module for wide point/line/etc drawing we'll need
a fragment shader too (like we pass in the vertex shader).
This fixes sprite point rendering when forcing the swtnl path.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
The state tracker may generate shaders that use generic vs outputs /
fs inputs like:
DCL IN[0], GENERIC[0]
DCL IN[1], GENERIC[10]
DCL IN[2], GENERIC[11]
This patch remaps 0, 10, 11 to small integers like 1, 2, 3 so that we
stay inside the SVGA3D limit (8).
The remapping is done to both the vertex shader outputs and the
fragment shader inputs. The same mapping must be used for a vs/fs
pair.
Note that 'union svga_compile_key' is now 'struct svga_compile_key'
because we needed to add the register remapping table. The change in
size isn't really significant though (it's not a search key).
Also, add assertions when building up SVGA3D src/dst registers to we
don't try to store too large of value for the bitfield size.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Missed this back in the arb_robustness branch
<6b329b9274b18c50f4177eef7ee087d50ebc1525>.
NOTE: This is a candidate for the 7.11 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The returned value should be a texture target index, not a bit.
I spotted this from seeing a new compiler warning caused by the increase
in the number of texture targets. This has been broken for a long time.
Note: This is a candidate for the 7.11 branch.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This requires tracking a couple extra fields in ir_variable:
* A flag to indicate that a variable had an initializer.
* For non-const variables, a field to track the constant value of the
variable's initializer.
For variables non-constant initalizers, ir_variable::has_initializer
will be true, but ir_variable::constant_initializer will be NULL. The
linker can use the values of these fields to check adherence to the
GLSL 4.20 rules for shared global variables:
"If a shared global has multiple initializers, the initializers
must all be constant expressions, and they must all have the same
value. Otherwise, a link error will result. (A shared global
having only one initializer does not require that initializer to
be a constant expression.)"
Previous to 4.20 the GLSL spec simply said that initializers must have
the same value. In this case of non-constant initializers, this was
impossible to determine. As a result, no vendor actually implemented
that behavior. The 4.20 behavior matches the behavior of NVIDIA's
shipping implementations.
NOTE: This is candidate for the 7.11 branch. This patch also needs
the preceding patch "glsl: Refactor generate_ARB_draw_buffers_variables
to use add_builtin_constant"
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=34687
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Paul Berry <stereotype441@gmail.com>
Incrementing "td" before initializing it is
pointless and just leads to an uninitialized
variable warning with MSVC.
Signed-off-by: Christian König <deathsimple@vodafone.de>
This is an OpenGL ES specific extension. External textures are textures that
may be sampled from, but not be updated (no glTexSubImage* and etc.). The
image data are taken from an EGLImage.
Reviewed-by: Brian Paul <brianp@vmware.com>
Acked-by: Jakob Bornecrantz <jakob@vmware.com>
GL_TEXTURE_RECTANGLE_NV (and soon GL_TEXTURE_EXTERNAL_OES) is special. Handle
it in its own if-block. There should be no functional change.
Reviewed-by: Brian Paul <brianp@vmware.com>
Acked-by: Jakob Bornecrantz <jakob@vmware.com>
This extension introduces a new sampler type: samplerExternalOES.
texture2D (and texture2DProj) can be used to do a texture look up in an
external texture.
Reviewed-by: Brian Paul <brianp@vmware.com>
Acked-by: Jakob Bornecrantz <jakob@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
3-bit fields are used store texture target in several places. That will fail
when TEXTURE_EXTERNAL_INDEX, which happends to be the 9th texture target, is
added. Make them 4-bit fields.
Reviewed-by: Brian Paul <brianp@vmware.com>
Acked-by: Jakob Bornecrantz <jakob@vmware.com>
remove another long if condition test. I don't feel a strong need of
this patch. But for it make the code a little simpler(I do think so),
I send it out.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
glRenderbufferStorage man page says:
GL_INVALID_VALUE is generated if either of width or height is negative,
or greater than the value of GL_MAX_RENDERBUFFER_SIZE.
NOTE: this is a candidate for the 7.11 branch
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
EXT_framebuffer_object bspec says:
Get Value Type Get Command Initial Value
------------------------------- ------ ----------- -----------
RENDERBUFFER_INTERNAL_FORMAT_EXT Z+ GetRenderbufferParameterivEXT RGBA
NOTE: this is a candidate for the 7.11 branch
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The ARB_texture_swizzle spec says:
The error INVALID_OPERATION is generated if TexParameteri,
TexParameterf, TexParameteriv, or TexParameterfv, parameter <pname>
is TEXTURE_SWIZZLE_R, TEXTURE_SWIZZLE_G, TEXTURE_SWIZZLE_B,
or TEXTURE_SWIZZLE_A, and <param> is not RED, GREEN, BLUE, ALPHA,
ZERO, or ONE.
The error INVALID_OPERATION is generated if TexParameteriv, or
TexParameterfv, parameter <pname> TEXTURE_SWIZZLE_RGBA, and the four
consecutive values pointed to by <param> are not all RED, GREEN, BLUE,
ALPHA, ZERO, or ONE.
So, the GL_TEXTURE_SWIZZLE* pname is legal for glTexParameterf(v)
NOTE: this is a candidate for the 7.11 branch
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
When a vertex shader input attribute is declared with an integral type
(e.g. ivec4), we need to ensure that the generated vertex shader code
addresses the vertex attribute register using the proper register
type. (Previously, we assumed all vertex shader input attributes were
floating-point).
In addition, when uploading vertex data that was specified with
VertexAttribIPointer, we need to instruct the vertex fetch unit to
convert the data to signed or unsigned int, rather than float. And
when filling in the implied w=1 on a vector with less than 4
components, we need to fill it in with the integer representation of 1
rather than the floating-point representation of 1.
Fixes piglit tests vs-attrib-{ivec4,uvec4}-precision.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch ensures that gl_client_array::Integer is properly set to
GL_TRUE for vertex attributes specified using glVertexAttribIPointer,
and to GL_FALSE for vertex attributes specified using
glVertexAttribPointer, so that the vertex attributes can be
interpreted properly by driver back-ends.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
When converting an expression like "++x" to GLSL IR we were failing to
account for the possibility that x might be an unsigned integral type.
As a result the user would receive a bogus error message "Could not
implicitly convert operands to arithmetic operator".
Fixes piglit tests {vs,fs}-{increment,decrement}-uint.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
TODO: check if GetImage works with passing the pitch as width, similar to PutImage,
which avoids the extra copy, ala dri_sw_displaytarget_display() in src/gallium/winsys/sw/dri/dri_sw_winsys.c
This is a cleanup of commit 02f1b50987.
Update tex buffer using a dri_drawable hook from implemented in sw/drisw.c.
This saves us the duplication of dri_drawable.c.
CC: Stuart Abercrombie <sabercrombie@chromium.org>
CC: Stéphane Marchesin <marcheu@chromium.org>
Certain exports (position, point size, etc.) are treated
specially by the shader and not counted as generic exports.
Note the exports and any relevant related state bits.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This fixes issues with the code playing fast and loose with types of
buffers, and as a bonus avoids the wrappers that were previously used
to pull bits out of packed depth/stencil buffers.
Reviewed-by: Brian Paul <brianp@vmware.com>
Some of the return values were u32, some were 24 bits, and z16
returned 16 bits. The caller would have to do all the work of
interpreting the format all over again. However, there are no callers
of this function at this point.
Reviewed-by: Brian Paul <brianp@vmware.com>
Perhaps the easiest implementation, nouveau can directly map buffers
even if tiled, and uses separate surfaces for its texture
renderbuffers so we don't have to worry about that offset.
Reviewed-by: Brian Paul <brianp@vmware.com>
Unlike intel, we do a blit to/from GTT memory in order to
untile/retile the renderbuffer data, since we don't have fence
registers for accessing it.
(There is software tiling code in radeon_tile.c, but it's unused and
doesn't support macro tiling)
Reviewed-by: Brian Paul <brianp@vmware.com>
v2: Add separate stencil S8 W-tile swizzling/deswizzling. Tested for
the swizzling case with env INTEL_SEPARATE_STENCIL=1 INTEL_HIZ=1
./bin/hiz-depth-stencil-test-fbo-d24-s8
v3: Apply Chad's fix for S8 window system buffers.
Reviewed-by: Chad Versace <chad@chad-versace.us>
Mesa core's is generic for things like osmesa.
For swrast_dri.so, we have to do Y flipping. The front-buffer path
isn't actually tested, though, because both before and after it fails
with a BadMatch in XGetImage.
Reviewed-by: Brian Paul <brianp@vmware.com>
This reverts commit abaebcee78.
The assertion I made was that "the zero-copy code in validation" would
zero copy. Of course, I deleted that check back in January because
the two sites that would trigger it (glTexImage() and this one) both
immediately bound their mt to the object, making the other check
pointless.
Removes two extra blits in glx-tfp. Also fixed the Android home
screen, which wasn't rendering because the extra copy broke the
relationship between the texture and the eglimage.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=42152
Tested-by: Chad Versace <chad@chad-versace.us>
Tested-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
This also fixes the build error due to missing link_uniforms.cpp in the source
lists.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Chad Versace <chad@chad-versace.us>
[olv: the missing link_uniforms.cpp was added before this patch is committed]
With the hope that Android.mk and SConscript can share the file to reduce
future breakage.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Chad Versace <chad@chad-versace.us>
MinGW uses MSVC's runtime DLLs for most of C runtime's functions, and
there has same semantics for vsnprintf.
Not sure how this worked until now -- maybe one of the internal
vsnprintf implementations was taking precedence.
Previously, the vertex and fragment shader back-ends assumed that all
varyings were floats. In GLSL 1.30 this is no longer true--they can
also be of integral types provided that they have an interpolation
qualifier of "flat".
This required two changes in each back-end: assigning the correct type
to the register that holds the varying value during shader execution,
and assigning the correct type to the register that ties the varying
value to the rest of the graphics pipeline (the message register in
the case of VS, and the payload register in the case of FS).
Fixes piglit tests fs-int-interpolation and fs-uint-interpolation.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
This function is similar to get_base_type(), but when called on
arrays, it returns the scalar type composing the array. For example,
glsl_type(vec4[]) => float_type.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
i965 graphics hardware has two floating point modes: ALT and IEEE. In
ALT mode, floating-point operations never generate infinities or NaNs,
and MOV instructions translate infinities and NaNs to finite values.
In IEEE mode, infinities and NaNs behave as specified in the IEEE 754
spec.
Previously, we used ALT mode for all vertex and fragment programs,
whether they were GLSL programs or ARB programs. The GLSL spec is
sufficiently vague about how infs and nans are to be handled that it
was unclear whether this mode was compliant with the GLSL 1.30 spec or
not, and it made it very difficult to test the isinf() and isnan()
functions.
This patch changes i965 GLSL programs to use IEEE floating-point mode,
which is clearly compliant with GLSL 1.30's inf/nan requirements. In
addition to making the Piglit isinf and isnan tests pass, this paves
the way for future support of the ARB_shader_precision extension.
Unfortunately we still have to use ALT floating-point mode when
executing ARB programs, because those programs require 0^0 == 1, and
i965 hardware generates 0^0 == NaN in IEEE mode.
Fixes piglit tests "isinf-and-isnan fs_fbo", "isinf-and-isnan vs_fbo",
and {fs,vs}-{isinf,isnan}-{vec2,vec3,vec4}.
The implementations are as follows:
isinf(x) = (abs(x) == +infinity)
isnan(x) = (x != x)
Note: the latter formula is not necessarily obvious. It works because
NaN is the only floating point number that does not equal itself.
Fixes piglit tests "isinf-and-isnan fs_basic" and "isinf-and-isnan
vs_basic".
This patch adds the extension '.ir' to all the files in
src/glsl/builtins/ir/, and changes generate_builtins.py so that it no
longer globs on '*' to find the files to build. This prevents
spurious files (such as EMACS' infamous *~ backup files) from breaking
the build.
The implementation of ir_binop_nequal in constant_expression_value()
appears to have been copy-and-pasted from the implementation of
ir_binop_equal, but with all instances of '==' changed to '!='. This
is correct except for one minor flaw: one of those '==' operators was
in an assertion checking that the types of the two arguments were
equal. That one needs to stay an '=='.
Fixes piglit tests {fs,vs}-inline-notequal.
svga keeps a small queue of similar primitive draws in order to coalesce
them into a single draw primitive command.
But the buffers referred in primitives not yet emitted were being ignored
in the considerations to flush or not the context.
This fixes piglit vbo-map-remap, vbo-subdata-sync, vbo-subdata-zero, and
Seeker.
Based on investigation and patch from Brian Paul.
Reviewed-By: Brian Paul <brianp@vmware.com>
Forgot to destroy the pipe context on xa context destroy.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
pb_debug_manager_dump was trying to take a lock already
held by all callers.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Jos Fonseca <jfonseca@vmware.com>
This drops all the old drmSupports* checks since KMS does them all, and it
also drop R300_CLASS and R600_CLASS.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
It caught one possible bug I recall in my time working on the driver,
and we haven't been setting it for non-fixed-function since the new FS
backend came along. The bug it caught was likely a confusion about
sampler mappings, which we have tests for these days.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Paul Berry <stereotype441@gmail.com>
This was the last prepare() function, and it's the first state atom,
so it must be ready to move.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Paul Berry <stereotype441@gmail.com>
I don't really want to touch this impenetrable code in this series, so
just call the one function from the other, since no other atom cares
about them.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Paul Berry <stereotype441@gmail.com>
While other units need to know about our constant buffer offsets,
nothing else cared about which particular BO other than the emit() half.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Paul Berry <stereotype441@gmail.com>
This rearranges the code a bit, and makes the upload of the binding
table take only as many surfaces as there are in use.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Paul Berry <stereotype441@gmail.com>
These produce BRW_NEW_SURFACES (used by binding table emit()) and
BRW_NEW_NR_WM_SURFACES (used by WM unit emit()). Fixes a bug where
with no texturing and no color buffer, we wouldn't consider the null
renderbuffer in nr_surfaces. This was harmless because nr_surfaces is
only used for the prefetch info in the unit state.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Paul Berry <stereotype441@gmail.com>
These produce BRW_NEW_SURFACES (used by binding table emit()) and
BRW_NEW_NR_WM_SURFACES (used by WM unit emit()). Fixes a bug where
with no texturing and no color buffer, we wouldn't consider the null
renderbuffer in nr_surfaces. This was harmless because nr_surfaces is
only used for the prefetch info in the unit state.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Paul Berry <stereotype441@gmail.com>
This is part of a series trying to eliminate the separate prepare()
hook in state upload. The prepare() hook existed to support the
check_aperture in between calculating state updates and setting up the
batch, but there should be no reason for that any more.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Paul Berry <stereotype441@gmail.com>
As we move state to emit() time from prepare() time, a couple of the
places that flag fallbacks will move here.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Paul Berry <stereotype441@gmail.com>
We were doing the BO validate step in prepare() (brw_validate_state())
hooks of atoms so that we could check_aperture before emitting the
relocation trees during brw_upload_state() that would actually make
the batchbuffer reference too much memory to be executed. Now that
all relocations occur in the batchbuffer, we can instead
check_aperture after emitting our state into the batchbuffer, and
easily roll back, flush, and retry if we happened to go over the
limits.
This will let us remove the whole prepare() vs emit() split in our
state atoms, which is a source of tricky dependencies and duplicated
code.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Paul Berry <stereotype441@gmail.com>
This will be used to avoid the prepare() step in the i965 driver's
state setup. Instead, we can just speculatively emit the primitive
into the batchbuffer, then check if the batch is too big, rollback and
flush, and replay the primitive.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Paul Berry <stereotype441@gmail.com>
This could have broken always_flush_cache on i965, since
reserved_space doesn't reflect the size of the workaround flushes, and
we might run out of space. This should make always_flush_cache more
useful on pre-i965, anyway (since the point is to flush around each
draw call, even within a batchbuffer).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Paul Berry <stereotype441@gmail.com>
Replace pipe->flush() with pipe->texture_barrier() in
the texture upload path for the staging texture.
This should be enough to get data out of the gpu
caches ready to be read for texture fetch.
It's not useful for anything.
The rest of the patch is just a cleanup resulting
from some of the variables being no longer used.
There are no piglit regressions.
With the recent changes to interpolation stuff, we can now get the value
direct from the program instead of just being fail.
fixes some of the glsl-1.30 interpolation tests with softpipe
Signed-off-by: Dave Airlie <airlied@redhat.com>
DRI2 supports this now - and already enables it explicitly - but drisw
does not and should not. Otherwise toolkits like clutter will only ever
SwapBuffers once and wait forever for an event that's not coming.
Signed-off-by: Adam Jackson <ajax@redhat.com>
Previously check_resources could fail, but we'd still try to optimize
the shader, do device-specific code generation, etc. In some cases,
this could explode (especially in the device-specific code
generation). I haven't found that I could trigger this with the
current code. When too many samplers were used with the new uniform
handling code, I observed several crashes deep down in the driver.
NOTE: This is candidate for the 7.11 branch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=41609
Cc: Eric Anholt <eric@anholt.net>
Reviewed-and-tested-by: Kenneth Graunke <kenneth@whitecape.org>
Previously a shader like
int X;
struct X { int i; };
void main() { gl_Position = vec4(0.0); }
would generate two error message:
0:2(19): error: struct `X' previously defined
0:2(20): error: incomplete declaration
The first one is the real error, and the second is spurious.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Other parts of the code already caught things like 'float x[4][2]'.
However, nothing caught 'float [4] x[2]'.
Fixes piglit test array-multidimensional-new-syntax.vert.
NOTE: This is candidate for the 7.11 branch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The idea here is to set up the message header with the Sampler State
pointer which the hardware provides as part of the PS Thread Payload in
register g0.
Unfortunately, the existing code
fs_reg(GRF, 0, BRW_REGISTER_TYPE_UD))
actually references "virtual GRF 0" rather than the hardware g0. This
is just some arbitrary GRF temporary which will get register allocated.
So, we ended up setting up the header with garbage.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Fixes the 1000000.0 overflow cases of piglit
GL_EXT_packed_float/pack.c
Reviewed-by: Marek Ol ák <maraeo@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
From the GL_EXT_packed_float spec:
For an RGBA color, if <type> is not one of FLOAT,
UNSIGNED_INT_5_9_9_9_REV_EXT, or UNSIGNED_INT_10F_11F_11F_REV_EXT,
or if the CLAMP_READ_COLOR_ARB is TRUE, or CLAMP_READ_COLOR_ARB
is FIXED_ONLY_ARB and the selected color (or texture) buffer is
a fixed-point buffer, each component is first clamped to [0,1].
Then the appropriate conversion formula from table 4.7 is applied
the component."
(but we previously resolved that the CLAMP_READ_COLOR bit is not
relevant to glGetTexImage())
This fixes most of the cases in piglit GL_EXT_packed_float/pack.
Reviewed-by: Marek Ol ák <maraeo@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This is only used in the code for packing to INF, and resulted in an
extra bit set that was set anyway, so it was harmless except for the
confusion caused.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
From page 22 (28 of PDF) of GLSL 1.30 spec:
It is an error to provide a literal integer whose magnitude is too
large to store in a variable of matching signed or unsigned type.
Unsigned integers have exactly 32 bits of precision. Signed integers
use 32 bits, including a sign bit, in two's complement form.
Fixes piglit int-literal-too-large-0[123].frag.
v2: Take care with INT_MIN, use stroull, and make it a function.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
There's no sense in building a broken driver. Previously, there was
the potential of building a DRI1-only driver that would work for DRI1
and fail on DRI2 because the newer libdrm code wasn't present. Now
the radeon build system should be matching intel and nouveau.
This can probably be reduced even further by moving this logic to the
scissor state update or just removing the logic entirely, but I don't
trust myself in radeon quite that much.
It's past time, and it was going to get in the way of the renderbuffer
mapping refactor. We dropped all the other DRI1 drivers for this
release, and I can't imagine anybody supporting DRI1 radeon classic in
a new release of Mesa.
Diff produced by treating kernel_mm as true, deleting the DRI1 paths
that produce kernel_mm false, and deleting code.
It's past time, and it was going to get in the way of the renderbuffer
mapping refactor. We dropped all the other DRI1 drivers for this
release, and I can't imagine anybody supporting DRI1 radeon classic in
a new release of Mesa.
Cleanup of the resulting dead code to follow.
Acked-by: Alex Deucher <alexander.deucher@amd.com>
These are effectively doing type->get_base_type()->base_type, which is
equivalent to type->base_type. Just use that, as it's simpler.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
And not all existing queries. The only reason we have that list is to be able
to suspend and resume the active ones.
This reduces looping over queries when suspending and resuming.
The queries no longer have to track some of their states.
We weren't setting TEX_SEM_WAIT on instructions that read the value of a
TEX instruction and also wrote the same register as the TEX instruction.
This is the sequence we were miscompiling:
1: TEX temp[0], input[2].xy__, 2D[0]
...
16: src0.xyz = temp[22], src1.xyz = temp[0], src2.xyz = temp[19]
MAD temp[0].xyz, src0.xxx, src1.xyz, src2.xxx
https://bugs.freedesktop.org/show_bug.cgi?id=42090
This required the following changes:
- WM setup now makes the appropriate set of barycentric coordinates
(perspective vs. noperspective) available to the fragment shader,
based on whether the shader requires perspective interpolation,
noperspective interpolation, both, or neither.
- The fragment shader backend now uses the appropriate set of
barycentric coordiantes when interpolating, based on the
interpolation mode returned by
ir_variable::determine_interpolation_mode().
- SF setup now uses gl_fragment_program::InterpQualifier to determine
which attributes are to be flat shaded (as opposed to the old logic,
which only flat shaded colors).
- CLIP setup now ensures that the clipper outputs non-perspective
barycentric coordinates when they are needed by the fragment shader.
Fixes the remaining piglit tests of interpolation qualifiers that were
failing:
- interpolation-flat-*-smooth-none
- interpolation-flat-other-flat-none
- interpolation-noperspective-*
- interpolation-smooth-gl_*Color-flat-*
Reviewed-by: Eric Anholt <eric@anholt.net>
The name was misleading. The actual effect of the bit is to cause
the clipper to emit *non-perspective* barycentric coordinate
information (which is only needed when doing noperspective
interpolation).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This patch changes how fs_visitor::emit_general_interpolation()
decides what kind of interpolation to do. Previously, it used the
shade model to determine how to interpolate colors, and used smooth
interpolation on everything else. Now it uses
ir_variable::determine_interpolation_mode(), so that it respects GLSL
1.30 interpolation qualifiers.
Fixes piglit tests interpolation-flat-*-smooth-{distance,fixed,vertex}
and interpolation-flat-other-flat-{distance,fixed,vertex}.
Reviewed-by: Eric Anholt <eric@anholt.net>
This patch modifies the fragment shader back-end so that instead of
using a single delta_x/delta_y register pair to store barycentric
coordinates, it uses an array of such register pairs, one for each
possible intepolation mode.
When setting up the WM, we intstruct it to only provide the
barycentric coordinates that are actually needed by the fragment
shader--that is computed by brw_compute_barycentric_interp_modes().
Currently this function returns just
BRW_WM_PERSPECTIVE_PIXEL_BARYCENTRIC, because this is the only
interpolation mode we support. However, that will change in a later
patch.
Reviewed-by: Eric Anholt <eric@anholt.net>
This patch modifies the special case in
fs_visitor::split_virtual_grfs() that prevents splitting from being
applied to the delta_x/delta_y register pair (this register pair needs
to remain contiguous so that it can be used by the PLN instruction).
When gen>=6, this register pair is in a fixed location, not a virtual
register, so it was in no danger of being split. And
split_virtual_grfs' attempt not to split it was preventing some other
unrelated register from being split.
Reviewed-by: Eric Anholt <eric@anholt.net>
This function determines how a variable should be interpolated based
both on interpolation qualifiers and the current shade model.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, we treated the 'smooth' qualifier as equivalent to no
qualifier at all. However, this is incorrect for the built-in color
variables (gl_FrontColor, gl_BackColor, gl_FrontSecondaryColor, and
gl_BackSecondaryColor). For those variables, if there is no qualifier
at all, interpolation should be flat if the shade model is GL_FLAT,
and smooth if the shade model is GL_SMOOTH.
To make this possible, I added a new value to the
glsl_interp_qualifier enum, INTERP_QUALIFIER_NONE.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This patch makes GLSL interpolation qualifiers visible to drivers via
the array InterpQualifier[] in gl_fragment_program, so that they can
easily be used by driver back-ends to select the correct interpolation
mode.
Previous to this patch, the GLSL compiler was using the enum
ir_variable_interpolation to represent interpolation types. Rather
than make a duplicate enum in core mesa to represent the same thing, I
moved the enum into mtypes.h and renamed it to be more consistent with
the other enums defined there.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Without this it's possible to wind up in a draw call with the
glBegin/End VBO still in a mapped state. This is a problem for
the SVGA3D driver and probably not good for other HW drivers.
Now that texture borders are gone, we never need to allocate our
textures through non-miptrees, which simplifies some irritating paths.
v2: Remove the !mt support case from intel_map_texture_image()
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
Reviewed-by: Brian Paul <brianp@vmware.com>
This replaces software rendering of textures with the deprecated
1-pixel border (which is always bad, since mipmapping is rather broken
in swrast, and GLSL 1.30 is unsupported) with hardware rendering that
just pretends there was never a border (so you have potential seams on
apps that actually intentionally used the 1-pixel borders, but correct
rendering otherwise).
This doesn't regress any piglit tests on gen6 (since the texwrap
border/bordercolor cases already failed due to broken border color
handling), but regresses texwrap border cases on original gen4 since
those end up sampling the border color instead of the border pixels.
It's a small price to pay for not thinking about texture borders any
more.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
We wanted to reuse this in the Intel driver.
v2: Move the flag to ctx->Const
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
Reviewed-by: Brian Paul <brianp@vmware.com>
The intel driver (and gallium, it looks like, though it doesn't use
these texstore functions at this point) doesn't bother making storage
for textures with 0 width, height, or depth. This avoids them having
to deal with returning a mapping for that nonexistent data.
Fixes assertion failures with an upcoming intel driver change.
Reviewed-by: Brian Paul <brianp@vmware.com>
This can be useful if you want to create a bunch of temporary strings
with a common prefix. For example, when iterating over uniform
structure fields, one might want to create temporary strings like
"pallete.primary", "palette.outline", and "pallette.shadow".
This could be done by overwriting the '.' with a null-byte and calling
ralloc_asprintf_append, but that incurs the cost of strlen("pallete")
every time...when this is already known.
These new functions allow you rewrite the tail of the string, given a
starting index. If the starting index is the length of the string, this
is equivalent to appending.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Consider the following vertex shader and fragment shader:
// vertex shader
varying vec4 v;
uniform vec4 u;
void main() { gl_Position = vec4(0.0); v = u; }
// fragment shader
void main() { gl_FragColor = vec4(0.0); }
Since the fragment shader does not use 'v', it is demoted from a
varying to a simple global variable. Once that happens, the
assignment to 'v' is useless, and it should be removed. In addition,
'u' is no longer active, and it should also be removed.
Performing extra dead code elimination after demoting shader inputs
and outputs takes care of this. This elimination must occur before
assigning uniform locations, or the declaration of 'u' cannot be
removed.
This change *breaks* the piglit test getuniform-01, but that test is
already incorrect. The test uses a vertex shader that assigns to a
user-defined varying, but it has no fragment shader. Since Mesa does
not support ARB_separate_shader_objects (we only support the EXT
version), the linker correctly eliminates the user-defined varying.
The cascading effect is that the uniform queried by the C code of the
test is also (correctly) eliminated.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=41980
Tested-by: Brian Paul <brianp@vmware.com>
Cc: Bryan Cain <bryancain3@gmail.com>
Cc: Vinson Lee <vlee@vmware.com>
Cc: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
These should be useful for doing transform feedback on Sandybridge.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
These are correct to the best of my knowledge, gleaned from a variety of
internal sources. Sadly, the Sandybridge PRM has incorrect limits.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The inconsistency between vs_max_threads and max_vs_entries was rather
annoying. I could never seem to remember which one was reversed, which
made it harder to find quickly. "Max __ Threads" seems more natural.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
According to the docs for 3DSTATE_PS (Gen7+) and 3DSTATE_WM (Gen6),
there is a platform dependent value for the minimum number of pixel
shader threads. It may also vary based on whether WIZ Hashing is on.
For example, Ivybridge requires at least 4 threads if WIZ hashing is
disabled, and 8 if it's enabled. Programming it to use less threads is
illegal. Sandybridge appears to have similar restrictions.
So on newer platforms, INTEL_DEBUG=sing will probably just hang the GPU.
Rather than try to patch it up for newer platforms and extend it to
support geometry shaders, just remove it as it isn't that useful anyway.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
For GL_RGB_SCALE and GL_ALPHA_SCALE targets, the API wrapper code
attempts to ensure the parameter is 1.0, 2.0, or 4.0.
This is unnecessary: set_combiner_scale in texenv.c (called by
_mesa_TexEnvfv) already checks this and raises an appropriate error.
It's also incorrect: For glTexEnvx, the API validation code directly
compares the GLfixed input parameter with a floating point constant,
prior to converting fixed-point to floating point.
Fixes an issue in the OpenGL ES 1.1 conformance suite.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Kill the code paths taken when src_mt is null. It is never null, otherwise
there would be a segfault on line 4 of this function:
GLuint width = src_mt->level[level].width;
(Some interleaved lines in the diff make the real diff non-obvious. All
I did was delete some code and then left-shifted what remained to correct
the indentation.)
Reviewed-by: Eric Anholt <eric@aholt.net>
Signed-off-by: Chad Versace <chad@chad-versace.us>
A driver trying to set up builtin uniforms is faced with a problem:
How do I walk the ir_variable structure (representing an array of
structs, or array of matrices, or struct, or whatever), and set up
driver structures so that dereference of that uniform gets the
corresponding ParameterValues[] entry. The rule in general is that
each corresponding vector-sized field of an array of structs is one
builtin uniform state slot. i965 relied on another invariant: each
state slot has a number of unique channel swizzles corresponding to
the number of elements in the field's vector, to avoid needing to walk
the glsl_type in parallel to get at vector_elements.
All of the builtin uniforms followed this behavior, except for
gl_NormalMatrix. That's a mat3 (so 3 vec3s), but it was swizzled as 3
vec4s.
Fixes piglit glsl-fs-normalmatrix.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
This is the new name for gl_MaxVaryingFloats now that non-float
varyings exist. Fixes piglit
glsl-1.30/execution/maximums/gl_MaxVaryingFloats
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
In commit 3e5d3626, Eric added a homebrew workaround to fix GPU hangs in
the Mesa "engine" demo and oglc's api-texcoord test.
Unfortunately, his PIPE_CONTROL contains a Depth Stall, which
necessitates the post-sync non-zero workaround,
Fixes GPU hangs in Civilization 4, PlaneShift, and 3DMMES.
Hopefully Heroes of Newerth as well, though I haven't tested that.
NOTE: This is candidate for the 7.11 branch.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=40324
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=41096
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-and-tested-by: Eric Anholt <eric@anholt.net>
I had a colleague hitting issues compiling with an old gcc3.2
system. These patches got them through.
NOTE: This is a candidate for the 7.11 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
Use VRAM for static and immutable buffers. This restores the
recently removed r600g winsys behaviour for memory locations.
This also improoves rendering times on the gpu for some
OpenSceneGraph based test cases by about 15%.
Signed-off-by: Marek Olšák <maraeo@gmail.com>
Make sure we do not run into the classic ABA problem on buffer object bind,
reusing this name and may be never rebind since we get an new name
that was just deleted and never rebound in between.
The explicit rebinding to the debault object in the current context
prevents the above in the current context, but another context
sharing the same objects might suffer from this problem.
Minor var renaming and comments edited by Brian.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Signed-off-by: Brian Paul <brianp@vmware.com>
Buffer objects may be shared across contexts.
Rework the array attrib push/pop implementation
to be thread safe. Make use of more library functions
for this purpose.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
In the past, swrast_texture_image::Data has been overloaded. It could
either point to malloc'd memory storing texture data, or it could point
to a current mapping of GPU memory.
Now, Buffer always points to malloc'd memory (if we're not using GPU
memory) and Data always points to mapped memory. The next step would
be to rename Data -> Map.
This change also involves adding swrast functions for mapping textures
and renderbuffers prior to rendering to setup the Data pointer. Plus,
corresponding functions to unmap texures and renderbuffers. This is
very much like similar code in the dri drivers.
Only swrast and the drivers that fall back to swrast need these fields now.
This removes the last of the fields related to software rendering from
gl_texture_image.
In lp_build_stencil_op() the incoming 'stencil' var is a 2-element array.
There's a front-face writemask and a back-face writemask but we're ignoring
the later. This patch doesn't fix anything but at least points out the
problem.
v2: Avoid the C99 rounding functions, because I don't trust
get/setting the C99 rounding mode from inside our library not having
other side effects. Instead, open-code roundEven() behavior around
Mesa's IROUND, which we're already testing for C99 rounding mode
safety.
Fixes glsl-1.30/compiler/built-in-functions/round*
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
- Fix _GNUC__ typo in both checks
- Fix logic error in check for gcc < 3.4 that breaks for gcc 2.x & older
Without this fix, builds with gcc 3.4.x end up depending on undefined
_mesa_bitcount instead of gcc's __builtin_popcount.
NOTE: This is a candidate for the stable branches.
Signed-off-by: Alan Coopersmith <alan.coopersmith@oracle.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Should fall back to shader based decoding (g3dvl) for now.
This is probably broken on systems that support xvmc, because
nouveau_video_buffer_create has no way to know for what api
the buffer is created, so I think this call might need a
separate argument as workaround.
Signed-off-by: Maarten Lankhorst <m.b.lankhorst@gmail.com>
The phase instance counts are not necessarily redeclared so with
the separation of declarations and instructions we wouldn't know
which instance count applies to which phase.
Correct linkage requires examining the signature itself, it cannot
be reconstructed from declarations only since unused registers may
have been omitted from them.
We don't want to clutter the code or handicap new hardware for
the sake of ancient GPUs on which d3d1x won't ever be used,
much less be fully compliant, anyway.
We were mis-computing the size of the user-space vertex buffer in
some circumstances. This led to a failed assertion at u_inlines.h:222
when using the VMware svga driver.
For example, if we had arrays such as:
array[0]: element_offset = 12, stride = 24
array[1]: element_offset = 0, stride = 24
We'd mistakenly compute 'bytes' to be 12 bytes too small.
I've reorganized the function too. By time it's called, we know that
we've got interleaved arrays either all in one VBO or all in user memory
and the stride is equal for all arrays.
Move the code that lived inside the attr==0 test after the loop.
In the loop we compute the true vertex size. That size factors into the
pipe->redefine_user_buffer() call later. Using the vertex size instead
of array[0]'s element_offset fixes the failed assertion.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Setting MaxIfDepth to UINT_MAX effectively means "don't lower anything."
Explicitly checking for this common case allows us to avoid walking the
IR, computing nesting levels, and so on.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Bryan Cain <bryancain3@gmail.com>
Commit 488fe51cf8 converted the EmitNoIfs
flag to MaxIfDepth, an unsigned integer saying "flatten if-statements
nested beyond this depth."
Unfortunately, i965 left this initialized to 0, which made ir_to_mesa
attempt to flatten all if-statements. We didn't notice right away
because we usually throw away ir_to_mesa's code in favor of the native
VS and FS backends...but this still creates a lot of unnecessary work.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Now that this is identical to gen6_wm_constants, just use that instead.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This makes it match gen6_prepare_wm_push_constants. For some reason, it
had been using AUB_TRACE_NO_TYPE.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
We definitely want CACHE_NEW_WM_PROG, not CACHE_NEW_VS_PROG.
NOTE: This is a candidate for the 7.11 branch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The DDX may allocate a buffer with a too small size.
Instead of failing, let's pretend everything's alright.
Such bugs should be fixed in the DDX, of course.
NOTE: This is a candidate for the stable branches.
The condmod instruction ends up generating garbage condition codes,
because apparently the comparison happens on the accumulator value (33
bits for UD), not the truncated value that would be written.
Fixes vs-op-neg-*
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The condmod instruction ends up generating garbage condition codes,
because apparently the comparison happens on the accumulator value (33
bits for UD), not the truncated value that would be written.
Fixes fs-op-neg-*
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
When there is no ARB_vertex_program program enabled, the Current
pointer points at a default program, so we were always using
VERTEX_PROGRAM_TWO_SIDE, even for fixed function lighting.
Fixes piglit two-sided-lighting*
Reviewed-by: Brian Paul <brianp@vmware.com>
From the GL 2.1 specification, page 114 (page 128 of the PDF):
"The version of PixelStore that takes a floating-point value
may be used to set any type of parameter; if the parameter is
boolean, then it is set to FALSE if the passed value is 0.0
and TRUE otherwise, while if the parameter is an integer, then
the passed value is rounded to the nearest integer."
Fixes piglit roundmode-pixelstore.
Note: This is a candidate for the 7.11 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
Simply generate GL_INVALID_OPERATION error at display list mode. As
explained by Brian, we are going to access PBO data at compile time.
No need to defer the error at execution time.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Using IROUND() to convert a float depth value to a 32-bit uint Z value.
didn't work (it returns a signed value). Just use a cast instead
Fixes piglit fbo-depth-array failure with swrast.
Note: this is a candidate for the 7.11 branch.
This code isn't really relevant since the kernel takes care not
to destroy busy GMR buffers.
Also with the advent of fence objects, the code was incorrect since
it didn't refcount fence handles.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
Wrap _mesa_unpack_bitmap to handle the case that data is stored in pixel
buffer object.
This would make calling Bitmap with data stored in PBO by display list work.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
v2: quote the spec; explicitly exclude the GL_BITMAP case to make code
more readable. (comments from Ian)
v3: Cast the offset by GLintptr to remove the compile warning(comments
from Brian).
I also found that I should use _mesa_sizeof_packed_type() instead,
as it includes packed pixel type, like GL_UNSIGNED_SHORT_5_6_5.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The patch(based on the reading of the emulator) came from while I was
trying to fix the oglc pbo texImage.1PBODefaults fail. This case
generates a texture with the width and height equal to window's width
and height respectively, then try to texture it on the whole window.
So, it's exactly one texel for one pixel. And, the min filter and mag
filter are GL_LINEAR. It runs with swrast OK, as expected. But it failed
with i965 driver.
Well, you can't tell the difference from the screen, as the error is
quite tiny. From my digging, it seems that there are some tiny error
happened while getting tex address. This will break the one texel for
one pixel rule in this case. Thus the linear result is taken, with tiny
error.
This patch would fix all oglc pbo subcase fail with the same issue on
both ILK, SNB and IVB.
v2: comments from Ian, make the address_round filed assignment consistent.
(the sampler is alread memset to 0 by the xxx_update_samper_state
caller, so need to assign 0 first)
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Generate the program parameters list by walking the IR instead of by
walking the list of linked uniforms. This simplifies the code quite a
bit, and is probably a bit more correct. The list of linked uniforms
should really only be used by the GL API to interact with the
application.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: Bryan Cain <bryancain3@gmail.com>
Cc: Eric Anholt <eric@anholt.net>
Having a few of these includes or forward declarations inside the
'extern "C"' block can cause problems later. Specifically, it
prevents C++ linkage functions from being added to ir_to_mesa.h and
makes G++ angry if 'struct foo' is seen both inside and outside an
'extern "C"'.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fold _mesa_get_active_uniform into its only caller in the process.
More changes are coming soon.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This simplificiation was enabled by the earlier refactors that
eliminated the references to the assembly shaders stored in the
gl_shader_program structure.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Finding this bit in the documentation proved challenging. It wasn't in
the SEND instruction's message descriptor section, nor the data port
message descriptor section. It turns out to be part of the Render
Target Write message's control bits, and in the documentation is named
"Last Render Target Select".
Shaders that use Multiple Render Targets should set this bit on the last
RT write, but not on any prior ones.
The GPU does update the Pixel Scoreboard appropriately, but doesn't
document this bit as directly causing a scoreboard clear.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
After printing the details of a specific message, we always print out
the message length and response length with nice "mlen" and "rlen"
labels.
For Gen5+ URB writes, we were dumping mlen and rlen a second time:
urb 0 urb_write interleave used complete mlen 5, rlen 0 mlen 5 rlen 0
Also, for Gen6 data port messages, we were including mlen and rlen in
the tuple of undecipherable integers.
Both of these are completely redundant. So, remove them.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Every brw_set_???_message function had duplicated code, per-generation,
to set the Message Descriptor and Extended Message Descriptor bits
(SFID, message length, response length, header present, end of thread).
However, these fields are actually specified as part of the SEND
instruction itself; individual types of messages don't even specify
them (except for header present, but that's in the same bit location).
Since these are exactly the same regardless of the message type, just
create a function to set them, using the generic message structs. This
not only shortens the code, but hides a lot of the per-generation
complexity (like the SFID being in destreg__conditionalmod) in one spot.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The existing code asserted that eot == 0, as it doesn't make sense for
a thread to sample a texture as the last thing it does.
It doesn't make much sense to pass around a dead parameter either.
Especially for a function which already has a long parameter list.
So, remove the parameter and just set EOT to 0.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
When reading the data port code, it was not clear to me what these
values meant, nor where I could find them in the documentation.
Especially since the latest BSpec and older PRMs document them in
radically different places...neither of which are near the descriptions
of individual messages.
Cite the documentation, and rename them to SFID to signify that these
are Shared Function IDs that one can read about in the GPU overview,
rather than arbitrary bitfields. While we're add it, make them an enum.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Currently, we use the Render Cache for scratch access (read/write data)
and the Sampler Cache for all read only data (pull constants).
Reversing the condition here is clearer: if the caller requested the
Render Cache, use that. Otherwise, they requested the Data Cache
(which does not exist on Gen6) or Sampler Cache, so use the Sampler
Cache.
This should not change behavior in any way.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Using the constant cache for reads isn't going to work for scratch
reads (variably-indexed arrays or register spills), as these aren't
constant at all.
Also, in the new VS backend, use the proper message number for OWord
Dual Block Write messages. It's now 10, instead of 9.
+205 piglits.
NOTE: This is a candidate for the 7.11 branch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
While reviewing some compiler cleanups I'd sent out, Paul noticed that
tree grafting wasn't taking "out" parameters into account.
Further investigation revealed that it isn't strictly necessary: ir_call
ends basic blocks, and tree grafting currently only operates on basic
blocks. So calls already kill grafts.
However, just to be safe, this patch makes "out" parameters explicitly
kill grafts. Paul and I both prefer this. It's a bit clearer.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
The 'mode' param is a bitset of GL_MAP_READ_BIT, GL_MAP_WRITE_BIT.
A future commit will perform buffer resolves in intel_region_map(). So,
even though the access mode is irrelevant to the GTT, the extra
information allows us to intelligently avoid unneccessary buffer resolves.
Signed-off-by: Chad Versace <chad@chad-versace.us>
Add the following to the vtbl:
hiz_resolve_depthbuffer
hiz_resolve_hizbuffer
For all drivers for which HiZ is not enabled, the methods are set to be
no-ops. If HiZ is enabled, the methods are currently to set to empty
stubs.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad@chad-versace.us>
intel_context::gen field is set by intelInitContext(). So, by calling
intelInitContext() before initializing the vtable, we can can construct
different vtables for different gens.
Specifically, this allows us to set the HiZ operations to be no-ops for
contexts for which HiZ is not enabled.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad@chad-versace.us>
During anholt's MapTextureImage refactoring, the call to
intel_tex_image_s8z24_create_renderbuffers was missplaced. It needs to
occur *after* the miptree is allocated.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Don't dereference the color buffer if one isn't attached.
This fixes the following Piglit tests in my experimental HiZ branch:
glean/logicOp
glean/paths
Note: This is a candidate for the stable branches.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad@chad-versace.us>
This is necessary because i965 will need to call vbo_bind_array() when
cleaning up after a buffer resolve meta-op.
Detailed Explanation
--------------------
The vbo module tracks vertex attributes separately from the gl_context.
Specifically, the vbo module maintins vertex attributes in
vbo_exec_context::array::inputs, which is synchronized with
gl_context::Array::ArrayObj::VertexAttrib by vbo_bind_array().
vbo_draw_arrays() calls vbo_bind_array() to perform the synchronization
before calling the real draw call, vbo_context::draw_arrays.
Intel hardware accomplishes buffer resolves with a meta-op. Frequently,
that meta-op must be performed within glDraw* in the moment immediately
before the draw occurs (The hardware designers hate us...). After
performing the meta-op, but before calling vbo_bind_array(), the
gl_context's vertex attributes will have been restored to their original
state (that is, their state before the meta-op began), but the vbo
module's vertex attribute are those used in the last meta-op. Therefore we
must manually synchronize the two with vbo_bind_array() before continuing
with the original draw command (that is, the one requested with glDraw*).
See brw_predraw_resolve_buffers(), which will be added in a future commit.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Chad Versace <chad@chad-versace.us>
This hook allows the driver to prepare for a glBegin/glEnd.
i965 will use the hook to avoid avoid recursive calls to FLUSH_VERTICES
during a buffer resolve meta-op.
Detailed Justification
----------------------
When vertices are queued during a glBegin/glEnd block, those vertices must
of course be drawn before any rendering state changes. To enusure this,
Mesa calls FLUSH_VERTICES as a prehook to such state changes. Therefore,
FLUSH_VERTICES itself cannot change rendering state without falling into
a recursive trap.
This precludes meta-ops, namely i965 buffer resolves, from occuring while
any vertices are queued. To avoid that situation, i965 must satisfy the
following condition: that it queues no vertex if a buffer needs resolving.
To satisfy this, i965 will use the PrepareExecBegin hook to resolve all
buffers on entering a glBegin/glEnd block.
--------
v2: Don't add dd_function_table::CleanupExecEnd. Anholt and I discovered
that hook to be unnecessary.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Chad Versace <chad@chad-versace.us>
In some cases, Intel hardware requires that depth and stencil buffers be
separate. To accommodate swrast, i965 resorts to hackery that causes
a segfault in the fastpaths of draw_depth_stencil_pixels() and
read_depth_stencil_pixels().
The hack is that i965 sets framebuffer->Attachment[BUFFER_DEPTH].Renderbuffer
and framebuffer->Attachment[BUFFER_STENCIL].Renderbuffer to a dummy
renderbuffer for which the GetRow accessors and friends are null. The real
buffers are located at framebuffer->_DepthBuffer and framebuffer->_Stencilbuffer.
To fix the segault, this patch skips the fastpath if
framebuffer->Attachment[BUFFER_DEPTH].Renderbuffer->GetRow is null.
Note: This is a candidate for the 7.11 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Chad Versace <chad@chad-versace.us>
When i965 uses (in the near future) meta-ops to perform buffer resolves,
the meta-op stack exceeds depth 2. I bumped it to 8 because... 8 is bigger
than 2, but not too big.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Chad Versace <chad@chad-versace.us>
If this flag is set, then _mesa_meta_begin/end will save/restore the state of
GL_SELECT and GL_FEEDBACK render modes.
Intel's future buffer resolve meta-ops will require this, since buffer resolves
may occur when the GL_RENDER_MODE is GL_SELECT.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Chad Versace <chad@chad-versace.us>
This is required in order for meta-ops to save/restore the GL_RENDER_MODE
state.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Chad Versace <chad@chad-versace.us>
I initially produced the patch using this bash command:
for file in {intel,i915,i965}/*.{c,cpp,h}; do [ ! -h $file ] && sed -i
's/GLboolean/bool/g' $file && sed -i 's/GL_TRUE/true/g' $file && sed -i
's/GL_FALSE/false/g' $file; done
Then I manually added #include <stdbool.h> to fix compilation errors,
and converted a few functions back to GLboolean that were used in core
Mesa's function pointer table to avoid "incompatible pointer" warnings.
Finally, I cleaned up some whitespace issues introduced by the change.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Chad Versace <chad@chad-versace.us>
Acked-by: Paul Berry <stereotype441@gmail.com>
It was previously under gpu_shader4, but I'm pretty sure everyone's
going to be doing GLSL 1.30 first (since gpu_shader4 is basically 1.30
plus a bunch of extra stuff).
Fixes piglit glsl-1.30/texel-offset-limits.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When saving the active program in _mesa_meta_begin, it was actually
saving the fragment program instead. This means that if the
application binds a program that only has a vertex shader then when
the meta saved state is restored it will forget the bound program.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=41969
Reviewed-by: Chad Versace <chad@chad-versace.us>
This is a step towards providing a direct route for drivers accepting
GLSL IR for codegen. Perhaps more importantly, it runs the fixed
function fragment program through the GLSL IR optimization. Having
seen how easy it is to make ugly fixed function texenv code that can
do unnecessary work, this may improve real applicatinos.
On converting fixed function programs to generate GLSL, the linker
became cranky that we were trying to make something that wasn't a
linked vertex+fragment program. Given that the Mesa GLES2 drivers
also support desktop GL with EXT_sso, just telling the linker to shut
up seems like the easiest solution.
As pointed out by Michel Dänzer, gcc -lstdc++ doesn't work on all systems,
because it may require other libraries which are only pulled in implicitly
by g++. And libstdc++ is available only with GNU compiler.
Use c++ compiler for linking and remove redundant LDFLAGS += -lstdc++
all over the tree.
In addition to setting up the flags correctly, this renames the
generated libraries to ensure they get 'Mangled' in the name.
This is very useful for distros and the like, where mangled Mesa
and non-mangled GL libraries typically need to be installed
side-by-side.
Reviewed-by: Dan Nicholson <dbn.lists@gmail.com>
Scalar instruction that need to write to the xyz components of a
register must reserve the RGB instruction slot for a REPL_ALPHA
instruction. With this commit, the scheduler will attempt to free
the RGB slot by moving the write to the w component of a register.
Introuduce a simple function called copy_data to do the image data copy
stuff for all the save_CompressedTex*Image function. The function check
the NULL data case to avoid some potential segfault. This also would
make the code a bit simpler and less redundance.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Fix is in {read,draw}_depth_stencil_pixels(). If depthRb == stencilRb,
then it is redundant to check depthRb->x *and* stencilRb->x.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Chad Versace <chad@chad-versace.us>
For glReadPixels, the user supplied pixels have format
GL_UNSIGNED_INT_24_8. But, when the depthstencil buffer's format was
MESA_FORMAT_S8_Z24, the fastpath read from the buffer without reordering
the depth and stencil bits. To fix this, this patch just skips the
fastpath when the format is not MESA_FORMAT_Z24_S8.
The problem and fix for glWritePixels is analagous.
Fixes the Piglit tests below on i965/gen6 and causes no regressions.
general/depthstencil-default_fb-drawpixels-24_8
general/depthstencil-default_fb-readpixels-24_8
EXT_packed_depth_stencil/fbo-depthstencil-GL_DEPTH24_STENCIL8-drawpixels-24_8
EXT_packed_depth_stencil/fbo-depthstencil-GL_DEPTH24_STENCIL8-readpixels-24_8
Note: This is a candidate for the stable branches.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Chad Versace <chad@chad-versace.us>
This is required for an accurate implementation of d3d1x's
CheckFormatSupport query.
It also seems generally useful for state trackers, which could
choose alternative rendering paths or formats if blending would
come at a significant performance loss.
The texture semaphore allows for prefetching of texture data. On my
RV515, this increases the FPS of Lightsmark by 33% (This is with the
reg_rename pass enabled, which is enabled in the next commit).
There is a new env variable now called RADEON_TEX_GROUP, which allows
you to specify the maximum number of texture lookups to do at once.
The default is 8, but different values could produce better results
for various application / card combinations.
We no longer emit full instructions immediately after they have been
merged. Instead merged instructions are added to the ready list and
the scheduler can commit them whenever it wants.
This is supported by the pseudo-code on pages 27 and 28 (pages 41 and
42 of the PDF) of the OpenGL 2.1 spec. The last part of the
implementation of ArrayElement is:
if (generic attribute array 0 enabled) {
if (generic vertex attribute 0 array normalization flag is set, and
type is not FLOAT or DOUBLE)
VertexAttrib[size]N[type]v(0, generic vertex attribute 0 array element i);
else
VertexAttrib[size][type]v(0, generic vertex attribute 0 array element i);
} else if (vertex array enabled) {
Vertex[size][type]v(vertex array element i);
}
Page 23 (page 37 of the PDF) of the same spec says:
"Setting generic vertex attribute zero specifies a vertex; the
four vertex coordinates are taken from the values of attribute
zero. A Vertex2, Vertex3, or Vertex4 command is completely
equivalent to the corresponding VertexAttrib* command with an
index of zero."
Fixes piglit test attribute0.
NOTE: This is a candidate for stable branches.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Don't allow any "CPU" buffers to be allocated by the pb_fenced
buffer manager, since we can't protect against failures during
buffer validation.
Also, add an extra slab buffer manager to allocate buffers from
the kernel if there is a failure to allocate from our big buffer pool.
The reason we use a slab manager for this, is to avoid allocating
many very small buffers from the kernel.
v2: Increased VMW_MAX_BUFFER_SIZE and fixed some comments.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
Returns a configuration that makes the dri state-tracker-manager
throttle.
Also disable kernel-based throttling.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
Hooks up throttling if there is a configuration function present and
it indicates that throttling is desired.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
Adds a possibility for the state tracker manager to query the
target for a specific configuration.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
But don't hook it up just yet until we figure out a good way to do that.
Also, we should, in the future, add driconf options to control what
throttling reasons should be honored, and the number of outstanding
swaps allowed.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
The X server has limited throttle support on the server side,
but doing this in the client has some benefits:
1) X server throttling is per client. Client side throttling can be done
per drawable.
2) It's easier to control the throttling based on what client is run,
for example using "driconf".
3) X server throttling requires drm swap complete events.
So implement a dri2 throttling extension intended to be used by direct
rendering clients.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
Reviewed-by: Michel Dänzer <michel@daenzer.net>
This change releases the stw_framebuffer::mutex past creation of
the pbuffer stw_framebuffer. Without this change the pbuffers
lock is never released. Since on win32 mutexes are recursive, this
does not hurt as long as all actions on a context are done from
the same thread. But if, for example, context creation happens in
a different thread than usage, every access to the context will
block for ever.
Signed-off-by: José Fonseca <jfonseca@vmware.com>
In commit 018ea68d87, when I
de-compacted clip planes on Gen6+, I updated both the old and new VS
back-ends to reflect the change in how clip planes are stored, but I
failed to change the code in gen6_vs_state.c that uploads clip plane
constants when using the old VS back-end.
As a result, if the set of enabled clip planes wasn't contiguous
starting with 0, then clipping would not occur properly. This patch
corrects gen6_vs_state.c to upload clip plane constants in the new
de-compacted form.
This only affects the old VS back-end (which is used for
fixed-function and ARB vertex programs, not for GLSL vertex shaders).
Fixes Piglit test fixed-clip-enables.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=41603
Reviewed-by: Eric Anholt <eric@anholt.net>
This fixes a bug where we'd wind up emitting an invalid instruction like
MOVE R[0]., R[1]; - note the empty/zero writemask. If we don't write to
any dest register channels, cull the instruction.
v2: simply change/fix the existing test for instruction culling.
Instead of the renderbuffer pointer. In the future, attaching a texture
may not mean the renderbuffer pointer gets set too.
Plus, remove some commented-out assertions.
v2: add a 'reading' parameter to distinguish between reading and writing
to the renderbuffer (we don't want to check if _ColorReadBuffer is null
when we're about to draw). Eric found this mistake.
These functions were only called in framebuffer.c where they were defined.
Remove the unneeded attIndex parameter too.
Reviewed-by: Eric Anholt <eric@anholt.net>
What I would prefer to assert is that, for each region that is currently
mapped, no batch is emitted that uses that region's bo. However, it's much
easier to implement this big hammer.
Observe that this requires that the batch flush in intel_region_map() be
moved to within the map_refcount guard.
v2: Add comments (borrowed from anholt's reply) explaining why the
assertion is a good idea.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad@chad-versace.us>
When updating a register reference to reflect the fact that we were
taking its absolute value, the fragment shader back-end failed to
clear the negate flag, resulting in abs(-x) getting computed as
-abs(x).
I also found (and fixed) a similar problem in brw_eu.h, but I'm not
aware of an actual manifestation of that problem.
Fixes piglit test glsl-fs-abs-neg-with-intermediate.
brw_set_compression_control took a GLboolean as an argument, then
promptly used a switch statement to compare it with various enumeration
values. Clearly it's not actually a boolean.
Introduce a new enumeration type, enum brw_compression, and use that.
Found by converting GLboolean to bool; clang then gave warnings about
switching on a boolean and ultimately duplicated case errors.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Chad Versace <chad@chad-versace.us>
Neither OES_framebuffer_object nor EXT_framebuffer_object allow
querying the window system FBO.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Previously GL_DEPTH_BUFFER and GL_STENCIL_BUFFER were (incorrectly)
allowed for both. Those enums don't even really exist! Now GL_DEPTH
and GL_STENCIL are only allowed for the window system FBO.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This adds support to the clear and tile caches for integer storage
and clearing, avoiding any floating paths.
Signed-off-by: Dave Airlie <airlied@redhat.com>
these are never USCALED, always UINT in reality.
taken from some work by Christoph Bumiller
v2: fixup formatting of table + tabs
Signed-off-by: Dave Airlie <airlied@redhat.com>
Previously it was getting set in draw_set_mapped_constant_buffer() but
if there were no shader constants, that function wasn't called. So the
pt.user.planes field was null and we died when we tried to access the
clip planes in the LLVM-generated code.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=41663
Note: This is a candidate for the 7.11 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Instead of 12 use DRAW_TOTAL_CLIP_PLANES. The max number of user-defined
clip planes was increased to 8 so the total number of planes is 14.
This doesn't fix any specific bug, but clearly the old code was wrong.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
For example, GL_TRIANLGES is converted to _3DPRIM_TRILIST.
The conversion is necessary because HiZ and MSAA resolve operations emit
a 3DPRIM_RECTLIST, which cannot be conveyed by GLenum.
As a consequence, brw_gs_prog_key.primitive is also converted.
v2
----
- [anholt] Split brw_set_prim into brw/gen6 variants in previous commit,
since not much code is really shared between the two.
- [anholt] Replace switch statements with table lookups, since this is
a hot path.
Reviewed-by: Eric Anholt <eric@anho.net>
Signed-off-by: Chad Versace <chad@chad-versace.us>
The "slight optimization to avoid the GS program" in brw_set_prim() is not
used by Gen 6, since Gen 6 doesn't use a GS program. Also, Gen 6 doesn't use
reduced primitives.
Also, document that intel_context.reduced_primitive is only used for Gen < 6
Reviewed-by: Eric Anholt <eric@anho.net>
Signed-off-by: Chad Versace <chad@chad-versace.us>
now that we have integer texture types I can drop this workaround so that
copies of values is done properly (as floats would fail on some corner cases).
Signed-off-by: Dave Airlie <airlied@redhat.com>
glDeleteProgram should only be able to remove the one refcount for the
user's reference to the program from the hash table (even though that
ref does live on in the hash table until the last other ref is
removed).
Fixes piglit ARB_shader_objects/delete-repeat.
Reviewed-by: Chad Versace <chad@chad-versace.us>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
PIPE_CONTROL reported time stamp are 64 bits value incrementing every
80 ns, and only the low 32 bits are active (high 32 are always 0).
v2: Cleaned up whitespace, function arguments (anholt).
Fixes piglit EXT_timer_query/time-elapsed
Signed-off-by: Zou Nan hai <nanhai.zou@intel.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
The rest of the linker/glsl translation code checks for NULL, so I suppose we should check here too. Fixes crash on exit with i915g instanced drawing.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
If there is not enough space in pushbuffer for fence emission
(nouveau_fence_emit -> nv50_screen_fence_emit -> MARK_RING),
the pushbuffer is flushed, which through flush_notify ->
nv50_default_flush_notify -> nouveau_fence_update marks currently
emitting fence as flushed. But actual emission is done after this mark.
So later when there is a need to wait on this fence and pushbuffer
was not flushed in between, fence wait will never finish causing
application to hang.
To fix this, introduce new fence state between AVAILABLE and EMITTED,
set it before emission and handle it everywhere.
Additionally obtain fence sequence numbers after possible flush in
MARK_RING, because we want to emit fences in correct order.
Reviewed-by: Christoph Bumiller <e0425955@student.tuwien.ac.at>
Note: This is a candidate for the 7.11 branch.
We need add a new set of fragment shader variants, along with new vertex
elements for signed and unsigned clears.
The new fragment shader variants are due to the integers values requiring
CONSTANT interpolation. The new vertex element descriptions are for passing
the clear color as an unsigned or signed integer value.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds the various mesa->gallium and gallium->mesa format conversions
along with the GL->gallium texture choosers for integers.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This add support for unsigned/signed integer types via adding a 'pure' bit
in the format description table. It adds 4 new u_format get/put hooks,
for get/put uint and get/put sint so that accessors can get native access
to the integer bits. This is used to avoid precision loss via float converting
paths.
It doesn't add any float fetchers for these types at the moment, GL doesn't
require float fetching from these types and I expect we'll introduce a lot
of hidden bugs if we start allowing such conversions without an API mandating
it.
It adds all formats from EXT_texture_integer and EXT_texture_rg.
0 regressions on llvmpipe here with this.
(there is some more follow on code in my gallium-int-work branch, bringing
softpipe and mesa to a pretty integer clean state)
v2: fixup python generator to get signed->unsigned and unsigned->signed
fetches working.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This fixes up the integer format choosing to pick the closest mesa format
then the most likely fallback.
(the formatting in this file needs cleaning in another patch).
Signed-off-by: Dave Airlie <airlied@redhat.com>
This just adds a simple packing for GL_UNSIGNED_INT/GL_INT destination formats.
This is enough for at least the gallium drivers to pack both unsigned and signed types for read pixels.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Most of these functions used three spaces for the first level of
indentation, but four spaces for the next level. One used tabs and then
three spaces. Some used 3/4 in a then block but 3/3 in the else block.
Normally I try to avoid field days like this, but since the functions
were so inconsistent, even internally, it was making it difficult to
edit without introducing spurious whitespace changes.
So, just get it over with. git diff -b shows 0 lines changed.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
i915 and i830 hardware doesn't have HiZ, so remove all HiZ related
assertions from *update_draw_buffer().
I've removed the dead format checks completely rather than replace them
with more appropriate checks. This doesn't reduce "assertion coverage",
however, because when I added these HiZ related assertions in c8fdf66
there were no pre-existing checks there.
Signed-off-by: Chad Versace <chad@chad-versace.us>
Silences a warning about comparing to an unsigned variable. It looks like
the result of swizzle_for_size() is always assigned to unsigned vars.
Reviewed-by: Chad Versace <chad@chad-versace.us>
This fixes failures found with the new piglit texsubimage test.
Two things were broken:
1. The dxt code doesn't handle sources images where width != row stride.
Check for that and take the _mesa_make_temp_ubyte_image() path to get
an image where width = rowstride.
2. If we don't take the _mesa_make_temp_ubyte_image() path we need to
take the source image unpacking parameters into account in order to
get the proper starting memory address of the source texels.
Note: This is a candidate for the 7.11 branch.
Docs say that default shader input color input need to be spec
as ARGB8888. And a clear rect prim essentially uses this value
instead of default diffuse. Depth on the other hands is an ieee
32 bit float. Clear stencil is U8.
Completely different are the clear values for zone init prims.
These are speced in the actual output pixel layout (and need
to be repeated for 16 bit formats).
Clear up the confusion by adding some comments.
v2: Retain the target swizzling support added by Stephan Marchesin.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Previously, if the user enabled a non-consecutive set of clip planes
(e.g. 0, 1, and 3), the driver would compact them down to a
consecutive set starting at 0. This optimization was of dubious
value, and complicated the implementation of gl_ClipDistance.
This patch changes the driver so that with Gen6 and later chipsets, we
no longer compact the clip planes. However, we still discard any clip
planes beyond the highest number that is in use, so performance should
not be affected for applications that use clip planes consecutively
from 0.
With chipsets previous to Gen6, we still compact the clip planes,
since the pre-Gen6 clipper thread relies on this behavior.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
The only remaining uses of brw_vs_prog_key::nr_userclip only occurred
when using clip planes (as opposed to gl_ClipDistance). This patch
renames the value to nr_userclip_planes and sets it to zero when
gl_ClipDistance is in use. This avoids unnecessary VS recompiles.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, brw_compute_vue_map required an argument indicating the
number of clip planes in use, but all it did with it was check if it
was nonzero.
This patch changes brw_compute_vue_map to take a boolean instead.
This allows us to avoid some unnecessary recompilation of the Gen4/5
GS and SF threads.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Previous to this patch, setup_uniform_clipplane_values() was setting
up clip plane uniforms based on ctx->Transform.ClipPlanesEnabled, a
piece of state not stored in the vertex shader cache key. As a
result, a change to this piece of state might not trigger a necessary
vertex shader recompile.
The patch adds a field to the vertex shader cache key,
userclip_planes_enabled, to store the current value of
ctx->Transform.ClipPlanesEnabled. Also, it changes
setup_uniform_clipplane_values() to read from this new field, so that
it's manifestly clear that the vertex shader isn't depending on state
not stored in the cache key.
Note: when the vertex shader uses gl_ClipDistance, the VS backend
doesn't need to know which clip planes are in use, so we leave the
field as zero in that case to avoid unnecessary recompiles.
Fixes Piglit test vs-clip-vertex-enables.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
No functional change. This patch rearranges the struct
brw_vs_prog_key so that the two fields related to clipping are
together, and documents those fields. This should make the patches
that follow easier to comprehend, since they add additional
clipping-related fields to this structure.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
The i965 driver already had a function to count bits in a 64-bit uint
(brw_count_bits()), but it was buggy (it only counted the bottom 32
bits) and it was clumsy (it had a strange and broken fallback for
non-GCC-like compilers, which fortunately was never used). Since Mesa
already has a _mesa_bitcount() function, it seems better to just
create a _mesa_bitcount_64() function rather than special-case this in
the i965 driver.
This patch creates the new _mesa_bitcount_64() function and rewrites
all of the old brw_count_bits() calls to refer to it.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Fixes the ES1 conformance 'userclip' test, which broke when we increased
MAX_CLIP_PLANES to 8. Core Mesa already validates incoming values
against MAX_CLIP_PLANES; we just need the ES wrapper to pass everything
through.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
It's required for ES 1.0 and 1.1, and isn't specified for ES 2.
While the comment says Mesa depends on it internally, removing it from
ES2 doesn't seem to regress any Piglit or ES2 conformance tests.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
In particular, drivers don't enable this in ES 1.1 contexts.
Prior to this, none of the OpenGL ES 1.1 conformance tests passed.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Since core Mesa no longer depends on gl_texture_image::Data pointing to
mapped texture buffers we don't have to mess with it all over the place
in the state tracker. Now Data is only used to point to malloc'd memory
that holds images which don't fit in the texture object's mipmap buffer.
These were used to find the start of a 3D image slice (or 2D array texture
slice) given a base address. Instead, use a simple array of address of
image slices instead.
This is a step toward getting rid of the gl_texture_image::ImageOffsets
field.
Reviewed-by: Eric Anholt <eric@anholt.net>
This patch implements proper support for gl_ClipVertex by causing the
new VS backend to populate the clip distance VUE slots using
VERT_RESULT_CLIP_VERTEX when appropriate, and by using the
untransformed clip planes in ctx->Transform.EyeUserPlane rather than
the transformed clip planes in ctx->Transform._ClipUserPlane when a
GLSL-based vertex shader is in use.
When not using a GLSL-based vertex shader, we use
ctx->Transform._ClipUserPlane (which is what we used prior to this
patch). This ensures that clipping is still performed correctly for
fixed function and ARB vertex programs. A new function,
brw_select_clip_planes() is used to determine whether to use
_ClipUserPlane or EyeUserPlane, so that the logic for making this
decision is shared between the new and old vertex shaders.
Fixes the following Piglit tests on i965 Gen6:
- vs-clip-vertex-const-accept
- vs-clip-vertex-const-reject
- vs-clip-vertex-different-from-position
- vs-clip-vertex-equal-to-position
- vs-clip-vertex-homogeneity
- vs-clip-based-on-position
- vs-clip-based-on-position-homogeneity
- clip-plane-transformation clipvert_pos
- clip-plane-transformation pos_clipvert
- clip-plane-transformation pos
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad@chad-versace.us>
Before this patch, clip planes didn't work properly in Mesa when using
vertex shaders, because Mesa assigned both gl_ClipVertex and
gl_Position to the same gl_vert_result (VERT_RESULT_HPOS). As a
result, backends couldn't distinguish between the two variables, so
any shader that wrote different values to them would fail to work
properly.
This patch paves the way for proper support of gl_ClipVertex by
creating a new enumerated value in gl_vert_result for it
(VERT_RESULT_CLIP_VERTEX). After this patch, a back-end may add
support for gl_ClipVertex using the following algorithm:
- If using a user-supplied GLSL vertex shader:
- If the bit corresponding to VERT_RESULT_CLIP_VERTEX is set in
gl_program::OutputsWritten:
- Clip using the vertex shader output VERT_RESULT_CLIP_VERTEX and
the clip planes defined in gl_context::Transform.EyeUserPlane.
- Else:
- Clip using the vertex shader output VERT_RESULT_HPOS and the
clip planes defined in gl_context::Transform.EyeUserPlane.
- Else (either using fixed function or an ARB vertex program):
- Clip using the vertex shader output VERT_RESULT_HPOS and the clip
planes defined in gl_context::Transform._ClipUserPlane (*)
where (*) represents the normal Mesa behavior before this patch.
An example of implementing the above algorithm can be found in the
patch that follows this one, which implements gl_ClipVertex in i965
Gen6.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The previous change was not effective for lines, because there is no
4 planes 4x4 block rasterization path: it is handled by the 16x16 block
case too, and the 16x16 block was not being budged as it should.
This fixes assertion failures on line rasterization.
llvmpipe has a few special rasterization paths for triangles contained in
16x16 blocks, but it allows the 16x16 block to be aligned only to a 4x4
grid.
Some 16x16 blocks could actually intersect the tile
if the triangle is 16 pixels in one dimension but 4 in the other, causing
a buffer overflow.
The fix consists of budging the 16x16 blocks back inside the tile.
This just adds the entries to the table and fixes the asserts up.
The int32 one is definitely wrong, since it uses a float temp
which will lose precision, but its no worse than now.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is taken from reading EXT_texture_integer + EXT_texture_rg in combination,
Comments on necessity of each format, naming of formats and bugs in the
formats tables please.
Is there any formats I've missed?
Eric looked over this to make sure its consistent at least.
As I've changed the ordering of things in the format table, the follow
patches are required to avoid regression.
Signed-off-by: Dave Airlie <airlied@redhat.com>
As per Brian's suggestion we can generate this table at first start
to make sure its correct. This is a sad workaround for compilers which
don't support named initialiser. (its 2011).
Signed-off-by: Dave Airlie <airlied@redhat.com>
Commit d1fda903 (radeon: Drop mapping we were doing around
glGetTexImage()) removed the common Radeon source file
radeon_tex_getimage.c, and pulled it out of the r200, r300, r600, and
radeon makefiles. But it left behind the symlinks that were being
used to share that file among the four directories.
This patch removes the dangling symlinks.
Reviewed-by: Brian Paul <brianp@vmware.com>
We want quad/pixel Z values to be interpolated exactly the same for
multi-pass algorithms. Because of how the optimized Z-test code is
written, we can't cull the first quad in a run even if it's totally
killed. See the comment for more info.
NOTE: This is a candidate for the 7.11 branch.
Instead of relying on the mirror in the Mesa IR assembly shader, just
use the variables actually stored in the GLSL IR. This will be a bit
slower, but nobody cares about the performance of glGetActiveAttrib.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This just folds get_active_attrib into _mesa_GetActiveAttribARB
and moves the resulting function function to the other source file.
More changes are coming soon.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This currently mirrors the state tracking
gl_shader_program::Attributes, but I'm working towards eliminating
that.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This just folds bind_attrib_location into _mesa_BindAttribLocationARB
and moves the resulting function function to the other source file.
More changes are coming soon.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
hash_table_replace doesn't use get_node to avoid having to hash the key twice.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This allows querying the linked shader itself rather than the Mesa IR.
This is the first step towards removing gl_program::Attributes.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The symbol table in the linked shaders may contain references to
variables that were removed (e.g., unused uniforms). Since it may
contain junk, there is no possible valid use. Delete it and set the
pointer to NULL.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
All drivers in Mesa have supported this extension for eons. This
extension is an optional features in desktop OpenGL (via
GL_ARB_draw_buffers) and OpenGL ES 2.x (via GL_NV_draw_buffers).
The extension is not usable in OpenGL ES 1.x. There is no
glDrawBuffers* entry point in OpenGL ES 1.x contexts, and glGet*v
generate errors when MAX_DRAW_BUFFERS or DRAW_BUFFERi is queried.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This also moves ATI_draw_buffers. This is to facilitate enabling
NV_draw_buffers in OpenGL ES 2.0.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Us poor souls who cross compile mesa want to be able to specify which pkg-config to pick, or at least just change one place.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Until now, we've been treating 1D arrays as a single slice, and each
array slice is actually just a row of the 2D texture. While swrast
still stores them this way, hardware drivers think that 1D arrays have
actual separate slices not stored as contiguous rows.
Reviewed-by: Brian Paul <brianp@vmware.com>
The path for ->Data was failing to be called for the FBO draw offset
fallback, and also had mismatched compressed texture support code.
This drops the intel_prepare_render() in the blit path. We aren't
copying to/from a GL_FRONT buffer, so it doesn't matter.
Too many separate functions each called from one location (in
different files). This code should all die soon when swrast starts
using MapTextureImage.
Before, we were only allocating these from our TexImage, so if the
texture image was set up in any other way (non-accelerated
glGenerateMipmaps()), they'd be missing or wrong.
Now that we can zero-copy generate the mipmaps into brand new
glTexImage()-generated storage using MapTextureImage(), we no longer
need to allocate image->Data in mipmap generate. This requires
deleting the drivers' old overrides of the miptree tracking after
calling _mesa_generate_mipmap at the same time, or the drivers
promptly lose our newly-generated data.
Reviewed-by: Eric Anholt <eric@anholt.net>
Both POW and INT DIV need a message length of 2; previously, we only
checked for POW.
Also, BRW_MATH_FUNCTION_INT_DIV_QUOTIENT_AND_REMAINDER has a response
length of 2; previously, we only checked for SINCOS. We don't use this
message, but in case we ever decide to, we may as well fix it now.
While we're at it, just move these computations into
brw_set_math_message, since they're entirely based on the function.
This fixes it for both brw_math and the old backend's brw_math_16.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Drivers implementing GLSL 1.30 want to do integer modulus, and until we
can stop generating code via ir_to_mesa, it's easier to make it silently
generate rubbish code. Multiply will do.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Classic compiler mistake. In the example below, the OMOD optimization
was combining instructions 4 and 10, but since there was an instruction
(#8) in between them that wrote to the same registers as instruction 10,
instruction 11 was reading the wrong value.
Example of the mistake:
Before OMOD:
4: MAD temp[0].y, temp[3]._y__, const[0]._x__, const[0]._y__;
...
8: ADD temp[2].x, temp[1].x___, -temp[4].x___;
...
10: MUL temp[2].x, const[1].y___, temp[0].y___;
11: FRC temp[5].x, temp[2].x___;
After OMOD:
4: MAD temp[2].x / 8, temp[3]._y__, const[0]._x__, const[0]._y__;
...
8: ADD temp[2].x, temp[1].x___, -temp[4].x___;
...
11: FRC temp[5].x, temp[2].x___;
https://bugs.freedesktop.org/show_bug.cgi?id=41367
Source swizzles for transcendent instructions were being stored in the X
channel regardless of what channel the instruction was writing.
This was causing problems for some helper functions that were expecting
source swizzles to occupy channels corresponding to the instruction's
writemask. This commit makes transcendent instructions follow the same
convention as normal instructions for representing source swizzles.
Previous behavior:
LG2 temp[0].y, input[0].x___;
Current behavior:
LG2 temp[0].y, input[0]._x__;
From the EXT_transform_feedback spec:
Primitives can be optionally discarded before rasterization by calling
Enable and Disable with RASTERIZER_DISCARD_EXT. When enabled, primitives
are discared right before the rasterization stage, but after the optional
transform feedback stage. When disabled, primitives are passed through to
the rasterization stage to be processed normally. RASTERIZER_DISCARD_EXT
applies to the DrawPixels, CopyPixels, Bitmap, Clear and Accum commands as
well.
And the GL 3.2 spec says it applies to ClearBuffer* as well.
Reviewed-by: Brian Paul <brianp@vmware.com>
This reverts commit d631c19db4.
The commit was broken, and ended up returning false all the time
because nobody in the world binds every single possible vertex array.
On further reflection, we don't want to discount stride == 0: This
function is just used for deciding to calculate whether to compute the
bonuds on the index, and there's no sense in computing index bounds
when stride == 0.
For the separate question of "how much data do I upload for this
vertex element?", the i965 driver was fixed to upload the data.
Fixes a regression of about 2x in 3DMMES, and most importantly, makes
Hammerfight playable.
Commit d631c19db4 avoided this problem
by forcing the driver to get the min/max index, but that commit was
broken, so just fix the driver problem (confusion between "do I need
to upload any data?" and "do I need the index bounds in order to
upload any data?").
Generally we're using fragment programs in all our drivers, so wasting
4MB for code that's never called is pretty lame. Reduces i965 memory
allocation for a short shader program from 21,932,128B to 17,737,816B.
As innocuous as it seemed, ebca47a basically broke the world (e.g.,
>200 piglit regressions). In vec4_visitor::emit_block_move,
src->swizzle was expected to be BRW_SWIZZLE_NOOP before setting it to
a swizzle that would replicate the existing channels of the source
type to a vec4 (e.g., .xyyy for a vec2).
The original assertion seems to have been a little bogus. In addition
to being BRW_SWIZZLE_NOOP, src->swizzle might already be a swizzle
that would replicate the existing channels of the source type to a
vec4. In other words, it might already have the value that we're
about to assign to it.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
If GL_NV_texture_env_combine4 is not supported, setting the fourth
combiner term would generate a GL error.
Of course, I noticed this right after committing the previous patch
to use a loop in the first place. <sigh>
Note that GL_EXT_texture_env_combine is always supported so the first
three combiner terms are always accepted.
There's four combiner terms (not 3) with GL_NV_texture_env_combine4.
Use a loop to make the code a little more compact.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The drivers don't need to care about the domains. All they need to set
are the bind and usage flags. This simplifies the winsys too.
This also fixes on r600g:
- fbo-depth-GL_DEPTH_COMPONENT32F-copypixels
- fbo-depth-GL_DEPTH_COMPONENT16-copypixels
- fbo-depth-GL_DEPTH_COMPONENT24-copypixels
- fbo-depth-GL_DEPTH_COMPONENT32-copypixels
- fbo-depth-GL_DEPTH24_STENCIL8-copypixels
I can't explain it.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
I have moved 'last_flush' and 'binding' from r600_bo to winsys/radeon.
The other members are now part of r600_resource.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
We were checking whether render_condition is set. That was not reliable,
because it's always set with trace and noop regardless of driver support.
Reviewed-by: Brian Paul <brianp@vmware.com>
This removes:
- PIPE_CAP_MAX_TEXTURE_IMAGE_UNITS
- PIPE_CAP_MAX_VERTEX_TEXTURE_UNITS
in favor of the that new per-shader cap.
Reviewed-by: Brian Paul <brianp@vmware.com>
All drivers support it (well, except Cell). The boolean option is going away
from core Mesa too.
This is a follow-up to Ian Romanick's patch
"mesa: Remove ARB_texture_mirrored_repeat extension enable flag".
Reviewed-by: Brian Paul <brianp@vmware.com>
This is from a Coverity defect report.
src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
1314 void
1315 vec4_visitor::emit_block_move(dst_reg *dst, src_reg *src,
1316 const struct glsl_type *type, bool
predicated)
...
1351 /* Do we need to worry about swizzling a swizzle? */
->1352 assert(src->swizzle = BRW_SWIZZLE_NOOP);
1353 src->swizzle = swizzle_for_size(type->vector_elements);
Reported-by: Vinson Lee <vlee@vmware.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=40158
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As written, this test correctly raises an error for #elif being used
with an undefined macro (and not as an argument to "defined"). If the
preceding #if were '#if 1' then this diagnositc would correctly be
hidden. That allows code such as the following to not raise an error:
#ifndef MAYBE_UNDEFINED
#elif MAYBE_UNDEFINED < 5
...
#endif
So this test case is working as expected already. We add it here just
to improve test coverage.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Carl Worth <cworth@cworth.org>
The specification reserves any macro name containing two consecutive
underscores, (anywhere within the name). Previously, we only raised
this error for macro names that started with two underscores.
Fix the implementation to check for two underscores anywhere, and also
update the corresponding 086-reserved-macro-names test.
This also fixes the following two piglit tests:
spec/glsl-1.30/preprocessor/reserved/double-underscore-02.frag
spec/glsl-1.30/preprocessor/reserved/double-underscore-03.frag
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Carl Worth <cworth@cworth.org>
This is as simple as abstracting one existing block of code into a
function call and then adding a single call to that function for the
case of a non-function-like macro.
This fixes the recently-added 097-paste-with-non-function-macro test
as well as the following piglit tests:
spec/glsl-1.30/preprocessor/concat/concat-01.frag
spec/glsl-1.30/preprocessor/concat/concat-02.frag
Also, the concat-04.frag test now passes for the right reason. The
test is intended to fail the compilation, but before this commit it
was failing compilation (and hence passing the test) for the wrong
reason.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Carl Worth <cworth@cworth.org>
Apparently we never implemented this, (but we've got a GLSL 1.30 test
in piglit that is exercising this case).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Carl Worth <cworth@cworth.org>
There was already a loop here to look for multiple token pastes, but
it was mistakenly incrementing the iterator counter after performing
one paste.
Instead, leave the loop iterator in place to coalesce as many tokens
as necessary into one.
This fixes the recently add 096-paste-twice test as well as the
following piglit test:
spec/glsl-1.30/preprocessor/concat/concat-03.frag
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Carl Worth <cworth@cworth.org>
This is something that piglit is exercising that currently fails.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Carl Worth <cworth@cworth.org>
The GL spec says that luminance values are returned as (l, 0, 0, 1),
L/A values as (l, 0, 0, a) and intensity values as (i, 0, 0, 1).
Use the pixel transfer scale controls to implement that.
This fixes a few failures in the new piglit getteximage-formats
test when getting a compressed L or L/A image.
If color material mode is enabled, constant buffer entries related
to the material coefficients will depend on glColor. So add
_NEW_CURRENT_ATTRIB to the bitset returned for material-related
constants in _mesa_program_state_flags().
This fixes a bug exercised by the new piglit draw-arrays-colormaterial
test.
Note: This is a candidate for the 7.11 branch.
This hasn't been needed so far since none of the core Mesa code paths
that call ctx->Driver.AllocTextureImageBuffer() are used with the
state tracker. That will change in upcoming patches.
Note that this function duplicates some code seen in the st_TexImage()
function. That can be cleaned up later.
The target, level and texObj can be obtained through the texImage
parameter. We could make similar changes for the TexImage() hooks too.
Reviewed-by: Eric Anholt <eric@anholt.net>
This fixes a build error introduced with commit
"winsys/svga: Update to vmwgfx kernel module 2.1"
if both the svga driver and the xorg state tracker was enabled
at the same time.
If needed we can re-add a minimal target for basic functionality.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
When an FBO is rendering to a texture (rather than a renderbuffer),
Gallium sets up an internal renderbuffer to handle the rendering, and
copies over enough texture state to make this work.
InternalFormat was missed out, causing glTexCopyImage to take a slow
path unnecessarily.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=41263
Signed-off-by: Simon Farnsworth <simon.farnsworth@onelan.co.uk>
Signed-off-by: Brian Paul <brianp@vmware.com>
Introduces fence objecs and a size limit on query buffers.
The possibility to map the fifo from user-space is gone, and
replaced by an ioctl that reads the 3D capabilities.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jakob Bornecranz <jakob@vmware.com>
Don't store references to these on the surface but on the context.
References to transfers are still stored on the surface since we allow
only a single map of a surface at a time.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
All drivers remaining in Mesa support this extension. This extension
is either required or optional features in desktop OpenGL, OpenGL ES
1.x, and OpenGL ES 2.x.
This extension was previously not supported on mach64, mga, and savage
(Savage3D and other pre-Savage4).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
All drivers remaining in Mesa support this extension. This extension
is either required or optional features in desktop OpenGL, OpenGL ES
1.x, and OpenGL ES 2.x.
This extension was previously not supported on i810, mach64, mga,
savage, sis, and tdfx (Voodoo Banshee and Voodoo3).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
All drivers remaining in Mesa support this extension. This extension
is either required or optional features in desktop OpenGL, OpenGL ES
1.x, and OpenGL ES 2.x.
This extension was previously not supported on mach64.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
All drivers remaining in Mesa support this extension. This extension
is either required or optional features in desktop OpenGL, OpenGL ES
1.x, and OpenGL ES 2.x.
This extension was previously not supported on mach64, mga, or r128.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
All drivers remaining in Mesa support this extension. This extension
is either required or optional features in desktop OpenGL, OpenGL ES
1.x, and OpenGL ES 2.x. The existing support is already partially
broken in Mesa (e.g., querying GL_TEXTURE_ENV_MODE in OpenGL ES 2.x).
This patch does not change the situation in any way.
It looks like the only hardware supported by Mesa that cannot do
ARB_texture_env_combine is pre-NV10 NVIDA chips. It appears that
these chips cannot do the GL_SUBTRACT mode. Based on looking at older
copies of nvOpenGLspecs.pdf found on the net, NVIDIA never supported
ARB_texture_env_combine on those chips either.
This extension was previously not supported on mach64, mga (G200),
r128, savage, sis, and tdfx (Voodoo Banshee and Voodoo3).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
All drivers remaining in Mesa support this extension. This extension
is either required or optional features in desktop OpenGL, OpenGL ES
1.x, and OpenGL ES 2.x. The existing support is already partially
broken in Mesa (e.g., querying GL_TEXTURE_ENV_MODE in OpenGL ES 2.x).
This patch does not change the situation in any way.
This extension was previously not supported on mach64, mga (G200),
savage (Savage3D and other pre-Savage4), sis, and tdfx (Voodoo
Banshee).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
All drivers remaining in Mesa support this extension. This extension
is either required or optional features in desktop OpenGL, OpenGL ES
1.x, and OpenGL ES 2.x. The existing support is already partially
broken in Mesa (e.g., querying GL_CLIENT_ACTIVE_TEXTURE in OpenGL ES
2.x). This patch does not change the situation in any way.
This extension was previously not supported on i810, mga (G200), or
tdfx (Voodoo Banshee).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We can use the core Mesa code for glGetTexImage() since it handles the
image mapping/unmapping now. We'll keep the decompress_with_blit() path
in the hope that it's faster than core Mesa's software decompression code.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=41312
The formula we were previously using for asinh:
asinh x = ln(x + sqrt(x * x + 1))
is numerically unstable: when x is a large negative value, the quantity
x + sqrt(x * x + 1)
is a small positive value (on the order of 1/(2|x|)). Since the
logarithm function is very sensitive in this range, any error in the
computation of the square root manifests as a large error in the
result.
This patch changes to the equivalent formula:
asinh x = sign(x) * ln(abs(x) + sqrt(x * x + 1))
which is only slightly more expensive to compute, and is numerically
stable for all x.
Fixes piglit tests
spec/glsl-1.30/execution/built-in-functions/[fv]s-asinh-*.
Reviewed-by: Chad Versace <chad@chad-versace.us>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Fixes the glsl-1.30/compiler/built-in-functions/trunc-* tests under 1.30.
Reviewed-by: Chad Versace <chad@chad-versace.us>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Somehow we managed to get the unsigned int vectors, but not scalar.
Fixes _mesa_problem complaints in piglit's uint tests.
Reviewed-by: Chad Versace <chad@chad-versace.us>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Bitshifts are one of the rare places that GLSL allows mixed base types
without an implicit conversion occurring.
Reviewed-by: Chad Versace <chad@chad-versace.us>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
For hardware drivers, we only have ir_to_mesa called for the purposes
of potential swrast fallbacks (basically never on a 1.30 driver),
which we don't really care about. This will allow 1.30 to be
implemented without rewriting swrast for it.
Reviewed-by: Chad Versace <chad@chad-versace.us>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
On pre-GEN6 chips, the VUE slots set aside for clip distance aren't
actually used, so there is no reason for the clipper to waste time
interpolating them.
When commit 62bad54727 changed the enum
value used to represent these VUE slots, that caused the clipper to
start interpolating them as an accidental side effect. This patch
reverts to the old clipper behavior.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch corrects two errors in the computation of the psiz/flags
VUE slot on pre-GEN5 when using the new VS backend:
- The clip flags (which should be stored in the w component of the
first VUE slot) were being accidentally duplicated in all other
components of that VUE slot, causing partially clipped triangles to
sometimes disappear completely.
- The OR instruction wasn't being stored in "inst", causing the
BRW_PREDICATE_NORMAL flag to be applied to the wrong instruction.
This patch fixes regressions in clipping behavior when using shaders
on GEN4-5.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This constructor was storing its argument in the wrong field of the
"imm" enum, resulting in it being converted to a float when it should
have remained an unsigned integer. This was preventing clipping from
working properly on pre-GEN6.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In pre-GEN6, when using clip planes, both the vertex shader and the
clipper need access to the client-supplied clip planes, since the
vertex shader needs them to set the clip flags, and the clipper needs
them to determine where to insert new vertices.
With the old VS backend, we used a clever optimization to avoid
placing duplicate copies of these planes in the CURBE: we used the
same block of memory for both the clipper and vertex shader constants,
with the clip planes at the front of it, and then we instructed the
clipper to read just the initial part of this block containing the
clip planes.
This optimization was tricky, of dubious value, and not completely
working in the new VS backend, so I've removed it. Now, when using
the new VS backend, separate parts of the CURBE are used for the
clipper and the vertex shader. Note that this doesn't affect the
number of push constants available to the vertex shader, it simply
causes the CURBE to occupy a few more bytes of URB memory.
The old VS backend is unaffected. GEN6+, which does clipping entirely
in hardware, is also unaffected.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now that i965 supports 8 clip planes instead of 6, the size of the
brw_vs_compile::userplane array needs to be increased to 8. Changed
the array size to MAX_CLIP_PLANES so that if the number changes again
in the future, this array size won't be missed.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When using user-defined clipping planes, the i965 driver compacts the
array of clipping planes so that disabled clipping planes do not
appear in it--this saves precious push constant space and makes it
easier to generate the pre-GEN6 clip program. As a result, when
enabling clipping planes in GEN6+ hardware, we always enable clipping
planes 0 through n-1 (where n is the number of clipping planes
enabled), regardless of which clipping planes the user actually
requested.
However, we can't do this when using gl_ClipDistance, because it would
be prohibitively complex to compact the gl_ClipDistance array inside
the user-supplied vertex shader. So, when enabling clipping planes in
GEN6+ hardware, if gl_ClipDistance is in use, we need to pass the
user-supplied enable flags directly through to the hardware rather
than just enabling the first n planes.
Fixes Piglit test vs-clip-distance-enables.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Since the i965 driver supports 8 clipping planes now, we need 4 bits
to store the number of user clipping planes, not 3.
In theory this isn't strictly necessary, since brw_clip.h is only used
on pre-GEN6, and pre-GEN6 only advertises support for 6 clipping
planes, but it seems wise to err on the safe side.
In the process I removed the pad0 element of struct
brw_clip_prog_key--it doesn't seem necessary because the compiler
automatically inserts padding if needed.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Override the context's GLSL version if the environment variable
MESA_GLSL_VERSION_OVERRIDE is set. Valid values for
MESA_GLSL_VERSION_OVERRIDE are integers, such as "130".
MESA_GLSL_VERSION_OVERRIDE has the same behavior as INTEL_GLSL_VERSION,
except that it applies to all drivers, not just Intel's. Since the former
supercedes the latter, this patch disables the latter.
Reviewed-by: Dave Airlie <airlied@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Again, the check was needlessly specific: this works fine on Gen7.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The check was designed to forbid it on old generations (Gen5/Ironlake),
not on new ones. It just works on Gen7/Ivybridge.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The mesa core code uses MapTextureImage() like we need now.
v2: Drop mapping around _mesa_generate_mipmap for compressed, since
the whole path ends up going through MapTextureImage(), and the
meta decompression code ended up causing us to lose track of the
region that was originally mapped and assertion fail.
v2: Changes by Brian to MapTexImage in the decompression path.
v3: Changes by anholt to fix srcRowStride for decompression of NPOT.
Tested-by: Brian Paul <brianp@vmware.com> (v2)
EXT_texture_integer also specifies border color should be a color
union, the values are used according to the texture sampler format.
(update docs)
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
It is necessary to manually set the GL version to 3.0 in order to run
Piglit tests that use glGetUniform*().
This patch allows one to override the version of the OpenGL context by
setting the environment variable MESA_GL_VERSION_OVERRIDE.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad@chad-versace.us>
This is a follow-up to commit
2d686fe911, which added decoding of
GL_CLIP_DISTANCE[67] to the _mesa_set_enable() function. This patch
makes the following additional fixes:
- Uses GL_CLIP_DISTANCEi enums consistently within enable.c rather
than the deprecated GL_CLIP_PLANEi enums.
- Generates an error if the user tries to access a clip flag that is
unsupported by the hardware.
- Applies the same change to _mesa_IsEnabled(), so that querying clip
flags using glIsEnabled() works properly.
- Applies corresponding changes to get.c, so that querying clip flags
using glGet*() works properly.
Fixes piglit test clip-flag-behavior.
Reviewed-by: Brian Paul <brianp@vmware.com>
We get called for TexImage higher up, and in a relatively normal way
(pixels == NULL is common for FBO setup).
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
There's nothing in our normal texture path we need for this. We don't
PBO upload blit it. We don't need to worry about flushing because
MapTextureImage handles it. hiz scattergather doesn't apply, but MTI
handles it too.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
It's totally gratuitous -- the image's miptree will be checked for
binding to the object later, anyway, with zero-copy or blitting as
appropriate.
Tested-by: Kristian Høgsberg <krh@bitplanet.net>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
_mesa_reference_renderbuffer already short-circuits equality, and
intel_miptree_release does nothing on NULL.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This mirrors the structure Eric used in the new VS backend, and seems
simpler. In particular, the math1/math2 split will avoid having to
figure out how many operands there are, as this is already known by the
caller.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
_swrast_choose_texture_sample_func() handles null texture object pointers
and will return the "null" sampler function which returns (0,0,0,1). This
fixes a minor regression from ce82914f5a
All drivers remaining in Mesa support this extension. This extension
is required in desktop OpenGL. The existing support is already partially
broken in Mesa (e.g., using format=GL_ABGR for glTexImage2D in OpenGL ES 2.x).
This patch does not change the situation in any way.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
All drivers remaining in Mesa support this extension. This extension
is either required or optional features in desktop OpenGL, OpenGL ES
1.x, and OpenGL ES 2.x.
EXT_texture_format_BGRA8888 is mostly a subset of EXT_bgra. The only
difference seems to be that EXT_texture_format_BGRA8888 allows GL_BGRA
as an internal format to glTexImage2D and friends.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This extension is always enabled, and drivers do not have
to option to disable it.
I kept this one separate from the others because I was a little
uncertain about the changes to get.c.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Mesa has never any portion of this extension, and neither has any
other vendor.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The year 2006 apparently came from the "Last Modified Date" in the
spec header. however, the revision history at the bottom say "2/22/00
mjk - added NVIDIA Implementation Details." From that we can safely
infer that the spec is from at least 2000, and it may even be older.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The following extensions are always enabled, and drivers do not have
to option to disable them:
GL_ARB_multisample
GL_ARB_texture_compression
GL_ARB_vertex_buffer_object / GL_OES_mapbuffer
GL_EXT_copy_texture
GL_EXT_multi_draw_arrays / GL_SUN_multi_draw_arrays
GL_EXT_polygon_offset
GL_EXT_subtexture
GL_EXT_texture_edge_clamp / GL_SGIS_texture_edge_clamp
GL_EXT_vertex_array
GL_SGIS_generate_mipmap
This set was picked because the are all either required or optional
features in desktop OpenGL, OpenGL ES 1.x, and OpenGL ES 2.x. The
existing support for some is already partially broken in Mesa (e.g.,
proxy texture targets in OpenGL ES). This patch does not change the
situation in any way.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This extension is enabled by default in _mesa_init_extensions, so
drivers don't need to enable it again.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This extension is enabled by default in _mesa_init_extensions, so
drivers don't need to enable it again.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes OpenArena on Gen7. Technically, adding only the first depth stall
fixes it, but the documentation says to do all three, and the Windows
driver seems to do it.
Not observed to fix anything on Gen6 yet.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38863
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
It seems that GT1/GT2 sorts of variations are here to stay, and more
special cases will likely be required in the future. Checking by PCI ID
via the IS_xxx_GTx macros is cumbersome; introducing a new 'gt' field
analogous to intel->gen will make this easier.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Seeing as they were only used once (in the same function they were
defined), having them as context members seemed rather pointless.
Remove them entirely (rather than using local variables) since the
chipset generation checks are actually just as straightforward.
While we're at it, clean up the remainder of the if-tree that set them.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
At one point, the documentation said that max thread count in 3DSTATE_PS
was at bit offset 23, but it's actually 24 on Ivybridge. Not only did
this halve our thread count, it caused us to write 1 into a bit 23, which
is marked as MBZ (must be zero). Furthermore, it made us write an even
number into this field, which is apparently not allowed. Apparently we
were just lucky it worked.
NOTE: This is a candidate for the 7.11 branch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
- first determine the buffer range to upload for each buffer by walking over
vertex elements
- take buffer_offset into account
- take src_offset into account
- take src_format into account in more places
- don't just blindly upload (stride*count) bytes
NOTE: This is a candidate for the 7.11 branch.
It can now override both buffer offsets and strides in additions to resources.
Overriding buffer offsets was kinda hackish and could cause issues with
non-native vertex formats.
intel_image->mt might be NULL, say with border width set. It then would
trigger a segfault at intel_map/unmap_texture_image function.
This would fix the oglc misctest(basic.textureBorderIgnore) fail.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Fence emission can flush the push buffer, which through flush_notify
unreferences recently emitted fence. If ref count is increased after
fence emission, unreference deletes the fence, which causes SIGSEGV.
Backtrace:
nouveau_fence_del
nouveau_fence_ref
nouveau_fence_next
nouveau_pushbuf_flush
MARK_RING
nv50_screen_fence_emit
nouveau_fence_emit
nv50_flush
This bug manifested as an assertion failure in nouveau_fence.c, because
SIGSEGV handler tried to shutdown the application and used messed up
fence.
This issue was reported by Maxim Levitsky.
Note: This is a candidate for the 7.11 branch.
Without this we'd miss the last update in a sequence like {COLOR0, COLOR1},
{COLOR0}, {COLOR0, COLOR1}. I originally had a patch for this that called
updated_drawbuffers() when the buffer count changed, but later realized that
was wrong. The ARB_draw_buffers spec explicitly says "The draw buffer for
output colors beyond <n> is set to NONE.", and this is queryable state.
This fixes piglit arb_draw_buffers-state_change.
NOTE: This is a candidate for the 7.11 branch.
Signed-off-by: Henri Verbeet <hverbeet@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This patch fix a "Unresolved Symbols" run time error when using G3DVL
through the VDPAU state tracker, by linking the vdpau targets with librt.
Reported by Arkadiusz Miśkiewicz.
Caused by this commit :
commit e911dbb563
Author: Emeric Grange <emeric.grange@gmail.com>
Date: Mon Sep 12 23:39:33 2011 +0200
Signed-off-by: Emeric Grange <emeric.grange@gmail.com>
This should bring g3dvl back to work until we figured out
how SCALED types should really work.
Signed-off-by: Christian König <deathsimple@vodafone.de>
There is no guarantee that the tokens TGSI will persist beyond the
create_fs_state. The pipe driver (and therefore the draw module) is
responsible for making copies of the TGSI tokens when it needs them.
Reviewed-by: Brian Paul <brianp@vmware.com>
i915_miptree_layout, i945_miptree_layout, and brw_miptree_layout always
just return GL_TRUE, so there's really no point to it. Change them to
void functions and remove the (dead) error checking code.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
For some reason I thought subexpressions were chained off the top-level
one. This isn't the case, so just create a temporary context and free
it. All of this memory would be eventually freed, but now is freed
much sooner.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Very simple shaders don't actually use GLSL built-ins. For example:
- gl_Position = gl_ModelViewProjectionMatrix * gl_Vertex;
- gl_FragColor = vec4(0.0);
Both of the shaders used by _mesa_meta_glsl_Clear() also qualify.
By waiting to initialize the built-ins until the first time we need to
look for a signature, we can avoid the overhead entirely in these cases.
Makes piglit run roughly 18% faster (255 vs. 312 seconds).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, we conditionally set up the SF pipline stage with a
urb_entry_read_offset of 2 when clipping was in use, and 1 otherwise,
causing the clip distance VUE slots to be skipped if present. This
was an extremely minor savings (it saved the SF unit from reading 2
vec4s out of the URB, but it didn't affect any computation, since we
only instruct the SF unit to perform interpolation on VUE slots that
are actually used by the fragment shader).
GLSL 1.30 requires an interpolated version of gl_ClipDistance to be
available for reading in the fragment shader, so we need the SF's
urb_entry_read_offset to be 1 when the fragment shader reads from
gl_ClipDistance.
This patch just unconditionally sets the urb_entry_read_offset to 1 in
all cases; this is sufficient to make gl_ClipDistance available to the
fragment shader when it is needed, and the performance loss should be
negligible when it isn't.
Reviewed-by: Eric Anholt <eric@anholt.net>
When gl_ClipDistance is in use, the contents of the gl_ClipDistance
array just need to be copied directly into the clip distance VUE
slots, so we re-use the code that copies all other generic VUE slots
(this has been extracted to its own method). When gl_ClipDistance is
not in use, the vertex shader needs to calculate the clip distances
based on user-specified clipping planes.
This patch also removes the i965-specific enum values
BRW_VERT_RESULT_CLIP[01], since we now have generic Mesa enums that
serve the same purpose (VERT_RESULT_CLIP_DIST[01]).
Reviewed-by: Eric Anholt <eric@anholt.net>
When the vertex shader writes to gl_ClipDistance, we do clipping based
on clip distances rather than user clip planes, so don't waste push
constant space storing user clip planes that won't be used.
Reviewed-by: Eric Anholt <eric@anholt.net>
i965 requires gl_ClipDistance to be formatted as an array of 2 vec4's
(as opposed to an array of 8 floats), so enable the lowering pass that
performs this conversion.
Reviewed-by: Eric Anholt <eric@anholt.net>
In order to support 8 clip distances, we need to properly decode when
the user sets the GL_CLIP_DISTANCE6 and GL_CLIP_DISTANCE7 enable
flags.
For clarity, this patch changes the names GL_CLIP_PLANE[0-5] in the
switch statement to the equivalent names GL_CLIP_DISTANCE[0-5], since
the GL_CLIP_PLANE names are deprecated.
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Brian Paul <brianp@vmware.com>
This patch assigns enumerated values for gl_ClipDistance in the
gl_vert_result and gl_frag_attrib enums, so that driver back-ends can
assign gl_ClipDistance to the appropriate hardware registers. It also
adjusts the functions _mesa_vert_result_to_frag_attrib() and
_mesa_frag_attrib_to_vert_result() (which translate between the two
enums) to correctly translate the new enumerated values.
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Brian Paul <brianp@vmware.com>
GLSL 1.30 requires us to use gl_ClipDistance for clipping if the
vertex shader contains a static write to it, and otherwise use
user-defined clipping planes. Since the driver needs to behave
differently in these two cases, we need a flag to record whether the
shader has written to gl_ClipDistance.
The new flag is called UsesClipDistance. We initially store it in
gl_shader_program (since that is the data structure that is available
when we check to see whethe gl_ClipDistance was written to), and we
later copy it to a flag with the same name in gl_vertex_program, since
that is a more convenient place for the driver to access it (in i965,
at least).
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Brian Paul <brianp@vmware.com>
In i965 GEN6+ (and I suspect most other hardware), gl_ClipDistance
needs to be laid out as a pair of vec4's (the first containing clip
distances 0-3, and the second containing clip distances 4-7).
However, it is declared in GLSL as an array of 8 floats.
This lowering pass acts at the GLSL level, modifying the declaration
of gl_ClipDistance so that it is an array of vec4's rather than an
array of floats, and renaming it to gl_ClipDistanceMESA. In addition,
it modifies all accesses to the array so that they access the
appropiate component of one of the vec4's.
Since some hardware may not internally represent gl_ClipDistance as a
pair of vec4's, this lowering pass is optional. To enable it, set the
LowerClipDistance flag in gl_shader_compiler_options to true.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch fixes a bug in ir_hirearchical_visitor: when traversing an
exec_list representing the formal or actual parameters of a function,
it modified base_ir to point to each parameter in turn, rather than
leaving it as a pointer to the enclosing statement. This was a
problem, since base_ir is used by visitor classes to locate the
statement containing the node being visited (usually so that
additional statements can be inserted before or after it). Without
this fix, visitors might attempt to insert statements into parameter
lists.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Only the XYZ components are checked to be negative by SVGA3DOP_TEXKILL.
GL_ARB_fp requires all four components be checked. Emit a second texkill
for W if needed.
We only need to do the divide by Q step for TXP instructions.
This fixes the incorrectly rendered soft shadow test in Lightsmark.
Along with the previous texture swizzle commit, this also fixes all
the piglit glsl-fs-shadow2d-XX.shader_test failures.
This exposes the GL_EXT_texture_swizzle extension and allows the various
depth texture modes to be implemented properly. This, plus a follow-on
texture/shadow change fixes quite a few piglit GLSL shadow sampler test
failures.
Emit the SVGA3D_RS_POINTSPRITEENABLE render state.
When sprite_coord_mode=PIPE_SPRITE_COORD_LOWER_LEFT emit extra frag
shader code to invert the Y coordinate of the incoming texcoord.
Accurately describe what operations are supported when a format caps
entry is not advertised by the host, and which formats are never
supported, instead of making ad-hoc and often incorrect assumptions.
It is sometimes useful to examine the first frame or and early frame of a
quickly executing and non-repeating application, this chain introduces a new
environment variable that is checked when creating contexts. If
GALLIUM_RBUG_START_BLOCKED is set, then each context that is created is started
in a blocked state. This allows time to connect rbug before anything is
rendered in the context.
There is already comments show how to detect a null texture. Fix the
code to match the comments.
This would fix the oglc divzero(basic.texQOrWEqualsZero) and
divzero(basic.texTrivialPrim) test case fail.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Fix the constant interpolation enable bit mask for flat light mode.
FRAG_BIT_COL0 attribute bit might be 0, in which case we need to
shift one more bit right.
This would fix the oglc specularColor test fail on both Sandybridge and
Ivybridge.
v2: move the constant interp bitmask setup code into for(; attr <
FRAG_ATTRIB_MAX; attr++) loop suggested by Eric.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: Xiang, Haihao <haihao.xiang@intel.com>
Since the blit gets sequenced after other batchbuffer rendering like
normal, there's no need to push things out early.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
All that matters here is the format of the texture, not the
internalformat (which might mean various different pixel formats). In
one case, the pbo upload for MESA_FORMAT_YCBCR would have swapped the
channels for MESA_FORMAT_YCBCR_REV.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This also improves the debugging output in the failure paths so you
get more than just "failed", and don't get spammed with "failed" when
you didn't even have a PBO to try.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
There were notes about the possibility of slowdowns due to zcopy from
a PBO due to thrashing around of the region. Slowdowns are even more
likely now that textures are generally tiled, which a zcopy wouldn't
get. Additionally, there were no checks on the buffer size to ensure
that the hardware-required rounding was present, which could result in
GPU hangs on large zcopy PBOs.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This doesn't cover support for this format as a renderbuffer yet. The
spec allows implementations to not support it, though it is something
we do want to support.
Only one failure in piglit on gen6, which is texwrap with bordercolor
(as usual).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
AFAIK, there are few users of this extension and I can see a couple
reasons why this is probably broken in Mesa anyway.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The pipe_sampler_view::format field should be prefered over the resource/
texture format. The former is used to override the texture format for
sRGB decode enable/disable, etc.
Also, use new util_format_is_srgb() helper to catch all sRGB formats.
This fixes the piglit tex-srgb test for GL_EXT_texture_sRGB_decode.
This fixes a regression from a8cf4b6acf
The problem occured when two successive glDrawArrays calls accessed
subsequent elements in user-space arrays. The user-space array
from the first call wasn't being grown to accomodate the second
draw call's elements.
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
While the program won't successfully link in the end, this avoids
possible assertion failure in the driver during linking if
this->result isn't initialized with something already.
Fixes piglit:
vertex-program-two-side enabled front back front2 back2
vertex-program-two-side enabled front back
vertex-program-two-side enabled front2 back2
We now raise an GL_INVALID_ENUM in glBegin() if mode is illegal, as was
done in Yuanhan Liu's original patch.
Take geometry shaders support into account too.
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Checking if the paints are opaque in renderer_validate_blend() does not
work. We could be drawing images. Remove the check from
renderer_validate_blend() and take image drawing into consideration in
blend_use_shader().
The bug was introduced by 3f0a966807,
which affects the lookup demo.
vg_context_is_object_valid() checks if a handle is valid by checking if
the handle is a valid key of the object hash table. However, the keys
of the object hash table were object pointers.
Fix vg_context_add_object() to use the handles as the keys so that
vg_context_is_object_valid() works. This bug was introduced by
99c67f27d3.
Use _mesa_set_enable() to avoid a redudant context lookup.
Need to disable the texture target in decompress_texture_image() so the
unit isn't still enabled after glGetTexImage() returns. Arguably, the
meta restore code should do this, but it doesn't.
Reviewed-by: Eric Anholt <eric@anholt.net>
If we're generating a mipmap for an sRGB texture we need to bypass
sRGB->linear conversion. Otherwise the destination mipmap level
(drawn with a textured quad) will have the wrong colors.
If we can't turn of sRGB->linear conversion (GL_EXT_texture_sRGB_decode)
we need to use the software fallback for mipmap generation.
Note: This is a candidate for the 7.11 branch.
The 1-bit alpha channel was incorrectly encoded. Previously, any non-zero
alpha value for the ubyte alpha value would set A=1. Instead, use the
most significant bit of the ubyte alpha to determine the A bit. This is
consistent with the other channels and other OpenGL implementations.
Note: This is a candidate for the 7.11 branch.
Reviewed-by: Michel Dänzer <michel@daenzer.net>
builtin_stubs.cpp is only supposed to be used for builtin_compiler. It
contains a stub version of _mesa_glsl_initialize_functions() that does
nothing.
libglsl.a already contains builtin_function.cpp, the generated file that
contains a version of _mesa_glsl_initialize_functions() that actually
initializes all the built-in functions.
By mistakenly linking to builtin_stubs, glsl_compiler and glsl_test are
unable to compile any shaders that use built-in functions.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Since Mesa is now capable of supporting up to 8 clipping planes
instead of 6, this patch updates Gallium internals to support 8
clipping planes as well.
Reviewed-by: Brian Paul <brianp@vmware.com>
draw_pipe_clip.c contained an ifdef to ensure that its local
definition of MAX_CLIPPED_VERTICES would not take effect if the global
MAX_CLIPPED_VERTICES (defined in src/mesa/main/config.h) was already
defined. This was unnecessary because draw_pipe_clip.c doesn't
directly or indirectly include src/mesa/main/config.h. Removed the
ifdef to reduce confusion.
Reviewed-by: Brian Paul <brianp@vmware.com>
This will allow drivers to increase ctx->Const.MaxClipPlanes to 8,
which is required for GLSL-1.30 compliance.
No driver behavior should be affected. However, many data structures
use MAX_CLIP_PLANES as an array size, so these arrays will get
slightly larger.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously this value was set to MAX_CLIP_PLANES, which is defined to
be 6. But MAX_CLIP_PLANES needs to be increased to 8 to support
GLSL-1.30-compliant drivers. This patch hard-codes the default value
of ctx->Const.MaxClipPlanes to 6, so that when MAX_CLIP_PLANES is
increased, it won't affect drivers that do not support 8 clip planes.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This patch removes the assertion "MAX_CLIP_PLANES == 6" from the i965
driver. This assertion is unnecessary; nothing in the driver requires
MAX_CLIP_PLANES to be 6.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
To support GLSL 1.30, we will need to increase MAX_CLIP_PLANES to 8.
To avoid breaking drivers that do not yet support 8 clip planes, this
patch modifies the Mesa core code that pertains to clipping to use
ctx->Const.MaxClipPlanes rather than MAX_CLIP_PLANES, since
ctx->Const.MaxClipPlanes will remain 6 for drivers that only support 6
clip planes.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We generate silly code for array access, and it's easier to generally
support the cleanup than to specifically avoid the bad code in each
place we might generate it.
Removes 4.6% of instructions from 41.6% of shaders in shader-db,
particularly savage2/hon and unigine.
v2: Fixes by Ken: Make is_zero/one member functions, and fix a
progress flag.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
_NEW_WINDOW_POS wasn't a real Mesa state flag, but we were missing
_NEW_BUFFERS to update the stipple offset when FBO binding or window
size changed, and _NEW_POLYGON to update when stippling gets enabled.
Fixes oglconform's tristrip test.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Because we skip the pattern upload when stippling is disabled, we need
to check again when it might have been turned on.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Trigger GL_INVALID_ENUM error if the face paramter is not a valid value.
Trigger GL_INVALID_VALUE error if the GL_SHININESS value is out side
[0, ctx->Constant.MaxShiniess].
v2: fix the max shininess value.
v3: suggested by Brian, move the face check into glMaterialfv function
to reduce code duplicate. Also, refactor the error message.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
The null platform has no window or pixmap surface (but pbuffer surface).
And the only valid display is EGL_DEFAULT_DISPLAY. It is useful for
offscreen rendering. It works everywhere becase no window system is
required.
All of the extensions actually supported by Mesa have been remapped by
remap.c for a long time. Emitting all of these data structures is
just clutter.
Drivers that need additional functions remapped, should add
'offset="assign"' to the function definition in the .xml file.
The changes to remap_helper.h are in a follow-on ~8700 line patch that
would surely be rejected by the mailing list.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chia-I Wu <olv@lunarg.com>
Since GL_EXT_blend_logic_op is removed, _mesa_rgba_logicop_enabled(ctx)
just returns ctx->Color.ColorLogicOpEnabled. That seems kind of silly.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Since GL_EXT_blend_logic_op is removed, _LogicOpEnabled and
ColorLogicOpEnabled always have the same value.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Support is removed for four reasons:
1. The implementation was broken with respect to separate blend
equations. The GL_EXT_blend_equation_separate spec says:
"If EXT_blend_logic_op and EXT_blend_equation_separate are both
supported, the logic op blend equation should be supported separately
for RGB and alpha as with the other blend equation modes."
But Mesa's implementation of GL_LOGIC_OP specifically forbids this.
2. No hardware supported by Mesa can support separate blend equations
involving GL_LOGIC_OP.
3. No applications could be found that use this extension.
4. No other Linux OpenGL drivers support this extension.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Cc: Brian Paul <brianp@vmware.com>
The last user of this function was driInitExtensions, and that function
was removed in a previous commit.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
From the NV_conditional_render spec:
BeginQuery sets the active query object name for the query type given by
<target> to <id>. If BeginQuery is called with an <id> of zero, if the
active query object name for <target> is non-zero, if <id> is the active
query object name for any query type, or if <id> is the active query
object for condtional rendering (Section 2.X), the error INVALID OPERATION
is generated.
Fixes piglit nv_conditional_render-begin-while-active.
Reviewed-by: Brian Paul <brianp@vmware.com>
From the NV_conditional_render spec:
BeginQuery sets the active query object name for the query type given by
<target> to <id>. If BeginQuery is called with an <id> of zero, if the
active query object name for <target> is non-zero, if <id> is the active
query object name for any query type, or if <id> is the active query
object for condtional rendering (Section 2.X), the error INVALID OPERATION
is generated.
Fixes piglit nv_conditional_render-begin-zero.
Reviewed-by: Brian Paul <brianp@vmware.com>
st_glsl_to_tgsi.cpp was completely ignored by makedepend because it was
not included in ALL_SOURCES, which caused that the file was not recompiled
when certain header files were changed (like glsl/ir.h).
The first part of this commit is just a consolidation.
The second part is the fix.
When copy propagating a value into an instruction that negates its
argument, we need to invert the sense of the value's "negate" flag, so
that -(+x) becomes -x and -(-x) becomes +x.
Previously, we were always setting the value's "negate" flag to true
in this circumstance, so that both -(+x) and -(-x) turned into -x.
Fixes Piglit test vs-double-negative.shader_test.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This code was really broken before. A lot of the error checks were
done much later (too late), and some of the error checks would fail.
The underlying problem is that Mesa doesn't ever keep compressed paletted
textures in their original format. The textures are immediately
converted to some RGB or RGBA format.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=39991
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Jin Yang <jin.a.yang@intel.com>
Accroding the man page, GL_INVALID_VALUE would generated if access has any
bits set other than those valid defined bits.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
According the man page, GL_INVALID_OPERATION should generated if
glPixelZoom is executed between the execution of glBegin and the
corresponding execution of glEnd.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
According the man page, GL_INVALID_OPERATION should be generated if
glIsEnabled is executed betwwen the execution of glBegin and the
correspoding execution of glEnd.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
Fix error handling while calling glTexEnv with invalid texture
environment parameters.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
According to the man page, it should trigger a GL_INVALID_OPERATION
while calling some glGet* functions inside glBegin and glEnd.
This patch dose handle the following functions:
glGetBooleanv
glGetFloatv
glGetIntegerv
glGetInteger64v
glGetDoublev
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
According man page, trigger error when calling glEvalMesh1/2D inside
glBegin/glEnd.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
We've had a hack to fix this in Gentoo on Solaris for a while.
Signed-off-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
This moves the gallium interface for clears from using a pointer to 4 floats to a pointer to a union of float/unsigned/int values.
Notes:
1. the value is opaque.
2. only when the value is used should it be interpretered according to
the surface format it is going to be used with.
3. float clears on integer buffers and vice-versa are undefined.
v2: fixed up vega and graw, dropped hunks that shouldn't have been in
patch.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Do it during swrast state validation since the FetchTexel() functions
are only called from swrast now and not core Mesa.
Remove assertions in mipmap.c since they're no longer appropriate.
Pass an explicit surface format as we do with pipe_put_tile_rgba_format().
This fixes the piglit fbo-srgb-blit test. With GL_EXT_framebuffer_sRGB we
override the resource's format with an explicit format (linear vs. sRGB).
We need to do so both when getting and putting tiles.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=40402
Reviewed-by: Dave Airlie <airlied@redhat.com>
We could constant interpolated values now and set have_perspective
if nothing else is set to avoid a GPU hang.
Signed-off-by: Dave Airlie <airlied@redhat.com>
TGSI CONSTANT interpolation is just flat, and we just read the values
direct from the LDS into the GPR without doing any interpolation on them.
This is needed to pass integer types into the fragment shader.
Signed-off-by: Dave Airlie <airlied@redhat.com>
If we get a scaled type assume its a real integer type (as textures are).
Also fixup the blend bypass and blend clamp flags on evergreen as per the
docs.
Signed-off-by: Dave Airlie <airlied@redhat.com>
LLVM 3.0svn added SubtargetInfo as additional parameter to
createMCDisassembler() and createMCInstPrinter().
See revision 139237 of LLVM.
Signed-off-by: Tobias Droste <tdroste@gmx.de>
Signed-off-by: Brian Paul <brianp@vmware.com>
If we're drawing to a luminance, luminance/alpha or intensity surface
we have to adjust (rebase) the fragment/quad colors before writing them
to the tile cache. The tile cache always stores RGBA colors but if
we're caching a L/A surface (for example) we need to be sure that R=G=B
so that subsequent reads from the surface cache appear to return L/A
We previously had a special case for RGB (no alpha) surfaces. This
change generalizes that for the other base formats.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=40408, but sRGB
formats are still failing. That'll be addressed in a later patch.
When compiling glDrawPixels, glTexImage(), etc. and we're copying
the user's image we need to be careful about GL error checking.
Previously, we were incorrectly generating GL_OUT_OF_MEMORY in
unpack_image() if width <= 0 or height <= 0 or for invalid format/type
values. We now check those arguments in unpack_image() and return NULL
if there's a bad value. The command will get compiled with the
arguments as-is and image=NULL. Later, when the command is executed the
correct errors will be generated.
This issue was reported by Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
I'm not 100% sure about this, it may need a version check or it might
be completely wrong.
added multisample ones as well.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The array_lvalue field was attempting to enforce the restriction that
whole arrays can't be used on the left-hand side of an assignment in
GLSL 1.10 or GLSL ES, and can't be used as out or inout parameters in
GLSL 1.10.
However, it was buggy (it didn't work properly for built-in arrays),
and it was clumsy (it unnecessarily kept track on a
variable-by-variable basis, and it didn't cover the GLSL ES case).
This patch removes the array_lvalue field completely in favor of
explicit checks in ast_parameter_declarator::hir() (this check is
added) and in do_assignment (this check was already present).
This causes a benign behavioral change: when the user attempts to pass
an array as an out or inout parameter of a function in GLSL 1.10, the
error is now flagged at the time the function definition is
encountered, rather than at the time of invocation. Previously we
allowed such functions to be defined, and only flagged the error if
they were invoked.
Fixes Piglit tests
spec/glsl-1.10/compiler/qualifiers/fn-{out,inout}-array-prohibited*
and
spec/glsl-1.20/compiler/assignment-operators/assign-builtin-array-allowed.vert.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Prevents lockups with piglit tests draw-elements and draw-vertices using large
numbers of vertices.
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Alex Deucher <alex.deucher@amd.com>
If we are called via the legacy DRI interface, and we don't support
legacy DRI (InitScreen is NULL), print a debug message, so it is easy
to see why the driver fails to initialize.
See https://bugs.freedesktop.org/show_bug.cgi?id=40437
Also includes loading of shared shader library code (used for f64
and integer division) and setting up the immediate array buffer
which is appended to the code.
Per the GL spec, clamp incoming colors prior to blending depending on
whether the destination buffer stores normalized (non-float) values.
Note that the constant blend color needs to be clamped too (we always
get the unclamped color from Mesa).
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=40412
This introduces an UNCLAMPED_FLOAT_TO_UBYTE x 4 inline function, as
suggested by Brian. It uses it in a few places I noticed from previous
color changes, and also some core mesa places. I haven't updated other places
yet.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This introduces a new gl_color_union union and moves the current
ClearColorUnclamped to use it, it removes current ClearColor completely and
renames CCU to CC, then all drivers are modified to expected unclamped floats instead.
also fixes st to use translated color in one place it wasn't.
Signed-off-by: Dave Airlie <airlied@redhat.com>
The EXT_texture_integer issues says:
Should pixel transfer operations be defined for the integer pixel
path?
RESOLVED: No. Fragment shaders can achieve similar results
with more flexibility. There is no need to aggrandize this
legacy mechanism.
v2: fix comments, fix unpack paths, use same comment/code
v3: fix last comment
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Since TGSI now has a UARL opcode that takes an integer as the source, it is
no longer necessary to hack around the lack of an integer ARL opcode using I2F.
UARL is only emitted when native integers are enabled; ARL is still used
otherwise.
Reviewed-by: Brian Paul <brianp@vmware.com>
==27540== Invalid read of size 4
==27540== at 0x96277B7: _mesa_make_extension_string (string3.h:144)
==27540== by 0x9604E78: _mesa_make_current (context.c:1514)
==27540== by 0x9602A8B: st_api_make_current (st_manager.c:789)
==27540== by 0x45406E7: ???
==27540== Address 0xad35b30 is 3,688 bytes inside a block of size 3,691 alloc'd
==27540== at 0x4025315: calloc (vg_replace_malloc.c:467)
==27540== by 0x9627641: _mesa_make_extension_string (extensions.c:910)
==27540== by 0x9604E78: _mesa_make_current (context.c:1514)
==27540== by 0x9602A8B: st_api_make_current (st_manager.c:789)
==27540== by 0x45406E7: ???
And:
==28351== Invalid write of size 2
==28351== at 0x4C087CC: _mesa_make_extension_string (string3.h:144)
==28351== by 0x4BE6198: _mesa_make_current (context.c:1514)
==28351== by 0x4BD4CAB: st_api_make_current (st_manager.c:789)
==28351== Address 0x48dd1f3 is 19 bytes inside a block of size 20 alloc'd
==28351== at 0x4025315: calloc (vg_replace_malloc.c:467)
==28351== by 0x4C08711: _mesa_make_extension_string (extensions.c:778)
==28351== by 0x4BE6198: _mesa_make_current (context.c:1514)
==28351== by 0x4BD4CAB: st_api_make_current (st_manager.c:789)
==28351==
==28351== Invalid read of size 4
==28351== at 0x4C087EC: _mesa_make_extension_string (extensions.c:806)
==28351== by 0x4BE6198: _mesa_make_current (context.c:1514)
==28351== by 0x4BD4CAB: st_api_make_current (st_manager.c:789)
==28351== Address 0x48dd1f4 is 0 bytes after a block of size 20 alloc'd
==28351== at 0x4025315: calloc (vg_replace_malloc.c:467)
==28351== by 0x4C08711: _mesa_make_extension_string (extensions.c:778)
==28351== by 0x4BE6198: _mesa_make_current (context.c:1514)
==28351== by 0x4BD4CAB: st_api_make_current (st_manager.c:789)
The first part adds 2, because ' ' and '\0' may be written at the end
of the buffer.
These two functions were nearly the same with lots of duplicated code.
Now pass in a boolean 'elts' flag and use a few conditionals to implement
the linear vs. indexed cases.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
==5715== Invalid read of size 4
==5715== at 0x4AA590B: _mesa_make_extension_string (extensions.c:908)
==5715== by 0x4A83198: _mesa_make_current (context.c:1514)
==5715== by 0x4A71CAB: st_api_make_current (st_manager.c:789)
==5715== Address 0x4795730 is 0 bytes inside a block of size 1 alloc'd
==5715== at 0x4025315: calloc (vg_replace_malloc.c:467)
==5715== by 0x4AA5B4C: _mesa_make_extension_string (extensions.c:772)
==5715== by 0x4A83198: _mesa_make_current (context.c:1514)
==5715== by 0x4A71CAB: st_api_make_current (st_manager.c:789)
This fixes piglit/fbo-generatemipmap-array.
It looks like SQ_TEX_SAMPLER_WORD0_0.TEX_ARRAY_OVERRIDE should be set
for array textures in order to disable filtering between slices,
which adds a dependency between sampler views and sampler states.
This patch reworks sampler state updates such that they are postponed until
draw time. TEX_ARRAY_OVERRIDE is updated according to bound sampler views.
This also consolidates setting the texture state between vertex and
pixel shaders.
The only purpose this call served in the DRI swrast driver was to
initialize the remap table. Core Mesa already does the dispatch
offset remapping for every function that could possibly ever be
supported. There's no need to continue using that cruft in the
driver.
Core Mesa already does the dispatch offset remapping for every
function that could possibly ever be supported. There's no need to
continue using that cruft in the driver.
Since the call to _mesa_enable_imaging_extensions (via
driInitExtensions) is removed, EXT_blend_color, EXT_blend_logic_op,
and EXT_blend_minmax are no longer advertised. These all resulted in
software fallbacks, so their loss will not be mourned.
EXT_blend_subtract is, however, explicitly added to the list.
GL_FUNC_SUBTRACT is fully accelerated, but GL_FUNC_REVERSE_SUBTRACT
(still) results in a software fallback.
Cc: Alex Deucher <alexdeucher@gmail.com>
Cc: Dave Airlie <airlied@redhat.com>
Core Mesa already does the dispatch offset remapping for every
function that could possibly ever be supported. There's no need to
continue using that cruft in the driver.
Since the call to _mesa_enable_imaging_extensions (via
driInitExtensions) is removed, EXT_blend_color is explicitly added to
the list.
EXT_blend_logic_op is removed from the list of extensions because
blend factors and separate blend equations are not handled correctly.
Cc: Alex Deucher <alexdeucher@gmail.com>
Cc: Dave Airlie <airlied@redhat.com>
Core Mesa already does the dispatch offset remapping for every
function that could possibly ever be supported. There's no need to
continue using that cruft in the driver.
Since the call to _mesa_enable_imaging_extensions (via
driInitExtensions) is removed, EXT_blend_color is explicitly added to
the list.
EXT_blend_logic_op is removed from the list of extensions because
blend factors and separate blend equations are not handled correctly.
Based on feedback from Roland Scheidegger.
Cc: Dave Airlie <airlied@redhat.com>
Cc: Alex Deucher <alexdeucher@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Corbin Simpson <MostAwesomeDude@gmail.com>
Core Mesa already does the dispatch offset remapping for every
function that could possibly ever be supported. There's no need to
continue using that cruft in the driver.
Since the call to _mesa_enable_imaging_extensions (via
driInitExtensions) is removed, EXT_blend_color is explicitly added
with a dependency on the drmSupportsBlendColor flag.
EXT_blend_logic_op is removed from the list of extensions because
blend factors and separate blend equations are not handled correctly.
Based on feedback from Roland Scheidegger.
Cc: Alex Deucher <alexdeucher@gmail.com>
Cc: Dave Airlie <airlied@redhat.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Core Mesa already does the dispatch offset remapping for every
function that could possibly ever be supported. There's no need to
continue using that cruft in the driver.
Since the call to _mesa_enable_imaging_extensions (via
driInitExtensions) is removed, EXT_blend_color, EXT_blend_minmax, and
EXT_blend_subtract are explicitly added to the list.
EXT_blend_logic_op is removed from the list of extensions because
blend factors and separate blend equations are not handled correctly.
Cc: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Cc: Viktor Novotný <noviktor@seznam.cz>
Core Mesa already does the dispatch offset remapping for every
function that could possibly ever be supported. There's no need to
continue using that cruft in the driver.
EXT_blend_logic_op is removed from the list of extensions because
blend factors and separate blend equations are not handled correctly.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It's not clear if these are acceptable cases so issue a one-time warning
in debug builds when we hit them.
Fixes segfault in piglit fbo-mipmap-copypix test.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
The test was of an enum, attIndex, which should be unsigned. The
explicit check for < 0 was replaced with a cast to unsigned in an
assertion that attIndex is less than the size of the array it will be
used to index.
Reviewed-by: Eric Anholt <eric@anholt.net>
Trivially silence the compiler by adding '(void) foo;' for each unused
parameter. These parameters could not be removed. They are part of
interface used elsewhere in Mesa, and some of the other customers
actually use these parameters.
The internalFormat, format, and type parameters were not used by
either try_pbo_upload or try_pbo_zcopy, so remove them. The width
parameter was also not used by try_pbo_zcopy (because it doesn't
actually copy anything), so remove it too.
Eric Anholt notes:
The current structure of this code is so hateful I can't bring
myself to say anything about whether changing the current code is
good or bad.
I have a dream that one call would try to make a surface
(miptree/region) out of the PBO, then we'd see about whether it
matches up nicely and zero-copy/blit using that. That would be
reusable for texsubimage, which is currently awful in this
respect.
At some point we should revisit this code with pitchforks and torches.
The depth0 parameter was not used in intel_miptree_create_for_region,
so remove it. All of the places that call this function, pass 1 for
that parameter, and the place where it looks like it should have been
used (the call to intel_miptree_create_internal) also had 1 hard
coded.
Reviewed-by: Eric Anholt <eric@anholt.net>
The GLenum target parameter was not used in intel_copy_texsubimage, so
remove it. Also remove the GLenum internalFormat parameter. Each
caller just copied this out of the intel_texture_image that is already
passed to intel_copy_texsubimage.
Reviewed-by: Eric Anholt <eric@anholt.net>
The intel_context and tiling parameters were not used by any if the
i9[14]5_miptree_layout or the functions they call, and the tiling parameter was
not used by brw_miptree_layout. Remove the unnecessary parameters.
Also clean-up some of the naming, etc. in
intel_buffer_object_purgeable. 'intel' is usually used as the name of
an intel_context pointer, and intel_obj is usually used as the name of
an intel_*_obj pointer. These changes were suggested by Eric Anholt.
Reviewed-by: Eric Anholt <eric@anholt.net>
v2: Remove the assertion in intel_batchbuffer_space:
assert((intel->batch.state_batch_offset - intel->batch.reserved_space)
>= intel->batch.used*4);
After reviewing all the places where this is called, I'm (fairly)
comfortable that this assertion was redundant. Having the assertion
adds ~20KiB to a driver build:
text data bss dec hex filename
903173 26392 1552 931117 e352d i965_dri.so
924093 26392 1552 952037 e86e5 i965_dri.so
Based on feedback from Eric Anholt.
Reviewed-by: Eric Anholt <eric@anholt.net>
This differs from the FS in that we track constants in each
destination channel, and we we have to look at all the swizzled source
channels. Also, the instruction stream walk is done in an O(n) manner
instead of O(n^2).
Across shader-db, this reduces 8.0% of the instructions from 60.0% of
the vertex shaders, leaving us now behind the old backend by 11.1%
overall.
Tracking virtual GRFs has tension between using a packed array per
virtual GRF (which is good for register allocation), and sparse arrays
where there's an element per actual register (so the first and second
column of a mat2 can be distinguished inside of an optimization pass).
The FS mostly avoided the need for this second sparse array by doing
virtual GRF splitting, but that meant that instances where virtual GRF
splitting didn't work, instructions using those registers got much
less optimized.
Now instead of env INTEL_NEW_VS=1 to get it, you need INTEL_OLD_VS=1
to not get it. While it's not quite to the same codegen efficiency as
the old backend, it is not regressing piglit on G965 and G45, and
actually fixing bugs on gen6, and the remaining codegen quality
regressions all appear tractable.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We don't expect uniform accesses to generally go away from being dead
code at this point, and we will want to have uniforms packed before
spilling them out to pull constants when we are forced to do that.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
The offset to the arrays after the first was mis-scaled, so we'd go
access off the end of the surface and read 0s. Fixes
glsl-vs-uniform-array-3.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
While we had nice debug output for most of the instruction stream, it
was terminated by a series of anonymous MOVs and a send.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It maps to MESA_FORMAT_RGBA8888_REV. Surfaces of the format can only be
sampled from but not render to.
Only i915 is tested.
Reviewed-by: Eric Anholt <eric@anholt.net>
[olv: add a check in intel_image_target_renderbuffer_storage]
Add a new format token, __DRI_IMAGE_FORMAT_ABGR8888, to __DRI_IMAGE. It
maps to MESA_FORMAT_RGBA8888_REV in core mesa or
PIPE_FORMAT_R8G8B8A8_UNORM in gallium. The format is used by
translucent surfaces on Android.
We were splitting on each side of an unlinked program, and the two
sides lost track of which variables they referenced, resulting in
assertion failure during validation. Fixes piglit
link-struct-uniform-usage.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously, it would produce:
Failed to compile FS: 0:6(7): error: non-lvalue in assignment
and now it produces:
Failed to compile FS: 0:5(7): error: whole array assignment is not
allowed in GLSL 1.10 or GLSL ES 1.00.
Also, add spec quotation to the two places we have code for array
lvalues in GLSL 1.10.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We just want to mark the whole thing used, not mark from each element
the whole size in use. Fixes undefined URB entry writes on i965,
which blew up with debugging enabled.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Uses the new _mesa_decompress_image() function. Unlike the meta path
that uses textured quad rendering to do decompression, this works with
signed formats as well.
We'd still accept the GL_PALETTE[48]_* formats in glCompressedTexImage2D,
but they wouldn't be listed if you queried whether they were supported.
Signed-off-by: Adam Jackson <ajax@redhat.com>
From section 7.1 (Vertex Shader Special Variables) of the GLSL 1.30
spec:
"It is an error for a shader to statically write both
gl_ClipVertex and gl_ClipDistance."
Fixes piglit test mixing-clip-distance-and-clip-vertex-disallowed.c.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The check now applies both when explicitly declaring the size of
gl_TexCoord and when implicitly setting the size of gl_TexCoord by
accessing it using integral constant expressions.
This is prep work for adding similar size checks to gl_ClipDistance.
Fixes piglit tests texcoord/implicit-access-max.{frag,vert}.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
From the GLSL 1.30 spec, section 7.1 (Vertex Shader Special Variables):
The gl_ClipDistance array is predeclared as unsized and must be
sized by the shader either redeclaring it with a size or indexing it
only with integral constant expressions.
Fixes piglit tests clip-distance-implicit-length.vert,
clip-distance-implicit-nonconst-access.vert, and
{vs,fs}-clip-distance-explicitly-sized.shader_test.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
-g3 causes binaries to be 3x - 10x bigger, not only on MinGW w/ dwarf
debugging info, but linux as well.
Stick with -g, (which defaults to -g2), like autoconf does.
Return true for NATIVE_PARAM_PREMULTIPLIED_ALPHA when all formats with
alpha support premultiplied alpha.
(Based on Chia-I Wu's patch)
[olv: remove the use of param_premultiplied_alpha from the original
patch]
Handle "format" events and return configs for the supported formats.
(Based on Chia-I Wu's patch)
[olv: update and explain why PIPE_FORMAT_B8G8R8A8_UNORM should not be
enabled without HAS_ARGB32]
Return true for NATIVE_PARAM_PREMULTIPLIED_ALPHA when all formats with
alpha support premultiplied alpha. Currently, it means when argb32 and
argb32_pre are both supported.
When wl_drm is avaiable and enabled, handle "format" events and return
configs for the supported formats. Otherwise, assume all formats of
wl_shm are supported.
EGL does not export this capability of a display server. But wayland
makes use of EGL_VG_ALPHA_FORMAT to achieve it.
So, when the native display returns true for the parameter, st/egl will
set EGL_VG_ALPHA_FORMAT_PRE_BIT for all EGLConfig's with non-zero
EGL_ALPHA_SIZE. EGL_VG_ALPHA_FORMAT attribute of a surface will affect
how the surface is presented.
Because st/vega does not support EGL_VG_ALPHA_FORMAT_PRE_BIT,
EGL_OPENVG_BIT will be cleared.
Replace the parameters of native_surface::present by a struct,
native_present_control. Using a struct allows us to add more control
options without having to update each backend every time.
The opcodes and strings were reversed. Quotient means division, and
modulus means remainder.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Follow a subset of changes in 7b1d94e5d1.
There are known issues, but it works to a certain degree. Non-working
demos also fail gracefully. More importantly, it fixes the build.
The list of numbers in (constant type (<numbers>)) needs to contain
exactly type->components() numbers (16 for a mat4, 3 for a vec3, etc.)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Each of these vecN constants only provided one component, which is
illegal. The printed IR is meant to contain exactly as many components
as are necessary; the IR reader does not splat single values.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We were failing to relocate, so on the first draw run our scratch
would tend to get written to 0x0.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We were passing an MRF as the source argument, instead of using the
implied move and putting the MRF number in the proper place in the
instruction encoding.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
On the old backend, we used scalar mode because Mesa IR math is
result.xyzw = math(op0.xxxx), which matched up well. However, in GLSL
IR we do things like result.xy = math(op0.xy), so we want vector mode.
For the common case of result.x = math(op0.x), performance will be the
same (no cost for un-executed channels), though result.xyzw =
math(op0.xxxx) would be worse.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When we tried to retype a brw_null_reg() in CMP(), the retyping didn't
take effect because HW_REG just ignores the type field.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If you get your total GRF count wrong, you write over some other
shader's g0, and the GPU fails shortly thereafter.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We need this for the upcoming fix for sw texture_from_pixmap.
Signed-off-by: Stuart Abercrombie <sabercrombie@chromium.org>
Signed-off-by: Stéphane Marchesin <marcheu@chromium.org>
Current just the items that have been removed from Mesa are mentioned
in the release notes.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Mesa hasn't supported color-index rendering for a long time.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
GL_COLOR_INDEX produced the same result (because GL_BITMAP is always
used for stencil glDrawPixels), but it was confusing to read. I spent
about 15 minutes wondering, "WTF?"
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Mesa hasn't supported color-index rendering for a long time.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
_mesa_make_temp_float_image can't work on color-index textures, but
there is no such thing as a color-index texture anymore.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These sampling functions don't work on color-index textures, but there
is no such thing as a color-index texture anymore.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These enums were only valid with the paletted texture extensions.
This allows a couple other trivial clean-ups.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
There's nothing left that can call any of these functions. This also
removes the meta-ops code that implemented the first two.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This was also discussed at XDS 2010. However, actually making the
change was delayed because several drivers still exposed these
extensions to significant benefit (e.g., tdfx). Now that those
drivers have been removed, this code can be removed as well.
v2: A lot of bits that were missed in the previous patch have been removed.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Since we now lay out the VUE the same way regardless of whether
two-sided color is enabled, brw_compute_vue_map() no longer needs to
know whether two-sided color is enabled. This allows the two-sided
color flag to be removed from the clip, GS, and VS keys, so that fewer
GPU programs need to be recompiled when turning two-sided color on and
off.
Reviewed-by: Eric Anholt <eric@anholt.net>
When doing two-sided color on GEN6+, we use the SF unit's
INPUTATTR_FACING mode to cause front colors to be used on front-facing
triangles, and back colors to be used on back-facing triangles. This
mode requires that the front and back colors be adjacent in the VUE.
Previously, we would only place front and back colors adjacent in the
VUE when two-sided color was enabled. Now we place them adjacent in
the VUE whether two-sided color is enabled or not. (We still only
swizzle the colors when two-sided color is enabled, so there should be
no user-visible change).
This simplifies the implementation of the VUE map and reduces the
amount of code that is dependent on two-sided color mode.
Reviewed-by: Eric Anholt <eric@anholt.net>
The previous computation had two bugs: (a) it used a formula based on
Gen5 for Gen6 and Gen7 as well. (b) it failed to account for the fact
that PSIZ is stored in the VUE header. Fortunately, both bugs caused
it to compute a URB size that was too large, which was benign. This
patch computes the URB size directly from the VUE map, so it gets the
result correct in all circumstances.
Reviewed-by: Eric Anholt <eric@anholt.net>
The variables offset[], idx_to_attr[], nr_bytes, nr_attrs, and
header_regs were all serving purposes which are now served by the VUE
map.
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, brw_clip_interp_vertex() iterated only through the
"non-header" elements of the VUE when performing interpolation
(because header elements don't need interpolation). This code now
refers exclusively to the VUE map to figure out which elements need
interpolation, so that brw_clip_interp_vertex() doesn't need to know
the header size.
Reviewed-by: Eric Anholt <eric@anholt.net>
This patch replaces some ad-hoc computations using ATTR_SIZE and the
offset[] array to use the VUE map functions
brw_vert_result_to_offset() and brw_vue_slot_to_offset().
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously we would examine the offset[] array (since an offset of 0
meant "not in use"). This paves the way for removing the offset[]
array.
Reviewed-by: Eric Anholt <eric@anholt.net>
The offsets within the VUE of HPOS and NDC are needed only in a few
auxiliary clipping functions. This patch moves computation of those
offsets into the functions that need them, and does the computation
using the VUE map.
Reviewed-by: Eric Anholt <eric@anholt.net>
This patch changes get_attr_override() (which computes the
relationship between vertex shader outputs and fragment shader inputs)
to use the VUE map.
Reviewed-by: Eric Anholt <eric@anholt.net>
This patch removes the variables nr_attrs and nr_setup_attrs, whose
purpose is now being served by the VUE map. nr_attr_regs and
nr_setup_regs are still needed, however they are now computed using
the VUE map rather than by counting the number of vertex shader
outputs (which caused subtle bugs when gl_PointSize was written).
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, the SF used nr_setup_attrs to determine whether it was
looking at the last element of the VUE. Changed this code to use the
VUE map.
Reviewed-by: Eric Anholt <eric@anholt.net>
These data structures were serving the same purpose as the VUE map,
but were buggy. Now that the code has been transitioned to use the
VUE map, they are not needed.
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, SF code used the idx_to_attr[] array to compute the
location of entries in the VUE map. This array didn't properly
account for gl_PointSize. Now we use the VUE map directly.
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, some of the code in SF erroneously used bitfields based on
the gl_frag_attrib enum when actually referring to vertex results.
This worked, because coincidentally the particular enum values being
used happened to match between gl_frag_attrib and gl_vert_result. But
it was fragile, because a future change to either gl_vert_result or
gl_frag_attrib would have made the enum values stop matching up. This
patch switches the SF code to use the correct enum.
Reviewed-by: Eric Anholt <eric@anholt.net>
The new function, called get_vert_result(), uses the VUE map to find
the register containing a given vertex attribute. Previously, we used
the attr_to_idx[] array, which served the same purpose but didn't
account for gl_PointSize correctly.
This fixes a bug on pre-Gen6 wherein the back side of a triangle would
be rendered incorrectyl if the vertex shader wrote to gl_PointSize.
Reviewed-by: Eric Anholt <eric@anholt.net>
This patch moves the computation of the SF URB entry read offset from
upload_sf_unit() to its own function, so that it can be re-used when
creating the gen4-5 SF program.
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, the new VS backend computed the size of the URB entry by
counting the number of MRFs used in emitting the URB entry. Now it
just gets it straight from the VUE map.
Reviewed-by: Eric Anholt <eric@anholt.net>
max_usable_mrf has been carefully set such that (max_usable_mrf -
base_mrf) is a multiple of 2, so that an even number of VUE slots are
emitted with each URB write (which Gen6 requires). This patch adds an
assertion to confirm that this is the case, and moves the comment to
this effect to be near the assertion.
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, the new VS backend used two functions,
emit_vue_header_gen6() and emit_vue_header_gen4() to emit the fixed
parts of the VUE, and then a pair of carefully-constructed loops to
emit the rest of the VUE, leaving out the parts that were already
emitted as part of the header.
This patch changes the new VS backend to use the VUE map to emit the
entire VUE.
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, emit_vue_header_gen4() used local variables to keep track
of which registers were storing the NDC and HPOS. This patch uses the
output_reg[] array instead, so that the code that manipulates NDC and
HPOS can be more easily refactored.
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, the old VS backend computed the URB entry size by adding
the number of vertex shader outputs to the size of the URB header.
This often produced a larger result than necessary, because some
vertex shader outputs are stored in the header, so they were being
double counted. This patch changes the old VS backend to compute the
URB entry size directly from the number of slots in the VUE map.
Note: there's a subtle change in that we no longer count header
registers towards the size of the VF input. I believe this is
correct, because the header is only emitted in the output of the VS
stage--it is not present in the input. (As evidence for this, note
that brw_vs_state.c sets urb_entry_read_offset to 0--it does not
include space for the header as part of the VS input).
Reviewed-by: Eric Anholt <eric@anholt.net>
Some parts of the i965 driver keep track of locations within the VUE
(vertex URB entry) using byte offsets. This patch adds inline
functions to compute these byte offsets using the VUE map.
Reviewed-by: Eric Anholt <eric@anholt.net>
Several places in the i965 code make implicit assumptions about the
structure of data in the VUE (vertex URB entry). This patch adds a
function, brw_compute_vue_map(), which computes the structure of the
VUE explicitly. Future patches will modify the rest of the driver to
use the explicitly computed map rather than rely on implicit
assumptions about it.
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, this conversion was duplicated in several places in the
i965 driver. This patch moves it to a common location in mtypes.h,
near the declaration of gl_vert_result and gl_frag_attrib.
I've also added comments to remind us that we may need to revisit the
conversion code when adding elements to gl_vert_result and
gl_frag_attrib.
Reviewed-by: Eric Anholt <eric@anholt.net>
This just adds the opcodes for evergreen, need to work on r600 and cayman
implementations.
don't advertise nativeintegers yet until we work out all the regressions.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This just adds all the API check for vertex arrays using 2101010 types.
2101010 is also useable with GL_BGRA.
v2: fix whitespace.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This adds the vertex processing paths for the 2101010 types. It converts
the attributes to floats for all the immediate entry points, some entrypoints
are normalised and the attrib APIs take a normalized parameter.
There are four main paths,
ui10 -> float unnormalized
i10 -> float unnormalized
ui10 -> float normalized
i10 -> float normalized
along with the ui2/i2 equivs.
Signed-off-by: Dave Airlie <airlied@redhat.com>
add new APIs to the internal mesa driver interface + set funcs in vtxfmt.c
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
These are the new API entrypoints for ARB_vertex_type_2_10_10_10_rev
extension, along with the new INT_2_10_10_10_REV enum.
v2: fixup crazy whitespace cut-n-paste mess
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Drivers supporting native integers set UniformBooleanTrue to the integer value
that should be used for true when uploading uniform booleans. This is ~0 for
Gallium and 1 for i965.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This just reorgs one define in csv file, and adds all the new formats
that are needed for this extension.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This fixes all but one of the piglit regressions from enabling native integers
in softpipe. The change to fix the last regression is still being discussed.
The preferred solution to keeping track of the picture structure
has been putting it in the state tracker, so use picture_structure
instead of frame_started to check if a frame needs to begin.
If picture_structure has been changed, end the frame and start again.
Signed-off-by: Maarten Lankhorst <m.b.lankhorst@gmail.com>
Signed-off-by: Christian König <deathsimple@vodafone.de>
This could happen in 3 different cases, and ERRNO can explain what
happened. First case would be EIO (gpu hang), second EINVAL (something is
wrong inside the batch), and we also discovered that sometimes it happens
with ENOSPACE. All of those cases are different it it could be worth to at
least know what happened.
Signed-off-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
According to the comment, we need to load /some/ push constants on
pre-Gen6 hardware or the GPU will hang. The existing code set these
bogus parameters to NULL pointers; unfortunately, the code in
brw_curbe.c that loads them dereferences those pointers. So, change
them to be pointers to an actual floating point value of 0.0.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
As per Brian's suggestion, add caps for drivers that support texture
offsets to advertise a min/max via TGSI, also use it in the state tracker.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This adds tokens for texture offsets, to store 4 * swizzled vec 3
for use in TXF and other opcodes.
It also contains TGSI exec changes for softpipe to use this code,
along with GLSL->TGSI support for TXF.
v2: add some more comments, add back padding I removed.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This fixes the swrast failures for piglit's fbo-generatemipmap-formats
test (for uncompressed formats). At some point down the road this code
will go away so I haven't checked all the other store_texel() functions.
Simple demos such as test-opengl-gl_basic work. SurfaceFlinger does not
work yet due to missing GL_OES_draw_texture support (and maybe more).
Reviewed-by: Chad Versace <chad@chad-versace.us>
In preparation for porting i915 to Android, factor its source lists into
a shared makefile. This prevents duplication of source lists, and hence
prevents the Android build from breaking as often.
Reviewed-by: Chad Versace <chad@chad-versace.us>
This is a better, more fine-grained way of lowering if statements. Fixes the
game And Yet It Moves on nv50.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We don't want to set the pixmap bit in the EGL config if the DRI
config we're adding is a double buffered config. However, don't clear
any other bits the platform might pass in in the surface_type
argument.
Using multiply and reciprocal for integer division involves potentially
lossy floating point conversions. This is okay for older GPUs that
represent integers as floating point, but undesirable for GPUs with
native integer division instructions.
TGSI, for example, has UDIV/IDIV instructions for integer division,
so it makes sense to handle this directly. Likewise for i965.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Bryan Cain <bryancain3@gmail.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Add generator instructions for the scratch opcodes.
Add emit_before() for handling ->ir and ->annotation inheritance.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This DP4 had one of its operands missing, so we were generating
garbage clip distances. Using the per-opcode instruction generators
made it obvious.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Set ctx->WindowRenderBuffer to EGL_BACK_BUFFER. As EGL_WINDOW_BIT of a
config is set only when there is dri_double_buffer, that makes sure
window surfaces are always double-buffered and contexts will render to
the back buffer.
Reviewed-by: Chad Versace <chad@chad-versace.us>
Advertising different format support based on sample count was a
bad idea, it made resolve to window work, but resolve to anything
else would fail.
See 9f4998639c.
By emitting code before generate_code(), we ended up in align1 mode
where writemasks don't exist, so we rescaled gl_Vertex.w and things
went badly. By moving GL_FIXED support to the visitor, we end up with
normal codegen, and as a bonus the GL_FIXED setup ends up getting
printed appropriately in debug output.
Fixes gtf/GL2Tests/fixed_data_type
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
At some point we need to also move uniform accesses out to pull
constants when there are just too many in use, but we lack tests for
that at the moment.
Fixes glsl-vs-large-uniform-array.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This avoids the massive conditional move array access, and brings code
generation quality for the new VS backend into the realm of efficiency
of the old backend (roughly 20% more instructions generated than
before across shader-db, instead of assertion failing for generating
over 10,000 instructions on many shaders!).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We sometimes want to put an instruction somewhere besides the end of
the instruction stream, and we also want per-opcode instruction
generation to enable compile-time checking of operands.
We'll be using that to track things for the new VS backend, and this will
avoid cluttering brw_vs_surface_state.c for it.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We were primarily failing to convert in the NativeIntegers case, which
this fixes. However, we were also just truncating float uniforms when
converting to integer, which does not appear to be the correct
behavior. Note, however, that the NVIDIA drivers also truncate
instead of rounding.
GL_DOUBLE return type is dropped because it was never used and
completely broken. It can be added when there's test code.
Fixes piglit ARB_shader_objects/getuniform
v2: This is a rewrite of my previous glGetUniform patch, which Ken
pointed out missed storage_type-based conversions to integer,
which was totally broken still thanks to a typo in the testcase.
v3: Quote the spec justifying the rounding behavior.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
At least for Intel, all our uniform components are of uint32_t size, either
float or signed or unsigned int. For uploading uniform data in the driver,
it's much easier to upload a full dword per uniform element instead of trying
to pick out the bool byte and then fill in the top 3 bytes of pad with 0.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Replace each occurence of
#include "../glsl/*.h"
with
#include "glsl/*.h"
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad@chad-versace.us>
libmesa_dri_common is a static library that contains the sources in
src/mesa/drivers/dri/common. Each DRI driver should link to it.
Reviewed-by: Chia-I Wu <olv@lunarg.com>
Signed-off-by: Chad Versace <chad@chad-versace.us>
In src/mesa/Android.mk, it is non-trivial to determine which variables are
imported by `include sources.mak`. So document them.
Reviewed-by: Chia-I Wu <olv@lunarg.com>
Signed-off-by: Chad Versace <chad@chad-versace.us>
libmesa_dricore.a is analogous to the libmesa.a built by the Autoconf
build.
Reviewed-by: Chia-I Wu <olv@lunarg.com>
Signed-off-by: Chad Versace <chad@chad-versace.us>
In order that the Autoconf and Android build can share the same source
lists, move the lists from
src/mesa/drivers/dri/Makefile.defines
into
src/mesa/drivers/dri/common/Makefile.sources
I would like for Android to just reuse Makefile.defines, but the file is
unsuitable for reuse.
Reviewed-by: Chia-I Wu <olv@lunarg.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off: Chad Versace <chad@chad-versace.us>
driverfuncs.o is already contained in libmesa.a, so remove it from the
following source lists:
src/mesa/drivers/dri/Makefiles.defines:COMMON_SOURCES.
src/mesa/drivers/dri/swrast/Makefile:SWRAST_COMMON_SOURCES
Reviewed-by: Ian Romanick <idr@freedesktop.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Remove defintion of COMMON_SOURCES from {r300,r660}/Makefile. The
defintion is a duplicate of that found in
src/mesa/drivers/dri/Makefile.defines.
Reviewed-by: Ian Romanick <idr@freedesktop.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
The layersize calculation is slightly different on +evergreen.
This makes mpeg2 video decoding and piglits texture-packed-formats
test work correctly on this hardware.
I noticed that a thread was created for every time async flush was called, so I moved it and used some semaphores to synch.
Signed-off-by: Maarten Lankhorst <m.b.lankhorst@gmail.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Signed-off-by: Christian König <deathsimple@vodafone.de>
This prevents null dereferences in validation of interdependent
state after a switch to a pipe context where we mark all state
as dirty but where not all state is valid / set yet.
The window system buffer will be BGRA and applications will try to
directly resolve to it, which would trigger an INVALID_OPERATION in
BlitFramebuffer if the multisample renderbuffer is RGBA.
All commonly used windows toolchains define wgl entrypoints in the windows
headers, and mesa_wgl.h not only is unnecessary but actually often stands
in the waydue to slight inconsistencies.
So remove it.
This is a port of vec4_visitor::try_rewrite_rhs_to_dst to fs_visitor.
Not only is this technique less invasive and more robust, it also
generates better code. Over and above the previous technique, this
reduced instruction count in shader-db by 0.28% on average and 1.4% in
the best case.
In no case did this technique result in more code than the prior method.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
This reverts commit 53c89c67f3, along with
the subsequent this->result = reg_undef additions it required.
Both Eric and I agree that the way he did this is really fragile; if you
forget to add this->result = reg_undef before calling accept(), it may
end up using the same register for two separate things, breaking things
in strange and mysterious ways.
The next commit will port over the new VS backend's method for solving
this problem, which is simpler, less intrusive, and still manages to
avoid MOVs in the common case.
Nothing in Mesa supports color-index textures, and most of the other
infrastructure that could allow such support has already been removed.
This puts the final nail in the coffin.
Also clean out some GL_COLOR_INDEX comments in formats.c.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This came from the "kill it with fire" discussion at XDS 2010.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This continues to allocate texImage->Data as before, so
drivers calling these functions need to use that when present.
Reviewed-by: Brian Paul <brianp@vmware.com>
ctx->Driver.MapTextureImage() / UnmapTextureImage() will be called by
the glTex[Sub]Image(), glGetTexImage() functions, etc. when we're
accessing texture data, and also for software rendering when accessing
texture data.
Reviewed-by: Brian Paul <brianp@vmware.com>
All driver implementations of FreeTextureImageBuffer already check
that Data != NULL and free it. However, this means that we will also
free driver storage if the driver storage wasn't in the form of a Data
pointer.
This was produced by the following semantic patch:
@@
expression C;
expression T;
@@
- if (T->Data) {
- C->Driver.FreeTextureImageBuffer(C, T);
+ C->Driver.FreeTextureImageBuffer(C, T);
- }
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This was produced by sed, except for one hunk in driverfuncs.c where
trailing whitespace was dropped.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Add platform_android.c that supports _EGL_PLAFORM_ANDROID. It works
with drm_gralloc, where back buffers of windows are backed by GEM
objects.
In Android a native window has a queue of back buffers allocated by the
server, through drm_gralloc. For each frame, EGL needs to
dequeue the next back buffer
render to the buffer
enqueue the buffer
After enqueuing, the buffer is no longer valid to EGL. A window has no
depth buffer or other aux buffers. They need to be allocated locally by
EGL.
Reviewed-by: Benjamin Franzke <benjaminfranzke@googlemail.com>
Reviewed-by: Chad Versace <chad@chad-versace.us>
[olv: with assorted minor changes, mostly suggested during the review]
Add rgba_masks to dri2_add_config. When it is non-NULL, the DRI config
is accepted only when the offsets and sizes of the its channels match
rgba_mask.
Reviewed-by: Chad Versace <chad@chad-versace.us>
Quickly tested with 945GME. SurfaceFlinger (the display server and
compositor) works. 2D apps with RGB or RGBA visuals work. As for 3D
apps, some work and some do not.
Quickly tested with VMWare Workstation 7.1.4 on Linux with GeForce
GT220. SurfaceFlinger (the display server and compositor) works. 2D
apps with RGB visual works. However, due to missing
PIPE_FORMAT_R8G8B8A8_UNORM support, those with RGBA visual do not.
Factor out C_SOURCES from Makefile to Makefile.sources, and let Makefile
and SConscript share it.
Note that
$(TOP)/src/glsl/ralloc.c and
$(TOP)/src/mesa/program/register_allocate.c
are removed from C_SOURCES in Makefile.sources and added back in
Makefile and SConscript. The idea is that they are not part of r300g.
But having them in libr300.a makes build non-GL targets such as the
compiler tests or g3dvl much easier. Also, for practical reason, TOP
would be an undefined variable in Makefile.sources.
drmVersion and driver specific ioctls are used to get the PCI ID from a
DRM fd. Eexpand the mechanism to nouveau and vmwgfx, except that for
nouveau, only the vendor ID is needed, and for vmwgfx, always assume
SVGA II.
When generating dispatch templates, emit the '(void) blah;' magic to
make GCC happy. This reduces a lot of warning spam if you build with
-Wunused-parameter or -Wextra.
Reviewed-by: Chia-I Wu <olv@lunarg.com>
In preparation for porting i965 to Android, factor its source lists into
a shared makefile. This prevents duplication of source lists, and hence
prevents the Android from breaking as often.
Acked-by: Chia-I Wu <olv@lunarg.com>
Signed-off-by: Chad Versace <chad@chad-versace.us>
If the state tracker tries to map the resource directly but we can't or don't
want to do that, fail to create a transfer.
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Compiling some (large) files with i686-pc-mingw32-gcc 4.2.2 (at least)
and the -gstabs option triggers a compiler error. Use this work-around
to simply compile the effected files without -gstabs.
Make setting the quant matrixes a generic interface.
Also removes setting the quant matrix from the XvMC interface
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Younes Manton <younes.m@gmail.com>
Make the picture_structure enum spec complient.
Also remove it from the compositor.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Younes Manton <younes.m@gmail.com>
Revert back to a macroblock based interface. The structure used
tries to keep as close to the spec as possible.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Younes Manton <younes.m@gmail.com>
Implement PIPE_CAP_NUM_BUFFERS_DESIRED giving the decoder control over
the number of buffers a state tracker should allocate.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Younes Manton <younes.m@gmail.com>
First of all get ride of the decode_buffer structure, while still giving
the decoder the ability to organize it's buffers depending on the needs
of the state tracker.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Younes Manton <younes.m@gmail.com>
Instructions with 3 source operands have no write mask, so we may replace their
destinations with PV/PS in the next group even if their dst.write is 0.
Note: This is a candidate for the 7.11 branch.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Need to do full check when not all bank swizzles in the group are forced
(e.g. when trying to merge interp_* group with the next instruction)
Note: This is a candidate for the 7.11 branch.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Nothing in Mesa generates these opcodes, and i965 hardware cannot
support it natively. If support were ever added for this opcode in
Mesa, there had better be a lowering pass for hardware that doesn't
support it natively.
while debugging texelFetchOffset we kept hitting the assert.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Otherwise we continue and hit the "Illegal formal parameter mode"
assertion.
Fixes negative compile test texelFetchOffset.frag in piglit.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds texelFetch support to translate from GLSL to TGSI TXF opcode.
I've tested this works with an r600g and softpipe backend.
v2: drop comments, fix title,
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Bryan Cain <bryancain3@gmail.com>
This just calls the texel fetch functions directly bypassing the sampling,
notes:
1: loops inside switch should be more optimal.
2: borders can be sampled though only up to border depth, outside that
its undefined.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is a straight texel fetch with no filtering or clamping. It uses
integers to specify the i/j/k (from EXT_gpu_shader4).
To enable this I had to add another hook into the tgsi sampler so that
we could easily bypass all the filtering sample does.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This adds the get_dims callback that is called from the tgsi exec_txq.
It returns values as per EXT_gpu_program4.
v2: fix one indent + use a switch (slighty modified from Brian)
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
this adds another callback in the sampler struct containing get_dims
entry point. This is used to query the driver for the texture resource
dimensions for the resource bound to the current sampler.
v2: remove unusued variable, fix indent
Signed-off-by: Dave Airlie <airlied@redhat.com>
It's the same as GL_AMD_conservative_depth. The specs have slight
differences in wording, but don't differ in content or behavior.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Tested with a Radeon HD 6250. SurfaceFlinger (the display server and
compositor) works. 2D apps with RGB or RGBA visuals work. As for 3D
apps, some work but some don't (with serious rendering defects).
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Put restrict in the function definitions to silence MSVC warnings
about incompatible assignments in "func = lp_tile_foobar;" when func
was declared with restrict keywords but the rhs function wasn't.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This patch documents some Mesa coding style conventions that came up
during the discussion of commit 67b5a32 (Perform implicit type
conversions on function call out parameters).
Before, if we ended up here without a BO for our image, but did choose
a miptree that had active rendering in the command buffer, our
teximage data would jump ahead of the rendering using the old texture
contents.
This showed up as breakage in gen-teximage and friends in the
following commit.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Several drivers have these fields in their subclasses of gl_texture_image.
They'll be useful for core Mesa too...
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The use of mmap() in winsys requires large file support. Not all OSes
have LFS so a wrapper should be used. In particular, os_mmap() should
call __mmap2() on Android.
Replace all calls to dd_function_table::MapBuffer with appropriate
calls to dd_function_table::MapBufferRange, then remove all the cruft.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The code previously passed GL_DYNAMIC_DRAW for the access parameter.
By inspection, I believe that all drivers would treat this as
GL_READ_WRITE because it's not GL_READ_ONLY and it's not
GL_WRITE_ONLY.
It appears the i965 code wants GL_WRITE_ONLY (it's about to write a
bunch of data in, never read data), while the arrayelt code is
GL_READ_ONLY (just dereffed as arguments to CALL_Whatever*v).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Keith Whitwell <keithw@vmware.com>
No driver used that parameter, and most drivers ended up with a bunch
of unused-parameter warnings because it was there.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
No driver used that parameter, and most drivers ended up with a bunch
of unused-parameter warnings because it was there.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
No driver used that parameter, and most drivers ended up with a bunch
of unused-parameter warnings because it was there.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
No driver used that parameter, and most drivers ended up with a bunch
of unused-parameter warnings because it was there.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
No driver used that parameter, and most drivers ended up with a bunch
of unused-parameter warnings because it was there.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
No driver used that parameter, and most drivers ended up with a bunch
of unused-parameter warnings because it was there.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Unfortunately, since a previous efficiency improvement, we no longer
have any open-source testcases producing register spilling, so this
code was untested in the fragment shader path. That should change
when we get proper temporary array support in the fragment shader.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=40194
Also, remove the BRW_SAMPLER_MESSAGE_SIMD8_RESINFO #define because
there totally isn't a SIMD8 variant.
Unfortunately, resinfo returns FLOAT32 on Broadwater/Crestline, unlike
G45 which returns a proper UINT32. This turns out to be simple,
however: when we emit MOVs to select the desired half of the SIMD16
result, we can simply override the register type to be float so it's
converted to an integer.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Not all texturing operations return floating point data. For example,
the resinfo message (textureSize or TXS) returns integer data. In the
future, we'll also add integer texture support.
ir_texture's type field contains this information; use its base type to
appropriately type the destination register. We want to keep it as a
four component vector, however, since SIMD8 samplers always have a
response length of 4.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Formats were based on a patch sent to xf86-video-nouveau by Bryan Cain
Signed-off-by: Maarten Lankhorst <m.b.lankhorst@gmail.com>
[Michel Dänzer: Add xorg_xvmc.c to SConscript.]
The driver may install its own vertex shader. _mesa_set_vp_override
must be called so that core mesa can generate correct fragment program..
Reviewed-by: Brian Paul <brianp@vmware.com>
Factor out source lists from Makefile to Makefile.sources, and let
Makefile, SConscript, and Android.mk share it.
Note that files in $(GENERATED_SOURCES) are removed from $(C_SOURCES).
Acked-by: José Fonseca <jfonseca@vmware.com>
Acked-by: Chad Versace <chad@chad-versace.us>
ParseSourceList() can be used to parse a source list file and returns
the source files defined in it. It is supposed to be used like this
# get the list of source files from C_SOURCES in Makefile.sources
sources = env.ParseSourceList('Makefile.sources', 'C_SOURCES')
The syntax of a source list file is compatible with GNU Make. This
effectively allows SConscript and Makefile to share the source lists.
Acked-by: José Fonseca <jfonseca@vmware.com>
Acked-by: Chad Versace <chad@chad-versace.us>
There is no ir_hierarchical_visitor::visit(ir_if *) method, since ir_if
is not a leaf node. Instead, there are visit_enter and visit_leave
methods. Use visit_enter arbitrarily (either would work fine, though
visit_enter will catch errors sooner).
Found thanks to a warning emitted by Clang.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
When intel_context requires separate stencil but the DRI2 separate stencil
handshake fails, then abort and emit an error instructing the user to
upgrade the DDX to 2.16.0.
CC: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Implement the any() part of the operation the same way regular ir_unop_any
is implemented.
This is a port of commit e7bf096e8b to glsl_to_tgsi, with added integer
support.
Logical-or is implemented using addition (followed by clamping to [0,1]) on
values of 0.0 and 1.0. Replacing the logical-or operators with addition gives
a + b which has a result on the range [0, 2].
Previously a SNE instruction was used to clamp the resulting logic value to
[0,1]. In a fragment shader, using a saturate on the add has the same effect.
Adding the saturate to the add is free, so (at least) one instruction is
saved. In a vertex shader, using an SLT on the negation of the add result has
the same effect. Many older shader architectures do not support the SNE
instruction. It must be emulated using two SLT instructions and an ADD. On
these architectures, the single SLT saves two instructions.
Note that SNE is still used when integers are used for boolean values, since
there is no such thing as an integer saturate, and older shader architectures
without SNE don't support integers.
This is a port of commit 41f8ffe5e0 to glsl_to_tgsi with integer support
added.
Since this is the software path, set GRALLOC_USAGE_SW_WRITE_OFTEN when
PIPE_BIND_RENDER_TARGET, and set GRALLOC_USAGE_SW_READ_OFTEN when
PIPE_BIND_SAMPLER_VIEW.
libGLES_mesa with swrast should link in these libraries
libmesa_egl
libmesa_egl_gallium
libmesa_st_egl
libmesa_st_mesa
libmesa_glsl
libmesa_glsl_utils
libmesa_pipe_softpipe
libmesa_winsys_sw_android
libmesa_gallium
Reviewed-by: Chad Versace <chad@chad-versace.us>
This builds the static library libmesa_glsl and executable glsl_compiler
from glsl. glsl_compiler is only installed for engineering build.
Reviewed-by: Chad Versace <chad@chad-versace.us>
This is the first step to integrate Mesa into Android(-x86) build
system. You can git clone mesa under the external/ directory of Android
source tree and build Android with
$ make BOARD_GPU_DRIVERS=swrast
It will build libGLES_mesa that will be loaded by Android runtime.
libGLES_mesa is still a stub in this commit.
Both HW and SW rendering are supported for Android. For SW rendering,
we use the generic gralloc lock/unlock for mapping and unmapping color
buffers (in winsys/android).
For HW rendering, we need to know the real type of color buffers. This
backend works with drm_gralloc, where a color buffer is backed by a GEM
object.
On Android, color buffers are passed between server and clients as
opaque buffer_handle_t. This winsys makes use of gralloc, which
provides a generic way to map and unmap buffer_handle_t for CPU access.
Add EGL_ANDROID_image_native_buffer and EGL_ANDROID_swap_rectangle.
There is no spec for them though.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Chad Versace <chad@chad-versace.us>
Android uses Linux kernel and its own C runtime. It resembles
PIPE_OS_LINUX a lot with some minor exceptions.
Reviewed-by: Brian Paul <brianp@vmware.com>
Move vbo_exec_FlushVertices_internal out of FEATURE_beginend.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Chad Versace <chad@chad-versace.us>
Makes the new vertex shader backend work on Ivybridge.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
When ctx->Const.NativeIntegers is set, Core Mesa loads integer/boolean
uniforms directly, rather than loading the floating point equivalent.
So, when that's set, we don't need to perform any conversions.
Unfortunately, we can't properly support native integers with the old
vertex shader backend, so this patch leaves them disabled for now.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, native integer support was based on whether the driver
advertised GLSL 1.30 or not. However, drivers that natively support
integers may wish to do so for older GLSL versions as well. Adding this
new opt-in flag allows them to do so.
Currently disabled by default on all drivers, which was the existing
behavior (no drivers currently implement GLSL 1.30).
Fixes piglit tests on i965 with INTEL_GLSL_VERSION=130 set:
- spec/glsl-1.10/fs-uniform-int-110.shader_test
- spec/glsl-1.30/fs-uniform-int-130.shader_test
(it was doubly converting the data)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Fixes vs-atan-* and several others. This is not the real solution we
eventually want, which will pack floats, vec2s, and vec3s into vec4
registers, but this code should provide the framework for that.
This is a rather pessimistic calculation, since it doesn't distinguish
individual channels of a vec4, or elements of an array, but should be
a minimum start for register allocation.
The areamap contains precomputed data on different aliasing types.
It is necessary for good performance.
Signed-off-by: Lauri Kasanen <cand@gmx.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
The state tracker expects depth and stencil pixels interleaved.
Evergreen can bind an interleaved depth-stencil resource as a colorbuffer,
but not as a zbuffer.
The hardware can do the interleaving for us when decompressing.
Such that it actually works in apps which use both.
A separate buffer is allocated for stencil. The only exception is
the window-system-provided depth-stencil buffer, where depth and stencil
share the same buffer.
This fixes:
- fbo-depthstencil-GL_DEPTH24_STENCIL8-clear
- fbo-depthstencil-GL_DEPTH24_STENCIL8-drawpixels-FLOAT-and-USHORT
- fbo-depthstencil-GL_DEPTH24_STENCIL8-readpixels-24_8
- fbo-depthstencil-GL_DEPTH24_STENCIL8-readpixels-FLOAT-and-USHORT
This was an unfinished to-do item before.
With this patch and the two preceeding patches, piglit's
fbo-generatemipmap-array test runs and passes instead of generating
a GL error and dying on an assertion.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We could do 1D/2D arrays with textured quad rendering, but it'll take
some work (as with 3D textures).
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Q should not be significant for OPCODE_TEX, but it winds up getting
passed to the compute_lambda() function. Make sure it's 1.0 to
prevent garbage values, which is effectively what we get when the
swizzle is coord.xyzz (which is what GLSL gives us).
Part of the fix for piglit's fbo-generatemipmap-array test.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Declare _mesa_meta_begin()/end() in meta.h so that drivers can write
custom meta-ops (such as HiZ resolves for i965).
This necessitates moving the the META_* macros into meta.h. To prevent
naming collisions, this commit renames each macro to be MESA_META_*.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Commit 6eff33dc (glapi: generate ES dispatch headers from core mesa)
replaced the autogenerated files
src/mapi/es1api/main/{dispatch,remap_helper}.h with new autogenerated
files src/mesa/main/api_exec_es{1,2}_{dispatch,remap_helper}.h. This
patch updates the .gitignore files to properly ignore the new
autogenerated files, and stop ignoring the old autogenerated files.
Reviewed-by: Chia-I Wu <olv@lunarg.com>
The flush extensions flush call indicates end of frame and should only
be called once per frame. However, in the dri2SwapBuffer fallback
path, we call flush and then call dri2CopySubBuffer, which also calls
flush. Refactor the code to only call flush once.
Needed for GL3.
v2: evergreen support
I don't set PA_SU_SC_MODE_CNTL.MULTI_PRIM_IB_ENA.
piglit/primitive-restart does pass though. Tested on RV730 and EG-REDWOOD.
The MUL opcode does a 16bit * 32bit multiply, and we need to do the
MACH to get the top 16bit * 32bit added in.
Fixes fs-op-mult-int-*, fs-op-mult-ivec*
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Shader Model 3.0[1] requires that shaders be able to execute at least
65536 instructions. Bump Mesa maxExec to that limit. This allows
several vertex shaders in the OpenGL ES 2.0 conformance test suite to
run to completion.
1: http://en.wikipedia.org/wiki/High_Level_Shader_Language
Reviewed-by: Eric Anholt <eric@anholt.net>
This cleans up some code generated by the IR-to-Mesa pass for i915.
In particular, some shaders involving arrays of constant matrices
result in really bad code.
v2: Silence several warnings from merging the gl_constant_value work.
Fix DP[23] folding. Add support for a bunch more opcodes that appear
in piglit runs on i915.
Reviewed-by: Eric Anholt <eric@anholt.net>
!a && b occurs frequently when nexted if-statements have been
flattened. It should also be possible use a MAD for (a && b) || c,
though that would require a MAD_SAT.
Reviewed-by: Eric Anholt <eric@anholt.net>
The operation ir_binop_all_equal is !(a.x != b.x || a.y != b.y || a.z
!= b.z || a.w != b.w). Logical-or is implemented using addition
(followed by clampling to [0,1]) on values of 0.0 and 1.0. Replacing
the logical-or operators with addition gives !bool((int(a.x != b.x) +
int(a.y == b.y) + int(a.z == b.z) + int(a.w == b.w)). This can be
implemented using a dot-product with a vector of all 1.0. After the
dot-product, the value will be an integer on the range [0,4].
Previously a SEQ instruction was used to clamp the resulting logic
value to [0,1] and invert the result. Using an SGE instruction on the
negation of the dot-product result has the same effect. Many older
shader architectures do not support the SEQ instruction. It must be
emulated using two SGE instructions and a MUL. On these
architectures, the single SGE saves two instructions.
Reviewed-by: Eric Anholt <eric@anholt.net>
The operation ir_binop_any_nequal is (a.x != b.x) || (a.y != b.y) ||
(a.z != b.z) || (a.w != b.w), and that is the same as any(bvec4(a.x !=
b.x, a.y != b.y, a.z != b.z, a.w != b.w)). Implement the any() part
the same way the regular ir_unop_any is implemented.
Reviewed-by: Eric Anholt <eric@anholt.net>
This is just like the ir_binop_logic_or case. The operation
ir_unop_any is (a.x || a.y || a.z || a.w). Logical-or is implemented
using addition (followed by clampling to [0,1]) on values of 0.0 and
1.0. Replacing the logical-or operators with addition gives (a.x +
a.y + a.z + a.w). This can be implemented using a dot-product with a
vector of all 1.0.
Previously a SNE instruction was used to clamp the resulting logic
value to [0,1]. In a fragment shader, using a saturate on the
dot-product has the same effect. Adding the saturate to the
dot-product is free, so (at least) one instruction is saved.
In a vertex shader, using an SLT on the negation of the dot-product
result has the same effect. Many older shader architectures do not
support the SNE instruction. It must be emulated using two SLT
instructions and an ADD. On these architectures, the single SLT saves
two instructions.
Reviewed-by: Eric Anholt <eric@anholt.net>
Logical-or is implemented using addition (followed by clampling to
[0,1]) on values of 0.0 and 1.0. Replacing the logical-or operators
with addition gives a + b which has a result on the range [0, 2].
Previously a SNE instruction was used to clamp the resulting logic
value to [0,1]. In a fragment shader, using a saturate on the add has
the same effect. Adding the saturate to the add is free, so (at
least) one instruction is saved.
In a vertex shader, using an SLT on the negation of the add result has
the same effect. Many older shader architectures do not support the
SNE instruction. It must be emulated using two SLT instructions and
an ADD. On these architectures, the single SLT saves two
instructions.
Reviewed-by: Eric Anholt <eric@anholt.net>
Remove the inclusion of fpu_control.h from compiler.h. Since Bionic lacks
fpu_control.h, this fixes the Android build.
Also remove the sole use of the fpu_control bits, which was in debug.c.
Those were brianp's debug bits, and he approved of their removal.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad@chad-versace.us>
We can't just look at the instruction that happens to appear at the
start of the loop, because it might be some other exec size and cause
us to only loop on the first N channels. We always want 8 in our
current code (since 16 doesn't work so we don't do 16-wide fragment in
that case).
Fixes loop-03.vert, which was triggering the assertions.
Link failure is something that shouldn't happen, but we sometimes want
it during development. The precompile also allows analysis of shader
codegen with shader-db.
This fixes most of the regressions in the vs array test set from the
varying array indexing work, since the giant array that was originally
allocated in virtual GRF space never gets used and is only ever
read/stored from scratch space.
We keep building these strange interfaces for DP read/write where
there's a helper function with some partially-specific,
partially-general controls, which is used in exactly one place in code
generation. Making these public will let us set up those instructions
in the one place they're to be generated.
For structs/arrays/matrices, they were ending up as uint because we
forgot to set them. All varyings in GLSL 1.20 are of base type float,
so just force the matter here (which gets inherited at
emit_urb_writes() time).
Fixes vs-varying-array-mat2-col-rd.
The low-level IR is a mashup of brw_fs.cpp and ir_to_mesa.cpp. It's
currently controlled by the INTEL_NEW_VS=1 environment variable, and
only tested for the trivial "gl_Position = gl_Vertex;" shader so far.
This will be used by the new vertex shader backend. The scalarizing
passes are skipped for non-fragment, since vertex and geometry threads
are based on vec4s.
This patch fixes a bug when lowering an integer division:
x/y
to a multiplication by a reciprocal:
int(float(x)*reciprocal(float(y)))
If x was a plain int and y was an ivecN, the lowering pass
incorrectly assigned the type of the product to be float, when in fact
it should be vecN. This caused mesa to abort with an IR validation
error.
Fixes piglit tests {fs,vs}-op-div-int-ivec{2,3,4}.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I also needed to make some changes in u_vbuf_mgr in order to override
the caps from the driver and enable the fallback even though the driver
claims the format is supported.
It's going to flush client's commands in eglWaitClient(). Before this,
egl applications using pixmap or pbuffer flicker because of no flush.
Reviewed-by: Alan Hourihane
The vs-varying-array-mat2-col-row-wr test writes a mat2[3] constant to
a mat2[3] varying out array, and also statically accesses element 1 of
it on the VS and FS sides. At link time it would get trimmed down to
just 2 elements, and then codegen of the VS would end up generating
assignments to the unallocated last entry of the array. On the new
i965 VS backend, that happened to land on the vertex position.
Some issues remain in this test on softpipe, i965/old-vs and
i965/new-vs on visual inspection, but i965 is passing because only one
green pixel is probed, not the whole split green/red quad.
This patch extends ir_validate.cpp to check the following
characteristics of each ir_call:
- The number of actual parameters must match the number of formal
parameters in the signature.
- The type of each actual parameter must match the type of the
corresponding formal parameter in the signature.
- Each "out" or "inout" actual parameter must be an lvalue.
Reviewed-by: Chad Versace <chad@chad-versace.us>
These functions don't modify the target instruction, so it makes sense
to make them const. This allows these functions to be called from ir
validation code (which uses const to ensure that it doesn't
accidentally modify the IR being validated).
Reviewed-by: Chad Versace <chad@chad-versace.us>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When an out parameter undergoes an implicit type conversion, we need
to store it in a temporary, and then after the call completes, convert
the resulting value. In other words, we convert code like the
following:
void f(out int x);
float value;
f(value);
Into IR that's equivalent to this:
void f(out int x);
float value;
int out_parameter_conversion;
f(out_parameter_conversion);
value = float(out_parameter_conversion);
This transformation needs to happen during ast-to-IR convertion (as
opposed to, say, a lowering pass), because it is invalid IR for formal
and actual parameters to have types that don't match.
Fixes piglit tests
spec/glsl-1.20/compiler/qualifiers/out-conversion-int-to-float.vert and
spec/glsl-1.20/execution/qualifiers/vs-out-conversion-*.shader_test,
and bug 39651.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=39651
Reviewed-by: Chad Versace <chad@chad-versace.us>
libGLw is an old OpenGL widget library with optional Motif support.
It almost never changes and very few people actually still care about
it, so we've decided to ship it separately.
The new home for libGLw is: git://git.freedesktop.org/mesa/glw/
Reviewed-by: Brian Paul <brianp@vmware.com>
Previously if-statements were lowered from inner-most to outer-most
(i.e., bottom-up). All assignments within an if-statement would have
the condition of the if-statement appended to its existing condition.
As a result the assignments from a deeply nested if-statement would
have a very long and complex condition.
Several shaders in the OpenGL ES2 conformance test suite contain
non-constant array indexing that has been lowered by the shader
writer. These tests usually look something like:
if (i == 0) {
value = array[0];
} else if (i == 1) {
value = array[1];
} else ...
The IR for the last assignment ends up as:
(assign (expression bool && (expression bool ! (var_ref if_to_cond_assign_condition) ) (expression bool && (expression bool ! (var_ref if_to_cond_assign_condition@20) ) (expression bool && (expression bool ! (var_ref if_to_cond_assign_condition@22) ) (expression bool && (expression bool ! (var_ref if_to_cond_assign_condition@24) ) (var_ref if_to_cond_assign_condition@26) ) ) ) ) (x) (var_ref value) (array_ref (var_ref array) (constant int (5)))
The Mesa IR that is generated from this is just as awesome as you
might expect.
Three changes are made to the way if-statements are lowered.
1. Two condition variables, if_to_cond_assign_then and
if_to_cond_assign_else, are created for each if-then-else structure.
The former contains the "positive" condition, and the later contains
the "negative" condtion. This change was implemented in the previous
patch.
2. Each condition variable is added to a hash-table when it is created.
3. When lowering an if-statement, assignments to existing condtion
variables get the current condition anded. This ensures that nested
condition variables are only set to true when the condition variable
for all outer if-statements is also true.
Changes #1 and #3 combine to ensure the correctness of the resulting
code.
4. When a condition assignment is encountered with a condition that is
a dereference of a previously added condition variable, the condition
is not modified.
Change #4 prevents the continuous accumulation of conditions on
assignments.
If the original if-statements were:
if (x) {
if (a && b && c && d && e) {
...
} else {
...
}
} else {
if (g && h && i && j && k) {
...
} else {
...
}
}
The lowered code will be
if_to_cond_assign_then@1 = x;
if_to_cond_assign_then@2 = a && b && c && d && e
&& if_to_cond_assign_then@1;
...
if_to_cond_assign_else@2 = !if_to_cond_assign_then
&& if_to_cond_assign_then@1;
...
if_to_cond_assign_else@1 = !if_to_cond_assign_then@1;
if_to_cond_assign_then@3 = g && h && i && j;
&& if_to_cond_assign_else@1;
...
if_to_cond_assign_else@3 = !if_to_cond_assign_then
&& if_to_cond_assign_else@1;
...
Depending on how instructions are emitted, there may be an extra
instruction due to the duplication of the '&&
if_to_cond_assign_{then,else}@1' on the nested else conditions. In
addition, this may cause some unnecessary register pressure since in
the simple case (where the nested conditions are not complex) the
nested then-condition variables are live longer than strictly
necessary.
Before this change, one of the shaders in the OpenGL ES2 conformance
test suite's acos_float_frag_xvary generated 348 Mesa IR instructions.
After this change it only generates 124. Many, but not all, of these
instructions would have also been eliminated by CSE.
Reviewed-by: Eric Anholt <eric@anholt.net>
Now the condition (for the then-clause) and the inverse condition (for
the else-clause) get written to separate temporary variables. In the
presence of complex conditions, this shouldn't result in more code
being generated. If the original if-statement was
if (a && b && c && d && e) {
...
} else {
...
}
The lowered code will be
if_to_cond_assign_then = a && b && c && d && e;
...
if_to_cond_assign_else = !if_to_cond_assign_then;
...
Reviewed-by: Eric Anholt <eric@anholt.net>
EGL doesnt define howto manage different native platforms.
So mesa has a builtime configurable default platform,
whith non-standard envvar (EGL_PLATFORM) overwrites.
This caused unneeded bugreports, when EGL_PLATFORM was forgotten.
Detection is grouped into basic types of NativeDisplays (which itself
needs to be detected). The final decision is based on characteristcs
of these basic types:
File Desciptor based platforms (fbdev):
- fstat(2) to check for being a fd that belongs to a character device
- check kernel subsystem (todo)
Pointer to structuctures (x11, wayland, drm/gbm):
- mincore(2) to check whether its valid pointer to some memory.
- magic elements (e.g. pointers to exported symbols):
o wayland display stores interface type pointer (first elm.)
o gbm stores pointer to its constructor (first elm.)
o x11 as a fallback (FIXME?)
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
GLESv1 and GLESv2 have their own dispatch.h and remap_helper.h. These
headers are only used by api_exec_es1.c and api_exec_es2.c in core mesa.
Move the rules to generate them from glapi to core mesa.
Reviewed-by: Brian Paul <brianp@vmware.com>
[olv: updated after reviewing to fix SCons build]
glapi_gen.mk is supposed to be included by glapi users to simplify
header generation. This commit also makes es1api, es2api, and
shared-glapi use it.
Reviewed-by: Brian Paul <brianp@vmware.com>
[olv: updated after reviewing to prefix all variables in glapi_gen.mk by
glapi_gen]
glapi/gen-es/ defines two sets of GLAPI XMLs for OpenGL ES 1.1
(es1_API.xml) and 2.0 (es2_API.xml) respectively. They are used to
generate dispatch.h and remap_helper.h for GLES. Together with
gl_and_es_API.xml, we have to maintain three sets of GLAPI XMLs.
This commit makes dispatch.h and remap_helper.h for GLES be generated
from gl_and_es_API.xml.
Reviewed-by: Brian Paul <brianp@vmware.com>
add gl_api::filter_functions and gl_function::filter_entry_points to
filter out unwanted functions and entry points.
Reviewed-by: Brian Paul <brianp@vmware.com>
Move the list of entry points belong to GLES from mapi_abi.py to a new
file.
Until we figure out how to describe the APIs an entry point belongs to
in the XML file, and how to handle the case where an entry point others
alias is missing in some APIs, this is an easier solution than
maintaining another two sets of XMLs in glapi/gen-es/.
Reviewed-by: Brian Paul <brianp@vmware.com>
Remove the 'f' suffix from a float literal.
- .float 0.0f+1.0
+ .float 1.0
This fixes the following compile error with clang:
error: unexpected token in directive
.float 0.0f+1.0
^
Note: This is a candidate for the stable branches.
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Optional parallel rendering of spans using OpenMP.
Initial implementation for aa triangles. A new option for scons is
also provided to activate the openmp support (off by default).
Signed-off-by: Brian Paul <brianp@vmware.com>
After copy buffer on preGEN6, it is necessary to wait for the blit to
complete before returning data to the user.
This should fix the piglit test: copy_buffer_coherency (pre-GEN6).
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
"reg" was set in only one case, virtual GRFs pre register allocation,
and would be unset and have hw_reg set after allocation. Since we
never bothered with looking at virtual GRF number after allocation
anyway, just use the same storage and avoid confusion.
Besides separating out a logical step of the giant register allocator
function, this now communicates a bunch of the allocator information
through entries in brw_context, which will make this code partially
reusable for caching the expensive allocator setup.
It's fewer pointers to track, and when we start caching the register
set, should be algorithmically better in the cache hit case (lookup in
a byte-per-register array, instead of a linear walk through
desctiption of register classes to find how to translate that class).
This was a debugging aid at one point -- virtual grf 0 should never be
allocated, and it would be used if undefined register access occurred
in codegen. However, it made the confusing register allocation code
even more confusing by indexing things off of 1 all over.
At least one of the invariants verified by IR validation concerns the
relative ordering of toplevel constructs in the IR: references to
global variables must come after the declarations of those global
variables.
Since linking affects the ordering of toplevel constructs in the IR,
it's possible that a bug in the linker will cause invalid IR to be
generated, even if all the pre-linked shaders are valid. (In fact,
such a bug was fixed by the previous commit.)
Bugs like this are easily masked by further optimization passes,
particularly inlining. So to make them easier to track down, this
patch addes an IR validation step right after linking, and before
final optimization occurs. The validation only occurs on debug
builds.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When link_functions.cpp adds a new function to the final linked
program, it needs to add it after any global variable declarations
that the function refers to, otherwise the IR will be invalid (because
variable declarations must occur before variable accesses). The
easiest way to do that is to have the linker emit functions to the
tail of the final linked program.
The linker used to emit functions to the head of the final linked
program, in an effort to keep callees sorted before their callers.
However, this was not reliable: it didn't work for functions declared
or defined in the same compilation unit as main, for diamond-shaped
patterns in the call graph, or for some obscure cases involving
overloaded functions. And no code currently relies on this sort
order.
No Piglit regressions with i965 Ironlake.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
process_array_type() contains an assertion to verify that no IR
instructions are generated while processing the expression that
specifies the size of the array. This assertion needs to happen
_after_ checking whether the expression is constant. Otherwise we may
crash on an illegal shader rather than reporting an error.
Fixes piglit tests array-size-non-builtin-function.vert and
array-size-with-side-effect.vert.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Rearranged the logic for converting the ast for a function call to
hir, so that we constant fold before emitting any IR. Previously we
would emit some IR, and then only later detect whether we could
constant fold. The unnecessary IR would usually get cleaned up by a
later optimization step, however in the case of a builtin function
being used to compute an array size, it was causing an assertion.
Fixes Piglit test array-size-constant-relational.vert.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38625
The ast-to-hir conversion needs to emit function signatures in two
circumstances: when a function declaration (or definition) is
encountered, and when a built-in function is encountered.
To avoid emitting a function signature in an illegal place (such as
inside a function), emit_function() checked whether we were inside a
function definition, and if so, emitted the signature before the
function definition.
However, this didn't cover the case of emitting function signatures
for built-in functions when those built-in functions are called from
inside the constant integer expression that specifies the length of a
global array. This failed because when processing an array length, we
are emitting IR into a dummy exec_list (see process_array_type() in
ast_to_hir.cpp). process_array_type() later checks (via an assertion)
that no instructions were emitted to the dummy exec_list, based on the
reasonable assumption that we shouldn't need to emit instructions to
calculate the value of a constant.
This patch changes emit_function() so that it emits function
signatures at toplevel in all cases.
This partially fixes bug 38625
(https://bugs.freedesktop.org/show_bug.cgi?id=38625). The remainder
of the fix is in the patch that follows.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
opt_dead_functions contained a shortcut to skip processing the first
function's body, based on the assumption that IR functions are
topologically sorted, with callees always coming before their callers
(therefore the first function cannot contain any calls).
This assumption turns out not to be true in general. For example, the
following code snippet gets translated to IR that violates this
assumption:
void f();
void g();
void f() { g(); }
void g() { ... }
In practice, the shortcut didn't cause bugs because of a coincidence
of the circumstances in which opt_dead_functions is called:
(a) we do inlining right before dead function elimination, and
inlining (when successful) eliminates all calls.
(b) for user-defined functions, inlining is always successful, because
previous optimization passes (during compilation) have reduced
them to a form that is eligible for inlining.
(c) the function that appears first in the IR can't possibly call a
built-in function, because built-in functions are always emitted
before the function that calls them.
It seems unnecessarily fragile to have opt_dead_functions depend on
these coincidences. And the next patch in this series will break (c).
So I'm reverting the shortcut. The consequence will be a slight
increase in link time for complex shaders.
This reverts commit c75427f4c8.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This reverts an unnecessary part of commit 4683529048 and fixes misrendering
and an assertion failure in Cogs.
Fixes freedesktop.org bug 39888.
Reviewed-by: Brian Paul <brianp@vmware.com>
If there are any cases left where the st thinks that RGBA -> BGRA
will swap components, it will get what it deserves.
Now the GPU's 2D engine goes unused. What a shame.
validate_program relies on validate_shader_program to fill in errMsg;
empirically, there exist cases where that doesn't happen.
While tracking those down may be worthwhile, initializing the string so
we don't try to ralloc_strdup random garbage also seems wise.
Fixes issues caught by valgrind while running some test case.
NOTE: This is a candidate for stable release branches.
Reviewed-by: Chad Versace <chad@chad-versace.us>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
DRI2 will throw BadRequest for this when the client is not local, but
DRI2 is an implementation detail and not something callers should have
to know about. Silently swallow errors in this case, and just propagate
the failure through DRI2Connect's return code.
Note: This is a candidate for the stable release branches.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=28125
Signed-off-by: Christopher James Halse Rogers <christopher.halse.rogers@canonical.com>
This saves both register space and upload bandwidth for unused values.
Note that previously we were relying on the visitor not initially
generating references to different sets of uniforms between the 8-wide
and 16-wide code generation, and now we're relying on them dead-code
eliminating the same stuff, too.
We should remove the relocations which caused a validation failure
from the list, so that the kernel receives only the validated ones.
NOTE: This is a candidate for the 7.11 branch.
That code drops performance in Unigine Heaven and Tropics
by a factor of 10. That's too crazy even for a debug build.
NOTE: This is a candidate for the 7.11 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
Unlike C++, empty declarations such as
float;
should be valid. The spec is not explicit about this actually.
Some apps that generate their shader sources may rely on this. This was
noted when porting one of them to Linux from Windows.
Reviewed-by: Chad Versace <chad@chad-versace.us>
Note: this is a candidate for the 7.11 branch.
Resolve via glBlitFramebuffer allows resolving a sub-region of a
renderbuffer to a different location in any mipmap level of some
other texture, and, with a new extension, even scaling. Therefore,
location and size parameters are needed.
The mask parameter was added because resolving only depth or only
stencil of a combined buffer is possible as well.
Full information about the blit operation allows the drivers to
take the most efficient path they possibly can.
This avoids the following runtime error with EGL on platforms that
require linking with libm for nontrivial math functions:
failed to load module: /xorg/lib64/gbm/gbm_gallium_drm.so: undefined
symbol: powf
(Based on Kristóf RALOVICHs patch and Ian's suggestions in
http://lists.freedesktop.org/archives/mesa-dev/2011-August/010036.html)
Use backend_map kernel query if supported, otherwise analyze ZPASS_DONE
results to get the mask.
Fixes lockups with predicated rendering due to incorrect query buffer
initialization on some cards.
Note: this is a candidate for the 7.11 branch.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
These looked more like copy-and-paste to me than the others (which
looked more like possibly someone forgot to write some code in a
refactor), so I didn't verify where they came from.
This makes piglit a lot more happy. The errors are logged when
INTEL_DEBUG=fallbacks because the application is about to hit a big
software fallback. We frequently ask people to run applications that
are hitting software fallbacks with INTEL_DEBUG=fallbacks so the we
can help them debug the reason for the software fallback.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This can only happen in GLSL shaders because assembly shaders that use
too many temps are rejected by core Mesa. It is easiest to make this
happen with shaders that contain flow-control that could not be lowered.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Rely on the driver to do the right thing. This probably means falling
back to software. Page 88 of the OpenGL 2.1 spec specifically says:
"A shader should not fail to compile, and a program object should
not fail to link due to lack of instruction space or lack of
temporary variables. Implementations should ensure that all valid
shaders and program objects may be successfully compiled, linked
and executed."
There is no provision for saying "No" to a valid shader that is
difficult for the hardware to handle, so stop doing that.
On i915 this causes a large number of piglit tests to change from FAIL
to WARN. The warning is because the driver still emits messages to
stderr like "i915_program_error: Unsupported opcode: BGNLOOP".
It also fixes ES2 conformance CorrectFull_frag and CorrectParse1_frag
on i915 (and probably other hardware that can't handle loops).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This prevents assertion failures in ralloc_strcat. The ralloc_free in
_mesa_free_shader_program_data can be omitted because freeing the
gl_shader_program in _mesa_delete_shader_program will take care of
this automatically.
A bunch of this code could use a refactor to use ralloc a bit more
effectively. A bunch of the things that are allocated with malloc and
owned by the gl_shader_program should be allocated with ralloc (using
the gl_shader_program as the context).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
linker_warning is a new function. It's identical to linker_error
except that it doesn't set LinkStatus=false and it prepends "warning: "
on messages instead of "error: ".
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Remove the other places that set LinkStatus to false since they all
immediately follow a call to linker_error. The function linker_error
was previously known as linker_error_printf. The name was changed
because it may seem surprising that a printf function will set an
error flag.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
For power-of-two sizes, h0 == mt->height0 since it's already a multiple
of two. However, for NPOT, they're different; h1 should be computed
based on the original size.
Fixes piglit test "cubemap npot" and oglconform test "textureNPOT".
NOTE: This is a candidate for stable release branches.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Before, if any uniform or constant array was accessed with indirect
addressing, st_translate_program() would emit uniform constants in the place
of immediates. This behavior was unavoidable with ir_to_mesa/mesa_to_tgsi, but
glsl_to_tgsi can work around it since the GLSL IR backend and the TGSI
emission are both inside the state tracker.
Fixes a regression unintentionally introduced by "glsl_to_tgsi: fix shaders with
indirect addressing of temps" that caused missing leaves in 3dmark01 test 4 (Nature)
and missing/displaced textures on human models in Counter-Strike: Source.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Bryan Cain <bryancain3@gmail.com>
Thanks to Kenneth Graunke for pointing out that glsl_type::get_instance(base, 4, 1)
is the same as glsl_type::get_vec4_type(base).
The function was only used in st_glsl_to_tgsi, and this commit replaces that usage
with get_instance.
Disabled by default on all drivers. To enable it, change ctx->GLSLVersion to 130
in st_extensions.c. Currently, softpipe is the only driver with integer support.
The functionality is not used by anything yet, and the glUniform functions will
need to be reworked before this can reach its full usefulness. It is
nonetheless a step towards integer support in the state tracker and classic drivers.
It is still a work in progress at this point, but it produces working and
reasonably well-optimized code.
Originally based on ir_to_mesa and st_mesa_to_tgsi, but does not directly use
Mesa IR instructions in TGSI generation, instead generating TGSI from the
intermediate class glsl_to_tgsi_instruction. It also has new optimization
passes to replace _mesa_optimize_program.
The previous formula for atan(x,y) returned a value of +/- pi whenever
|x|<0.0001, and used a formula based on atan(y/x) otherwise. This
broke in cases where both x and y were small (e.g. atan(1e-5, 1e-5)).
This patch modifies the formula so that it returns a value of +/- pi
whenever |x|<1e-8*|y|, and uses the formula based on atan(y/x)
otherwise.
The previous formula for asin(x) was algebraically equivalent to:
sign(x)*(pi/2 - sqrt(1-|x|)*(A + B|x| + C|x|^2))
where A, B, and C were arbitrary constants determined by a curve fit.
This formula had a worst case absolute error of 0.00448, an unbounded
worst case relative error, and a discontinuity near x=0.
Changed the formula to:
sign(x)*(pi/2 - sqrt(1-|x|)*(pi/2 + (pi/4-1)|x| + A|x|^2 + B|x|^3))
where A and B are arbitrary constants determined by a curve fit. This
has a worst case absolute error of 0.00039, a worst case relative
error of 0.000405, and no discontinuities.
I don't expect a significant performance degradation, since the extra
multiply-accumulate should be fast compared to the sqrt() computation.
Fixes piglit tests {vs,fs}-asin-float and {vs,fs}-atan-*
The function used a variable named 'score', which was an outright lie.
A signature matches or it doesn't; there is no fuzzy scoring.
Change the return type of parameter_lists_match() to an enum, and
let ir_function::matching_sigature() switch on that enum.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Array constructors obey narrower conversion rules than other constructors
[1] --- they use the implicit conversion rules [2] instead of the scalar
constructor conversions [3]. But process_array_constructor() was
incorrectly applying the broader rules.
[1] GLSL 1.50 spec, Section 5.4.4 Array Constructors, page 52 (58 of pdf)
[2] GLSL 1.50 spec, Section 4.1.10 Implicit Conversions, page 25 (31 of pdf)
[3] GLSL 1.50 spec, Section 5.4.1 Conversion, page 48 (54 of pdf)
To fix this, first check (with glsl_type::can_be_implicitly_converted_to)
if an implicit conversion is legal before performing the conversion.
Fixes:
piglit:spec/glsl-1.20/compiler/structure-and-array-operations/array-ctor-implicit-conversion-bool-float.vert
piglit:spec/glsl-1.20/compiler/structure-and-array-operations/array-ctor-implicit-conversion-bvec*-vec*.vert
Note: This is a candidate for the 7.10 and 7.11 branches.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
The function is no longer used and has been replaced by
glsl_type::can_implicitly_convert_to().
Note: This is a candidate for the 7.10 and 7.11 branches.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Context
-------
In ast_function_expression::hir(), parameter_lists_match() checks if the
function call's actual parameter list matches the signature's parameter
list, where the match may require implicit conversion of some arguments.
To check if an implicit conversion exists between individual arguments,
type_compare() is used.
Problems
--------
type_compare() allowed the following illegal implicit conversions:
bool -> float
bvecN -> vecN
int -> uint
ivecN -> uvecN
uint -> int
uvecN -> ivecN
Change
------
type_compare() is buggy, so replace it with glsl_type::can_be_implicitly_converted_to().
This comprises a rewrite of parameter_lists_match().
Fixes piglit:spec/glsl-1.20/compiler/built-in-functions/outerProduct-bvec*.vert
Note: This is a candidate for the 7.10 and 7.11 branches.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
This method checks if a source type is identical to or can be implicitly
converted to a target type according to the GLSL 1.20 spec, Section 4.1.10
Implicit Conversions.
The following commits use the method for a bugfix:
glsl: Fix implicit conversions in non-constructor function calls
glsl: Fix implicit conversions in array constructors
Note: This is a candidate for the 7.10 and 7.11 branches.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
This is part of fixing a ~1% performance regression in OpenArena when
changing the fixed function fragment shader to using the new backend.
Right now this just avoids the LINTERP of the projector, not the math
using it.
Certain attributes (position, psize, etc.) don't
count as params; they are handled separately by the hw.
However, the VS is required to export at least one param
and r600_shader_from_tgsi() takes care of adding a dummy
export if there is none. Make sure the VS param export
count in the SPI properly accounts for this.
Note: This is a candidate for the 7.11 branch.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This was done in the old codegen path, but not the new one. Caught by
piglit fbo tests after the conversion to GLSL ff_fragment_shader.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The FF VS generation happens just after the FF FS generation in
state.c, so the ctx->VP._Current value is for the previous state
update's vertex shader, not the one that will be chosen as a result of
this state update. The vertexShader and vertexProgram variables
should be accurately telling us whether there's going to be a
ctx->VP._Current (except on _MaintainTnlProgram drivers, where it's
always true).
The glsl-vs-statechange-1 test was created to test for this, but it
turns out that the bug is hidden by the fact that we call
_mesa_update_state() twice per draw call -- once from
_mesa_valid_to_render() and once from vbo_draw_arrays(), and the
second one was fixing up the first one.
Reviewed-by: Brian Paul <brianp@vmware.com>
We have to make it through this loop processing the color multiple
times, so we can't go overwriting it on our first color buffer.
Reviewed-by: Brian Paul <brianp@vmware.com>
When we do a glReadPixels into the temporary buffer, we don't want to
use GL_LUMINANCE, GL_LUMINANCE_ALPHA or GL_INTENSITY since they will
compute L=R+G+B which is not what we want.
This bug has existed all along but was only exposed by the elimination
of the driver hook for glCopyTexImage() in
5874890c26.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=39604
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
The previous commit removed the last use of this field.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
The purpose of the (irb->draw_offset & 4095) != 0 check was to ensure
that we don't have XYy offsets into a tile, since Gen4 hardware doesn't
support that. However, it's insufficient: there are cases where
draw_offset & 4095 is 0 but we still have a Y-offset. This leads to an
assertion failure in brw_update_renderbuffer_surface with tile_y != 0.
Instead, simply call intel_renderbuffer_tile_offsets to compute the
actual X/Y offsets and check if either are non-zero. This makes both
the workaround and the assertion check the same things.
Fixes piglit test fbo-generatemipmap-formats, and should also fix
bugs #34009 and #39487.
NOTE: This is a candidate for stable release branches.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=34009
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=39487
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Chad Versace <chad@chad-versace.us>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
We were neglecting to load dvdx and dvdy. v is not optional.
Fixes glslparsertests tex-grad-0[12345].frag on Broadwater/Crestline.
(We still need an execution test using sampler1D.)
NOTE: This is a candidate for the 7.11 branch.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
The constant used in the radians() function didn't have enough
precision, causing a relative error of 1.676e-5, which is far worse
than the precision of 32-bit floats. This patch reduces the relative
error to 1.14e-9, which is the best we can do in 32 bits.
Fixes piglit tests {fs,vs}-radians-{float,vec2,vec3,vec4}.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
What a beast.
r300g doesn't depend on files from r300c anymore, so r300c is now left
to its own fate. BTW 'make test' can be invoked from the gallium/r300
directory to run some compiler unit tests.
The implementation deviated slightly from the GL_EXT_texture_sRGB spec
and from other implementations. A giant comment block was added to
justify the somewhat odd behavior of this function.
In addition, the interface had unnecessary cruft. The 'all' parameter
was false at all callers, so it has been removed.
Reviewed-by: Brian Paul <brianp@vmware.com>
If an application requests a generic compressed format for a texture
and the driver does not pick a specific compressed format, return the
generic base format (e.g., GL_RGBA) for the GL_TEXTURE_INTERNAL_FORMAT
query.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=3165
Reviewed-by: Brian Paul <brianp@vmware.com>
lower_variable_index_to_cond_assign runs until it can't make any more
progress. It then returns the result of the last pass which will
always be false. This caused the lowering loop in
_mesa_ir_link_shader to end before doing one last round of
lower_if_to_cond_assign. This caused several if-statements (resulting
from lower_variable_index_to_cond_assign) to be left in the IR.
In addition to this change, lower_variable_index_to_cond_assign should
take a flag indicating whether or not it should even generate
if-statements. This is easily controlled by
switch_generator::linear_sequence_max_length. This would generate
much better code on architectures without any flow contol.
Fixes i915 piglit regressions glsl-texcoord-array and
glsl-fs-vec4-indexing-temp-src.
Reviewed-by: Eric Anholt <eric@anholt.net>
The index buffer state emit only occurred if there was an IB in place
and we were in either a new batch or a new IB state. But because we
only flagged new IB state if IB state changed from the last IB state
we calculated, we could simply never emit IB state after batchbuffer
wraps if the first draw didn't use the IB and we didn't actually
change the IB.
Fixes piglit glx-multi-context-ib-1.
Fixes user-clip on 965 with 3D clears enabled. I created a separate
flag because I wanted to avoid the overhead of the matrix operations
in this path.
Reviewed-by: Brian Paul <brianp@vmware.com>
It turns out that internally the texture cache gets flushed in a
couple of cases, particularly around 2D operations mixed with 3D. In
almost all cases one of those happens between rendering to an
FBO-attached texture and rendering from that texture. However, as of
the next patch, glean tfbo (and the new fbo-flushing-2 test) would
manage to get stale texture values because one of those flushes didn't
occur. The intention of this code was always to get the render cache
cleared and ready to be used from the sampler cache (and it does on <=
gen4), so this just catches gen5 up.
This patch was also tested to fix fbo-flushing on gen7.
When emitting a MAC instruction in a vertex shader, brw_vs_emit()
calls accumulator_contains() to determine whether the accumulator
already contains the appropriate addend; if it does, then we can avoid
emitting an unnecessary MOV instruction.
However, accumulator_contains() wasn't checking the val.negate or
val.abs flags. As a result, if the desired value was the negation, or
the absolute value, of what was already in the accumulator, we would
generate an incorrect shader.
Fixes piglit test vs-refract-vec4-vec4-float.
Tested on Gen5 and Gen6.
Reviewed-by: Eric Anholt <eric@anholt.net>
On Ivybridge, the shadow comparitor goes in the first slot, rather than
at the end. It's not necessary to send u, v, and r.
Fixes tests texturing/texdepth and glean/fbo.
NOTE: This is a candidate for the 7.11 branch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Commit 53c89c67f3 ("i965: Avoid generating
MOVs for assignments of expressions.") added the line "this->result =
reg_undef" all over the code. Unfortunately, since Eric developed his
patch before I landed Ivybridge support, he missed adding it to
fs_visitor::emit_texture_gen7() after rebasing.
Furthermore, since I developed TXD support before Eric's patch, I
neglected to add it to the gradient handling when I rebased.
Neglecting to set this causes the visitor to use this->result as storage
rather than generating a new temporary. These missing statements
resulted in the same register being used to store several different
values.
Fixes the following piglit tests on Ivybridge:
- glsl-fs-shadow2dproj.shader_test
- glsl-fs-shadow2dproj-bias.shader_test
NOTE: This is a candidate for the 7.11 branch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The blend_quad function clobbers the actual render target color/alpha
values while applying the destination blend factor, which results in
restoring the wrong value during the masking stage for write-disabled
channels.
Reviewed-by: Brian Paul <brianp@vmware.com>
Just like the non-constant array index lowering pass, compare all N
indices at once. For accesses to a vec4, this saves 3 comparison
instructions on a vector architecture.
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously the code would just look at deref->array->type to see if it
was a constant. This isn't good enough because deref->array might be
another ir_dereference_array... of a constant. As a result,
deref->array->type wouldn't be a constant, but
deref->variable_referenced() would return NULL. The unchecked NULL
pointer would shortly lead to a segfault.
Instead just look at the return of deref->variable_referenced(). If
it's NULL, assume that either a constant or some other form of
anonymous temporary storage is being dereferenced.
This is a bit hinkey because most drivers treat constant arrays as
uniforms, but the lowering pass treats them as temporaries. This
keeps the behavior of the old code, so this change isn't making things
worse.
Fixes i965 piglit:
vs-temp-array-mat[234]-index-col-rd
vs-temp-array-mat[234]-index-col-row-rd
vs-uniform-array-mat[234]-index-col-rd
vs-uniform-array-mat[234]-index-col-row-rd
Reviewed-by: Eric Anholt <eric@anholt.net>
Leaving the unused registers with other values caused assertion
failures and other problems in places that blindly iterate over all
sources.
brw_vs_emit.c:1381: get_src_reg: Assertion `c->regs[file][index].nr !=
0' failed.
Fixes i965 piglit:
vs-uniform-array-mat[234]-col-row-rd
vs-uniform-array-mat[234]-index-col-row-rd
vs-uniform-array-mat[234]-index-row-rd
vs-uniform-mat[234]-col-row-rd
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This fixes many cases of accessing arrays of matrices using
non-constant indices at each level.
Fixes i965 piglit:
vs-temp-array-mat[234]-index-col-rd
vs-temp-array-mat[234]-index-col-row-rd
vs-temp-array-mat[234]-index-col-wr
vs-uniform-array-mat[234]-index-col-rd
Fixes swrast piglit:
fs-temp-array-mat[234]-index-col-rd
fs-temp-array-mat[234]-index-col-row-rd
fs-temp-array-mat[234]-index-col-wr
fs-uniform-array-mat[234]-index-col-rd
fs-uniform-array-mat[234]-index-col-row-rd
fs-varying-array-mat[234]-index-col-rd
fs-varying-array-mat[234]-index-col-row-rd
vs-temp-array-mat[234]-index-col-rd
vs-temp-array-mat[234]-index-col-row-rd
vs-temp-array-mat[234]-index-col-wr
vs-uniform-array-mat[234]-index-col-rd
vs-uniform-array-mat[234]-index-col-row-rd
vs-varying-array-mat[234]-index-col-rd
vs-varying-array-mat[234]-index-col-row-rd
vs-varying-array-mat[234]-index-col-wr
Reviewed-by: Eric Anholt <eric@anholt.net>
If the non-constant index was in the LHS of an assignment, any
existing condititon on that assignment would be lost.
Reviewed-by: Eric Anholt <eric@anholt.net>
If the non-constant index was in the LHS of an assignment, any
existing condititon on that assignment would be lost.
Fixes i965 piglit:
fs-temp-array-mat[234]-col-row-wr
fs-temp-array-mat[234]-index-col-row-wr
fs-temp-array-mat[234]-index-col-wr
fs-temp-array-mat[234]-index-row-wr
vs-varying-array-mat[234]-index-col-wr
Reviewed-by: Eric Anholt <eric@anholt.net>
The previous implementation could easily get tricked if the LHS of an
assignment included a non-constant index that was "inside" another
dereference. For example:
mat4 m[2];
m[0][i] = vec4(0.0);
Due to the way it tracked whether the array was being assigned, it
would think that the non-constant index was in an r-value. The new
code fixes that by tracking l-values and r-values differently. The
index is also replaced by cloning the IR and replacing the index
variable instead of the odd way it was done before.
v2: Apply some simplifications suggested by Eric Anholt. Making
assignment_generator::rvalue be ir_dereference instead of ir_rvalue
simplified the code a bit.
Fixes i965 piglit fs-temp-array-mat[234]-index-wr and
vs-varying-array-mat[234]-index-wr.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=34691
Reviewed-by: Eric Anholt <eric@anholt.net>
There's no reason for it to be there, and another class that may not
have access to the visitor will need it soon.
Reviewed-by: Eric Anholt <eric@anholt.net>
Not sure how I computed these, but they were wrong (which explains why
bumping the polynomial order before never improved precision).
This allows to pass the EXP test cases of PSPrecision/VSPrecision DCTs.
Add an iteration step, which makes rqsqrt precision go from 12bits to
24, and fixes RSQ/NRM test case of PSPrecision/VSPrevision DCTs.
There are no uses of this function outside shader translation.
These tests invoke do_lower_jumps() in isolation (using the glsl_test
executable) and verify that it transforms the IR in the expected way.
The unit tests may be run from the top level directory using "make
check".
For reference, I've also checked in the Python script
create_test_cases.py, which was used to generate these tests. It is
not necessary to run this script in order to run the tests.
Acked-by: Chad Versace <chad@chad-versace.us>
This patch adds a new build artifact, glsl_test, which can be used for
testing optimization passes in isolation.
I'm hoping that we will be able to add other useful standalone tests
to this executable in the future. Accordingly, it is built in a
modular fashion: the main() function uses its first argument to
determine which test function to invoke, removes that argument from
argv[], and then calls that function to interpret the rest of the
command line arguments and perform the test. Currently the only test
function is "optpass", which tests optimization passes.
This patch moves the following functions from main.cpp (the main cpp
file for the standalone executable that is used to create the built-in
functions) to standalone_scaffolding.cpp, so that they can be re-used
in other standalone executables:
- initialize_context()*
- _mesa_new_shader()
- _mesa_reference_shader()
*initialize_context contained some code that was specific to main.cpp,
so it was split into two functions: initialize_context() (which
remains in main.cpp), and initialize_context_from_defaults() (which is
in standalone_scaffolding.cpp).
Several Mesa headers redundantly define the INLINE macro. Adding this
guard prevents the compiler from complaining about macro redefinition.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad@chad-versace.us>
This is an alternative to the draw module's polygon stipple stage.
The softpipe implementation here is just a test. The advantange of
using the new polygon stipple utility module (with other drivers)
is we can avoid software vertex processing in the draw module and
get much better performance.
Polygon stipple doesn't require special vertex processing like
the other draw module stage.
u_vbuf_upload_buffers modifies the buffer offsets. If they are not
restored, and any of the vertex formats is not supported natively, the
next u_vbuf_mgr_draw_begin call will translate the vertex buffers with
incorrect buffer offsets.
ES 2.0.25 page 127 says:
If the value of FRAMEBUFFER_ATTACHMENT_OBJECT_TYPE is NONE, then
querying any other pname will generate INVALID_ENUM.
See also:
b9e9df78a0
NOTE: This is a candidate for the 7.10 and 7.11 branches.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The GLSL 1.20 and later specs say:
"Recursion is not allowed, not even statically. Static recursion is
present if the static function call graph of the program contains
cycles."
Recursion is detected and rejected both a compile-time and at
link-time. The complie-time check happens to detect some cases that
may be removed by various optimization passes. The spec doesn't seem
to allow this, but other vendors (e.g., NVIDIA) appear to only check
at link-time after all optimizations.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=33885
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The behavior of flushes in the hardware is a maze of twisty passages,
and strangely the VS constants appear to be loaded during a pipeline
flush instead of at the time of the packet emit according to the
simulator. On moving the STATE_BASE_ADDRESS packet to where it really
needed to live (in order for data loads by other packets to be
correct), we sometimes no longer got a flush between those packets
where we apparently needed it. This replicates the flushes implied by
a STATE_BASE_ADDRESS update, fixing the GPU hangs in OGLC and the
"engine" demo.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=36821
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=39257
Tested-by: Keith Packard <keithp@keithp.com> (bzflag and etracer fixed)
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
There's scary stuff going on in PIPE_CONTROL internals, and if the
BSpec says to do this to make PIPE_CONTROL work, I'll go ahead and do
it because we'll probably never be able to debug it after the fact.
v2: Use stall at scoreboard instead of depth stall, as noted by Ken.
For this and occlusion queries, we're trying to avoid setting
I915_GEM_DOMAIN_RENDER for the write domain, because the data written
is definitely not going through the render cache, but we do need to
tell the kernel that the object has been written. However, with using
I915_GEM_DOMAIN_GTT, the kernel on retiring the batchbuffer sees that
the w/a BO has a write domain of GTT, and puts it on the flushing
list. If something tries to wait for that BO to finish rendering
(such as the AUB dumper reading the contents of BOs), we get into
wait_request (since obj->active) but with a 0 seqno (since the object
is on the flushing list, not actually on a ringbuffer), and BUG_ONs.
To avoid the kernel bug (which I'm hoping to delete soon anyway), just
use I915_GEM_DOMAIN_INSTRUCTION like occlusion queries do. This
doesn't result in more flushing, because we invalidate INSTRUCTION on
every batchbuffer now that we're state streaming, anyway.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Kenneth Graunke <kenneth@whitecape.org>
This cuts out a large portion of the overhead of glClear() from
resetting the texenv state and recomputing the fixed function
programs. It also means less use of fixed function internally in our
GLES2 drivers, which is rather bogus.
Reviewed-by: Brian Paul <brianp@vmware.com>
When parsing S-Expressions, we need to store nul-terminated strings for
Symbol nodes. Prior to this patch, we called ralloc_strndup each time
we constructed a new s_symbol. It turns out that this is obscenely
expensive.
Instead, copy the whole buffer before parsing and overwrite it to
contain \0 bytes at the appropriate locations. Since atoms are
separated by whitespace, (), or ;, we can safely overwrite the character
after a Symbol. While much of the buffer may be unused, copying the
whole buffer is simple and guaranteed to provide enough space.
Prior to this, running piglit-run.py -t glsl tests/quick.tests with GLSL
1.30 enabled took just over 10 minutes on my machine. Now it takes 5.
NOTE: This is a candidate for stable release branches (because it will
make running comparison tests so much less irritating.)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Commit 1a339b6c71 made
st_ChooseTextureFormat map GL_RGBA with type GL_UNSIGNED_BYTE
to PIPE_FORMAT_A8B8G8R8_UNORM.
The image format for ARGB pixmaps is PIPE_FORMAT_B8G8R8A8_UNORM
however. This mismatch caused the texture to be recreated in
st_finalize_texture.
NOTE: This is a candidate for the 7.11 branch.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=39209
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
Reviewed-by: Stéphane Marchesin <marcheu@chromium.org>
Signed-off-by: Brian Paul <brianp@vmware.com>
This fixes a regression introduced by commit
a26121f375 (fd.o bug #39219).
Since the __glXInitialize() call should be unnecessary anyway, this is
probably a nicer fix for the original problem too.
NOTE: This is a candidate for the 7.10 and 7.11 branches.
Signed-off-by: Henri Verbeet <hverbeet@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: padfoot@exemail.com.au
Until now, the stencil buffer was allocated as a Y tiled buffer, because
in several locations the PRM states that it is. However, it is actually
W tiled. From the PRM, 2011 Sandy Bridge, Volume 1, Part 2, Section
4.5.2.1 W-Major Format:
W-Major Tile Format is used for separate stencil.
The GTT is incapable of W fencing, so we allocate the stencil buffer with
I915_TILING_NONE and decode the tile's layout in software.
This fix touches the following portions of code:
- In intel_allocate_renderbuffer_storage(), allocate the stencil
buffer with I915_TILING_NONE.
- In intel_verify_dri2_has_hiz(), verify that the stencil buffer is
not tiled.
- In the stencil buffer's span functions, the tile's layout must be
decoded in software.
This commit mutually depends on the xf86-video-intel commit
dri: Do not tile stencil buffer
Author: Chad Versace <chad@chad-versace.us>
Date: Mon Jul 18 00:38:00 2011 -0700
On Gen6 with separate stencil enabled, fixes the following Piglit tests:
bugs/fdo23670-drawpix_stencil
general/stencil-drawpixels
spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX16-copypixels
spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX16-drawpixels
spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX16-readpixels
spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX1-copypixels
spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX1-drawpixels
spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX1-readpixels
spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX4-copypixels
spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX4-drawpixels
spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX4-readpixels
spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX8-copypixels
spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX8-drawpixels
spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX8-readpixels
spec/EXT_packed_depth_stencil/fbo-stencil-GL_DEPTH24_STENCIL8-copypixels
spec/EXT_packed_depth_stencil/fbo-stencil-GL_DEPTH24_STENCIL8-readpixels
spec/EXT_packed_depth_stencil/readpixels-24_8
Note: This is a candidate for the 7.11 branch.
Signed-off-by: Chad Versace <chad@chad-versace.us>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
LLVM 3.0svn introduced a new type system. It defines a new way to create
named structs and removes the (now not needed) LLVMInvalidateStructLayout
function. See revision 134829 of LLVM.
Signed-off-by: Tobias Droste <tdroste@gmx.de>
Signed-off-by: Brian Paul <brianp@vmware.com>
In a rare case of building gallium only, we need to
check if the required packages are available
libdrm_[intel|nouveau] - gallium[i915 i965|nouveau]
v2: r300g and r600g do not need libdrm_radeon
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Marek Olšák <maraeo@gmail.com>
Including the full "3DSTATE_VF_STATISTICS" should make it easier to
cross-reference the code and documentation.
Also, move the 965/GM45 suffix to the beginning for consistency with
newer #defines.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
The documentation uses 3DSTATE_DRAWING_RECTANGLE, and we already had it
defined in brw_defines.h; we were simply using an old #define from
intel_reg.h.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
This is useful for shadow map generation. Tested with glsl-bug-22603,
which rendered the depth textures with fallbacks before.
Acked-by: Chad Versace <chad@chad-versace.us>
We were updating our new viewport using the old buffers' _WindowMap.m.
We can do less math and avoid using that deprecated matrix by just
folding the viewport calculation right in to the driver.
Fixes piglit fbo-depthtex.
i915_update_draw_buffers() already handles the fallback bit for
missing stencil region, so here we just need to handle whether the GL
thinks we have stencil data or not (and disable the test if so).
We were disabling it once at the moment we changed draw buffers, but
later enabling of depth test could turn it back on. Fixes
fbo-nodepth-test.
Note that ctx->DrawBuffer has to be checked because during context
create we get called while it's still unset. However, we know we'll
get an intel_draw_buffer() after that, so it's safe to make a silly
choice at this point.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=30080
The illusion of shared code here wasn't fooling anybody. It was
tempting to keep i830 and i915 still shared, but I think I actually
want to make them diverge shortly.
Reviewed-by: Chad Versace <chad@chad-versace.us>
This brings us into compliance with page 17 (page 22 of the PDF) of
the GLSL 1.20 spec:
"[Sampler types] can only be declared as function parameters or
uniform variables (see Section 4.3.5 "Uniform"). ... [Samplers]
cannot be used as out or inout function parameters."
The spec isn't explicit about whether this rule applies to
structs/arrays containing shaders, but the intent seems to be to
ensure that it can always be determined at compile time which sampler
is being used in each texture lookup. So to avoid creating a
loophole, the rule needs to apply to structs/arrays containing shaders
as well.
Fixes piglit tests spec/glsl-1.10/compiler/samplers/*.frag, and fixes
bug 38987.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38987
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The new location, as a member function of glsl_type, is more
consistent with queries like is_sampler(), is_boolean(), is_float(),
etc. Placing the function inside glsl_type also makes it available to
any code that uses glsl_types.
These happen to work because their values are the same as the equivalent
PIPE_TRANSFER_* flags, but it's still misleading.
Signed-off-by: Henri Verbeet <hverbeet@gmail.com>
The GLSL spec says:
"If a built-in function is redeclared in a shader (i.e., a
prototype is visible) before a call to it, then the linker will
only attempt to resolve that call within the set of shaders that
are linked with it."
This patch enforces this behavior. When a function call is processed
a flag is set in the ir_call to indicate whether the previously seen
prototype is the built-in or not. At link time a call will only bind
to an instance of a function that matches the "want built-in" setting
in the ir_call.
This has the odd side effect that first call to abs() in the shader
below will call the built-in and the second will not:
float foo(float x) { return abs(x); }
float abs(float x) { return -x; }
float bar(float x) { return abs(x); }
This seems insane, but it matches what the spec says.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=31744
The following resolves the build issues and missing symbols
Add "xvmc-nouveau/target.c" - missing symbol "driver_description"
Add "drivers/nvc0/libnvc0.a" - missing symbol "nvc0_screen_create"
Remove "drivers/softpipe/libsoftpipe.a" - unnessecary dependency
resolves build (when building without swrast)
Add "drivers/trace/libtrace.a" in Makefile
Note: With/without those patches xvmc-nouveau still segfaults
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Use all zpass data for predication instead of the last block only.
Use query buffer as a ring instead of reusing the same area
for each new BeginQuery. All query buffer offsets are in bytes
to simplify offsets math.
commit 1856230d9fa61710cce3e152b8d88b1269611a73
Author: José Fonseca <jose.r.fonseca@gmail.com>
Date: Tue Jul 12 23:41:27 2011 +0100
make: Use better var names on packaging.
commit d1ae72d0bd14e820ecfe9f8f27b316f9566ceb0c
Author: José Fonseca <jose.r.fonseca@gmail.com>
Date: Tue Jul 12 23:38:21 2011 +0100
make: Apply several of Dan Nicholson's suggestions.
commit f27cf8743ac9cbf4c0ad66aff0cd3f97efde97e4
Author: José Fonseca <jose.r.fonseca@gmail.com>
Date: Sat Jul 9 14:18:20 2011 +0100
make: Put back the tar.bz2 creation rule.
Removed by accident.
commit 34983337f9d7db984e9f0117808274106d262110
Author: José Fonseca <jose.r.fonseca@gmail.com>
Date: Sat Jul 9 11:59:29 2011 +0100
make: Determine tarballs contents via git ls-files.
The wildcards were a mess:
- lots of files for non Linux platforms missing
- several files listed and archived twice
Using git-ls-files ensures things are not loss when making the tarballs.
commit 34a28ccbf459ed5710aafba5e7149e8291cb808c
Author: José Fonseca <jose.r.fonseca@gmail.com>
Date: Sat Jul 9 11:07:14 2011 +0100
glut: Remove GLUT source.
Most distros ship freeglut, and most people don't care one vs the other,
and it hasn't been really maintained.
So it is better to have Mesa GLUT be revisioned and built separately
from Mesa.
commit 5c26a2c3c0c7e95ef853e19d12d75c4f80137e7d
Author: José Fonseca <jose.r.fonseca@gmail.com>
Date: Sat Jul 9 10:31:02 2011 +0100
Ignore the tarballs.
commit 26edecac589819f0d0efe2165ab748dbc4e53394
Author: José Fonseca <jose.r.fonseca@gmail.com>
Date: Sat Jul 9 10:30:24 2011 +0100
make: Create the Mesa-xxx-devel symlink automatically.
Also actually remote the intermediate uncompressed tarballs.
this moves getting the context into the debug in this function,
just spotted it trawling callgrind traces for other things.
Signed-off-by: Dave Airlie <airlied@redhat.com>
The forward references to video enum types in p_context.h causes
a massive number of compiler warnings (ISO C forbids forward references
to ‘enum’ types).
By putting the new video enums in a separate header that can be included
by p_context.h and p_screen.h we can avoid this.
Acked-by Christian König <deathsimple@vodafone.de>
inline the hotpath of the reference remaining the same. This shouldn't
penalise the slow path at all but improve the hot path so we don't have
to jump to the function.
It also moves some assert checks under an #ifndef NDEBUG.
Minor clean-ups added by Brian.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
Because we don't support them.
For instance, R32G32B32 is not R32G32B32X32 as was assumed.
Add support for R8G8B8X8_UNORM instead of R8G8B8_UNORM surfaces.
I think the past are those times when the gallium interface was changed all
the time. Now it is not, so there is no reason to always compile the libs
if they are not needed.
There seems to be a bug in r600g when uploading more than one layer of a
3D resource at once with a hardware blit.
So just do them one at a time to workaround this.
Bitmap caching shouldn't affect the results of the queries and
conditional render.
NOTE: This is a candidate for the 7.11 branch.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
We were failing at rounding, misplacing the non-baselevels. Fixes:
3DFX_texture_compression_FXT1/fbo-generate-mipmaps
ARB_texture_compression/fbo-generate-mipmaps
EXT_texture_compression_s3tc/fbo-generate-mipmaps
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The first rendering after context create didn't know of the color
buffer yet, triggering a sw fallback. The intel_prepare_render() from
intelSpanRenderStart then found the buffer and turned off fallbacks,
but intelSpanRenderFinish was never called and things were left
mapped. By checking buffers before making the call on whether to do
the fallback pipeline or not, we avoid the fallback change inside of
the rendering pipeline.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=31561
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
In some cases _mesa_create_context() can return NULL an in the mesa
state tracker, we do not concider the case, which may cause issues
within st_create_context_priv()
This patch adds a simple check (similar to the one in the dri drivers)
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
As Chia-I Wu said 'There are two libGL providers, Xlib and DRI based
they cannot coexist'
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Marek Olšák <maraeo@gmail.com>
This version is mostly Dan's post to the mesa-dev mailing list on
6/22/2011.
NOTE: This is a candidate for the 7.10 and 7.11 branches.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Dan Nicholson <dbn.lists@gmail.com>
According to the GLSL 1.20 specification, "it is a semantic error if
there are multiple ways to apply [implicit] conversions [...] such that
the call can be made to match multiple signatures."
Fixes a regression caused by 60eb63a855,
which implemented the wrong policy of finding a "closest" match.
However, this is not a revert, since the original code failed to
continue looking for an exact match once it found two inexact matches.
It's OK to have multiple inexact matches if there's also an exact match.
NOTE: This is a candidate for the 7.10 and 7.11 branches.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38971
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
This is exactly analogous to Eric's Gen6 change in commit
6861a70177. His explanation:
"This is just like PointSprite overrides, but it's always on for that
attribute."
Fixes glsl-fs-pointcoord and gtf/point_sprites.
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
NOTE: This is a candidate for the 7.11 branch.
This is exactly analogous to Eric's Gen6 change in commit
f304bb8a5d. His explanation:
"We were assuming that the input attribute n to the FS was
FRAG_ATTRIB_TEXn, which happened to be true often enough for our
testcases."
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
NOTE: This is a candidate for the 7.11 branch.
This is exactly analogous to Eric's Gen6 change in commit
e7280b16d6.
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
NOTE: This is a candidate for the 7.11 branch.
This is just barely more pretty-printing than we previously had, but
at least it doesn't leave out unit states in the log.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is quite a bit of spam, but I think it's useful to have in a full
INTEL_DEBUG=batch dump. And a lot of this spam on glxgears is just
because we're awful at handling our constants :/
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The previous brw_state_dump output was rather useless -- last used
program per batch, and just the hex. Now we dump all programs (since
we don't know which were used), and disassemble them. But that's a
ton of spam, and usually when looking into program contents we use
INTEL_DEBUG={vs,wm,misc,other} and when looking into state updates we
use INTEL_DEBUG=batch, so this dump usually just massively clutters up
the output.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now that we're using state base addresses for most things, we're less
interested in the absolute address of the state, and more in its
offset from the state base address (start of batchbuffer). Also,
reorder the printout so it looks more like the batchbuffer dump.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I want to make brw_state_dump.c handle more than just the last
statechange, so I want to keep track of what's in the batch state. By
using AUB file numbering for most of these packets, this may be
reusable for aub dumping.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This will let me hang cached compiler structs off of the context
without having to worry about cleaning them up at destroy time.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
There's no pretty way to avoid the overwriting of the src operands, so
just use a temporary destination and rely on the MOV optimization.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We were stomping over the source for the body of the LIT instruction
when doing the MOV of 1.0 to the uninteresting channels.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
From ARB_framebuffer_object:
If a buffer is specified in <mask> and does not exist in both the
read and draw framebuffers, the corresponding bit is silently
ignored.
Using GL_NONE as DataType of Z32_FLOAT_X24S8, not sure what I should put there.
The spec says the type is n/a.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The existing code was missing GL_DEPTH_COMPONENT32, resulting in it
wrongly returning the color buffer instead of the depth buffer.
Fixes an issue in PlaneShift 0.5.7 when casting spells. The game calls
CopyTexSubImage2D on buffers with a GL_DEPTH_COMPONENT32 internal
format, which (prior to this patch) resulted in an attempt to copy
ARGB8888 to X8_Z24.
Instead of adding the missing enumeration directly, convert the code to
use _mesa_is_depth_format() and _mesa_is_depthstencil_format() as these
should catch any newly added depth formats in the future.
NOTE: This is a candidate for the 7.10 and 7.11 branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This was tricky. We were doing a use-before-initialize of
grf_reg_count, but the value usually got overwritten anyway -- when we
didn't have to do a relocation (typical), or on gen5 when we didn't
have relocations at all.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38771
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Commit b46dc45cee claimed that
NEW_POLYGONSTIPPLE is gratuitous, but somehow just changed comments
and whitespace instead of actually removing the flag.
While we're at it, 3DSTATE_PS doesn't appear to need NEW_LINE or
NEW_POLYGON either (those are in 3DSTATE_WM). Also, 3DSTATE_WM
doesn't appear to need BRW_NEW_NR_WM_SURFACES or BRW_NEW_CURBE_OFFSETS
either (those are in 3DSTATE_PS).
NOTE: This is a candidate for the 7.11 branch.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
When switching channels with xine it sometimes happens that xine
destroys the drawable before we get a chance to call
DRI2DestroyDrawable, resulting in an x error.
SUB & LRP instructions should toggle NEG bit instead of setting it,
otherwise e.g. "SUB a,b,-1" is translated as "ADD a,b,-1"
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
For 0^0 case result of "LOG_CLAMPED ...,0" is -MAX_FLOAT, and then result of
"MUL_LIT ...,0,-MAX_FLOAT,..." is -MAX_FLOAT instead of 0 because of special
src1 checks for -MAX_FLOAT. So swap src0/1:
"MUL_LIT ...,-MAX_FLOAT,0,..." to get expected 0, then result of
"EXP_IEEE ...,0" is 1 as expected for LIT.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Create a new GLX drawable struct to track client related info, and add a
wrap counter to it drawable and track it as we receive events. This
allows us to support the full 64 bits of the event structure we pass to
the client even though the server only gives us a 32 bit count.
Reviewed-by: Michel Dänzer <michel@daenzer.net>
Reviewed-by: Jeremy Huddleston <jeremyhu@apple.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Normally lower_jumps.cpp doesn't need to lower a break instruction
that occurs at the end of a loop, because all back-ends can produce
proper GPU instructions for a break instruction in this "canonical"
location. However, if other break instructions within the loop are
already being lowered, then a break instruction at the end of the loop
needs to be lowered too, since after the optimization is complete a
new conditional break will be inserted at the end of the loop.
Without this patch, lower_jumps.cpp may require multiple passes in
order to lower all jumps. This results in sub-optimal output because
lower_jumps.cpp produces a brand new set of temporary variables each
time it is run, and the redundant temporary variables are not
guaranteed to be eliminated by later optimization passes.
Fixes unit test test_lower_breaks_6.
Previously, lower_jumps.cpp would break out of its loop after lowering
a jump instruction in just the then- or else-branch of a conditional,
and it would fail to lower a jump instruction occurring in the other
branch.
Without this patch, lower_jumps.cpp may require multiple passes in
order to lower all jumps. This results in sub-optimal output because
lower_jumps.cpp produces a brand new set of temporary variables each
time it is run, and the redundant temporary variables are not
guaranteed to be eliminated by later optimization passes.
Fixes unit test test_lower_returns_4.
The visitor class in lower_jumps.cpp never removes or replaces the
instruction being visited, but it frequently alters or removes the
instructions that follow it. Therefore, to make sure the altered IR
is visited, it needs to iterate through exec_lists using foreach_list
rather than visit_exec_list().
Without this patch, lower_jumps.cpp may require multiple passes in
order to lower all jumps. This results in sub-optimal output because
lower_jumps.cpp produces a brand new set of temporary variables each
time it is run, and the redundant temporary variables are not
guaranteed to be eliminated by later optimization passes.
Also, certain invariants assumed by lower_jumps.cpp may fail to hold,
causing assertion failures.
Fixes unit tests test_lower_pulled_out_jump,
test_lower_unified_returns, test_lower_guarded_conditional_break,
test_lower_return_non_void_at_end_of_loop, and test_lower_returns_3.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, lower_jumps.cpp would only lower return and continue
statements that appeared inside conditionals. This patch makes it
lower unconditional returns and continue statements that occur inside
a loop.
Such unconditional flow control statements would be unlikely to be
explicitly coded by a reasonable user, however they might arise as a
result of other optimizations.
Without this patch, lower_jumps.cpp might not lower certain return and
continue statements, causing some backends to fail.
Fixes unit tests test_lower_return_void_at_end_of_loop and
test_remove_continue_at_end_of_loop.
Previously, lower_jumps.cpp only lowered return statements that
appeared inside of an if statement.
Without this patch, lower_jumps.cpp might not lower certain return
statements, causing some back-ends to fail (as in bug #36669).
Fixes unit test test_lower_returns_1.
Previously, do_lower_jumps.cpp determined whether to lower return
statements in ir_lower_jumps_visitor::should_lower_jumps(). Moved
this logic to ir_lower_jumps_visitor::visit(ir_function_signature *),
so that it can be used in determining whether to lower a return
statement at the end of a function.
Previously ir_reader was only able to handle return of non-void.
This patch is necessary in order to allow optimization passes to be
tested in isolation.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
LLVM 3.0svn changes pretty rapidly. The change in
Target->createMCInstPrinter() signature which inspired commits
40ae214067 and
92e29dc5b0 has been reverted.
Signed-off-by: Gustaw Smolarczyk <wielkiegie@gmail.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
Otherwise PIPE_FORMAT_X8B8G8R8_UNORM and friends would fail.
NOTE: This is a candidate for the 7.10 and 7.11 branches.
Reviewed-by: Brian Paul <brianp@vmware.com>
When the state tracker adds a front buffer, nothing triggers a validate
drawable call, since the state tracker manager is never notified.
Force a validate drawable call by invalidating the framebuffer's stamp, so
that the window system's renderbuffer (if any) is picked up.
This fixes bug 38988
https://bugs.freedesktop.org/show_bug.cgi?id=38988
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Fixes segfault when running cubemap demo on i945. This happened
when intel_region_reference() was called in i915_set_draw_region()
with depth_region=NULL.
Reviewed-by: Eric Anholt <eric@anholt.net>
Even if we don't have a current context, if we're freeing the rb we
should free its region (and BO). The renderbuffer unreference checks
appear to be just cargo-cult from the region unreference code.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=30217
Reviewed-by: Chad Versace <chad@chad-versace.us>
As a result of this cleanup, a bug in
intel_process_dri2_buffer_no_separate_stencil() became quite apparent.
We were associating the NULL pointer after an unreference with the
STENCIL attachment -- clarify the logic and attach the right region.
Reviewed-by: Chad Versace <chad@chad-versace.us>
This should help us avoid leaking regions in region reference code by
making the API more predictable.
Reviewed-by: Chad Versace <chad@chad-versace.us>
This prevents developer surprise at seeing a GL_DEPTH_COMPONENT
texture have stencil bits, and avoids the metaops path accidentally
copying stencil bits around in glCopyTexImage(GL_DEPTH_COMPONENT) (and
being broken because swrast's glReadPixels(GL_UNSIGNED_INT_24_8) is
broken).
Acked-by: Chad Versace <chad@chad-versace.us>
We simply emit these using OUT_BATCH and bitshifting, as it results in
better compiled code than packed structures. Since our documentation
is public, it's not terribly useful to keep these around for reference.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Also rename it from CMD_STATE_INSN_POINTER to CMD_STATE_SIP to match the
documentation.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This is a little different from most because it's a single DWord;
there's no length field.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
I'm not sure about this one. The current code actually follows the spec, but
considering the spec is supposed to be written against GL 3.2 I'd say the spec
is broken. I filled out a spec feedback form over a month ago, but either the
form is broken, or nobody cares.
Signed-off-by: Henri Verbeet <hverbeet@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This is probably nicer if the array size ever changes.
NOTE: This is a candidate for the 7.11 branch.
Signed-off-by: Henri Verbeet <hverbeet@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The total number of units used by a shader is limited to MAX_TEXTURE_UNITS,
but the actual indices are only limited by MAX_COMBINED_TEXTURE_IMAGE_UNITS,
since they're shared between vertex and fragment shaders.
NOTE: This is a candidate for the 7.11 branch.
Signed-off-by: Henri Verbeet <hverbeet@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
It's not supposed to do conversion, but st sometimes asks us to.
Sometimes conversion is even wrong (e.g. between UNORM and SRGB).
This should now include all formats the 2D engine supports.
Fixes an assertion failure in the piglib out-01.frag
ARB_explicit_attrib_location test. The locations set via the layout
qualifier in fragment shader were not being applied to the shader
outputs. As a result all of these variables still had a location of
-1 set.
This may need some more work for pre-3.0 contexts. The problem is
dealing with generic outputs that lack a layout qualifier. There is
no way for the application to specify a location
(glBindFragDataLocation is not supported) or query the location
assigned by the linker (glGetFragDataLocation is not supported).
NOTE: This is a candidate for the 7.10 and 7.11 branches.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38624
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Vinson Lee <vlee@vmware.com>
And don't delete them. Let ralloc clean them up. Deleting the
temporary IR leaves dangling references in the prog_instruction. That
results in a bad dereference when printing the IR with MESA_GLSL=dump.
NOTE: This is a candidate for the 7.10 and 7.11 branches.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38584
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Using GLuint pointers worked when the pixel size was four bytes
or the row stride was a multiple of four but was otherwise broken.
Fixes failures found with the piglit fbo-stencil test.
This helps to fix https://bugs.freedesktop.org/show_bug.cgi?id=38729
NOTE: This is a candidate for the 7.11 branch.
The existing error result doesn't appear in the GL 2.1 or 3.2
compatibility specs, and triggers an unexpected GL error in Intel's
oglconform when it tries to reset the feedback state after usage so
that the "diff the state at error time vs. context init time" code
doesn't generate spurious diffs. The unexpected GL error then
translates into testcase failure. Brian wants the safety check on
buffer = NULL, though, so that people can't as easily set up a broken
buffer.
Fixes a bug caught by oglconform, and now piglit
ARB_vertex_program/getenv4d-with-error. The wrapping of an existing
GL function made it so that we couldn't distinguish an error in
looking up our arguments from an existing error. Instead, make a
helper function to choose the param, and use it from multiple callers.
v2: Move the success case line into the conditional, use COPY_4V more.
Commit 6750226e6d bumped the base MRF to
m2 instead of m0, but failed to adjust inst->mlen, which was being set
to the highest MRF. Subtracting the base MRF solves the issue.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
This fixes a regression introduced with commit
"st-api: Rework how drawables are invalidated v3"
where the glx state tracker manager would invalidate a drawable each time it
checks the drawable dimensions, even during a validate call, which
resulted in an endless loop, since the state tracker would immediately
detect the new invalidation and rerun the validate...
This change marks the drawable invalid only if the drawable dimensions actually
changed during the validate, which will result in at most a single
unnecessary validate by the context running a validate during which the
dimensions changed.
To avoid unnecessary validates altogether, we need to implement yet another
st-api change: Returning the current time stamp from the validate function,
as suggested by Chia-I Wu. The glx state tracker manager could then return
the stamp resulting from the last drawable dimension check.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
It makes things too random, as settings for temporary trials get stored
permannently, and it make difficult to build several platforms from the
same tree.
So disable it, again.
If a user-buffer was referenced twice by a draw command, the affected ranges
were uploaded separately, with only the last one being referenced by the
hardware. Make sure we upload only a single range.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
We currently always treat contents of user-buffers as volatile so
we don't need to take any particular action when the state tracker
announces that the contents has changed.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Viewperf uses some unusual vertex arrays where the stride is less
than the element size. In this case, the stride was 4 while the
element size was 12. The difference of 8 bytes causes us to miss
uploading the tail bit of the array data.
Typically the stride is >= the element size so there was no problem
with other apps.
Stream user buffer contents rather than trying to maintain persistent
host / hardware copies.
Resulting negative array offsets are not allowed by the hardware,
(well, at least not according to header files), so adjust index bias
to make all array offsets positive.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Make sure that the upload manager doesn't upload data that's not
dirty. This speeds up the viewperf test proe-04/1 a factor 5 or so on svga.
Also introduce an u_upload_unmap() function that can be used
instead of u_upload_flush() so that we can pack
even more data in upload buffers. With this we can basically reuse the
upload buffer across flushes.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
libdrm is used in multiple places. Always check for it and set
have_libdrm. Each user can then check the variable.
This is useful when only EGL and DRI drivers are needed.
The idea is that DRI driver, libGL and libOSMesa are libraries that can
be independently enabled, yet --with-driver does not allow us to easily
do that, if not impossible. This also matches what
--enable-{egl,xorg,d3d1x} do for the respective libraries.
There are two libGL providers: Xlib-based and DRI-based. They cannot
coexist. To be able to choose between them, --enable-xlib-glx is also
added.
With this commit, --with-driver=dri can be replaced by
$ ./configure --enable-dri --enable-glx --disable-osmesa
--with-driver=xlib can be replaced by
$ ./configure --disable-dri --enable-glx --enable-osmesa \
--enable-xlib-glx
and --with-driver=osmesa can be replaced by
$ ./configure --disable-dri --disable-glx --enable-osmesa
Some combinations that cannot be supported with --with-driver will
produce errors at the moment. But in the future, we would like to
support, for example,
$ ./configure --enable-dri --disable-glx --enable-egl
(build libEGL and DRI drivers, but not libGL)
Note that this commit still keeps --with-driver for transitional
purpose.
- Copy i915c's support for phases, that should allow us to run a coupe more shaders.
- Fix the error messages.
- Still try to proceed when we get a shader that's too long.
MOD_TO_FRACT was designed to lower the GLSL 1.20 mod() function, which
operates on floating point values. However, we also use ir_binop_mod
for GLSL 1.30's % operator, which operates on integers.
For now, make MOD_TO_FRACT only apply to floating-point mod operations.
In the future, we may want to add a lowering pass for integer-based mod.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, it would simply say "type error" in three different cases:
- The LHS is not an integer
- The RHS is not an integer
- The LHS and RHS have different base types (int vs. uint)
Now the error messages state the specific problem.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, ir_function::matching_signature had a fatal bug: if a
function had more than one non-exact match, it would simply return NULL.
This occured, for example, when looking for max(uvec3, uvec3):
- max(vec3, vec3) -> score 1 (found first)
- max(ivec3, ivec3) -> score 1 (found second...used to return NULL here)
- max(uvec3, uvec3) -> score 0 (exact match...the right answer)
This did not occur for max(ivec3, ivec3) since the second match found
was an exact match.
The new behavior is to return a match with the lowest score. If there
is an exact match, that will be returned. Otherwise, a match with the
least number of implicit conversions is chosen.
Fixes piglit tests max-uvec3.vert and glsl-inexact-overloads.shader_test.
NOTE: This is a candidate for the 7.10 and 7.11 branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
No MOV is necessary since signed/unsigned integers share the same
bit-representation; it's simply a question of interpretation. In
particular, the fs_reg::imm union shouldn't need updating.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Mesa IR actually stores all numbers as floating point, so this is
totally a farce, but we may as well keep it going.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reverts commit f41e1db327
"fix conversions from uint to bool and from float/bool to uint"
f2i, b2i, and b2i should not accept uint types. Use i2u and u2i.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
These are necessary to handle int/uint constructor conversions. For
example, the following code currently results in a type mismatch:
int x = 7;
uint y = uint(x);
In particular, uint(x) still has type int.
This commit simply adds the new operations; it does not generate them,
nor does it add backend support for them.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
We almost never want to specify a condition, and when we do we're
already thinking about it (because we're writing a lowering pass
generating the condition), so a default argument should make the code
more pleasant to read.
NOTE: This is a candidate for the 7.11 branch (we want to be able to
cherry-pick future code).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Our copy propagation tends to be bad at handling the later array
accesses of the matrix argument we moved to a temporary. Generally we
don't need to move it to a temporary, though, so this avoids needing
more copy propagation complexity.
Reduces instruction count of some Unigine Tropics and Sanctuary
fragment shaders that do operations on uniform matrix arrays by 5.9%
on gen6.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We were constrained to using temporaries because we were assuming
variables all over. This simplifies things a bit.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This awkward typing was to avoid shadowing the function argument (the
matrix) with the temporary deref (the column) before the
get_column()/get_element()s were moved into the expression/assignment
constructors. They're about to become not-variables, so the current
names had to go. This change is almost mechanical (other than
column_expr), so it should make the next diff clearer.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I think this makes the code more obvious by moving the declarations to
their single usage (now that we aren't using them to get at the ->type
field for expression constructors).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Move defintion of M_PI (for the benefit of <math.h> which do not define it), to
before the first use of it
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Brian Paul <brianp@vmware.com>
Commit 1a339b6c(st/mesa: prefer native texture formats when possible)
introduced two new arguments to the st_choose_format() functions.
This patch fixes the order and passes the correct internal_target
rather than GL_NONE
NOTE: This is a candidate for the 7.11 branch
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
The api and the state tracker manager code as well as the state tracker code
assumed that only a single context could be bound to a drawable. That is not
a valid assumption, since multiple contexts can bind to the same drawable.
Fix this by making it the state tracker's responsibility to update all
contexts binding to a drawable
Note that the state trackers themselves don't use atomic stamps on
frame-buffers. Multiple context rendering to the same drawable should
be protected by the application.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Instead of using a chain of manually maintained if/else blocks to
handle "#extension" directives, we now consult a table that specifies,
for each extension, the circumstances under which it is available, and
what flags in _mesa_glsl_parse_state need to be set in order to
activate it.
This makes it easier to add new GLSL extensions in the future, and
fixes the following bugs:
- Previously, _mesa_glsl_process_extension would sometimes set the
"_enable" and "_warn" flags for an extension before checking whether
the extension was supported by the driver; as a result, specifying
"enable" behavior for an unsupported extension would sometimes cause
front-end support for that extension to be switched on in spite of
the fact that back-end support was not available, leading to strange
failures, such as those in
https://bugs.freedesktop.org/show_bug.cgi?id=38015.
- "#extension all: warn" and "#extension all: disable" had no effect.
Notes:
- All extensions are currently marked as unavailable in geometry
shaders. This should not have any adverse effects since geometry
shaders aren't supported yet. When we return to working on geometry
shader support, we'll need to update the table for those extensions
that are available in geometry shaders.
- Previous to this commit, if a shader mentioned
ARB_shader_texture_lod, extension ARB_texture_rectangle would be
automatically turned on in order to ensure that the types
sampler2DRect and sampler2DRectShadow would be defined. This was
unnecessary, because (a) ARB_shader_texture_lod works perfectly well
without those types provided that the builtin functions that
reference them are not called, and (b) ARB_texture_rectangle is
enabled by default in non-ES contexts anyway. I eliminated this
unnecessary behavior in order to make the behavior of all extensions
consistent.
NOTE: This is a candidate for the 7.10 and 7.11 branches.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
These were previously 1-bit-wide bitfields. Changing them to bools
has a negligible performance impact, and allows them to be accessed by
offset as well as by direct structure access.
NOTE: This is a candidate for the 7.10 and 7.11 branches.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
From the OpenGL docs for GL_ARB_explicit_attrib_location:
This extension provides a method to pre-assign attribute locations to
named vertex shader inputs and color numbers to named fragment shader
outputs.
This was accidentally implemented for fragment shader inputs. This
patch fixes it to apply to fragment shader outputs.
Fixes piglit tests
spec/ARB_explicit_attrib_location/1.{10,20}/compiler/layout-{01,03,06,07,08,09,10}.frag
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
NOTE: This is a candidate for the 7.10 and 7.11 branches.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38624
The scissor state was incorrectly in a .prepare function instead of
.emit, so the packet would end up in the batch before the
STATE_BASE_ADDRESS. It appears that this doesn't actually hurt, as
the scissor address gets dereferenced according to the current SBA at
draw time.
All it's going to do is generate lots and lots and lots of
'warning: visibility attribute not supported in this configuration; ignored'
warnings
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Dan Nicholson <dbn.lists@gmail.com>
Considering fbdev as an in-kernel window system,
- opening a device opens a connection
- there is only one window: the framebuffer
- fb_var_screeninfo decides window position, size, and even color format
- there is no pixmap
Now EGL is built on top of this window system. So we should have
- the fd as the handle of the native display
- reject all but one native window: NULL
- no pixmap support
modeset support is still around, but it should be removed soon.
The system routine requires m0 be reserved for saving off architectural
state. Moved the allocation to start at 2 instead of 0.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, if max_depth were 1, the following code would see the
first if-statement (correctly) not get flattened, but the second
if-statement would (incorrectly) get flattened:
void main()
{
if (a)
gl_Position = vec4(0);
if (b)
gl_Position = vec4(1);
}
This is because the visit_leave(ir_if*) method would not decrement the
depth before returning on the first if-statement.
NOTE: This is a candidate for the 7.10 and 7.11 branches.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Typically this was done by having a surface creation function fail if
the format was not supported.
However, in some situations when changing hardware surface formats,
it's desirable to do this check before attempting costly readback operations.
Also updated the surface_redefine interface.
Bump minor.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Blending and maybe even alpha-test don't work with those formats.
Only supporting RGBA, BGRA, RGBX, BGRX.
NOTE: This is a candidate for the 7.10 and 7.11 branches.
Remove set_event_handler() and pass the event handler with
native_get_XXX_platform(). Add init_screen() so that the pipe screen is
created later. This way we don't need to pass user_data to
create_display().
Throttle pretty hard in order to prioritize user-space interactivity over
3D application speed. May revisit this later.
Signed-off-by: Thomas <thellstrom@vmware.com>
Based uppon a patch from Pali Rohár <pali.rohar@gmail.com>.
This seems to get at least YUV->RGB conversion working.
So a simple "mplayer -vo vdpau" now seems to work fine.
According to OpenGL 3.1 chapter 2.1.5 the representation without zero
should only be used for vertex attribute values, but not for textures
or frame-buffers.
tgsi_helper_copy is used on several occasions to copy a temporary result
into the real destination register to emulate writemasks for OP3 and
reduction operations. According to R600 ISA that's unnecessary.
This patch fixes this use for MAD, CMP and DP4.
# Configuration for profiling on Linux with x86 optimizations with gprof
include $(TOP)/configs/linux-x86-static
CONFIG_NAME = linux-x86-profile
OPT_FLAGS = -pg -g -O2
DEFINES += -DNDEBUG
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.