GLX_INTEL_swap_event is broken on the server side, where it's
currently unconditionally enabled. This completely breaks
systems running on drivers which don't support that extension.
There's no way to test for its presence on this side, so instead
of disabling it uncondtionally, just disable it for drivers
which are known to not support it. It makes sense because
most drivers do support it right now.
We'll be able to remove this once Xserver properly advertises
GLX_INTEL_swap_event.
Note: This is a candidate for stable branch branches.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=60052
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
(cherry picked from commit 076403c30d)
In GLSL, sampler indices are allocated contiguously from 0. But in the
case of ARB_fragment_program (and possibly fixed function), an app that
uses texture 0 and 2 will use sampler indices 0 and 2, so we were only
allocating space for samplers 0 and 1 and setting up sampler 0. We
would read garbage for sampler 2, resulting in flickering textures and
an angry simulator.
Fixes bad rendering in 0 A.D. and ETQW. This was fixed for pre-gen7 by
28f4be9eb9
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=25201
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=58680
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
NOTE: This is a candidate for stable branches.
(cherry picked from commit 5bb05c6e6d)
Before, we'd spill one reg, then continue on without actually register
allocating, then assertion fail when we tried to use a vgrf number as a
register number.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit d4bcc65918)
This should have been picked when 9237f0e was picked.
Bugzill: https://bugs.freedesktop.org/show_bug.cgi?id=59700
Ivybridge doesn't appear to have the same errata as Sandybridge; no
corruption was observed by setting it to more than the minimal correct
value. It's possible that we were simply lucky, since the URB entries
are 1024-bit on Ivybridge vs. 512-bit Sandybridge. Or perhaps the
underlying hardware issue is fixed.
Either way, we may as well program the minimum value since it's now
readily available, likely to be more efficient, and possibly more
correct.
v2: Use GEN7_SBE_* defines rather than GEN6_SF_*. (A copy and paste
mistake.) They're the same, but using the right names is better.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 44aa2e15f6)
(This commit message was primarily written by Paul Berry, who explained
what's going on far better than I would have.)
Previous to this patch, we thought that the only restrictions on
3DSTATE_SF's URB read length were (a) it needs to be large enough to
read all the VUE data that the SF needs, and (b) it can't be so large
that it tries to read VUE data that doesn't exist. Since the VUE map
already tells us how much VUE data exists, we didn't bother worrying
about restriction (a); we just did the easy thing and programmed the
read length to satisfy restriction (b).
However, we didn't notice this erratum in the hardware docs: "[errata]
Corruption/Hang possible if length programmed larger than recommended".
Judging by the context surrounding this erratum, it's pretty clear that
it means "URB read length must be exactly the size necessary to read all
the VUE data that the SF needs, and no larger". Which means that we
can't program the read length based on restriction (b)--we have to
program it based on restriction (a).
The URB read size needs to precisely match the amount of data that the
SF consumes; it doesn't work to simply base it on the size of the VUE.
Thankfully, the PRM contains the precise formula the hardware expects.
Fixes random UI corruption in Steam's "Big Picture Mode", random terrain
corruption in PlaneShift, and Piglit's fbo-5-varyings test.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=56920
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=60172
Tested-by: Jordan Justen <jordan.l.justen@intel.com> (v1/Piglit)
Tested-by: Martin Steigerwald <martin@lichtvoll.de> (PlaneShift)
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 09fbc29828)
The maximum SF source attribute is necessary to compute the Vertex URB
read length properly, which will be done in the next commit.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Tested-by: Martin Steigerwald <martin@lichtvoll.de>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 5e9bc7bd12)
The next patch will benefit from easy access to the source attribute
number and whether or not we're swizzling. It doesn't want the final
attr_override DWord form, however.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Tested-by: Martin Steigerwald <martin@lichtvoll.de>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit b3efc5bea8)
This optimized filter (when using repeat wrap modes,
linear min/mag/mip filters, pot textures) only applies to 2d textures,
but nothing prevented it from being used for other textures (likely
leading to very bogus sample results).
Note: This is a candidate for the 9.0 branch.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
(cherry picked from commit 66b6d51214)
We weren't emitting the SVGA_RS_OUTPUTGAMMA state so sRGB rendering
didn't work properly.
Fixes piglit's framebuffer-srgb test.
Note: This is a candidate for the stable branches.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
(cherry picked from commit ff60509157)
glRasterPos doesn't exist in the core profile.
NOTE: This is a candidate for the stable branches (9.0 and 9.1).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
(cherry picked from commit cc5fdaf2dc)
Conflicts:
src/mesa/main/extensions.c
Note: This is a candidate for the 9.0 branch.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
(cherry picked from commit 8f3c81d018)
Conflicts:
src/mesa/main/extensions.c
Check the return value of calls to u_upload_alloc() and
u_upload_data() and return early if needed.
Since we don't have a way to propagate errors all the way up to
Mesa through pipe_context::draw_vbo(), call debug_warn_once() so
the user might have some clue about OOM errors.
Note: This is a candidate for the 9.0 branch.
(cherry picked from commit b13c534f14)
Conflicts:
src/gallium/auxiliary/util/u_vbuf.c
We weren't properly checking the return value of these calls (and
calls to u_upload_data()) to detect OOM errors.
Note: This is a candidate for the 9.0 branch.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
(cherry picked from commit 8c3f9ea073)
Some callers of this function were checking the 'ptr' result to see if
the function failed. But the correct way is to check the regular
return value for PIPE_ERROR_x. Now we initialize all the returned
values at the top of the function in case we do hit an error (like OOM).
Callers are more likely to detect OOM conditions now. But there
are some callers which don't do any error checking...
Note: This is a candidate for the 9.0 branch.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
(cherry picked from commit 68a097596e)
Conflicts:
src/gallium/auxiliary/util/u_upload_mgr.c
In my testing I haven't found any cases where we get a null context
pointer, but it might still be possible. Check for null just to be safe.
Note: This is a candidate for the stable branches.
(cherry picked from commit a4311054c7)
We were warning when there was no current context and we're about
to delete a renderbuffer, but that happens fairly often and isn't
really a problem.
Fixes http://bugs.freedesktop.org/show_bug.cgi?id=57754
Note: This is a candidate for the stable branches.
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
(cherry picked from commit 006918c0db)
the critical error would use driverName.
Found by internal RH coverity scan.
Note: This is a candidate for stable branches.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit a0ec9185eb)
Conflicts:
src/glx/dri_glx.c
The use-after-free happened when the renderbuffer was shared by multiple
contexts and we tried to delete the renderbuffer using a context which
was previously deleted.
Note: this is a candidate for the stable branches.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
(cherry picked from commit 4cedb65a43)
We sometimes need a rendering context when deleting renderbuffers.
Pass it explicitly instead of trying to grab a current context
(which might be NULL). The next patch will make use of this.
Note: this is a candidate for the stable branches.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
(cherry picked from commit c73245882c)
Conflicts:
src/mesa/swrast/s_renderbuffer.c
_mesa_delete_renderbuffer() should free the mutex (though that may be a
no-op) and then free the renderbuffer object itself. Subclasses of
gl_renderbuffer can use this function too.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
(cherry picked from commit 8472bb4508)
GLX uses mapi/glapi/libglapi.la, which is only built for OpenGL.
If the user specified --enable-xlib-glx --disable-opengl, error out, as these
cannot be both observed at the same time. If the user just specified
--disable-opengl but not --disable-glx, print a warning and disable GLX as
well.
NOTE: This is a candidate for the stable branches.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=59364
Tested-by: Tom Stellard <thomas.stellard@amd.com>
(cherry picked from commit 3b888f534c)
The gallium docs for pipe_screen::is_format_supported() says that
samples==0 or samples==1 both mean that multisampling is not supported.
Return GL_MAX_SAMPLES==0 instead of 1 for consistency with other drivers.
Note: This is a candidate for the 9.0 branch.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
(cherry picked from commit d60da27273)
The swrast fragment program interpreter has trouble computing the
right texture LOD because it doesn't have easy access to input
derivatives. This causes the GLSL-based meta generate mipmap code
to fetch texels from the wrong mipmap level.
One possible fix would be to set the GL_TEXTURE_MIN/MAX_LOD parameters
to limit sampling from the right level. But let's just use the
_mesa_generate_mipmap() fallback since it's a lot faster than using
the fragment shader interpreter.
Fixes http://bugs.freedesktop.org/show_bug.cgi?id=54240
Note: This is a candidate for the 9.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
(cherry picked from commit 89551ae04f)
See previous commit for more info.
Note: This is a candidate for the 9.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
(cherry picked from commit 2180f32972)
Only drivers supporting DRI2 version >=4 support GLX_INTEL_swap_event.
So lets mark it as such otherwise applications which use this extension
(i.e. everything based on Clutter, e.g. gnome-shell) break horribly on
drivers supporting DRI2 versions only up to 3.
Note: This is a candidate for the 9.0 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
(cherry picked from commit dbb2d192de)
configure should warn if libxml2 is not found.
libxml2 is needed by glapi/gen.
Fixes error during build in src/mapi/glapi/gen:
ImportError: No module named libxml2
NOTE: This is a candidate for the 9.0 branch.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=31598
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 410b58c7bf)
The GLSL 1.40 spec says:
"Uniform block names and variable names declared within uniform
blocks are scoped at the program level."
Track the block name in the symbol table and emit errors when conflicts
exist.
Fixes es3conform's uniform_buffer_object_block_name_conflict test, and
fixes the piglit block-name-clashes-with-{variable,function,struct}.vert
tests.
NOTE: This is a candidate for the 9.0 branch.
v2: Fix bad constructor initialization. Noticed by Topi Pohjolainen.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
(cherry picked from commit 4f29169913)
Effectively this path would always assert. Move the break statement to
the (probable) intended place.
Note: This is a candidate for the stable branches.
Signed-off-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
(cherry picked from commit 06f3a1f792)
Fixes piglit glx-dont-care-mask test.
Note: This is a candidate for the stable branches.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
(cherry picked from commit fe90762414)
Fixes piglit glx-dont-care-mask test.
Note: This is a candidate for the stable branches.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
(cherry picked from commit 46bad058eb)
If the call fails, we should return NULL from XMesaCreateVisual().
This was found when Waffle tried to create a visual with depth/stencil
bits = -1. That's an illegal value for glXChooseFBConfig() and we should
return NULL in that situation.
Note: This is a candidate for the stable branches.
(cherry picked from commit 05cd6cfd5f)
No piglit regressions and now passes glsl-uniform-out-of-bounds-2.
validate_uniform_parameters now checks that the array index is
valid. This means if an index is out of bounds, glGetUniform* now
fails with GL_INVALID_OPERATION, as it should.
_mesa_uniform and _mesa_uniform_matrix also call
validate_uniform_parameters so the bounds checks there became
redundant and were removed.
The test in glGetUniformLocation is modified to check array bounds
so it now returns GL_INVALID_INDEX (-1) if you ask for the location
of a non-existent array element, as it should.
Signed-off-by: Frank Henigman <fjhenigman@google.com>
Reviewed-by: Stéphane Marchesin <marcheu@chromium.org>
(cherry picked from commit 46e3aeb077)
Previously, Mesa code assumed that glReadBuffer(GL_NONE) was only
valid for user-created framebuffer objects. However, the spec is
quite clear that is should also be valid for the default framebuffer.
From section 18.2.1 ("Obtaining Pixels from the Framebuffer") of the
GL 4.3 spec:
"When READ_FRAMEBUFFER_BINDING is zero, i.e. the default
framebuffer, src must be one of the values listed in table 17.4,
including NONE."
Similar language exists in the GLES 3.0 spec, and in desktop GL all
the way back to ARB_framebuffer_object.
Partially fixes GLES3 conformance test "CoverageES30.test".
NOTE: This is a candidate for stable branches.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
(cherry picked from commit cf5632094b)
In exec_prepare() we were comparing pointers to see if the fragment
shader variant had changed before calling tgsi_exec_machine_bind_shader().
This didn't work reliably when there was a lot of shader token malloc/
freeing going on because the memory might get reused.
Instead, bind the shader variant during regular state validation.
Fixes http://bugs.freedesktop.org/show_bug.cgi?id=40404
(fixes a couple of piglit's glsl-max-varyings test)
Note: This is a candidate for the stable branches.
(cherry picked from commit 18ef8f83b2)
The old call to tgsi_exec_machine_bind_shader() in
softpipe_delete_fs_state() was never called since the shader's original
tokens are never passed to the tgsi interpreter (only shader _variant_
tokens are). Now, unbind the variant's tokens from the tgsi interpreter
when we free the variant.
This doesn't fix any known bugs but it's the right thing to do.
Note: This is a candidate for the stable branches.
(cherry picked from commit fddcc67f5c)
The GL 3.1 and ES 3.0 specs say of glGetActiveUniformsiv:
"If an error occurs, nothing will be written to params."
So, make a pass through the indices and check that they're valid before
the pass that actually writes to params. Checking pname happens on the
first iteration of the second loop.
Fixes es3conform's getactiveuniformsiv_for_nonexistent_uniform_indices
test.
NOTE: This is a candidate for the 9.0 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 11cea47246)
Otherwise the driver announces 4096 vertex shader constants and other
way too high limits.
NOTE: This is a candidate for stable release branches.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 654a945f4d)
Conflicts:
src/mesa/drivers/dri/r200/r200_context.c
NOTE: This is a candidate for stable release branches.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit f6a4e1bc1e)
Conflicts:
src/mesa/drivers/dri/radeon/radeon_context.c
Fixes flat shading for AA lines. demos/src/trivial/line-smooth is a
test case which hits this.
Note: This is a candidate for the stable branches.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
(cherry picked from commit d2c7fe5389)
We need to clamp vertex buffer fetch based on its size, not based on the
user specified max index hint.
This matches draw_pt_fetch_run() above.
NOTE: This is a candidate for the stable branches.
Reviewed-by: Brian Paul <brianp@vmware.com>
(cherry picked from commit 7da3a947c7)
The NV formulation of primitive restart is turned on/off with
glEnableClientState/glDisableClientState. These two functions don't
exist in core contexts, which mean that GL_NV_primitive_restart is
essentially useless...even broken.
However, leaving it on causes oglconform's primitive-restart-nv tests to
run in OpenGL 3.1 contexts, which results in them all failing. This
patch causes 29 subtests to go from "fail" to "not run".
NOTE: This is a candidate for stable branches.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
(cherry picked from commit 32c6db3978)
Only fail if GLX_SAMPLE_BUFFERS_ARB or GLX_SAMPLES_ARB are non-zero.
We were already doing this in the older swrast/glx code.
This fixes a piglit/waffle problem where we'd always fail to get a
visual/config and report the test as "skip".
Note: This is a candidate for the stable branches.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit c6d74bfaf6)
We need to rebase colors (ex: set G=B=0) when getting GL_LUMINANCE
textures in following cases:
1. If the luminance texture is actually stored as rgba
2. If getting a luminance texture, but returning rgba
3. If getting an rgba texture, but returning luminance
A similar fix was pushed by Brian Paul for uncompressed textures
in commit: f5d0ced.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=47220
Observed no regressions in piglit and ogles2conform due to this fix.
This patch will cause failures in intel oglconform pxconv-gettex,
pxstore-gettex and pxtrans-gettex test cases. The cause of failures
is a bug in test cases. Expected luminance value is calculted
incorrectly in test cases: L = R+G+B.
V2: Set G = 0 when getting a RG texture but returning luminance.
Note: This is a candidate for stable branches.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <idr@freedesktop.org>
(cherry picked from commit 9ab896243c)
Fixes part of es3conform's transform_feedback_init_defaults test.
NOTE: This is a candidate for the stable branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
(cherry picked from commit 13f9012ad3)
v2: Perform this count the same way as elsewhere in this file, per
Brian Paul's review.
Fixes part of es3conform's transform_feedback_init_defaults test.
NOTE: This is a candidate for the stable branches.
Reviewed-by: Brian Paul <brianp@vmware.com>
(cherry picked from commit 7c2060f0f0)
4bits and 3bits quantitization values differ significantly for
values other than 0 and 1.
Fixes piglit draw-pixels for softpipe/llvmpipe.
NOTE: Probably a candidate for stable branches.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
(cherry picked from commit 0cb0c38cce)
v2: Fix mangled sentence in the comment, and make the loop exit early.
Fixes assertion failures in Piglit's spec/ARB_occlusion_query2/render
test as well as the game PlaneShift.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
(cherry picked from commit e755c1a36b)
We are now seing cs that can go over the vram+gtt size to avoid
failing flush early cs that goes over 70% (gtt+vram) usage. 70%
is use to allow some fragmentation.
The idea is to compute a gross estimate of memory requirement of
each draw call. After each draw call, memory will be precisely
accounted. So the uncertainty is only on the current draw call.
In practice this gave very good estimate (+/- 10% of the target
memory limit).
v2: Remove left over from testing version, remove useless NULL
checking. Improve commit message.
v3: Add comment to code on memory accounting precision
This version is a backport for mesa 9.0
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
The bug: The printed horizontal stride was the numerical value of the
BRW_HORIZONTAL_$N enum.
The fix: Translate the enum before printing.
Note: This is a candidate for the stable releases.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
(cherry picked from commit ca7d332253)
The sampler appears to ignore writemasks (even when correcting the
WRITEMASK_XYZW in brw_vec4_emit.cpp to the proper writemask) and just
always writes all four values.
To cope with this, just texture into a temporary, then MOV out into a
register that has the proper number of components.
NOTE: This is a candidate for stable branches.
Fixes es3conform's shadow_execution_vert.test.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <idr@freedesktop.org>
(cherry picked from commit f0dbd9255b)
Previously it was left undefined, causing us to select a random LOD.
NOTE: This is a candidate for stable branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <idr@freedesktop.org>
(cherry picked from commit aeff9a0d98)
This is purely a refactor. However, in a moment, we'll want to set
lod_type to float for ir_tex, where ir->lod_info.lod is NULL.
NOTE: This is a candidate for stable branches (for the next patch).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <idr@freedesktop.org>
(cherry picked from commit 56ce55d198)
This is needed to compute render_to_fbo. It even has the comment.
NOTE: This is a candidate for stable branches.
Reviewed-by: Eric Anholt <eric@anholt.net>
(cherry picked from commit bd87441ac0)
Fixes oglconform shad-compiler advanced.TestLessThani.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=48629
NOTE: This is a candidate for the 9.0 branch.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 0482998ccc)
Conflicts: fs_inst doesn't have a "predicate" field on the 9.0 branch,
so convert it to "predicated = true". See 54679fcbca.
The maximum number of URB entries come from the 3DSTATE_URB_VS and
3DSTATE_URB_GS state packet documentation; the thread count information
comes from the 3DSTATE_VS and 3DSTATE_PS state packet documentation.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
(cherry picked from commit 9add4e8038)
We weren't lowering textureGrad() with samplerCubeShadow because I
couldn't figure out the LOD calculations. It turns out they're easy:
you just have to use 1 for the depth. This causes it to pass
oglconform's four tests.
(cherry picked from commit 613e64060c)
When cherry-picking this to stable, I've reordered this before the
Haswell commit that would have introduced a regression.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Anuj Phogat <anuj.phogat@gmail.com> (original)
Tested-by: Ian Romanick <idr@freedesktop.org> (original)
Haswell supports EXT_texture_swizzle and legacy DEPTH_TEXTURE_MODE
swizzling by setting SURFACE_STATE entries. This means we don't have to
bake the swizzle settings into the shader code by emitting MOV
instructions, and thus don't have to recompile shaders whenever the
swizzles change.
Unfortunately, we can't handle GL_ALPHA this way: unlike all the others,
which store the comparison result in the .r channel (and possibly others
as well), GL_ALPHA puts it in the .a channel. The GLSL 1.30+ style
functions which return a float always simply return the .r channel,
which would be zero if we handled this as a surface override. In this
case, fall back to doing it the old way. DEPTH_TEXTURE_MODE = GL_ALPHA
isn't an interesting performance path anyway.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 6d6aef7974)
It's going to be reused in a second place soon.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit b5a042a657)
Haswell moved the "Cut Index Enable" bit from the INDEX_BUFFER packet to
a new 3DSTATE_VF packet, so we need to emit that. Also, it requires us
to specify the cut index rather than assuming it's 0xffffffff.
This adds a new Haswell-specific tracked state atom to gen7_atoms.
Normally, we would create a new generation-specific atom list, but since
there's only one difference over Ivybridge so far, I chose to simply
make it return without doing any work on non-Haswell systems.
Fixes five piglit tests:
- general/primitive-restart-DISABLE_VBO
- general/primitive-restart-VBO_COMBINED_VERTEX_AND_INDEX
- general/primitive-restart-VBO_INDEX_ONLY
- general/primitive-restart-VBO_SEPARATE_VERTEX_AND_INDEX
- general/primitive-restart-VBO_VERTEX_ONLY
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
(cherry picked from commit 815d9d405c)
Otherwise, we crash when the callback is executed, since the dri2_surf
pointer may point to invalid data.
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
LOOP_START_DX10 ignores the LOOP_CONFIG* registers, so it is not limited
to 4096 iterations like the other LOOP_* instructions. Compute shaders
need to use this instruction, and since we aren't optimizing loops with
the LOOP_CONFIG* registers for pixel and vertex shaders, it seems like
we should just use it for everything.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
(cherry picked from commit 810345492e)
Fixes a hang on the following piglit test on my rv770
./bin/ext_timer_query-time-elapsed -auto -fbo
Tom Stellard:
-Keep --with-libclc-path and mark it deprecated.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
(cherry picked from commit 959e83d650)
Also remove the recently added and overloaded LLVM_CXXFLAGS from CXXFLAGS.
Note: This is a candidate for the stable branches.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
(cherry picked from commit 21694b8eac)
Conflicts:
src/gallium/auxiliary/Makefile
command mistakenly used vector instead of scalar emit (the more or less
identical code in radeon is already correct).
Seems like it would be broken ever since kms probably.
Should fix bugs 22576, 26809.
(cherry picked from commit 320d531373)
I erroneously added this back in January 2011 in commit 88421589.
Looking at the commit message, I have no idea why I added it. It only
added non-array structure fields to the symbol table, so array structure
fields are treated correctly.
Fixes piglit tests structure-and-field-have-same-name.vert and
structure-and-field-have-same-name-nested.vert. It should also fix
WebGL conformance tests shader-with-non-reserved-words.
NOTE: This is a candidate for the stable release branches.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=57622
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit ed3f237e09)
Coverity pointed out this uninitialised class member.
Note: This is a candidate for stable branches.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit f3476ec8fa)
coverity pointed out this field was being used uninitialised.
Note: This is a candidate for stable branches.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 906670a790)
To fix a pipe_context::surface_destroy() use-after-free problem.
We previously added pipe_sampler_view_release() for similar reasons.
Note: this is a candidate for the stable branches.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
(cherry picked from commit 51223784d6)
Until we have proper 'make dist' this is an improvement of the current
situation, because each time some old Makefiles got converted to automake
we had to update the tarballs target.
NOTE: This is a candidate for the 9.0 branch.
Cc: Eric Anholt <eric@anholt.net>
Acked-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit 0f5e2ce854)
Conflicts:
Makefile.am
Commit 774fb90db3 introduced a ralloc context to
each user of struct brw_compile, but for this one a NULL context was used,
causing the later ralloc_free(mem_ctx) to not do anything.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=55175
NOTE: This is a candidate for the stable branches.
(cherry picked from commit 59bfd66a61)
The bug was found by Coverity.
NOTE: This is a candidate for the stable branches.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
(cherry picked from commit 10f214e5b2)
For GLES1 and GLES2, brwCreateContext neglected to validate the requested
context version received from the DRI layer. If DRI requested an OpenGL
ES2 context with version 3.9, we provided it one.
Before this fix, the switch statement that validated the requested GL
context flavor was an ugly #ifdef copy-paste mess. Instead of reproducing
the copy-past-mess for GLES1 and GLES2, I first refactored it. Now the
switch statement is readable.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
(cherry picked from commit 243cf7a924)
It seems that -NDEBUG and other flags might still be leaked through
those variables, so strip those off there as well.
NOTE: This is a candidate for the 9.0 branch.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
(cherry picked from commit ddb901fbf4)
The diff looks funny, but it's moving the integer vs non-integer check
below the _mesa_source_buffer_exists() check that ensures
_ColorReadBuffer is non-null, so we get a GL_INVALID_OPERATION instead
of a segfault. This looks like it had regressed in the
_mesa_error_check_format_and_type() changes, which removed the first of
the two duplicated checks for the source buffer. Fixes segfault in the
new piglit ARB_framebuffer_object/negative-readpixels-no-rb.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=45877
NOTE: This is a candidate for the stable branches.
Reviewed-by: Brian Paul <brianp@vmware.com>
(cherry picked from commit 5c99697f74)
MaxLog2 led to bugs, because it didn't work well with 1D and 3D textures.
NOTE: This is a candidate for the stable branches.
v2: correct the comment at MaxNumlevels
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 8111342e81)
We were accidentally setting bit 14 in DWord 2 (which is Reserved/MBZ)
rather than bit 14 in DWord 3 (which is AA Line Distance Mode).
There's also no reason to ever set it to legacy mode; the bit is only
used when drawing antialiased lines anyway. Set it unconditionally.
NOTE: This is a candidate for stable branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
(cherry picked from commit e639385064)
Previously we were accepting garbage after #else and #endif tokens when
the previous preprocessor conditional evaluated to false (eg, #if 0).
When the preprocessor hits a false conditional, it switches the lexer
into the SKIP state, in which it ignores non-control tokens. The parser
pops the SKIP state off the stack when it reaches the associated #elif,
#else, or #endif. Unfortunately, that meant that it only left the SKIP
state after the lexing the entire line containing the #token and thus
would accept garbage after the #token.
To fix this we use a mid-rule, which is executed immediately after the
#token is parsed.
NOTE: This is a candidate for the stable branch
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=56442
Fixes: preprocess17_frag.test from oglconform
Reviewed-by: Carl Worth <cworth@cworth.org> (glcpp-parse.y)
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 060e696799)
The GL_POINT_BIT state attribute GL_POINT_SPRITE_COORD_ORIGIN
is only supported on OpenGL-2.0 or later. Prevent glPopAttrib()
from trying to restore it on OpenGL-1.4 implementations which
support GL_ARB_POINT_SPRITE, as otherwise the sequence...
glPushAttrib(GL_POINT_BIT);
glPopAttrib();
throws an GL_INVALID_ENUM error in glPopAttrib().
See also commit f778174ea1
NOTE: This is a candidate for the 9.0 branch.
Signed-off-by: Mario Kleiner <mario.kleiner@tuebingen.mpg.de>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
(cherry picked from commit eabbe5c45f)
Since cf438f5375e242, we store actual integers for the attribute data.
We just need to reinterpret the GLfloat array as a GLint/GLuint array
so we can read the proper data.
Fixes oglconform's glsl-vertex-attrib/basic.VertexAttribI[1234][u]i
subtests (after fixing an unrelated bug in those test cases).
v2: Use the COPY_4V macro to be concise.
NOTE: This is a candidate for the stable branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Marek Olšák <maraeo@gmail.com> [v1]
(cherry picked from commit c299f44782)
I've no idea why there isn't a piglit that triggers this behaviour,
but while enabling TBOs for softpipe and r600g, I noticed all the
integer tests failed. I tracked it back to the TXF returning a float
when it should be returning an int. This fixed it and I haven't
seen any regressions in a full piglit run on softpipe.
http://bugs.freedesktop.org/55010
NOTE: This is a candidate for the 9.0 branch.
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 9785ae0973)
Integer textures generate invalid operation in glGenerateMipmap.
So, the code related to integer textures is now redundant.
Note: This is a candidate for stable branches.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
(cherry picked from commit a196f43596)
Khronos has reached a conclusion and disallowed following texture formats in
glGenerateMipMap():
(a) ASTC textures
(b) integer internal formats (e.g., RGBA8UI, RG16I)
(c) textures with stencil formats (e.g., STENCIL_INDEX8)
(d) textures with packed depth/stencil formats (e.g, DEPTH24_STENCIL8)
https://cvs.khronos.org/bugzilla/show_bug.cgi?id=9471
Note: This is a candidate for stable branches.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
(cherry picked from commit c0a78d7d7b)
This is part of fixing gl-3.1/genned-names.
v2: Fix a missing return value.
NOTE: This is a candidate for the 9.0 branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
(cherry picked from commit 947d8ff4a7)
Tested with a modified glean tstencil2 test.
NOTE: This is a candidate for stable branches.
Reviewed-by: Brian Paul <brianp@vmware.com>
(cherry picked from commit f69fc36127)
This fixes an issue where glsl_to_tgsi_visior::get_opcode() would emit the
wrong opcode because the register type was GLSL_TYPE_ARRAY/STRUCT instead of
GLSL_TYPE_FLOAT/INT/UINT/BOOL, so the function would use the float opcodes for
operations on integer or boolean values dereferenced from an array or
structure. Assertions have been added to get_opcode() to prevent this bug
from reappearing in the future.
NOTE: This is a candidate for the stable branches.
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Andreas Boll <andreas.boll.dev@gmail.com>
(cherry picked from commit 170f0459a2)
Consider the following code, which reinterprets a register as a
different type:
mov(8) g6<1>F g1.4<0,4,1>.xF
and(8) g5<1>.xUD g6<4,4,1>.xUD 0x7fffffffUD
Copy propagation would notice that we can replace the use of g6 with
g1.4 and eliminate the MOV. Unfortunately, it failed to preserve the UD
type, incorrectly generating:
and(8) g5<1>.xUD g6<4,4,1>.xF 0x7fffffffUD
Found while debugging Ian's uncommitted ARB_vertex_program LOG opcode
test with my new Mesa IR -> Vec4 IR translator.
NOTE: This is a candidate for stable release branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
(cherry picked from commit 03ea156f1b)
Consider the following code sequence:
mul(8) g4<1>F g1<0,4,1>.wzwwF g3<4,4,1>.wzwwF
mov.sat(8) m1<1>.xyF g4<4,4,1>F
mul(8) g4<1>F g1<0,4,1>.xxyxF g3<4,4,1>.xxyxF
mov.sat(8) m1<1>.zwF g4<4,4,1>F
The compute-to-MRF pass will discover the first mov.sat and attempt to
replace it by rewriting earlier instructions. Everything works out,
so it replaces scan_inst's destination file, reg, and reg_offset,
resulting in:
mul(8) m1<1>F g1<0,4,1>.wzwwF g3<4,4,1>.wzwwF
mul(8) g4<1>F g1<0,4,1>.xxyxF g3<4,4,1>.xxyxF
mov.sat(8) m1<1>.zwF g4<4,4,1>F
Unfortunately, it loses the .xy writemask on the mov.sat's MRF
destination. While this doesn't pose an immediate problem, it then
proceeds to transform the second mov.sat, resulting in:
mul(8) m1<1>F g1<0,4,1>.wzwwF g3<4,4,1>.wzwwF
mul(8) m1<1>F g1<0,4,1>.xxyxF g3<4,4,1>.xxyxF
Instead of writing both halves of the vector (like the original code),
it overwrites the full vector both times, clobbering the desired .xy
values.
When encountering a MOV, the compute-to-MRF code scans for instructions
which generate channels of the MOV source. It ensures that all
necessary channels are available (possibly written by several
instructions). In this case, *more* channels are available than
necessary, so we want to take the subset that's actually used.
Taking the bitwise and of both writemasks should accomplish that.
This was discovered by analyzing an ARB_vertex_program test
(glean/vertProg1/MUL test (with swizzle and masking)) with my new
Mesa IR -> Vec4 IR translator code. However, it should be possible
with GLSL programs as well.
NOTE: This is a candidate for stable release branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
(cherry picked from commit 10ff6772c8)
Previously, we used lookahead patterns to differentiate:
#define FOO(x) function macro
#define FOO (x) object macro
Unfortunately, our rule for function macros:
{HASH}define{HSPACE}+/{IDENTIFIER}"("
relies on infinite lookahead, and apparently triggers a Flex bug where
the generated code overflows a state buffer (see YY_STATE_BUF_SIZE).
There's no need to use infinite lookahead. We can simply change state,
match the identifier, and use a single character lookahead for the '('.
This apparently makes Flex not generate the giant state array, which
avoids the buffer overflow, and should be more efficient anyway.
Fixes piglit test 17000-consecutive-chars-identifier.frag.
NOTE: This is a candidate for every release branch ever.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Carl Worth <cworth@cworth.org>
(cherry picked from commit 9142ade154)
We should use the later since we're freeing the memory with free(),
not the gallium FREE() macro.
This fixes a mismatch when using the gallium debug memory functions.
NOTE: This is a candidate for the 9.0 branch.
(cherry picked from commit bb93439873)
MaxIfDepth of 0 means "flatten all the time", not "never flatten".
This is only desirable on hardware that can't support control flow;
software rasterization and most hardware drivers want this.
This alters behavior for swrast as well as i915. Tested on i915.
NOTE: This is a candidate for stable release branches.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 56705cd36b)
commit a010215463 removed ES2 specific dispatch
table and remap_helper, since now we are using dispatch.h which is generated
from gl_and_es_API.xml we need to generate a matching remap_helper using the
same xml.
Note: This is a candidate for the 9.0 branch.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
(cherry picked from commit 60565b564b)
Only the first 'nr_cbufs' color buffers in the pipe_framebuffer_state are
valid. The rest of the color buffer pointers might be unitialized.
Fixes a regression in the piglit fbo-srgb-blit test since changes in the
gallium blitter code.
NOTE: This is a candidate for the 9.0 branch (just to be safe).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
(cherry picked from commit 60a9390978)
To validate this code, I ran piglit -t vs quick.tests with the "go spill
everything" debugging code enabled. There was only one regression:
glsl-vs-unroll-explosion simply ran out of registers. This should be
fine in the real world, since no one actually spills every single
register.
NOTE: This is a candidate for the 9.0 branch. Even if it proves to have
bugs, it's likely better than simply failing to compile.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
(cherry picked from commit 9237f0ea8d)
move_grf_array_access_to_scratch() calculates scratch buffer offsets in
bytes. However, emit_scratch_read/write() expects the base_offset
parameter to be measured in OWords.
As a result, a shader using a scratch read/write offset greater than
zero (in practice, a shader containing more than one variable in
scratch) would use too large an offset, frequently exceeding the
available scratch space.
This patch corrects the mismatch by removing spurious conversion from
OWords to bytes in move_grf_array_access_to_scratch().
This is based on a patch by Paul Berry.
NOTE: This is a candidate for stable release branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
(cherry picked from commit 46e529672b)
Since wl_display_dispatch_queue() returns the number of processed events
or -1 on error, only cancel the roundtrip if an -1 is returned.
This also fixes a potential memory corruption bug happening when the
roundtrip does an early return and the callback later writes to the then
out of scope stack allocated `done' parameter.
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
When a client frame callback is executed and the client starts rendering
again, the egl event queue might not have been dispatched so that the
buffer release event for the previous frame hasn't been processed. In
that case a third buffer is allocated, even though it would be possible
to reuse the buffer that was just released.
The wl_display_dispatch_queue_pending() entry point is available from
wayland-client 1.0.2, so require that in configure.ac. Also, just
let the pkg-config macro throw its own error, which will show what version
we were looking for and failed to find.
Note: This is a candidate for stable branches.
Signed-off-by: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>
Commit ca3ed3e024 fixed the problem where
eglMakeCurrent would trigger a getbuffer callback that then breaks the
following wl_egl_window_resize() call. However, we still need to
invalidate buffers in eglSwapBuffers, since in wayland we always swap
buffers, so the dri driver needs to come out and ask us for the next buffer
after each swapbuffer.
Note: this is a candidate for stable branches.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
We used to invalidate the drawable after a call to eglSwapBuffers(),
so that a wl_egl_window_resize() would take effect for the next frame.
However, that leads to calling dri2_get_buffers() when eglMakeCurrent()
is called with the current context and surface, and a later call to
wl_egl_window_resize() would not take effect until the next buffer
swap.
Instead, add a callback from wl_egl_window_resize() back to the wayland
egl platform, and invalidate the drawable only when it is resized.
This solves a bug on wayland clients when going back to windowed mode
from fullscreen when clicking a pop up menu, where the window size
after this would be the fullscreen size.
Note: this is a candidate for stable branches.
CC: wayland-devel@lists.freedesktop.org
i.e. we have to allocate a temporary tiled resource if dst isn't tiled.
This fixes hardlocks on r6xx-r7xx, though using a linear resource is forbidden
on later asics as well.
NOTE: This is a candidate for the stable branches.
(cherry picked from commit 9c6410e5c3)
Conflicts:
src/gallium/drivers/r600/r600_blit.c
src/gallium/drivers/r600/r600_texture.c
The burst was incorrectly used, because ELEM_SIZE was always 0.
I don't know if the burst works, because I don't know of any test
which uses it.
NOTE: This is a candidate for the stable branches.
Reviewed-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 6a2ec765bd)
Conflicts:
src/gallium/drivers/r600/r600_shader.c
While developing cube map array support I found that we didn't
support this properly, also piglit didn't test for it at all.
I've submitted a test to piglit to check for this, and this
fixes explicit lod and lod bias with cube maps.
NOTE: This is a candidate for the 9.0 branch.
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 037b4f8038)
This fixes graphics corruption in the case where the DISCARD_RANGE flag
is used to map a buffer.
NOTE: This is a candidate for the stable branches.
(cherry picked from commit cff4c948ed)
Array textures were broken.
NOTE: This is a candidate for the stable branches.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit e7dde5c8fb)
Array textures were broken.
NOTE: This is a candidate for the stable branches.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 6dd839f23a)
It was pretty broken with array textures, where the array size (height or
depth depending on the target) shouldn't be magnified.
The guessing also doesn't fail with 1D and cube textures.
NOTE: This is a candidate for the stable branches.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit c06258dd02)
Conflicts:
src/mesa/state_tracker/st_cb_texture.c
NOTE: This is a candidate for the stable branches.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 985f2aec4a)
Conflicts:
src/mesa/main/texstorage.c
MaxLog2 led to bugs, because it didn't work well with 1D and 3D textures.
NOTE: This is a candidate for the stable branches.
v2: correct the comment at MaxNumlevels
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 8111342e81)
Conflicts:
src/mesa/main/teximage.h
The functions were broken, because they converted ints to floats.
Now we can finally advertise OpenGL 3.0. ;)
In this commit, the vbo module also tracks the type for each attrib
in addition to the size. It can be one of FLOAT, INT, UNSIGNED_INT.
The little ugliness is the vertex attribs are declared as floats even though
there may be integer values. The code just copies integer values into them
without any conversion.
This implementation passes the glVertexAttribI piglit test which I am going
to commit in piglit soon. The test covers vertex arrays, immediate mode and
display lists.
NOTE: This is a candidate for the stable branches.
Reviewed-by: Brian Paul <brianp@vmware.com>
v2: cosmetic changes as suggested by Brian
(cherry picked from commit acf438f537)
This is a regression since b3921e1f53.
The array stores VS outputs, not FS inputs.
Now llvmpipe can do 32 varyings too.
NOTE: This is a candidate for the stable branches.
Reviewed-by: Brian Paul <brianp@vmware.com>
(cherry picked from commit 183e122bdf)
And the clear color too, though that may be an issue only with GL_RGB if it's
actually RGBA in the driver.
NOTE: This is a candidate for the stable branches.
Reviewed-by: Brian Paul <brianp@vmware.com>
v2: The types of st_translate_color parameters were changed to gl_color_union
and pipe_color_union as per Brian's comment.
(cherry picked from commit 2bbd307fa6)
For precise lts support I had to do some magic with the library names, which works fine
as long as the libraries from pkg-config are used.
The parts with src/gallium/targets/va-*/Makefile will not apply on the master branch,
but do apply to the 9.0 branch.
NOTE: This is a candidate for the 9.0 branch.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Acked-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit 4f0537e645)
fixes errors ./configure and make was complaining about
NOTE: This is a candidate for the 9.0 branch.
Reviewed-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit 8a9f0fdeab)
fixes errors ./configure was complaining about
NOTE: This is a candidate for the 9.0 branch.
Reviewed-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit bc08f26485)
fixes errors ./configure was complaining about
NOTE: This is a candidate for the 9.0 branch.
Reviewed-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit a0a90ea920)
If a frame callback is not destroyed when destroying a surface, its
handler function will be invoked if the surface was destroyed after the
callback was requested but before it was invoked, causing a write on
free:ed memory.
This can happen if eglDestroySurface() is called shortly after
eglSwapBuffers().
Note: This is a candidate for stable branches.
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
(cherry picked from commit a3b6b2d305)
Global initializers using the ?: operator with at least one non-constant
operand generate ir_if statements. For example,
float foo = some_boolean ? 0.0 : 1.0;
becomes:
(declare (temporary) float conditional_tmp)
(if (var_ref some_boolean)
((assign (x) (var_ref conditional_tmp) (constant float (0.0))))
((assign (x) (var_ref conditional_tmp) (constant float (1.0)))))
This pattern is necessary because the second or third arguments could be
function calls, which create statements (not expressions).
The linker moves these global initializers into the main() function.
However, it incorrectly had an assertion that global initializer
statements were only assignments, calls, or temporary variable
declarations. As demonstrated above, they can be if statements too.
Other than the assertion, everything works fine. So remove it.
Fixes new Piglit test condition-08.vert, as well as an upcoming
game that will be released on Steam.
NOTE: This is a candidate for stable release branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
(cherry picked from commit b45a68eebf)
Previously, if the server didn't send a GLX_FRAMEBUFFER_SRGB_CAPABLE_EXT
tag, it would still be set to GLX_DONT_CARE (which is -1). Set it to
GL_FALSE instead.
NOTE: This is a candidate for stable release branches.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: Maciej Wieczorek <maciej.t.wieczorek@intel.com>
(cherry picked from commit 7b0f912e70)
This is a squash of the following two commits:
ralloc: Annotate printf functions with PRINTFLIKE(...)
Catches problems such as (in the gles3 branch)
glcpp-parse.y: In function '_glcpp_parser_handle_version_declaration':
glcpp-parse.y:1990:39: warning: format '%lli' expects argument of type
'long long int', but argument 4 has type 'int' [-Wformat]
As a side-effect, remove ralloc.c's likely/unlikely macros and just use
the ones from main/compiler.h.
NOTE: This is a candidate for the release branches.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
(cherry picked from commit 41b14d1251)
and
src/glsl/tests/Makefile.am: Specify -I... in AM_CPPFLAGS
When specifying per-target CFLAGS (e.g., ralloc_test_CFLAGS) AM_CFLAGS
are not used. AM_CPPFLAGS should be used for includes anyway.
Fixes a build problem since 41b14d125:
CC ralloc_test-ralloc.o
In file included from ../../../src/glsl/ralloc.c:42:0:
../../../src/glsl/ralloc.h:57:27: fatal error: main/compiler.h: No such file or directory
Acked-by: Paul Berry <stereotype441@gmail.com>
(cherry picked from commit 67f1e7bf5f)
Fixes the problem where configure from the tarball would report missing
files:
$ ./configure
configure: error: cannot find install-sh, install.sh, or shtool in bin
NOTE: This is a candidate for the 9.0 branch.
(cherry picked from commit ec57fbbc72)
Note: This is a candidate for the stable branches.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Adam Jackson <ajax@redhat.com>
(cherry picked from commit a30d14635d)
Due to a string mismatch, INTEL_swap_event wasn't listed among GLX
extensions for the connection, even when present on both client and
server. That is, glXQueryServerString and glXGetClientString reported the
extension, but glXQueryExtensionsString did not.
Note: This is a candidate for the stable branches.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=56057
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
(cherry picked from commit 1d0c621121)
Version 12 of the EGL_KHR_create_context spec changed this behavior.
NOTE: This is a candidate for the 9.0 branch
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
(cherry picked from commit 587d5db11d)
It doesn't provide the cross-process buffer sharing that a window system
pixmap could otherwise support and we don't have anything left that uses
this type of surface.
The 0.99.0 Wayland release changes the event API to provide a thread-safe
mechanism for receiving events specific to a subsystem (such as EGL) and
we need to use it in the EGL platform.
The Wayland protocol now also requires a commit request to make changes
take effect, issue that from eglSwapBuffers.
We need to create bos suitable for cursor usage that we can map and
write data into. The kms dumb ioctls is all we need for this, so drop
the dependency on libkms.
This was introduced by commit 24db6d6 (cherry-picked from a683012). The
original patch fixed potential GPU hangs on SNB, and it caused some
rendering regressions there. The benefits outweigh the costs.
However, the work-around is not necessary for pre-SNB chipsets.
Applying the work-around there gives rendering regressions with no
benefit. This patch disables the work-around on pre-SNB chipsets.
Without the original patch, the piglit test
depthstencil-render-miplevels would reliably hang an SNB GPU. On ILK
this test would not hang, and it does not hang with this patch.
NOTE: This is a candidate for the 8.0 and 9.0 branches
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Paul Berry <stereotype441@gmail.com>
The layer dimension of array textures is not subject to mipmap minification.
OTOH we were missing an assertion for the depth dimension.
Fixes assertion failures with piglit {f,v}s-textureSize-sampler1DArrayShadow.
For some reason, they only resulted in piglit 'warn' results for me, not
failures.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=56211
NOTE: This is a candidate for the stable branches.
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Andreas Boll <andreas.boll.dev@gmail.com>
(cherry picked from commit eee1ff423c)
This is a squash of:
mesa: add get-pick-list.sh script into bin/
NOTE: This is a candidate for the stable branches.
(cherry picked from commit 2d95db660e)
This is the 2nd commit message:
mesa: simplify get-pick-list.sh script
and add a description for the script
NOTE: This is a candidate for the stable branches.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
(cherry picked from commit fa27a0db43)
This is the 3rd commit message:
mesa: optimize get-pick-list.sh script
cuts down the while loop iterations from 4600 to 380 commits at the
moment
NOTE: This is a candidate for the stable branches.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
(cherry picked from commit b2991526ed)
This is the 4th commit message:
mesa: grep for commits with cherry picked in commit message only once
and save them temporary in already_picked
NOTE: This is a candidate for the stable branches.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
(cherry picked from commit 135ec3a1db)
This is the 5th commit message:
mesa: fix indentation in get-pick-list.sh script
NOTE: This is a candidate for the stable branches.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
(cherry picked from commit 3e3ff4cd73)
Commit 006c1a3c65 introduced a call to
clock_gettime, but failed to include <time.h>, breaking the build in
some cases.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 7cb8764ca3)
This got broken by:
22b7ddc7f glapi: rename/move GL_POLYGON_OFFSET_BIAS to its extension
section
Fix it by appending the _EXT suffix to the enum in the test too.
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Oliver McFadden <oliver.mcfadden@linux.intel.com>
The glGet hash was initialized only once for a single GL API, even if
the application later created a context for a different API. This
resulted in glGet failing for otherwise valid parameters in a context
if that parameter was invalid in another context created earlier.
Fix this by using a separate hash table for each API.
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Oliver McFadden <oliver.mcfadden@linux.intel.com>
This should be named GL_POLYGON_OFFSET_BIAS_EXT and listed under the
EXT_polygon_offset section. (Solution by Ian Romanick)
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Oliver McFadden <oliver.mcfadden@linux.intel.com>
These enums are valid only in ES1 and ES2. So far they were marked valid
incorrectly, depending on the previous API mask in the enum list.
Signed-off-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Oliver McFadden <oliver.mcfadden@linux.intel.com>
For glCompressedTexSubImage, width or height = 0 is legal.
Fixes a failure in piglit's s3tc-errors test.
This is for the 9.0 and 8.0 branches. Already fixed on master.
This simply avoids some failed assertions but there's no reason to
call the driver hooks for storing a tex image if its size is zero.
Note: This is a candidate for the stable branches.
(cherry picked from commit 91d8409649)
This is a squash for the following 7 commits. The first introduces the
functionality, and the remaining six fix various bugs.
Patch 1:
_mesa_meta_GenerateMipmap: Support all texture targets by generating shaders at runtime
glsl path of _mesa_meta_GenerateMipmap() function would require different fragment
shaders depending on the texture target. This patch adds the code to generate
appropriate fragment shader programs at run time.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=54296
V2: Removed the code for integer textures as ARB is planning to
disallow automatic mipmap generation for integer textures.
Now using ralloc_asprintf in setup_glsl_generate_mipmap().
NOTE: This is a candidate for stable branches.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
(cherry picked from commit 299acac849)
Patch 2:
_mesa_meta_GenerateMipmap: Generate separate shaders for glsl 120 / 130
glsl version of _mesa_meta_GenerateMipmap() would require separate
shaders for glsl 120 and 130.
V2: Removed the code for integer textures as ARB is planning to
disallow automatic mipmap generation for integer textures.
NOTE: This is a candidate for stable branches.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
(cherry picked from commit 15bf3103b4)
Patch 3:
meta: Add on demand compilation of per target shader programs
A call to glGenerateMipmap() follows the generation of a relevant
shader program in setup_glsl_generate_mipmap().
To support all texture targets and to avoid compiling shaders
everytime, per target shader programs are compiled on demand
and saved for the next call.
Fixes float-texture(mipmap.manual):
See Comment 6: https://bugs.freedesktop.org/show_bug.cgi?id=54296
NOTE: This is a candidate for stable branches.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
(cherry picked from commit eb1d87fb94)
Patch 4:
meta: make mem_ctx non-global.
I can't see any external users, and this is a global symbol,
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 36639ec6e9)
Patch 5:
meta: Remove unsafe global mem_ctx pointer
NOTE: This is a candidate for the 9.0 branch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
(cherry picked from commit ab097dde0c)
Patch 6:
meta: Rearrange shader creation in setup_glsl_generate_mipmap
The diff looks weird, but this moves the code from the first 'if
(ctx->Const.GLSLVersion < 130)' block down into the second block. It
also moves some variable decalarations closer to their use.
NOTE: This is a candidate for the 9.0 branch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
(cherry picked from commit 3308c079bd)
Patch 7:
meta: Don't use GLSL 1.30 shader on OpenGL ES 2
Fixes GLES2 CoverageGL conformance test.
NOTE: This is a candidate for the 9.0 branch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
(cherry picked from commit 0242381f06)
Somehow I only hit this issue with my latest libdrm changes.
This won't be needed with DB texturing.
NOTE: This is a candidate for the 9.0 branch.
(cherry picked from commit 9dfca930d7)
A compressed texture image size doesn't have to be a multiple of the
compressed block size (only sub-images do). Fixes issues when building
compressed mipmaps because we often wind up with non-block-size images
for the higher mipmap levels.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=55445
Note: This is a candidate for the stable branches.
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Sven Arvidsson <sa@whiz.se>
(cherry picked from commit df4a88ac43)
From SandyBridge PRM, volume 2 Part 1, section 12.2.3, BLEND_STATE:
DWord 1, Bit 30 (AlphaToOne Enable):
"If Dual Source Blending is enabled, this bit must be disabled"
Note: This is a candidate for stable branches.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
(cherry picked from commit ea0d088727)
The idea here is to not flag _NEW_VARYING_VP_INPUTS when shaders (either
GLSL or ARB vp/fp) are in use. If either TNL or TexEnv programs are
active, at least one stage is using fixed function.
On Pineview, fixes 20 Piglit, 60 oglconforms, and 7 ES 1.1 conformance
tests, as well as missing textures in Xonotic. These were all
regressions since commit fb4a34e60e.
NOTE: This is a candidate for the 9.0 branch.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=49127
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=54807
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
(cherry picked from commit 7fa0f10cd8)
This function is only present in GLES1 and in the OpenGL compatibility
profile.
Fixes the following "make check" failure:
[----------] 1 test from DispatchSanity_test
[ RUN ] DispatchSanity_test.GLES2
Mesa warning: couldn't open libtxc_dxtn.so, software DXTn
compression/decompression unavailable
dispatch_sanity.cpp:122: Failure
Value of: table[i]
Actual: 0x4de54e
Expected: (_glapi_proc) _mesa_generic_nop
Which is: 0x41af72
i = 321
[ FAILED ] DispatchSanity_test.GLES2 (4 ms)
[----------] 1 test from DispatchSanity_test (4 ms total)
NOTE: This is a candidate for stable release branches.
Reviewed-by: Oliver McFadden <oliver.mcfadden@linux.intel.com>
Tested-by: Oliver McFadden <oliver.mcfadden@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
(cherry picked from commit 8f0b81bf7d)
The EGL_NOK_swap_region2 spec states that the rectangles are specified
with a bottom-left origin within a surface coordinate space also with a
bottom left origin, so this patch ensures the rectangles are flipped
before passing them on to dri2_copy_region.
Fixes piglit's egl-nok-swap-region test.
Tested-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit 0a523a8820)
(cherry picked from commit 837f06b42f)
The only symbols that need to be public (those in intel_screen.c that the
loader looks for) are already marked public. Saves 100k of compiled driver
size.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
I think libtool should be handling this for us, but the build fails for
Jordan because libdricommon (a static library, which uses expat) appears
before -lexpat on the linker command.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Tested-by: Jordan Justen <jordan.l.justen@intel.com>
(cherry picked from commit 31ab61cac1)
Conflicts:
src/mesa/drivers/dri/i965/Makefile.am
If the destination texture image doesn't exist we'd hit an assertion
(or crash in a release build). The piglit/s3tc-errors test hits this.
This has already been fixed in master by the error checking code
consolidation.
Note: This is a candidate for the 8.0 branch.
In commit 091eb15b69, Jordan changed get_temp_image_type() to use
_mesa_get_format_datatype() instead of returning GL_FLOAT. That has
several possible return values: GL_FLOAT, GL_INT, GL_UNSIGNED_INT,
GL_SIGNED_NORMALIZED, and GL_UNSIGNED_NORMALIZED.
We do want to use GL_INT/GL_UNSIGNED_INT for integer formats. However,
we want to continue using GL_FLOAT for the normalized fixed-point types.
There isn't any code in pack.c to handle GL_(UN)SIGNED_NORMALIZED.
Fixes oglconform's fboarb advanced.blit.copypix, which was regressed by
commit 091eb15b69.
NOTE: This is a candidate for the 9.0 branch.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=53573
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 3767b25bd3)
This failed when all the uploads to occur were uniform-type vertex data (like
glColor4f being active across a DrawArrays), because it would upload 1 element
instead of 1 element per vertex. There was no citation for how this code
helped any particular application, and it breaks ETQW, so just remove it.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=47170
NOTE: This is a candidate for the 9.0 and 8.0 branches.
Reviewed-and-tested-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 0334e8dc25)
GL_TEXTURE_1D, GL_TEXTURE_3D, GL_TEXTURE_RECTANGLE, and
GL_TEXTURE_GEN_S/T/R/Q don't exist in ES 1 contexts, so any meta ops
that used _mesa_meta_begin with MESA_META_TEXTURE would trigger GL
errors. One such operation is _mesa_meta_Clear().
On ES 1, we want to disable GL_TEXTURE_GEN_STR_OES instead.
Fixes the ES1 conformance test miplin.c, which was regressed by commit
08be1d288f.
NOTE: This is a candidate for the 9.0 branch.
v2: Also blacklist GL_TEXTURE_3D, per Brian's comment.
v3: Disable GL_TEXTURE_GEN_STR_OES, per Ian's comment.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=54297
Reviewed-by: Brian Paul <brianp@vmware.com> [v1]
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 679c93ff89)
If a subtexture region isn't aligned to the compressed block size,
return GL_INVALID_OPERATION, not gl_INVALID_VALUE.
NOTE: This is a candidate for the stable branches.
Reviewed-by: Eric Anholt <eric@anholt.net>
(cherry picked from commit 1f586684d6)
This squashes two commits from master:
i965: Don't free the intel_context structure when intelCreateContext fails.
intelDestroyContext will eventually be called, and it will clean things
up. The call to brwInitVtbl is moved earlier so that
intelDestroyContext can call the device-specific destructor. This also
makes the code look more like the i915 code.
NOTE: This is a candidate for the 9.0 branch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=54301
(cherry picked from commit 87f26214d6)
And:
i965: brwInitVtbl needs to know the chipset generation
Fixes major regressions since de958de.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
(cherry picked from commit e87c63f288)
The second commit message should have read 'since 87f2621', of course.
This is a squash of 2 commits from master.
The first commit is:
i965/blorp: Add support for blits between SRGB and linear formats.
Fixes colorspace issues in L4D2 when multisampling is enabled (the
scene was far too dark, but the flashlight area was way too bright).
The nVidia and AMD binary drivers both allow this kind of blit.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit e2249e8c4d)
The second commit is:
i965/blorp: Fix sRGB MSAA resolves.
Commit e2249e8c4d (i965/blorp: Add
support for blits between SRGB and linear formats) changed blorp to
always configure surface states for in linear format (even if the
underlying surface is sRGB). This allowed sRGB-to-linear and
linear-to-sRGB blits to occur without causing the image to be
inappropriately brightened or darkened.
However, it broke sRGB MSAA resolves, since they rely on the
destination buffer format being sRGB in order to ensure that samples
are averaged together in sRGB-correct fashion.
This patch fixes the problem by instead configuring the source buffer
to use the *same* format as the destination buffer. This ensures that
the image won't be brightened or darkened, but preserves proper sRGB
averaging.
Fixes piglit tests "EXT_framebuffer_multisample/accuracy srgb".
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=55265
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-and-tested-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 124b214f09)
Fixes an assertion failure when compiling certain shaders that need both
pull constants and register spilling:
brw_eu_emit.c:204: validate_reg: Assertion `execsize >= width' failed.
NOTE: This is a candidate for the 8.0 release branch.
Signed-off-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit ab5ce2789f)
This patch is a band-aid fix for a bug in commit 5fd67fa (i965/blorp:
Reduce alignment restrictions for stencil blits), which causes
multisampled stencil blits to work incorrectly on Sandy Bridge.
When blitting to or from a normal stencil buffer, we have to use a
coordinate transformation that swizzles coordinates to account for the
fact that stencil buffers use W tiling, but the most similar tiling
format available for textures and render targets is Y tiling. The
differences between W and Y tiling cause pixels to be scrambled within
a block of size 8x4 (width x height) as measured relative to a W tile,
or 16x2 as measured relative to a Y tile. So in order to make sure
that pixels at the edges of the blit aren't lost, we need to align the
rendering rectangle (and the buffer sizes) to multiples of the 8x4
block size. This alignment happens in the brw_blorp_blit_params
constructor, whereas the determination of how to swizzle the
coordinates happens during code generation, in the
brw_blorp_blit_program class.
When blitting to or from a multisampled stencil buffer, the coordinate
swizzling is more complex, because it has to account for the
interleaving pattern of samples, which uses 4x4 blocks for 4x MSAA and
8x4 blocks for 8x MSAA. The end result is that if multisampling is in
use, the 16x2 block size (relative so a Y tile) needs to be expanded
to 16x4, and the corresponding size relative to a W tile expands to
8x8.
The problem doesn't affect Ivy Bridge severely enough to crop up in
Piglit tests because on Ivy Bridge we have to disable multisampling
when blitting *to* a multisampled stencil buffer (the blorp compiler
generates code to compensate for the fact that multisampling is
disabled). However I suspect a bug is still present because we don't
disable multisampling when blitting *from* a multisampled stencil
buffer.
This patch fixes the problem by doubling the vertical alignment
requirement when blitting to or from a multisampled stencil buffer,
and multisampling has not been disabled.
In the long run I would like to rework the brw_blorp_blit_params
constructor--it's difficult to follow and has had several subtle bugs
like this one. However this band-aid fix should be suitable for
cherry-picking to release branches.
Fixes Piglit tests "unaligned-blit {2,4} stencil {msaa,upsample}" on
Sandy Bridge.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit a33ce665a5)
Previously, we aligned all stencil blit operations to multiples of the
size of a tile, since stencil buffers use W-tiling, and blorp has to
approximate this by configuring the 3D pipeline for Y-tiling and
swizzling coordinates.
However, this was unnecessarily conservative; it turns out that the
differences between W-tiling and Y-tiling are confined to 32-byte
sub-tiles within the 4k tiling pattern; the layout of these 32-byte
sub-tiles within the larger 4k tile is the same (8 sub-tiles across by
16 sub-tiles down, in column-major order). Therefore we only need to
align stencil blit operations to multiples of the sub-tile size.
Note: although the performance improvement of this change is probably
quite small, the fact that W-tiling and Y-tiling formats only differ
within 32-byte sub-tiles will be essential in a future patch to ensure
that stencil blits work correctly between parts of the miptree other
than level/layer 0. Making this change provides handy documentation
(and validation) of this fact.
Acked-by: Eric Anholt <eric@anholt.net>
(cherry picked from commit 5fd67fac14)
When blitting to a stencil buffer, we need to align the rectangle we
send down the rendering pipeline, to account for the fact that the
stencil buffer uses a W-tiled layout, but we are configuring its
surface state as Y-tiled.
Previously, when the stencil buffer was multisampled, we assumed that
we could reduce the amount of alignment that was necessary, since each
pixel occupies a block of 2x2 or 4x2 samples in the stencil buffer.
That would have been correct if the coordinates we were adjusting were
measured in pixels. However, the conversion from pixel coordinates to
coordinates within the interleaved buffer has already been done;
therefore the full alignment restriction applies.
Note: the reason this mistake wasn't previously uncovered by piglit
tests is because it is being masked by another mistake: the blorp
engine is using overly conservative alignment restrictions when doing
stencil blits. The overly conservative alignment restrictions will be
removed in the patch that follows. Doing this fix now will prevent
the subsequent patch from introducing regressions.
Acked-by: Eric Anholt <eric@anholt.net>
(cherry picked from commit 1a75063d5f)
This patch modifies intel_region_get_aligned_offset() to make the
appropriate calculation when the blorp engine sets up a W-tiled
stencil buffer using a Y-tiled SURFACE_STATE.
Acked-by: Eric Anholt <eric@anholt.net>
(cherry picked from commit b760c9913d)
When the blorp engine is performing a blit from one stencil buffer to
another, it sets up the surface state for these buffers as Y-tiled, so
it needs to be able to force intel_region_get_tile_masks() to return
the appropriate masks for a Y-tiled region.
Acked-by: Eric Anholt <eric@anholt.net>
(cherry picked from commit 50dec7fc2d)
Previously, when performing a blit using the blorp engine, we failed
to account for the level and layer of the source and destination. As
a result, all blits would occur between miplevel 0 and layer 0 of the
corresponding textures, regardless of which level/layer was bound to
the framebuffer.
This patch passes the correct level and layer through
brw_blorp_miptrees() into the brw_blorp_blit_params data structure.
Further patches in the series will adapt
gen{6,7}_blorp_emit_surface_state to make use of these parameters.
Reviewed-by: Eric Anholt <eric@anholt.net>
(cherry picked from commit 3123f06215)
Currently, gen{6,7}_blorp_emit_surface_state assumes that the src and
dst surfaces are mapped to miplevel 0 and layer 0 (thus no surface
offset is required). This is a bug, since the user might try to blit
to and from levels/layers other than 0.
To fix this bug, it will not be sufficient to have
gen6_{6,7}_blorp_emit_surface_state look up the surface offset at the
time they set up the surface state, since these offsets will need to
be tweaked when blitting stencil buffers (due to the fact that stencil
buffer blits have to swizzle between W and Y tiling formats).
So, to pave the way for the bug fix, this patch causes the x and y
offsets to be computed during blit setup and stored in
brw_blorp_mip_info.
As a result of this change, brw_blorp_mip_info doesn't need to store
the level and layer anymore.
For consistency, this patch makes a similar change to the handling of
depth buffers when doing HiZ operations.
Reviewed-by: Eric Anholt <eric@anholt.net>
(cherry picked from commit c130ce7b2b)
Previously, gen{6,7}_blorp_emit_surface_state would look up the width
and height of the surface at the time they set up the surface state,
and then tweak it if necessary (it's necessary when a W-tiled surface
is being mapped as Y-tiled). With this patch, we look up the width
and height when setting up the blit, and store them in
brw_blorp_mip_info. This allows us to do the necessary tweak in the
brw_blorp_blit_params constructor (where it makes more sense). It
also reduces the need to keep track of level and layer in
brw_blorp_mip_info, so that a future patch can eliminate them
entirely.
For consistency, this patch makes a similar change to the handling of
depth buffers when doing HiZ operations.
Reviewed-by: Eric Anholt <eric@anholt.net>
(cherry picked from commit 09b0fa8499)
This makes it more convenient for blorp functions to get access to
Intel-specific data inside the renderbuffer objects.
Reviewed-by: Eric Anholt <eric@anholt.net>
(cherry picked from commit e14b1288ef)
Also add a clarifying comment for why the width/height doesn't need
adjustment for Gen7.
Reviewed-by: Eric Anholt <eric@anholt.net>
(cherry picked from commit 32c7b2769c)
Since Gen6+ stencil buffers use W-tiling (a tiling arrangement which
drm and the kernel are not aware of) we need to round up the width and
height of a stencil buffer to multiples of the W-tile size (64x64)
before allocating a stencil buffer. Previously, we rounded up the
size of the base miplevel, and then computed the miptree layout based
on the rounded up size. This was incorrect, because it meant that the
total size of the miptree would not be properly W-tile aligned, and
therefore we would not always allocate enough pages.
(Note: even though the GL API doesn't allow creation of mipmapped
stencil textures, it does allow mipmapping of a combined depth/stencil
texture, and on Gen6+, a combined depth/stencil texture is internally
implemented as a pair of separate depth and stencil buffers.)
For example, on Sandy Bridge, when allocating a mipmapped stencil
texture of size 128x128, we would first round up to the nearest
multiple of 64x64 (causing no change to the size), and then compute
the miptree layout (whose size worked out to 128x196). Then we would
request an allocation of 128*196 bytes (6.125 pages), causing 7 pages
to be allocated to the texture. However, the texture needs 8 pages,
since each W-tile occupies a page, and it takes 2 W-tiles to cover a
width of 128 and 4 W-tiles to cover a height of 196.
This patch changes the order of operations so that the miptree layout
is computed first and then the total size of the miptree is rounded up
to be W-tile aligned.
NOTE: This is a candidate for the 8.0 release branch.
Reviewed-by: Eric Anholt <eric@anholt.net>
(cherry picked from commit bde833c9d0)
v2: Allow GL_ARB_shader_objects functions in core profile because we
still expose the extension string there. Don't allow
glBindFragDataLocation in GLES3 because it's not part of that API.
Based (mostly) on review comments from Eric Anholt.
NOTE: This is a candidate for the 9.0 branch
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
(cherry picked from commit be66cf950e)
This isn't used by this patch, but it will be necessary for several
follow-on patches. Separating this out will make it easier to reorder
patches later.
NOTE: This is a candidate for the 9.0 branch
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
(cherry picked from commit 3ef9e43865)
This function is not the same as glGetProgramiv.
NOTE: This is a candidate for the 9.0 branch
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
(cherry picked from commit 7f7268d385)
This was already (correctly) supported for glGetSamplerParameter paths.
NOTE: This is a candidate for stable branches.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
(cherry picked from commit ae3023e967)
This fixes glGetStringi(GL_EXTENSIONS,.. for core contexts. Previously,
all extension names returned would be NULL.
NOTE: This is a candidate for release branches.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit d30a7d2eb4)
MSAA resolves and other blit-like operations ignore SRGB state anyway,
so we should be able to safely allow resolves between compatible
SRGB/linear formats like SRGBA8 and RGBA8888.
This matches the behavior of the nVidia and AMD binary drivers.
Fixes completely black rendering when using multisampling in L4D2.
NOTE: This is a candidate for the 9.0 branch.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit c96828ecb4)
Commit 42723d88d intended to override an S3TC internalFormat to a
generic compressed format when the application requested online
compression of uncompressed data. Unfortunately, it also broke
pre-compressed textures when libtxc_dxtn isn't installed but the
extensions are forced on.
Both glCompressedTexImage2D() and glTexImage2D() call teximage(), which
calls _mesa_choose_texture_format(), hitting this override code. If we
have actual S3TC source data, we can't treat it as any other format, and
need to avoid the override.
Since glCompressedTexImage2D() passes in a format of GL_NONE (which is
illegal for glTexImage), we can use that to detect the pre-compressed
case and avoid the overrides.
Fixes a regression since 42723d88d3.
NOTE: This is a candidate for the 9.0 branch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-and-tested-by: Jordan Justen <jordan.l.justen@intel.com>
(cherry picked from commit 328961d955)
GL_TEXTURE_1D, GL_TEXTURE_3D, GL_TEXTURE_RECTANGLE, and GL_TEXTURE_GEN_*
don't exist in ES 1 contexts, so any meta ops that used _mesa_meta_begin
with MESA_META_TEXTURE would trigger GL errors. One such operation is
_mesa_meta_Clear().
Fixes the ES1 conformance test miplin.c, which was regressed by commit
08be1d288f.
NOTE: This is a candidate for the 9.0 branch.
v2: Also blacklist GL_TEXTURE_3D, per Brian's comments.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=54297
Cc: Ian Romanick <idr@freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Don't cache pointers to elements of reallocatable array.
In some circumstances it caused false cache hits resulting in incorrect
command stream and gpu lockup.
Note: This is a candidate for the stable branches.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
(cherry picked from commit 9aa8bac98b)
pipe_draw_info::indexed determines if it should be indexed and not
the presence of an index buffer.
This fixes crashes in r300g.
NOTE: This is a candidate for the stable branches.
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
(cherry picked from commit 2988fa940e)
This symbol with dricore escapes into the namespace, its too generic,
we should prefix it with something just to be nice.
Should be applied to stable + 9.0
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 88b0790b1a)
So glcpp tried to workaround yylex its own way, but failed,
do it properly.
This fixes another crash found after fixing the first crash.
this is a candidate for 9.0 and stable branches
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 53d46bc787)
This avoids us making a global yylex symbol which will interfere will
all sorts of apps.
with libdricore which can't do symbol visibility currently we pollute
the namespace with this.
This is a candidate for 9.0 & stable branches.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit cc943c8470)
The current code is duplicated in two places and relies on `uname` to
detect the flags. This is no good for cross-compiling, and the current
logic uses -m64 for the x32 ABI which breaks things.
Unify the code in one place, avoid `uname` completely, and add support
for the new x32 ABI.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
ARB fragment programs use texture unit numbers directly, unlike GLSL
which has an extra indirection. If a fragment program only uses one
texture assigned to GL_TEXTURE1, SamplersUsed will only contain a single
bit, which would make us only upload a single surface/sampler state
entry. However, it needs to be the second entry.
Using _mesa_fls() instead of _mesa_bitcount() solves this. For ARB
programs, this makes num_samplers the ID of the highest texture unit
used. Since GLSL uses consecutive integers assigned by the linker,
_mesa_fls() should give the same result as _mesa_bitcount()..
Fixes a regression since 85e8e9e000,
which caused GPU hangs in ETQW (and probably others), as well as
breaking piglit test fp-fragment-position.
v2: Add a comment, as suggested by Matt.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=54098
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=54179
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Tested-by: meng <mengmeng.meng@intel.com>
(cherry picked from commit 28f4be9eb9)
ffs() finds the least significant bit set; _mesa_fls() finds the /most/
significant bit.
v2: Make it an inline function in imports.h, per Brian's suggestion.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit 0fc163408e)
Use 1/256 for R6xx/7xx, 1/4096 for evergreen, instead of default 1/16.
Helps to pass some piglit tests (fbo, multisample).
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit f44bda17f5)
From ARB_sync spec:
If the value of <timeout> is zero, then ClientWaitSync does not
block, but simply tests the current state of <sync>. TIMEOUT_EXPIRED
will be returned in this case if <sync> is not signaled, even though
no actual wait was performed.
Fixes random fails of the arb_sync-timeout-zero piglit test on r600g.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
(cherry picked from commit b05a1fc156)
This is basically a follow-on to 1f5b1f9846.
Basically, generate GL errors for ordinary invalid parameters for proxy
targets the same as for non-proxy targets. Only texture size and OOM
errors should be handled specially for proxies.
Note: This is a candidate for the stable branches.
(cherry picked from commit 35c75f6777)
Turns out we weren't doing any format checking before. Now check
the internal format and, in particular, make sure that unsized internal
formats aren't accepted.
Note: This is a candidate for the stable branches.
(cherry picked from commit 2e4fc54977)
From the GL 4.3 spec, section 18.3.1 "Blitting Pixel Rectangles":
If SAMPLE_BUFFERS for either the read framebuffer or draw
framebuffer is greater than zero, no copy is performed and an
INVALID_OPERATION error is generated if the dimensions of the
source and destination rectangles provided to BlitFramebuffer are
not identical, or if the formats of the read and draw framebuffers
are not identical.
It is not clear from the spec whether "dimensions" should mean both
sign and magnitude, or just magnitude.
Previously, Mesa interpreted "dimensions" as meaning both sign and
magnitude, so any multisampled blit that attempted to flip the image
in the X and/or Y direction would fail.
However, Y flips are likely to be commonplace in OpenGL applications
that have been ported from DirectX applications, as a result of the
fact that DirectX and OpenGL differ in their orientation of the Y
axis. Furthermore, at least one commercial driver (nVidia) permits Y
filps, and L4D2 relies on them being permitted. So it seems prudent
for Mesa to permit them.
This patch changes Mesa to allow both X and Y flips, since there is no
language in the spec to indicate that X and Y flips should be treated
differently.
NOTE: This is a candidate for stable release branches.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
(cherry picked from commit 5d5f0f3491)
According to the GLSL 4.30 specification, this is a compile time error.
Earlier specifications don't specify a behavior, but since 0 and 1 are
the only valid indices for dual source blending, it makes sense to
generate the error.
Fixes (the fixed version of) piglit's layout-12.frag.
NOTE: This is a candidate for the 9.0 branch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
(cherry picked from commit 354f2cb5c7)
This fixes the blue zombies bug in l4d2.
NOTE: This is a candidate for the 9.0 branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 39aca5076f)
This fixes an issue where the local 'table' variable was hiding the
function parameter name in glGetColorTable(..., void *table).
This should be OK as long as there's never a GL entrypoint that uses
'disp_table' as a parameter name.
Note: This is a candidate for the 9.0 branch.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
(cherry picked from commit 043f66204b)
This is a long-standing omission in Mesa's texture image size checking.
We need to take the mipmap level into consideration when checking if the
width, height and depth are too large.
Fixes the new piglit max-texture-size-level test.
Thanks to Stéphane Marchesin for finding this problem.
Note: This is a candidate for the stable branches.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
(cherry picked from commit 771e7b6d88)
The CALLOC() macro only takes one argument so this was being treated
as a comma expression. Simply use calloc() instead.
A follow-on patch will replace all CALLOC() calls with calloc().
NOTE: This is a candidate for the 8.0 and 9.0 branches.
(cherry picked from commit 43ed822a50)
glGetStringi(GL_EXTENSIONS) failed to respect the context's API, and so
returned all internally enabled GLES extensions from a GL context.
Likewise, glGetIntegerv(GL_NUM_EXTENSIONS) also failed to repsect the
context's API.
Note: This is a candidate for the 8.0 and 9.0 branches.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
(cherry picked from commit f29a4b0157)
On Android we want to add only double buffered configs for visuals.
Earlier implementation set the SurfaceType as 0 for single buffered
configs but driver still exposed these configs that were not compatible
with any egl surface type. This caused Khronos conformance test runs to
fail on Android. This patch fixes the issue by skipping single buffered
configs earlier and not exposing them.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
(cherry picked from commit d58ca43b80)
I wonder if the better solution is to have _mesa_meta_GenerateMipmap not
use MESA_META_ALL for the GLSL path. Even on compatibility profiles
there is no reason to save and restore fog on this path.
NOTE: This is a candidate for the 9.0 branch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Lu Hua <huax.lu@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=54295
(cherry picked from commit 51b069e7aa)
Reading brw->fragment_program is nonsensical in compiler code: it
contains the currently active program (if any), not the one currently
being compiled. Attempting to access it may either lead to crashes
(null pointer dereference if no program is active) or wrong results.
Fixes piglit regressions since 9ef710575b
on pre-Sandybridge hardware. The actual bug was created in commit
7b1fbc6889.
NOTE: This is a candidate for the 8.0 branch.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=54183
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
(cherry picked from commit 4d9abd96cc)
As discussed with Kristian on #wayland. Pushes the decision of components into
the dri driver giving it greater freedom to allow t to implement YUV samplers
in hardware, and which mode to use.
This interface will also allow drivers like SVGA to implement YUV surfaces
without the need to sub-allocate and instead send 3 seperate buffers for each
channel, currently not implemented.
I have tested these changes on Gallium Svga. Scott tested them on both intel
and Gallium Radeon. Kristan and Pekka tested them on intel.
v2: Fix typo in dri2_from_planar.
v3: Merge in intel changes.
(cherry picked from commit 6a7dea93fa)
Tested-by: Scott Moreau <oreaus@gmail.com>
Tested-by: Pekka Paalanen <ppaalanen@gmail.com>
Tested-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Signed-off-by: Jakob Bornecrantz <jakob@vmware.com>
Now that OpenGL 3.1 is supported by at least one driver, follow
tradition and bump the major version number.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The blend state is different and the resolve single-sample buffer must have
FMASK and CMASK enabled. I decided to have one CMASK and one FMASK
per context instead of per resource.
There are new FMASK and CMASK allocation helpers and a new buffer_create
helper for that.
The color resolve on r6xx needs PT_RECTLIST. Using conventional primitive
types (triangles and quads) produces an ugly line between two diagonally
opposite corners. I guess a rectangular point sprite would work too.
This partially reverts d638da23d2.
With gallium the meta code is not always built so the call to
_meta_in_progress() was unresolved. Simply special-case the
GL_MULTISAMPLE case in the meta code. There might be other special
cases in the future given all the differences between legacy GL,
core GL, GLES, etc.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=54234
and https://bugs.freedesktop.org/show_bug.cgi?id=54239
v2 (Paul Berry <stereotype441@gmail.com>): keep _meta_in_progress
function, since it's needed by the i965 driver, but don't call it from
core mesa.
Signed-off-by: Brian Paul <brianp@vmware.com>
Prior to commit 2f1869822, emit_fb_writes() looped from 0 to 3, writing
all four components of a vec4 color output. However, that broke for
smaller output types (float, vec2, or vec3). To fix that, I introduced
a new variable (output_components[]) containing the size of the output
type for each render target.
Unfortunately, I forgot to actually initialize it in the constructor,
which meant that unless a shader wrote to gl_FragColor, or the specific
output for each render target, output_components would contain a garbage
value, and we'd loop for a completely non-deterministic amount of time.
Not actually emitting any color writes seems like the right approach.
We may still need to emit a render target write (to terminate the
thread), but don't have to put in any sensible values (the shader didn't
write anything, after all).
Fixes a regression since 2f18698220.
NOTE: This is a candidate for stable release branches.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=54193
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Tested-by: Ian Romanick <idr@freedesktop.org>
It is possible to force S3TC extensions to be enabled. This is
generally done to support applications that will only supply
pre-compressed textures. This accounts for the vast majority of
applications.
However, there is still the possibility of an application asking for
on-line compression. In that case, generate a warning and substitute a
generic compressed format. The driver will either pick an uncompressed
format or a compressed format that Mesa can handle on-line (e.g., FXT1).
This should only cause problems for applications that request on-line
compression and read the compressed texture back. This is likely an
infinitesimal subset of an already infinitesimal subset.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Fix API_OPENGL_CORE handling when TEXTURE_FLOAT_ENABLED is not
defined. Based on review feedback from Eric Anholt.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This is a purely software extension. The drivers don't need to do any
work to support it.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Page 407 (page 423 of the PDF) of the OpenGL 3.0 spec says (in the list
of deprecated functionality):
"Separate polygon draw mode - PolygonMode face values of FRONT and
BACK; polygons are always drawn in the same mode, no matter which
face is being rasterized."
Also modify meta to not use FRONT or BACK in a core context.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
We were calling through a dispatch table entry that was NULL, since the apple
variant is only on legacy desktop. Just call the function we mean instead of
indirecting through the dispatch.
v2: Use API_OPENGL_CORE.
v3: Only require desktop GL. If a driver can't support TexBOs in a non-core
context, it should not enable them.
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Fix completely broken condition around ClearColorIiEXT and
ClearColorIuiEXT.
v3: Add special VertexAttrib handling for ES2.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
The comment in the code even says this is the right thing to do.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
All drivers in Mesa do. This allows a lot of extension checking code to be
gutted from the function.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Fixes a bug that glGetMaterial[fx]v in ES1 contexts would (try to) allow
queries of GL_AMBIENT_AND_DIFFUSE. This enum can only be used in glMaterial,
not in the get.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Also handle glDisable, glIsEnabled, glEnableClientState, and
glDisableClientState.
v2: Add proper core-profile and GLES3 filtering.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
v2: Add proper core-profile and GLES3 filtering.
v3: Allow glGetVertexAttribfv(0, GL_CURRENT_VERTEX_ATTRIB_ARB, param) in
OpenGL 3.1, just like OpenGL ES 2.0.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
v2: Add proper core-profile filtering.
v3: Allow GL_SRC_ALPHA_SATURATE as a destination factor in GLES3. Based
on review feedback from Eric Anholt.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
v2: Add proper core-profile and GLES3 filtering.
v3: Allow GL_RGB10_A2UI in GLES3 based on review feedback from Eric
Anholt.
v4: Arg. Reject unsized RED and RG enums on GLES. More feedback from
Eric.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
v2: Add proper core-profile, GLES1, and GLES3 filtering.
v3: Fix the GL_FRAMEBUFFER_ATTACHMENT_OBJECT_NAME query when the
attachment type is GL_NONE on GLES3. Other cleanups. Based on review
feedback from Eric Anholt.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
v2: Add proper core-profile and GLES3 filtering.
v3: Fix a typo in GL_TEXTURE_2D_ARRAY checking.
v4: Change !_mesa_is_desktop_gl tests to _mesa_is_gles test. The test
around GL_TEXTURE_2D_ARRAY got some other changes because that enum is
also available with GLES3 (which uses API_OPENGLES2). Based on review
feedback from Eric Anholt.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
v2: Add proper core-profile and GLES3 filtering.
v3: Change !_mesa_is_desktop_gl tests to _mesa_is_gles test. The test
around GL_TEXTURE_2D_ARRAY got some other changes because that enum is
also available with GLES3 (which uses API_OPENGLES2). Based on review
feedback from Eric Anholt.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
The Common Subexpression Elimination pass will not operate on
instructions with physical register defs, so we end up with
several redundant copies to M0 when using interpolation.
Adding a register class that only contains the M0 register allows
use to use a virtual register to represent M0, and makes it possible
for the Common Subexpression Elimination pass to remove the extra
copies.
This reduces the overhead of using the fixed function internally
in the driver.
V2: Use setup_glsl_generate_mipmap() and setup_ff_generate_mipmap()
functions to avoid code duplication.
Use glsl version when ARB_{vertex, fragmet}_shader are present.
Remove redundant code.
V3: Remove redundant border related code leaving the assertion.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <idr@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
the progs/util directory is now in mesa demos
replace glean with piglit
add ApiTrace
markup: replace the unordered list <ul> with a definition list <dl>
Signed-off-by: Brian Paul <brianp@vmware.com>
I've reviewed the code, and the swrast callsites remaining are all in
drawpixels/copypixels/bitmap/accum, or _swrast_BlitFramebuffer that shouldn't
be hit. A piglit run with the context setup disabled on legacy GL and GLES2
showed regressions only in the copypixels and drawpixels tests.
If the context type is forced, this reduces the shader_runner maximum heap
size for glsl-algebraic-add-add-1.shader_test from 15,137,496b to 4,165,376b.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
The Fallback field of the context struct doesn't work that way on i965, and
it's the only caller of FALLBACK() in the driver.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This code has been in the driver since the first commit. I think it was
trying to stop rendering from happening with a disabled position array. Core
mesa has since had changes to deal with disabled position arrays correctly.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
But cap the size in bytes, to avoid depleting the whole system memory,
with humongus textures.
Tested with max-texture-size piglit test.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
We want to check whether there are bits set outside of the valid flags.
Fixes piglit test egl-create-context-invalid-flag-gl
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Now that it's on by default, we may as well make it obey the flag,
for consistency's sake if nothing else.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Precompiling the shader at link time often allows us to avoid compiling
it at the first use. This moves the expensive compilation and
optimization process to game or level load time, rather than at draw
time, where we really can't avoid any cycles and don't want to risk
stalling the GPU.
The downside is that we have to guess the non-orthagonal state the
program will have set when it draws with the shader. Previously, we
guessed wrong for nearly every shader, so it wasn't useful. With the
recent SamplerUnits rework and this series, we've either eliminated
state or made smarter guesses, and usually get it right now.
In the L4D2 time demo, I now have 39 fragment shader recompiles and no
vertex shader recompiles. Before this series and the SamplerUnits
rework, I had 206 fragment shader recompiles and 192 vertex shader
recompiles.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This fixes a regression since 76d1301e8e:
I began setting SWIZZLE_XYZW for unused sampler units in the actual
program keys, since this matched the FS precompile behavior. However,
the VS precompile was expecting zero, so that commit made essentially
every vertex shader (even those not using texturing) mismatch and need
to be recompiled.
Setting them in the VS precompile key solves the issue. It also is an
improvement over our old behavior: previously we guessed that vertex
shaders didn't use any textures at all. Now we actually look to see if
the VS had any sampler uniforms and guess based on that.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Eric added support for WM key debugging. This adds it for the VS.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Our previous assumption, SWIZZLE_XYZW, was completely bogus for depth
textures. There are no Y, Z, or W components.
DEPTH_TEXTURE_MODE has three options:
- GL_LUMINANCE: <X, X, X, 1>
- GL_INTENSITY: <X, X, X, X>
- GL_ALPHA: <0, 0, 0, X>
The default value is GL_LUMINANCE, and most applications don't seem to
alter DEPTH_TEXTURE_MODE. Make that our precompile guess.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Now that most things are based on the linker-assigned index, it makes
sense to convert the arrays in the VS/WM program key as well. It seems
silly to leave them indexed by texture unit.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
brw_wm_prog_key's proj_attrib_mask field is designed to enable an
optimization for fixed-function programs, letting us avoid projecting
attributes where the divisor is 1.0.
However, for shaders, this is not useful, and is pretty much impossible
to guess when building the FS precompile key. Turning it off for
shaders should allow the precompile to work and not lose much.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Suggested-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
We probably want to do something more sophisticated here, but this at
least makes it through L4D2 without dumping the program cache.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Do all pre-draw hiz resolves *after* the renderbuffers are resized by
intel_prepare_render. Otherwise, we may resolve buffers that are
immediately discarded afterwards.
Fixes the assertion failure below when resizing windows in KDE and under
some unknown circumstance in Chrome OS:
intel_resolve_map.c:46: intel_resolve_map_set: Assertion
`(*tail)->need == need' failed.
Also, remove the comment that "resolves must occur [...] before setting up
any hardware state". That was true when resolves were implemented with
meta-ops, but no longer with blorp.
v2:
- Keep brw_predraw_resolve_buffers in its current position, which is
before any brw_context bits are modified. Instead, move the call to
intel_prepare_render.
Note: This is a candiate for the 8.0 branch.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=52252
Reported-by: Lu Hua <huax.lu@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
intel_renderbuffer_resolve_hiz checks if rb->mt is null, so there is no
need for the caller to do so.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
This adds the FMASK and CMASK buffers. They share the same resource
with color data.
COMPRESSION and FAST_CLEAR are always enabled if both FMASK and CMASK are
allocated. We initialize the CMASK to a "compressed" state (not "fast cleared"),
so that we can keep FAST_CLEAR enabled all the time.
Both FMASK and CMASK must be present at the moment. If either one is missing,
the other one is not used.
v2: add cayman regs in the list
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
The original samples positions took samples outside of the pixel boundary,
leading to dark pixels on the edge of the colorbuffer, among other things.
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Drivers need to be able to communicate their actual number of bits populated
in the field in order for applications to be able to properly handle rollover.
There's a small behavior change here: Instead of reporting the
GL_SAMPLES_PASSED bits for GL_ANY_SAMPLES_PASSED (which would also be valid),
just return 1, because more bits don't make any sense.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
When faced with this sequence:
MOV R1, c[1];
MAD R0, R2, R1.x, R1.y;
we were concluding that the MOV of R1 set up our accumulator and so we could
just use the previous result. Only, it's got R1.xyzw in it instead of the
r1.y we're looking for.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=46784
NOTE: This is a candidate for the 8.0 branch.
Support version 3 as well as 2, since that is only the new format query,
which Jesse added support for to st/dri when he added it to dri_inteface.h.
Tested-by: Scott Moreau <oreaus@gmail.com>
Signed-off-by: Jakob Bornecrantz <jakob@vmware.com>
Since its not used by anything anymore and no release has gone out
where it was being used.
Tested-by: Scott Moreau <oreaus@gmail.com>
Signed-off-by: Jakob Bornecrantz <jakob@vmware.com>
Uses libkms instead of dri image cursor. Since this is the only user of the
DRI cursor and write interface we can remove cursor surfaces entirely from
the DRI interface and as a consequence also from the Gallium interface as
well. Tho to make everybody happy with this it would probably should add a
kms_bo_write function, but that is probably wise in anyways.
The only downside is that it adds a dependancy on libkms, this could how ever
be replaced with the dumb_bo drm ioctl interface.
Tested-by: Scott Moreau <oreaus@gmail.com>
Signed-off-by: Jakob Bornecrantz <jakob@vmware.com>
We already changed the actual program key builder to only set these bits
on gen < 6; this patch just brings the precompile state back in line so
it doesn't mismatch every time.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
When dumping differences in program keys, it printed messages of the
format:
[Name of thing that changed] [new]->[old]
This was terribly confusing: the right arrow implies "the value changed
from this to that", when in fact the message conveyed the opposite.
Except that some of the time, it didn't, since we accidentally swapped
the arguments to brw_debug_recompile_sampler_key. With two swaps, it
would often come out in the expected format.
This patch fixes it to properly print:
[Name of thing that changed] [old]->[new]
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Gallium drivers and i965 don't require special notification when
sampler uniforms change. They simply see the _NEW_TEXTURE and adjust
their indirection tables. These drivers don't want ProgramStringNotify:
it simply causes pointless recompiles.
Unfortunately, i915 still requires shader recompiles and needs
ProgramStringNotify. Rather than trying to fix that, simply change the
hook to a new, more specific one: ShaderUniformChange. On i915, this
translates to ProgramStringNotify; others simply ignore it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
When assigning uniform locations, the linker assigns each sampler
uniform a sequential numerical ID. gl_shader_program::SamplerUnits maps
these sampler variable IDs to the actual texture units they reference
(specified via glUniform1i).
Previously, we encoded this mapping in the SEND instruction encoding:
the "sampler" was the texture unit number, and the binding table index
was SURF_INDEX_TEXTURE(the texture unit number). This unfortunately
meant that whenever the application changed the value of a sampler
uniform, we had to recompile the shader to change the SEND instructions.
This was horrible for the game Cogs, which repeatedly switches between
using texture unit 0 and 1. It also made fragment shader precompiles
useless: we'd do the precompile at glLinkShader() time, before the
application called glUniform1i to set the sampler values. As soon as
it did that, we'd have to recompile, wasting time and space in the
program cache.
This patch encodes the SamplerUnits indirection in the binding table,
sampler state, and sampler default color tables. Instead of baking the
texture unit number into the shader, we bake in the sampler variable ID
assigned by the linker. Since those never change, we don't need to
recompile programs on uniform changes.
This does mean that the tables now depend on the linked shader program
being used for rendering, rather than simply representing all available
texture units. This could cause an increase in state emission.
Another plus is that the sampler state and sampler default color tables
are now compact: we only emit as many entries as there are sampler
uniforms, with no holes in the table since the new sampler IDs are
sequential. Previously we had to emit a full 16 entries every time,
since the tables tracked the state of all active texture units.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This represents the index into the sampler state table or sampler
default color table (the two are identical).
Right now, this is still the texture unit, but that will change shortly.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Currently, we mirror the VS and WM binding tables' texture entries.
That may not continue to be true, so in preparation, pass in the binding
table and surface index as arguments.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
The number we're passing around is actually the ID of the texture unit,
as opposed to the numerical value our of sampler uniforms. Calling it
"texunit" clarifies this slightly.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
The number we're passing around is actually the ID of the texture unit,
as opposed to the numerical value our of sampler uniforms. Calling it
"texunit" clarifies this slightly.
Don't bother renaming fs_instruction::sampler. Although it's currently
the texture unit, this series will change that. No need for the churn.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, we left the swizzle key field as zero for unused texture
units. The precompile sets all of them to SWIZZLE_NOOP, which meant
that we mismatched almost every time.
Since either works equally well, change it to SWIZZLE_NOOP to match
the precompiles.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
I can't actually understand what these mean, and they seem to
essentially say "we should simplify things", which is a nice goal but
not very specific.
Presumably things got cleaned up at some point.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Fixes brw_shader.cpp:101:9: warning: converting to non-pointer type
'GLboolean {aka unsigned char}' from NULL [-Wconversion-null]
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-with-great-enthusiasm-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by Eric Anholt <eric@anholt.net>
v2: Add proper core-profile and GLES3 filtering.
v3: *Really* add proper core-profile and GLES3 filtering based on review
feedback from Eric Anholt. It looks like previously there was some
rebase / merge fail.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
v2: Add proper core-profile and GLES3 filtering based on review feedback
from Eric Anholt. It looks like previously there was some rebase /
merge fail.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
v2: Fix handling of GL_INT and GL_UNSIGNED_INT types pre-ES3.0, and fix
handling of GL_INT_2_10_10_10_REV and GL_UNSIGNED_INT_2_10_10_10_REV in
ES3.0. Based on review comments by Ken Graunke.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This consolidates the tests and makes the emitted error message
consistent.
v2: Rename _mesa_valid_element_type to valid_elements_type. Log the
enum string instead of the hex value in error messages. Based on review
comments from Brian Paul and Ken Graunke.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
_mesa_generic_compressed_format_to_uncompressed_format() probably wins the
prize for longest function name in Mesa.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
See comments in the code for details.
Note: we only need to special-case the generic compressed formats since
specific texture formats are error-checked earlier to see if the compression
format is compatible with the texture type.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This will let us choose the actual hardware format depending on the
type of texture.
v2: fixup radeon, nouveau, intel and swrast drivers too
Reviewed-by: Eric Anholt <eric@anholt.net>
'target' was used both as a parameter of type st_texture_type and then
re-used for GL_TEXTURE_x targets. Rename the function parameter and
add a new local 'GLenum target'.
And remove an extraneous break statement.
Patches changes mesa to use 'HAVE_DLOPEN' defined by configure and Android.mk
instead of _GNU_SOURCE for detecting dlopen capability. This makes dlopen to
work also on Android where _GNU_SOURCE is not defined.
[mattst88] v2: HAVE_DLOPEN is sufficient for including dlfcn.h, remove
mingw/blrts checks around dlfcn.h inclusion.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Previously, when performing a fast depth clear, we would also clear
the miptree's resolve map. This destroyed important information,
since the resolve map contains information about needed resolves for
all levels and layers of the miptree, whereas a depth clear only
applies to a single level/layer combination at a time. As a result,
resolves would sometimes fail to occur, leading to incorrect
rendering.
Fixes rendering artifacts with shadow maps in Unigine Heaven and
Unigine Sanctuary.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=50270
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
There are three possible resolve map states for each (level, layer) of
a depth miptree: "needs HiZ resolve", "needs depth resolve", and
"needs neither". When HiZ was first implemented on i965, any attempt
to directly transition between "needs HiZ resolve" and "needs depth
resolve" without passing through the "needs neither" state would have
been a bug indicating that a necessary resolve hadn't been performed.
Accordingly, intel_resolve_map_set() contained an assertion to verify
that no such direct transition happened.
However, now that we support fast depth clears, there is a valid
transition from the "needs HiZ resolve" to the "needs depth resolve"
state. When doing a fast depth clear, the old state of the buffer is
irrelevant, since we are completely replacing it with the clear value,
so it is not necessary to do any resolves before clearing--we can
transition, if necessary, directly from the "needs HiZ resolve" state
to the "needs depth resolve" state.
To avoid spurious assertions in this valid case, this patch just
removes the assertion.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Just use the functionality provided by the surface manager instead.
This fixes just another bunch of piglit tests.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Previously you could always glGetProgramiv one of the transform feedback
or geometry shader enums even if the extension wasn't supported.
In addtion, this reverts part of bda6ad27. I think the hunks involving
GL_PROGRAM_BINARY_LENGTH_OES were spurious. Mesa has no support for any
other part of GL_OES_get_program_binary.
v2: Remove redundant return in get_programiv based on review feedback
from Matt Turner.
v3: Correctly handle UBO related enums.
v4: Emit the bad enum in the _mesa_error call based on review feedback
from Brian Paul.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fix API functions for memory objects to accept CL_MEM_READ_WRITE flag.
Signed-off-by: Blaž Tomažič <blaz.tomazic@gmail.com>
[ Francisco Jerez: Drop incorrect change in clCreateSubBuffer. ]
Fix-up the texel fetch functions so that they handle 3D coords (as used for
array textures) and remove the "f_2d" part from their names.
Helps fix swrast crashes in piglit's copyteximage test. More to come.
There was a lot of similar or duplicated code before.
To minimize this patch's size, use a forward declaration for
compressed_texture_error_check(). Move the function in the next patch.
If a proxy texture call generates a regular GL error, we should not
clear the proxy image's width/height/depth/format fields. Use a new
PROXY_ERROR token to distinguish proxy errors from regular GL errors.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
When calling glTexImage() with a proxy target most error conditions should
generate a GL error. We were erroneously doing the proxy-error behaviour
(where we zeroed-out the image's width/height/depth/format fields) in too
many places.
There's another issue with proxy textures, but that'll be fixed in the
next patch.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
draw->samplers(_views) now has PIPE_SHADER_TYPES elements, instead of
PIPE_MAX_SAMPLERS as before.
Also, shader_stage must be less than PIPE_SHADER_TYPES to prevent buffer
overflow.
Trivial.
Render Target Write message should include source zero alpha value when
sample-alpha-to-coverage is enabled for an FBO with multiple render targets.
Source zero alpha value is used as fragment coverage for all the render
targets.
This patch makes piglit tests draw-buffers-alpha-to-coverage and
alpha-to-coverage-no-draw-buffer-zero to pass on Sandybridge. No
regressions are observed with piglit all.tests.
V2: Revert all the changes made in emit_color_write() function to
include src0 alpha for targets > 0. Now handling this case in a if
block.
V3: Correctly calculate the instruction length for buffer zero.
Properly handle the case of dual_src_blend when alpha-to-coverage
is enabled.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
When too may uniforms are used, the error will be caught in
check_resources (src/glsl/linker.cpp).
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Stéphane Marchesin <marcheu@chromium.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Benoit Jacob <bjacob@mozilla.com>
Also validate glCopyTexImage border. This fixes a bug in the APIspec.
Previously glTexImage3DOES could be passed a non-zero border without error.
NOTE: This is a candidate for stable release branches.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This also adds a missing extension (and API) check around
GL_TEXTURE_CROP_RECT_OES.
v2: Add proper core-profile and GLES3 filtering. GL_TEXTURE_MAX_LEVEL
is (incorrectly) accepted in ES contexts. A future patch will add
GL_APPLE_texture_max_level, and meta really needs this.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This also adds a missing extension (and API) check around
GL_TEXTURE_CROP_RECT_OES.
v2: Add proper core-profile, GLES1, and GLES3 filtering. GL_TEXTURE_MAX_LEVEL
is (incorrectly) accepted in ES contexts. A future patch will add
GL_APPLE_texture_max_level, and meta really needs this.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Fixed the piglit test arb_texture_buffer_object-negative-unsupported.
NOTE: This is a candidate for stable release branches.
v2: Add proper core-profile and GLES3 filtering.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This should take care of all the TexImage, TexSubImage, CopyTexImage,
CompressedTexImage3DOES, and CopyTexSubImage type paths.
v2: Add proper core-profile and GLES3 filtering.
v3: Squash the CompressedTexImage3DOES patch per review comment from
Ken.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This is a bit of a hack. _mesa_meta_GenerateMipmap shouldn't even be
used in contexts where GL_GENERATE_MIPMAP doesn't exist (i.e., core
profile and ES2) because it uses fixed-function, and fixed-function
doesn't exist there either!
A GLSL-based _mesa_meta_GenerateMipmap should be available soon. When
that is available, this patch will be irrelevant and should be reverted.
v2: Change (ctx->API != API_OPENGLES2 && ctx->API != API_OPENGL_CORE) to
(ctx->API == API_OPENGL || ctx->API == API_OPENGLES) based on review
comment from Brian Paul.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Commit 77a3efc6b9 broke android build that
sets its own value for GLSL_SRCDIR before including Makefile.sources.
Patch moves overriding the value after include, this works as GLSL_SRCDIR
variable gets expanded only later.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
The name is taken from the driver_descriptor, so it will be the same as
expected by driconf utility.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
The segmentation fault occurs when DRI2 is not loaded up and
dri2_setup_screen() function deferences dri2_dpy->dri2 (since it's NULL
at this point).
This patch fixes the segmentation fault by checking if dri2 pointer is
not NULL before deferencing it.
Signed-off-by: Paulo Alcantara <pcacjr@profusion.mobi>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
This new operand replaces the MachineOperand flags in LLVM, which
will be deprecated soon. Eventually all instructions should have a flag
operand, but for now this operand has only been added to instructions
that need it.
SRC_DIRS was overwritten (visible in the second hunk).
Also don't require mapi/shared-glapi to be built for GLES.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We need to enable at least one interpolation mode,
otherwise the GPU will hang.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Disable blending when dual_src_blend is enabled and number of color exports
in the current fragment shader is less than 2.
Fixes lockups with ext_framebuffer_multisample-
alpha-to-coverage-dual-src-blend piglit test.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
The generic texture formats should be accepted by the <internalformat>
parameter of TexImage1D, TexImage2D, TexImage3D, CopyTexImage1D, and
CopyTexImage2D functions. When the application specifies a generic
format, the driver is free to pick an uncompressed format.
This patch reverts the changes due to following commit:
commit a36581ccc0
mesa: do more teximage error checking for generic compressed formats
This patch fixes compressed texture format failures in intel oglconform
pxconv-gettex test case:
https://bugs.freedesktop.org/show_bug.cgi?id=47220
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Don't dereference NULL pointers, and if all views are NULL, don't generate an
invalid PM4 packet which locks up the GPU.
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Mesa doesn't check the parameter passed to glMultiTexCoord*. It does,
however, mask the texture value to prevent out-of-bounds writes. This
patch will promote this non-conformant behavior to OpenGL ES 1. I don't
think anyone will care, and the gets some silly code out of a hot path.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
This is required to make some of llvm's api calls
thread save. In particular the PassRegistry, which is
implicitly accessed while compiling shader programs.
The PassRegistry uses a mutex that is only active if
the llvm_is_multithreaded() returns true.
Calling llvm_start_multithreading() makes this happen
and by calling this function we try to make sure that
we can savely compile shaders in paralell.
Since there is also a call llvm_stop_multithreading()
in the llvm api, we cannot guarantee that this does
not get switched off while we are relying on this being
set, but for the easier use cases this fixes a race with
the radeon llvm compiler we have as of today.
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
Signed-off-by: Tom Stellard <thomas.stellard@amd.com>
In the past, when we called pipe::set_sampler_views(n) the drivers set
samplers [n..MAX] to NULL. We no longer do that. The state tracker
code was already trying to set unused sampler views to NULL to cover
that case, but the logic was broken and unnoticed until now. This patch
fixes it.
Strictly speaking, this patch shouldn't be necessary. Drivers should simply
ignore unused samplers and sampler views. But some drivers like llvmpipe (and
others?) count those things and they figure into state validation. That could
be fixed in the future.
Fixes http://bugs.freedesktop.org/show_bug.cgi?id=53617
Reviewed-by: Marek Olšák <maraeo@gmail.com>
GL_INVALID_OPERATION is to be raised when querying a non-compressed
image/buffer. Since a buffer object can't have a compressed format this
query always generates an error.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These are gradually going to get whittled away and eventually folded into the
source files with the native type functions.
v2: Add (speculative) SConscript changes. These may be broken.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
In the old backend, we looked at any FS attribute's proj_attrib_mask bits, not
just texcoords. Now that we have _mesa_vert_result_to_frag_attrib(), we can
fill in the other FS inputs with correct proj_attrib_mask info.
NOTE: This is a candidate for stable branches.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=46644
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The OpenGL 3.1 specification explicitly allows this. Oddly, the
ARB_texture_buffer_object spec's issues section claims this isn't
allowed, but proceeds to explain that the extension simply doesn't edit
the underlying spec to allow it, and thus it didn't appear in the list
of legal texture targets.
Thus, this patch legalizes it only in 3.1+ contexts, but still returns
INVALID_ENUM in earlier contexts that expose ARB_texture_buffer_object.
Unfortunately, the behavior of the call is horrendously undefined.
Fixes oglconform's tbo/negative.textureParams test.
v2: Require desktop OpenGL.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Move the _mesa_GetTexLevelParameter[iv] functions below the helper
function so the prototype is available.
This will be useful in the next commit.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
For cube maps, _mesa_generate_mipmap() calls this with
GL_TEXTURE_CUBE_MAP (the gl_texture_object's Target) rather than one
of the faces. This caused _mesa_max_texture_levels() to return 0, which
resulted in maxLevels == -1 and the next line's assertion to fail.
This function is called from seven places:
- fbobject.c: framebuffer_texture()
- mipmap.c: _mesa_generate_mipmap()
- texgetimage.c:
- getteximage_error_check()
- getcompressedteximage_error_check()
- texparam.c: _mesa_GetTexLevelParameteriv()
- texstorage.c: tex_storage_error_check()
All of these (or their callers) now explicitly check for invalid targets
already, so this shouldn't cause invalid targets to slip through.
(Technically _mesa_generate_mipmap() doesn't check for invalid targets,
but the API-facing _mesa_GenerateMipmapEXT() function does.)
+2 oglconforms (float-texture/mipmap.automatic and mipmap.manual)
In addition to fixing the mipmap bug, it should also cause glTexStorage
to accept GL_TEXTURE_CUBE_MAP, which is explicitly allowed by the spec.
v2: Drop alterations to callers; this is now in a patch series that adds
explicit checking to API functions.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously, it relied on _mesa_max_texture_levels() for texture target
error checking. This was somewhat dodgy, as _mesa_max_texture_levels()
is called in seven diferent places, not all of which necessarily accept
the same list of targets.
I copied the list of legal targets from _mesa_max_texture_levels(), so
this patch should not introduce any change in behavior. Future patches
will cause the two to diverge.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously, they relied on _mesa_max_texture_levels() for texture target
error checking. This was somewhat dodgy, as _mesa_max_texture_levels()
is called in seven diferent places, not all of which necessarily accept
the same list of targets.
I copied the list of legal targets from _mesa_max_texture_levels() but
removed the proxy targets, as both functions explicitly rejected those
targets. This changes the order in which we check errors, which could
change whether we return INVALID_VALUE or INVALID_ENUM. However, it
shouldn't change the list of accepted targets.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
It's possible for us to have an unused sampler bound when the fragment
shader itself doesn't use any samplers. So the assertion isn't valid.
Fixes http://bugs.freedesktop.org/show_bug.cgi?id=53616
We aligned the dimensions to the blocksize, then divided by it
(in r600_blit.c), then minified, which was wrong.
The minification must be done first, not last.
This fixes piglit/fbo-generatemipmap-formats with S3TC and maybe
a bunch of other tests too. Tested on RV730.
This seems to be expected by the WebGL texture-mips test. The error makes
sense, but I haven't found (yet) any OpenGL documentation specifying this
error condition.
See http://bugs.freedesktop.org/show_bug.cgi?id=44912
Note: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
As with other recent changes, put the vertex and fragment sampler state
into arrays indexed by the shader type. This will let us easily add
support for other types of shaders in the future.
PIPE_MAX_SAMPLERS, PIPE_MAX_VERTEX_SAMPLERS and PIPE_MAX_GEOMETRY_SAMPLERS
were all defined to the same value (16).
In various places we're creating arrays such as
sampler_views[PIPE_SHADER_TYPES][PIPE_MAX_SAMPLERS] so we were assuming
the same number of max samplers for all shader stages anyway.
Of course, drivers are still free to advertise different numbers of max
samplers for different shaders.
The previous test for result != NULL was kind of bogus since we dereferenced
the pointer earlier in the code. Now, check for result != NULL first, then
get the result->key info.
Also, remove the useless "offset +=" code at the end.
We'd end up re-using the old one and throwing away the new one anyway, but only
after a roundtrip to the kernel.
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
If a hole exactly matches the allocated size plus alignment, we would fail to
preserve the alignment as a hole. This would result in never being able to use
the alignment area for an allocation again.
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Otherwise we'll likely end up with an ever increasing amount of ever smaller
holes.
Requires keeping the list ordered wrt offsets.
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Otherwise we'd wrap around after 32 bits. The kernel currently limits GPU
virtual address space to 4GB anyway, but that will probably change sooner or
later, and this would result in confusing error messages when running out of
virtual address space even now.
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
This adds support for having libGL pick a different driver for prime support.
DRI_PRIME env var is set to the value retrieved from the server randr
provider calls, by the calling process. (generally DRI_PRIME=1 will be
the right answer).
Signed-off-by: Dave Airlie <airlied@redhat.com>
With this we can embed data for the shaders (like resource
descriptors) into the PM4 stream.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
I was seeing some GPU hangs that seemed to be cause by ALU instructions
writing to the same register used as the source for VTX_READ. Adding
this constraint to the VTX_READ instructions avoids this situation.
The only allowed instructions are TXQ_LZ and TXF.
TXQ_LZ is like TXQ, but without the LOD parameter (which is always zero
with MSAA textures)
The 3rd or the 4th texcoord component in TXF should contain the sample index
for a 2D_MSAA or 2D_ARRAY_MSAA texture, respectively.
The problem was that the string matching succeeded e.g. for "2D" when there
was actually "2D_MSAA" and then failed parsing "_MSAA".
To prevent similar failures in the future, let's fix this kind of error
everywhere.
Rename _mesa_pack_rgba_span_int to _mesa_pack_rgba_span_from_uints.
Add _mesa_pack_rgba_span_from_ints.
These separate routines allow the integer clamping to be handled
properly for signed versus unsigned integers.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
We need to downsample before flushing BUFFER_FAKE_FRONT_LEFT to
BUFFER_FRONT_LEFT in intel_flush_front.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Stop repeating ourselves. Replace the 4 instances of
`driContext->driDrawablePriv` with `driDrawable`.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Move it from intel_screen.c to intel_context.c. Redeclare as non-static.
A future commit will use it in multiple files.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Unlike 1.x to 2.0, OpenGL ES 3.0 is backwards compatible with 2.0. Use the
same API flag for both. Applications that specifically want 3.0 will specify
this using the major / minor version attributes.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Just like in GLX, EGL_KHR_create_context requires DRI2 version >= 3, and
EGL_EXT_create_context_robustness requires both DRI2 version >= 3 and the
__DRI2_ROBUSTNESS extension.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The extra block in dri2_create_context is to prevent extra white space noise
in the next patch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Add GL_ARB_invalidate_subdata to release notes at Brian's
suggestion.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
These are part of GL_ARB_invalidate_subdata (but not OpenGL ES 3.0).
v2: Add comment explaining why minimum dimensions are set to 1 for some
texture targets. Add default case to switch statement to silence
compiler warnings and detect new texture targets. Both changes
suggested by Brian. Also use _mesa_is_desktop_gl as suggested by Ken.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These are part of GL_ARB_invalidate_subdata (but not OpenGL ES 3.0).
v2: Use _mesa_bufferobj_mapped instead of testing
gl_buffer_object::Pointer as suggested by Brian. Also use
_mesa_is_desktop_gl as suggested by Ken.
v3: Add a comment by the map subrange / discard range overlap test and
fix an off-by-one error noticed by Ken.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
With this change _mesa_init_bufferobj_dispatch won't set function
pointers that don't exist in OpenGL ES.
v2: Use _mesa_is_desktop_gl and _mesa_is_gles3 as suggested by Ken.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These are part of GL_ARB_invalidate_subdata and OpenGL ES 3.0.
v2: Reject aux buffers in core context, and use _mesa_is_desktop_gl and
_mesa_is_gles3. Both suggested by Ken.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is basically cut-and-paste from the swrast implementation, and it
could probably be (slightly) more optimal.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
No driver supports this extension, and it seems unlikely than any driver
ever will. I think r300c may have supported it at one time, but that
driver has already been removed.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
The final step of _mesa_unpack_depth_span is to take the temporary
GLfloat depth values and convert them to the desired format. When
converting to GL_UNSIGNED_INTEGER with depthMax > 0xffffff, we use
double-precision math to avoid overflow and precision problems.
Or at least that's the idea. Unfortunately
GLdouble z = depthValues[i] * (GLfloat) depthMax;
actually causes single-precision multiplication, since both operands are
GLfloats. Casting depthMax to GLdouble causes the scaling to be done
with double-precision math.
Fixes a regression in oglconform's depth-stencil basic.read.ds test
since c60ac7b179, where the expected and
actual values differed slightly. For example, 0xcfa7a6 vs. 0xcfa7a4.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=49772
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Use base-10 for versions like gl_context::Version. Suggested by Ken.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Use base-10 for versions like gl_context::Version. Suggested by Ken.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This forces the drivers to do at least some validation of context API
and version before creating the context. In r100 and r200 drivers, this
means that they don't do any post-hoc validation.
v2: Actually reject compatibility profile 3.2+ contexts. Thanks Ken.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It may be possible to trim the list of extensions futher. These are
just the obvious extensions that add functionality that the core context
explicitly forbids. Apple's core-context extension list is *just* the
extensions on top of the core GL version. I'm not sure we want to go
that far, but removing some things that have been in core since 2.1 may
be okay.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Add both top_srcdir and top_builddir to mesa asm include dirs.
These require both in-tree and build-time-generated files.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Christopher James Halse Rogers <christopher.halse.rogers@canonical.com>
Like in src/mesa, use GLSL_BUILDDIR/GLSL_SRCDIR to unambiguously
distinguish between in-tree and generated files.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Christopher James Halse Rogers <christopher.halse.rogers@canonical.com>
Also fix include paths for the generated headers.
v2: Switch to using self-explanatory BUILDDIR/SRCDIR defined from
top_builddir/top_srcdir rather than the ambiguous TOP.
v3: Add both top_builddir and top_srcdir to include flags for mesa asm.
These rely on both in-tree and build-time-generated includes.
v4: Rebased on top of 948c8f502a.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Christopher James Halse Rogers <christopher.halse.rogers@canonical.com>
Signed-off-by: Matt Turner <mattst88@gmail.com>
After realizing that brw_finish_batch emitted some final PIPE_CONTROLs
to record occlusion queries, Chris noted that we probably hadn't
reserved enough space to actually emit them.
Reserving a full 60 bytes seems a bit harsh, since we only need that
much if occlusion queries are actually active. Plus, 28 bytes would be
sufficient for Gen7, and 24 for Gen4-5.
We could optimize this in the future, but it doesn't seem too critical.
NOTE: This is a candidate for stable release branches.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=53311
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Eric Anholt <eric@anholt.net>
On Gen4+, brw_finish_batch() calls brw_emit_query_end(), which emits
some extra PIPE_CONTROLs to capture the current occlusion query data.
Unfortunately, it was being called *after* _intel_batchbuffer_flush
added the MI_BATCH_BUFFER_END, meaning those PIPE_CONTROLs didn't get
inside the batch.
Not only does this likely cause bogus occlusion query values, it can
also cause crashes: with the recent change to use 64-bit depth count
writes on Gen6+, we started emitting an odd-length PIPE_CONTROL, which
happened after the MI_NOOP padding. This resulted in an odd-length
batch buffer, which resulted in execbuf2 returning -EINVAL and the
application dying with an intel_do_flush_locked failure.
On older generations, finish_batch() doesn't emit any state, so this
change shouldn't have any effect.
Huge thanks to Chris Wilson for helping me figure this out.
NOTE: This is a candidate for stable release branches.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=53311
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Eric Anholt <eric@anholt.net>
I want to introduce some more debug output for performance surprises that
includes fallbacks, but aren't necessarily software rasterization. Leave
INTEL_DEBUG=fall in place for those that have used that flag before.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Avoid INVALID_OPERATION error if decompressing rectangle texture.
Setting mipmap level limits for those textures is error that must not be
hit by meta code to mislead user.
[v3/Kayden]: Resolve conflicts due to Eric picking a subset of Pauli's
original changes.
Signed-off-by: Pauli Nieminen <pauli.nieminen@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Sampler objects are perfect for meta operations.Sampler object
is separate state object that shadows the sampling state in texture
object. With sampler object mipmap can maintain same sampling state for
all subsequent generation requests.
Signed-off-by: Pauli Nieminen <pauli.nieminen@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Sampler queries are so far made only for enabled texture unit. But if
any code would query sampler before checking texture unit state that
would result to NULL deference.
Making the inline helper easier to use with NULL check makes a lot sense
because compiler is likely to combine the checks for the current texture.
Signed-off-by: Pauli Nieminen <pauli.nieminen@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In tune with previous patches. Again there is duplication of information
in function parameters that is good to remove.
Signed-off-by: Pauli Nieminen <pauli.nieminen@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Size and format information is always stored in gl_texture_image
structure. That makes it preferable to remove duplicate information from
parameters to make interface easier to understand.
Signed-off-by: Pauli Nieminen <pauli.nieminen@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
gl_texture_image structure always holds size and internal format before
TexImage driver hook is called. Those passing same information in
function parameters only duplicates information making the interface
harder to understand.
Signed-off-by: Pauli Nieminen <pauli.nieminen@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Commit 6882381a2e added a dependency on a
newer version of xcb, but the version check wasn't added in all the
necessary places.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This reverts commit 9f5a5d541d.
Fixes the following build error on GCC 4.2.3:
cc1plus: error: unrecognized command line option "-Wno-narrowing"
The GCC Manual incorrectly stated that commit 9f5a5d54 woulde be safe for
old versions of GCC.
Reported-by: Andy Furniss <andyqos@ukfsn.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
The var!=softpipe->fs_variant assertion was failing because we weren't
nulling the softpipe->fs_variant pointer when binding a new shader.
Since softpipe->fs_variant depends on the current fs, it's of no use
when a new FS is bound.
Fixes http://bugs.freedesktop.org/show_bug.cgi?id=53318
Note: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
After we attach a new renderbuffer in this function we need to make
sure Mesa's update_framebuffer() gets called.
Fixes crash in WebGL conformance/textures/texture-attachment-formats.html,
but the test still fails for other reasons.
Fixes http://bugs.freedesktop.org/show_bug.cgi?id=53316
Note: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Add -Wno-narrowing to CXXFLAGS for gcc.
It is safe to add this flag even for versions of gcc that don't recognize
it. From the GCC Manual [1]: "[GCC] allows the use of new -Wno- options
with old compilers".
This removes warnings of the form
warning: narrowing conversion of X from 'int' to 'float' inside { } is
ill-formed in C++11 [-Wnarrowing]
in ff_fragment_shader.cpp and gen6_blorp.cpp of the form. When building
i965, I observed no other difference in the build output.
[1] http://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Fixes WebGL conformance/uniforms/uniform-default-values.html crash.
We need to check for the null view pointer before accessing view->texture.
Fixes http://bugs.freedesktop.org/show_bug.cgi?id=53317
Note: This is a candidate for the 8.0 branch.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Always downsample before mapping, even if the map mode contains
GL_MAP_INVALIDATE_RANGE_BIT. If we neglect to downsample when only
a subrect is mapped then the upsample in intel_miptree_unmap_multisample
may write garbage to the region outside the subrect.
(Eric gave my patch e88cfbb a conditional reviewed-by with the condition
that it always downsample before mapping. I forgot to make that change
before pushing the patch.)
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Fixes the glsl skinning demo regression since changing to the new GLSL
compiler, and is part of fixing piglit gl-2.0-edgeflag.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=50079
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If there was an edge flag or a two-side-color pair present, we'd end up
mismatched and read values from earlier in the VUE for later FS inputs.
v2: Fix regression in gles2conform shaders generating point size. (change by
anholt)
Signed-off-by: Olivier Galibert <galibert@pobox.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
NOTE: This is a candidate for the 8.0 branch.
If the application has requested reset notification, then
dri2_convert_glx_attribs will initialize this to the correct value.
Otherwise, it's supposed to initialize this to NO_NOTIFICATION, but
doesn't when num_attribs == 0. (The consensus seems to be that we
should make it do so, but that's more invasive, so I'm pushing this for
now.)
Fixes a regression since a8724d85f8
where trying to run OilRush_x86 or apitrace heaven_x64 would result in:
dri_util.c:221: dri2CreateContextAttribs: Assertion `!"Should not get
here."' failed.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=53076
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Patch changes i915 and i965 drivers to use fixed function version of
meta clear when running on ES 1.1. This fixes rendering errors seen with
Google Maps, Angry Birds and Gallery3D on Android platform.
Change 88128516d4 exposes all extensions
internally to be available independent of GL flavour, therefore check
against ARB_fragment_shader does not work.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=50333
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This removes the CS stall on Ivybridge.
On Sandybridge, the depth stall needs to be preceded by a non-zero
post-sync op, which requires a CS stall, which needs a stall at
scoreboard. Emit the full workaround.
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
I don't know if it was possible to trigger this bug -- we don't merge
saturates into the math instruction because we're bad at coalescing currently,
and there's nothing generating these with predicates. Still, let's avoid
future bugs when we do smarter codegen.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This was ridiculous. We were ignoring the inst->header.saturate flag in the
case of math and only math. On gen4, we would leave inst->header.saturate in
place if it happened to be set, which would end up being applied to the
implicit mov and thus trash the first argument. On gen6, we would overwrite
inst->header.saturate with the saturate flag from the argument, which was not
set appropriately in brw_vec4_emit.cpp, and was only not a bug due to our
incompetence at coalescing saturate moves.
By ripping the argument out and making saturate work just like all the other
brw_eu_emit.c code generation, we can avoid both these classes of bugs.
Fixes piglit fog-modes, and the new specific fs-saturate-exp2 case.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=48628
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
There was a chance for brw_wm_emit.c to screw up and pass (1 << 4) instead of
1, which would get converted to 0 when stored. Instead, use stdbool which
converts nonzero to true/1 like we want.
Otherwise, conditional rendering always takes the fallthrough "render it
anyway" case unless the application had itself done a check or wait on the
query.
Fixes intel oglconform's conditional_render advanced.nofbo.readpixels.
Reviewed-by: Brian Paul <brianp@vmware.com>
NOTE: This is a candidate for the 8.0 branch.
I happened to notice this while looking at a blit pass in l4d2, which had an
optional push/pop around framebuffer srgb setting. It didn't matter in the
end, but the fix is sitting in my tree now.
Reviewed-by: Brian Paul <brianp@vmware.com>
NOTE: This is a candidate for the 8.0 branch.
You can't practically have desktop OpenGL and OpenGL ES on the same system
without this. The benefits of not having it (e.g., a more compact dispatch
table) are irrelevant.
v2: Don't mark shared-glapi as experimental. Review suggestion by Chad.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
These are largely based on the src/mapi/glapi/tests. However,
shared-glapi provides less external visibility into the dispatch table,
so there is less to test. Also, shared-glapi does not implement
_glapi_get_proc_name, so that test was removed.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
When --enable-shared-glapi is used, all non-ABI entries in the table are
lies. Avoiding the use of glapitable.h avoids the lies. The only
entries used in this code are entries that are ABI. For these, the ABI
offset can be used directly.
Since this code is in src/glx, it can't use src/mesa/main/dispatch.h to
get the pretty names for these offsets.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
When --enable-shared-glapi is used, all non-ABI entries in the table are
lies. There are two completely separate code generation paths used to
assign dispatch offset. Neither has any clue about the other.
Unsurprisingly, the can't agree on what offsets to assign.
This adds a bunch of overhead to __glXNewIndirectAPI, but this function
is called at most once.
The test ExtensionNopDispatch was removed. There was just no way to
make this test work with the information provided in shared-glapi.
Since indirect_glx.c uses _glapi_get_proc_offset now, it was also
impossible to make the tests work without shared-glapi. So much pain.
This fixes indirect rendering with shared-glapi.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
This fixes 'make check' on with --enable-shared-glapi. This test cannot work
in that environment.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
The hardware seems to use the length of the PIPE_CONTROL command to
indicate whether the write is 64-bits or 32-bits. Which makes sense
for immediate writes.
Daniel discovered this by writing a pattern into the query object bo
and noticing that the high 32-bits were left intact, even on those
pipe control writes that seemingly worked.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Eric Anholt <eric@anholt.net>
The hardware seems to use the length of the PIPE_CONTROL command to
indicate whether the write is 64-bits or 32-bits. Which makes sense
for immediate writes.
Daniel discovered this by writing a pattern into the query object bo
and noticing that the high 32-bits were left intact, even on those
pipe control writes that seemingly worked.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Eric Anholt <eric@anholt.net>
This consolidates the complexity in one place, which is important
because it's about to get even more complicated.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Eric Anholt <eric@anholt.net>
PIPE_CONTROL has variable length, depending upon generation and whether
we want to do 32-bit or 64-bit data writes. Make it explicit, rather
than hiding a length of 4 in the #define for _3DSTATE_PIPE_CONTROL.
Generated by s/3DSTATE_PIPE_CONTROL/3DSTATE_PIPE_CONTROL | (4 - 2)/g.
This is equivalent since the #define used to have | 2 in it. A grep
through the sources shows that all instances have been converted, so
it's safe to remove the | 2 from the #define.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Eric Anholt <eric@anholt.net>
Unlike the FS side in the previous commit, this does variable indexing just
fine, using the same code as we used for other variable-indexed pull
constants.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Variable array indexing isn't finished, because the lowering pass
turns it all into conditional moves of constant index accesses so I
can't test it.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I wanted to add the surface index as a variable value for UBO support,
and a reg seemed like the obvious way to go. This exposes more of the
information to CSE, which we'll probably want to apply to pull
constant loads for UBOs eventually (you might access 4 floats in a
row, each of which would produce an oword block read of the same
block).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes piglit GL_ARB_uniform_buffer_object/dlist.
v2: Use the .ui fields instead of .i for type consistency (review by Brian
Paul)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The ARB spec lets you get away with the default block counting against the
blocks for combined size limits. The core spec says you need to be able to
support the maximum size of default block *and* the maximum size of each
uniform block. I see no reason that any driver would have a problem with
that.
Fixes gl 3.1/minmax (with an associated fix to the test)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We were only propagating it to the API when the variable was a matrix type,
but we were still tripping over it in lower_ubo_reference when it was set on a
vector.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We were getting the base offset of a vec2, not of a vec2[2] like the quoted
spec text says we should.
v2: Fix swapped then/else cases.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, we were returning the index into the UniformBlocks of one of the
linked shaders, when it's supposed to be the program global index.
Fixes piglit getactiveuniformsiv-uniform_block_index.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In between glGenBuffers() and glBindBuffer(), the buffer object points to this
dummy buffer with a name of 0, and a glBindBufferBase() would point to that.
It seems pretty clear, given that glBindBufferBase() only cares about the
current size of the buffer at render time, that it should bind up the buffer
that you passed in instead of pointing it at this useless dummy buffer.
However, what should glBindBufferRange() do? As of this patch, it will
promote the genned buffer to a proper buffer like it had been
glBindBuffer()ed, and then detect that the size is greater than the buffer's
current size of 0 and throw INVALID_VALUE. It seems like the most reasonable
answer here.
Note that this also changes the behavior of these two on non-glGenBuffers() bo
names. We haven't yet set up the error throwing for glBindBuffers() on gl
3.1+, and my assumption is that these two functions should inherit their
behavior on un-genned names from glBindBuffers().
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Reduce the impenetrable code in emit_ubo_loads() by 23 lines by keeping
the ir_variable as the variable part of the offset from handle_rvalue(),
and track the constant offsets from that with a plain old integer value,
avoiding a bunch of temporary variables in the array and struct handling.
Also, fix file description doxygen.
v3: Fix a row vs col typo, and fix spelling in a comment.
Reviewed-by: Eric Anholt <eric@anholt.net>
For the UBO lowering pass, I want to see the whole dereference chain for
replacing, not the innermost ir_dereference_variable.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Drivers will probably want to be able to take UBO references in a
shader like:
uniform ubo1 {
float a;
float b;
float c;
float d;
}
void main() {
gl_FragColor = vec4(a, b, c, d);
}
and generate a single aligned vec4 load out of the UBO. For intel,
this involves recognizing the shared offset of the aligned loads and
CSEing them out. Obviously that involves breaking things down to
loads from an offset from a particular UBO first. Thus, the driver
doesn't want to see
variable_ref(ir_variable("a")),
and even more so does it not want to see
array_ref(record_ref(variable_ref(ir_variable("a")),
"field1"), variable_ref(ir_variable("i"))).
where a.field1[i] is a row_major matrix.
Instead, we're going to make a lowering pass to break UBO references
down to expressions that are obvious to codegen, and amenable to
merging through CSE.
v2: Fix some partial thoughts in the ir_binop comment (review by Kenneth)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When converting var->location from pointing at the program's UniformBlocks to
pointing at the linked shader's UniformBlocks, I missed this change. It
usually worked out in the end because the two lists happen to be the same in
many testcases.
Fixes a valgrind complaint on
oglconform ubo-compile.cpp advanced.std140.2stage
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
As we get into supporting GL 3.x core, we come across more and more features
of the API that depend on the version number as opposed to just the extension
list. This will let us more sanely do version checks than "(VersionMajor == 3
&& VersionMinor >= 2) || VersionMajor >= 4".
v2: Fix a bad <= 30 check.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This turns on window system MSAA.
This patch changes the id of many GLX visuals and configs, but that
couldn't be prevented. I attempted to preserve the id's of extant configs
by appending the multisample configs to the end of the extant ones. But
somewhere, perhaps in the X server, the configs are reordered with
multisample configs interspersed among the singlesample ones.
Test results:
Tested with xonotic and `glxgears -samples 1` on Ivybridge.
No piglit regressions on Ivybridge.
On Sandybridge, passes 68/70 of oglconform's
winsys multisample tests. The two failing tests are:
multisample(advanced.pixelmap.depth)
multisample(advanced.pixelmap.depthCopyPixels)
These tests hang the gpu (on kernel 3.4.6) due to
a glDrawPixels/glReadPixels pair on an MSAA depth buffer. I don't expect
realworld apps to do that, so I'm not too concerned about the hang.
On Ivybridge, passes 69/70. The failing case is
multisample(advanced.line.changeWidth).
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
This function felt sloppy, so this patch cleans it up a little bit.
- Rename `color` to `i`. It is not a color value, only an iterator int.
- Move `depth_bits[0] = 0` into the non-accum loop because that is where
it used. The accum loop later overwrites depth_bits[0].
- Rename `depth_factor` to `num_depth_stencil_bits`.
- Redefine `msaa_samples_array` as static const because it is never
modified. Rename to `singlesample_samples`.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
If either argument to driConcatConfigs(a, b) is null or the empty list,
then simply return the other argument as the resultant list.
All callers were accomplishing that same behavior anyway. And each caller
accopmplished it with the same pattern. So this patch moves that external
pattern into the function.
Reviewed-by: <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
DRI2 configs were constructed in intelInitScreen2. That function already
does too much, so move verbatim the code for creating configs to a new
function, intel_screen_make_configs.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Add two new functions: intel_miptree_{map,unmap}_multisample, to which
intel_miptree_{map,unmap} dispatch. Only mapping flat, renderbuffer-like
miptrees are supported.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Move the opencoded construction and destruction of intel_miptree_map into
new functions, intel_miptree_attach_map and intel_miptree_release_map.
This patch prevents code duplication in a future commit that adds support
for mapping multisample miptrees.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Move the body of intel_miptree_map into a new function,
intel_miptree_map_singlesample. Now intel_miptree_map dispatches to the
new function. A future commit adds a multisample variant.
Ditto for intel_miptree_unmap.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Add function intel_renderbuffer_set_needs_downsample. It is a no-op
except on multisample winsys buffers shared with DRI2.
Mark the needed downsamples with the new function at two locations:
- Immediately after drawing is complete.
- After blitting.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Define a function, brw_blorp_blit_miptrees, that simply wraps
brw_blorp_blit_params + brw_blorp_exec with C calling conventions. This
enables intel_miptree.c, in a following commit, to perform blits with
blorp for the purpose of downsampling multisample miptrees.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Immediately after obtaining, with DRI2GetBuffersWithFormat, the DRM buffer
handle for a DRI2 buffer, we wrap that DRM buffer handle with a region and
a miptree. This patch additionally allocates an accompanying multisample
miptree if the DRI2 buffer is multisampled.
Since we do not yet advertise multisample GL configs, the code for
allocating the multisample miptree is currently inactive.
This patch adds the following fields to intel_mipmap_tree:
singlesample_mt
needs_downsample
and the following function stubs:
intel_miptree_downsample
intel_miptree_upsample
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Move the logic for creating the ancillary hiz and mcs miptress for winsys
and non-texture renderbuffers from intel_alloc_renderbuffer_storage to
intel_miptree_create_for_renderbuffer. Let's try to isolate complex
miptree logic to intel_mipmap_tree.c.
Without this refactor, code duplication would be required along the
intel_process_dri2_buffer codepath in order to create the mcs miptree.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Add a new param, num_samples, to intel_create_renderbuffer and
intel_create_private_renderbuffer.
No multisample GL config is yet advertised, so the value of num_samples is
currently 0. For server-owned winsys buffers, gl_renderbuffer::NumSamples
is not yet used.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com> (v1)
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Rename quantize_num_samples to intel_quantize_num_samples and change the
first param from struct intel_context* to struct intel_screen*. The
function will later be used by intelCreateBuffer, which is not bound to
any context but is bound to a screen.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com> (v1)
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
The comment referred to intel_tex_image_map/unmap, but should more
accurately refer to intel_miptree_map/unmap.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Fixes uninitialized scalar field defect reported by Coverity.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
v2: Note that GLSL 4.3 has not been started, and that
ARB_compute_shader has been started in Gallium drivers.
Signed-off-by: Jason Wood <sandain@hotmail.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
KHR extension name is reserved for Khronos ratified extensions, and there is
no such thing as EGL_KHR_surfaceless_{gles1,gles2,opengl}. Replace these
three extensions with EGL_KHR_surfaceless_context since that extension
actually exists.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Since support for swrast version 2 was added (f55d027a), it has also been
required. In swrast_driver_extensions, version 2 is set for __DRI_SWRAST
extension. Remove the spurious version checks sprinked through the code.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Previously an error would be generated if any attributes were specified when
creating a non-desktop OpenGL context. This was a mistake, and it will
prevent old drivers from working with new EGL libraries that add support for
the createContextAttribs interface. Instead, match the behavior of
EGL_KHR_create_context: allow versions that make sense, reject non-zero flags.
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Commit f0cecd43d6 moved the VUE map computation to be only once, at
VS compile time. However, it did so in slightly the wrong place: it
made the one call to brw_vue_compute_map happen right before the
allocation of dummy slots for replaced point sprite coordinates, causing
a different VUE map to be generated (at least on Ironlake).
Fixes a regression in Piglit's point-sprite test on Ironlake.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=46489
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Consider a texture call such as:
textureLod(s, coordinate, log2(...))
First, we begin setting up the sampler message by loading the texture
coordinates into MRFs, starting with m2. Then, we realize we need the
LOD, and go to compute it with:
ir->lod_info.lod->accept(this);
On Gen4-5, this will generate a SEND instruction to compute log2(),
loading the operand into m2, and clobbering our texcoord.
Similar issues exist on Gen6+. For example, nested texture calls:
textureLod(s1, c1, texture(s2, c2).x)
Any texturing call where evaluating the subexpression trees for LOD or
shadow comparitor would generate SEND instructions could potentially
break. In some cases (like register spilling), we get lucky and avoid
the issue by using non-overlapping MRF regions. But we shouldn't count
on that.
Fixes four Piglit test regressions on Gen4-5:
- glsl-fs-shadow2DGradARB-{01,04,07,cumulative}
NOTE: This is a candidate for stable release branches.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=52129
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
With the textureRect support and GL_CLAMP workarounds, it's grown
sufficiently that it deserves its own function. Separating it out
makes the original function much more readable.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Setting the texture offset bits in the message header involves very
specific hardware register descriptions. As such, I feel it's better
suited for the lower level "generate" layer that has direct access to
the weird register layouts, rather than at the fs_inst abstraction layer.
This also parallels the approach I took in the VS backend.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Use atom for sampler state. Does not provide new functionality
or fix any bug. Just a step toward full atom base r600g.
v2: Split seamless on r6xx/r7xx into it's own atom. Make sure it's
emited after sampler and with a pipeline flush before otherwise
it does not take effect.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
...to look like update_fragment_samplers() code, as with the previous
commit. The next step would be to merge the two functions.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Less code. And as with softpipe, if/when we consolidate the pipe_context
functions for binding sampler state, this will make the llvmpipe changes
trivial.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
The functions for setting samplers and sampler views for vertex,
fragment and geometry shaders were nearly identical. Now they
use shared code.
In the future, if the pipe_context functions for setting samplers
and sampler views for vert/frag/geom/compute are combined, this
will make updating the softpipe driver a snap.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Combine separate arrays for vertex/fragment/geometry samplers, etc into
one array indexed by PIPE_SHADER_x.
This allows us to collapse separate code for vertex/fragment/geometry
state into loops over the shader stage. More to come.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Fixes dereference before null check defect reported by Coverity.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Fixes uninitialized pointer read defect reported by Coverity.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Fixes dereference before null check defect reported by Coverity.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Merge the vertex/fragment versions of the cso_set/save/restore_samplers()
functions. Now we pass the shader stage (PIPE_SHADER_x) to the function
to indicate vertex/fragment/geometry samplers. For example:
cso_single_sampler(cso, PIPE_SHADER_FRAGMENT, unit, sampler);
This results in quite a bit of code reduction, fewer CSO functions and
support for geometry shaders.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Fixes uninitialized scalar variable defect reported by Coverity.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
The GL_OES_mapbuffer extension is supported by OpenGL ES 1 and ES 2 so return
GL_MAP_WRITE_BIT for both ES versions, not just ES 1.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Before, the GLSL parser was getting rebuilt every time that scons was
run. The problem was scons was expecting a glsl_parser.hpp file but
we were generating a glsl_parser.h file.
Signed-off-by: Brian Paul <brianp@vmware.com>
Windowed speed is of course way to slow, but fullscreen
works like a charm now.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Using the writemask in the sampler results in packet
VGPRS. For now just sample all components and let
llvm chose the right one.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
The backend is multiplying the offset by the numbers of
elements anyway, so doing it twice just makes everything
crash.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
The patch makes the SCons build with Intel Compiler successful.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Framebuffer blit needs to setup texture sampling with no reference to the
user's texturing state, and a sampler object lets us avoid a bunch of changes
to the user's state setup.
We don't bother caching the sampler object since we're changing parameters in
it based on the filtering option to glBlitFramebuffer().
Fixes piglit GL_ARB_sampler_objects/framebufferblit and rendering in l4d2 (our
setting of srgb decode wasn't being respected due to the user's sampler object
being active).
Signed-off-by: Pauli Nieminen <pauli.nieminen@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Sampler objects can be used to shadow texture object state without
modifying original application state. Decompression path feels a bit
like path where caching shouldn't happen. But as everything else is
cached already I decided to cache sampler state too.
Signed-off-by: Pauli Nieminen <pauli.nieminen@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
To allow meta module to use sample objects mesa GL functions need to be
visible and linkable for meta module.
Signed-off-by: Pauli Nieminen <pauli.nieminen@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
swrast needs to pass sampler object into all texture fetching functions
to use correct sampling state when sampler object is bound to the unit.
The changes were made using half manual regular expression replace.
v2: Fix NULL deref in _swrast_choose_triangle(), because the _Current
values aren't set yet, so we need to look at our texObj2D. (anholt)
Signed-off-by: Pauli Nieminen <pauli.nieminen@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
To allow meta acceleration operations to use sampler objects the
ARB_sampler_objects extension needs to be mandatory for all drivers.
Because the extension doesn't have any hardware dependencies it is
trivial to implement.
Signed-off-by: Pauli Nieminen <pauli.nieminen@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
CompareFailValue is part of Sampler state that needs to be read from
bound sampler object if present.
Signed-off-by: Pauli Nieminen <pauli.nieminen@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Fixed function fragment shader generator was incorrectly read texture
sampling state directly from texture object. To make sure that
ARB_sampler_object works correctly shader generator has to use the
bound sampler if one exist.
Signed-off-by: Pauli Nieminen <pauli.nieminen@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Preparation for the mandatory support of ARB_sampler_objects. I have tested
this patch with rv280 only.
Signed-off-by: Pauli Nieminen <pauli.nieminen@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
When I build tested radeon changes I noticed two warnings about format
size missmatch in 64bit. I decided to clean them to make relevant
compiler warnings easier to spot.
Signed-off-by: Pauli Nieminen <pauli.nieminen@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
ARB_sampler_objects is very simple software only extension to support. I want
to make it a mandatory extension for Mesa drivers to allow the meta module to
use it.
This patch add support for the extension to nouveau. It is completely untested
search and replace patch, except for flagging the texture state as needing to
be recomputed when a sampler object is present.
Signed-off-by: Pauli Nieminen <pauli.nieminen@linux.intel.com>
sRGBDecode state is part of sampler object state but mesa was missing
handlers to access the state. This patch adds the support for required
state changes and queries.
GL_EXT_texture_sRGB_decode issue 4:
"4) Should we add forward-looking support for ARB_sampler_objects?
RESOLVED: YES
If ARB_sampler_objects exists in the implementation, the sampler
objects should also include this parameter per sampler."
Fixes piglit GL_ARB_sampler_objects/GL_EXT_texture_sRGB_decode.
Signed-off-by: Pauli Nieminen <pauli.nieminen@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
GL_DEPTH_TEXTURE_MODE isn't meant to be part of sampler state based on
compatibility profile specifications.
OpenGL specification 4.1 compatibility 20100725 3.9.2:
"... The values accepted in the pname parameter
are TEXTURE_WRAP_S, TEXTURE_WRAP_T, TEXTURE_WRAP_R, TEXTURE_MIN_-
FILTER, TEXTURE_MAG_FILTER, TEXTURE_BORDER_COLOR, TEXTURE_MIN_-
LOD, TEXTURE_MAX_LOD, TEXTURE_LOD_BIAS, TEXTURE_COMPARE_MODE, and
TEXTURE_COMPARE_FUNC. Texture state listed in table 6.25 but not listed here and
in the sampler state in table 6.26 is not part of the sampler state, and remains in the
texture object."
The list of states is in Table 6.24 "Textures (state per texture
object)" instead of 6.25 mentioned in the specification text.
Same can be found from 3.3 compatibility specification.
Signed-off-by: Pauli Nieminen <pauli.nieminen@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This patch allows GL_SAMPLES to be set to either 0 or 1 on i965
platforms that don't support MSAA (those prior to Gen6). Setting
GL_SAMPLES=1 has the same effect as setting it to 0 on these platforms
(because MSAA is unsupported), but is distinguishable via the GL API.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=50165
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
EXT_framebuffer_multisample is a required subpart of
ARB_framebuffer_object, which means that we must support it even on
platforms that don't support MSAA. Fortunately
EXT_framebuffer_multisample allows for this by allowing GL_MAX_SAMPLES
to be set to 1.
This leads to a tricky quirk in the GL spec: since
GlRenderbufferStorageMultisamples() accepts any value for its
"samples" parameter up to and including GL_MAX_SAMPLES, that means
that on platforms that don't support MSAA, GL_SAMPLES is allowed to be
set to either 0 or 1. On platforms that do support MSAA, GL_SAMPLES=1
is not used; 0 means no MSAA, and 2 or higher means MSAA.
In other words, GL_SAMPLES needs to be interpreted as follows:
=0 no MSAA (possible on all platforms)
=1 no MSAA (only possible on platforms where MSAA unsupported)
>1 MSAA (only possible on platforms where MSAA supported)
This patch modifies all MSAA-related code to choose between
multisampling and single-sampling based on the condition (GL_SAMPLES >
1) instead of (GL_SAMPLES > 0) so that GL_SAMPLES=1 will be treated as
"no MSAA".
Note that since GL_SAMPLES=1 implies GL_SAMPLE_BUFFERS=1, we can no
longer use GL_SAMPLE_BUFFERS to distinguish between MSAA and non-MSAA
rendering.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Nearly the whole function body was contained in the 'else' branch. The
'if' branch did one thing: return early with an error. Clean things up by
moving all the code out of the 'else' branch. Decreases max nesting level
from 4 to 3.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
After commit "intel: Convert to using private depth/stencil buffers", we
request from DRI2GetBuffersWithFormat only the front left and back left
buffers. We no longer request depth and stencil buffers.
Assert that in intelAllocateBuffer and remove the related dead code.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
These assignments caused CFLAGS specified on the configure line to
appear twice in the final CFLAGS. Removing them makes the behavior
reasonable -- USER_CFLAGS are appended at the end of CFLAGS, allowing
the builder to override flags added by configure.ac like
-fno-strict-aliasing.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Even on s390{,x} where there's no video card, you still want this so GLX
protocol works.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Adam Jackson <ajax@redhat.com>
This reverts commit 5d5af7d359.
It turns out the issue this was supposed to fix merely counter-acted
a bug in the hardware driver that I wasn't aware of.
The resource_resolve is not supposed to do sRGB conversion, period.
(This would violate the requirement that source and destination must
be of the same format).
no point in emitting aux scissor values if we
a) never enable them
b) never set the actual values
plus it is enough to have that aux scissor enable reg (which we never set to
enable) in one place not two.
There were several problems with these functions (which are a remnant
of dri1 hyperz mostly - should bring it back somehow someday).
First, it would always do a swrast clear if the buffer to clear was a fbo.
Second, for buffers we wouldn't handle the clear (I guess aux/accum?) we
would actually still have tried to clear that later even when we already
cleared it with swrast.
This addresses one issue raised in bug #51658 discovered by Eugene St Leger.
The assert is bogus since there's no problem with texture width/height being
2048 (the width/height programmed is width/height minus one).
OTOH though the programmed size for scissor rect should be width/height
minus one too otherwise bad things may happen (as it is inclusive, and there's
not enough bits for more than a value of 2047).
SI does not support 64-bit immediates natively, but llvm will generate
i64 immediates when indexing loads and stores (since SI has 64-bit
pointers). The i64 indices will always be small enough to fit into
32-bits (i.e. the high 32 bits will always be all zeros), so we can
treat these index values as 32-bits.
In tablegen, if two patterns match, the one that comes first in the file
is given preference. We want the SMRD IMM pattern to be given
preference, because it encodes the pointer offset in its immediate
field, which saves us an add instruction.
I ended up having to add rallocing of the ast_type_qualifier in order
to avoid pulling in ast.h for glsl_parser_extras.h, because I wanted
to track an ast_type_qualifier in the state.
Fixes piglit ARB_uniform_buffer_object/row-major.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Yes, you get to say things like "layout(row_major, column_major)" and
get column major.
Part of fixing piglit ARB_uniform_buffer_object/row_major.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This is like a stripped-down version of glGetActiveUniform that just
returns the name, since the other return values (type and size) of
that function are now meant to be handled with
glGetActiveUniformsiv().
Fixes piglit ARB_uniform_buffer_object/getactiveuniformname
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The previous implementation required a flag in _mesa_glsl_parse_state
and line of code to initialize it for every version of the shading
language we intend to support. As we look to add 150, 330, 400, 410,
420, and beyond, this gets rather unwieldy.
This patch retains the switch statement (to reject, say, #version 111),
but removes all the bits. Code to check for ctx->API == API_OPENGL_CORE
could easily be added to the 110 and 120 cases to reject those.
v2: Use _mesa_is_desktop_gl to preserve the existing behavior in the
presence of the new API_OPENGL_CORE enumeration.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net> [v1]
Fixes some failures in getteximage-formats.
v2: Remove stray include, and drop extra test for encoding == GL_SRGB --
_mesa_get_srgb_format_linear() returns the same format if it wasn't SRGB.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=48120
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
NOTE: This is a candidate for the 8.0 branch.
It was using state->Const.GLSL_100ES, which is set if the driver
supports ARB_ES2_compatibility or we're in ES2 mode. Instead, it should
use state->language_version, as that represents the actual GLSL version
of the shader being compiled.
Since the correct logic is < 120 && !100, just make it == 110.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This will need to get refactored when we add support for core profiles
or forward-compatible contexts, but we may as well have it in the
meantime. This allows us to override the GLSL version and experiment.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Move installing osmesa.pc to drivers/osmesa, where it belongs better
This also restores the installation of gl.pc if we are building osmesa at the
same time as libGL, which was broken in commit 39785488 when the .pc
installation was converted to automake
v2:
Remove HAVE_OSMESA_DRIVER automake conditional, it's now pointless as we
will only be building in the drivers/osmesa directory if the condition it
checked was true.
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Eric Anholt <eric@anholt.net>
This patch fixes this build failure with Intel Compiler.
src/gallium/auxiliary/util/u_format_tests.c(903): error: floating-point operation result is out of range
{PIPE_FORMAT_R16_FLOAT, PACKED_1x16(0xffff), PACKED_1x16(0x7c01), UNPACKED_1x1( NAN, 0.0, 0.0, 1.0)},
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Now that ir_quadop_vector exists, ir_last_binop and ir_last_opcode are
no longer the same. Only one place currently uses this enumeration, and
already handles ir_quadop_vector correctly.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Olivier Galibert <galibert@pobox.com>
It's more convenient to use shortcuts like glsl_type::bvec2_type than
the longwinded glsl_type::get_instance(GLSL_TYPE_BOOL, 2, 1).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Olivier Galibert <galibert@pobox.com>
The hardware supports this format with no known quirks, so we may as
well enable it.
Alpha blending is not supported until Sandybridge, but as far as I can
tell, OpenGL doesn't require alpha blending on SNORM formats. Plus, we
already expose R8G8B8A8_SNORM which has a similar restriction.
Fixes 6 piglit texwrap-2D-*SNORM* cases,
gl-3.1/required-sized-texture-formats, and 10 oglconform snorm-textures
subcases
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
v2: fix tiling for small pitches, that finally makes
glxgears and readPixSanity work
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
The format member of pipe_surface may differ from that of the
pipe_resource, which is used to communicate, for instance, whether
sRGB encode should be enabled in the resolve operation or not.
Fixes resolve to sRGB surfaces in mesa/st when GL_FRAMEBUFFER_SRGB
is disabled.
Reviewed-by: Brian Paul <brianp@vmware.com>
sRGBEnabled should affect both textures and renderbuffers, so we need
to check/update the pipe_surface format for both.
Fixes, for instance, rendering appearing too bright in wine applications
using sRGB multisample renderbuffers.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
Remove the check for pixel transfer ops. If any RGB/depth scale/bias
is in effect, it'll be applied in the glTexImage step.
If drawing stencil pixels we need to disable pixel transfer so that
alpha scale/bias are not applied to the stencil data.
These issues were spotted by Roland.
Fixes Blender performance issues reported in
http://bugs.freedesktop.org/show_bug.cgi?id=47375
NOTE: This is a candidate for the 8.0 branch.
Tested-by: Barto <mister.freeman@laposte.net>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
No functional change. This patch modifies intel_miptree_alloc_mcs to
allocate the 4x MCS buffer using MESA_FORMAT_R8 instead of
MESA_FORMAT_A8. In principle it doesn't matter, since we only access
the buffer using MCS-specific hardware mechanisms, so all that's
important is to use a format with the correct size. However,
MESA_FORMAT_A8 has enough unusual behaviours that it seems prudent to
avoid it.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
It seems reset is not required for setting the max_wm_threads to 80
on gen6 GT2.
Increases performance in the Counter-Strike: Source video stress test
by 7.18% (n=5).
Signed-off-by: Zou Nan hai <nanhai.zou@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Matt Turner <mattst88@gmail.com>
Acked-by: Eric Anholt <eric@anholt.net>
The VCC register is tricky because the SALU views it as 64-bit, but the
VALU views it as 1-bit. In order to deal with this we've added some
special bitcast and binary operations to help convert from the 64-bit
SALU view to the 1-bit VALU view and vice versa.
If you want to change your compiler arguments, just set CFLAGS/CXXFLAGS.
Having Mesa have this separate variable is a great way to have your arguments
not thoroughly propagated to all compiler invocations.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In all current uses, it was appended to CFLAGS, which already had -m32. If
you want to do some other flag supplied to compiler invocations, there's
CFLAGS/CXXFLAGS.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
No functional change. This patch modifies brw_blorp_blit.cpp to use
the ROUND_DOWN_TO macro instead of open-coded bit manipulations, for
clarity.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The emit->key.fkey info is only valid if we're generating a fragment shader.
We should not look at it if we're generating a vertex shader.
When generating a vertex shader, the value of emit->key.fkey.num_textures was
garbage and the loop over num_textures would read invalid data. At best
this would cause us to emit an unused constant. At worse, we could segfault.
Just by dumb luck, fkey.num_textures was usually a smallish integer.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Recently more files were removed from control to be auto-generated
in the dricore library. Android build was not able to locate the
new files if they were not created beforehand.
LOCAL_SRC_FILES includes some of those files and Android.gen.mk
re-defines this variable by filtering out the auto-generated files.
Unfortunately for this variable it is not the same to have the SRCDIR
variable defined as the current directory.
By re-defining SRCDIR for the autotools build the Android build system
is happy again and the new files were actually removed from the sources
to use the auto generated versions.
Also patch d5c1801a01 was partially reverted as the files
can not be compiled to the LOCAL_PATH, instead they should live on the
intermediates folder so that a clean can wipe them out.
v3: [chad] Fix the definition of SRCDIR in libdricore/Makefile.am.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Signed-off-by: Daniel Charles <daniel.charles@intel.com>
XGetImage() will generate a BadMatch error if the source window isn't
visible. When that happens, create a new XImage. Fixes piglit 'select'
test failures with swrast/xlib driver.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Always allocate space for the inverse matrix in _math_matrix_ctr()
since we were always calling _math_matrix_alloc_inv() anyway.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When computing a matrix inverse, if the determinant is too small we could hit
a divide by zero. There's a check to prevent this (we basically give up on
computing the inverse and return the identity matrix.) This patch loosens
this test to fix a lighting bug reported by Lars Henning Wendt.
v2: use abs(det) to handle negative values
NOTE: This is a candidate for the 8.0 branch.
Tested-by: Lars Henning Wendt <lars.henning.wendt@gris.tu-darmstadt.de>
The sendc instruction causes the fragment shader thread to wait for
any dependent threads (i.e. threads rendering to overlapping pixels)
to complete before sending the message. We need to use sendc on the
first render target write in order to guarantee that fragment shader
outputs are written to the render target in the correct order.
Previously, we only used the "sendc" instruction when writing to
binding table index 0. This did the right thing for fragment shaders,
because our fragment shader back-ends always issue their first render
target write to binding table index 0. However, it did the wrong
thing for blorp, which performs its render target writes to binding
table index 1.
A more robust solution is to use sendc for all render target writes.
This should not produce any performance penalty, since after the first
sendc, all of the dependent threads will have completed.
For more information about sendc, see the Ivy Bridge PRM, Vol4 Part3
p218 (sendc - Conditional Send Message), and p54 (TDR Registers).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
A lot of code was still differentiating between between winsys and
user fbos by testing the fbo's name against zero. This converts
everything in the i915 and 965 drivers over to use _mesa_is_user_fbo()
and _mesa_is_winsys_fbo().
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
A lot of code was still differentiating between between winsys and
user fbos by testing the fbo's name against zero. This converts
everything in core mesa, the state tracker, and src/mesa/program over
to use _mesa_is_user_fbo() and _mesa_is_winsys_fbo().
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The OpenGL(R) ES Shading Language
Version 1.00 Revision 17 (12 May, 2009)
> 4.6.1 The Invariant Qualifier
> ... To force all output variables to be invariant, use the pragma
> #pragma STDGL invariant(all)
Signed-off-by: Oliver McFadden <oliver.mcfadden@linux.intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
We already provided these files on 'make install', but only created a
'libglapi.so' in the top-level lib/ convenience folder. We used to
create all three, but at some point in the build system churn, it broke.
Various applications (like the ES2 conformance suite) seem to link
against libglapi.so.0, so without these links, setting LD_LIBRARY_PATH
and LIBGL_DRIVERS_PATH can lead to using /usr/lib/libglapi.so.0 with
/home/whatever/libGL.so, which leads to API calls getting routed
incorrectly (i.e. glCompileShader -> _mesa_LinkProgramARB), which leads
to rage problems.
Preserve developer sanity...install links.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Ever since ctx->NativeIntegers was set, the conversion flag has been
PARAM_NO_CONVERT.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Since osmesa now has been converted to Makefile.am, an appropriate install: rule
is generated to install the shared libary, so we no longer need to do that in
src/mesa/Makefile.old
This leaves nothing in src/mesa/Makefile.old but the tags: rule, so move that to
Makefile.am and remove Makefile.old
Also, nothing now uses OSMESA_LIB_GLOB anymore, so remove it
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Commit 6c6803f28d removed xm_image.[ch], and removed
xm_image.c, but not xm_image.h from the Makefile, this was subsequently carried over
into Makefile.am
Remove xm_image.h from Makfile.am. This allows 'make dist' to succeed, even if it
doesn't do anything useful
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
"Use -no-undefined to assure libtool that the library has no
unresolved symbols at link time, so that libtool will build a shared
library on platforms require that all symbols are resolved when the
library is linked."
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
"Use -no-undefined to assure libtool that the library has no
unresolved symbols at link time, so that libtool will build a shared
library on platforms require that all symbols are resolved when the
library is linked."
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
MCS buffers use 32 bits per pixel in 8x MSAA, and 8 bits per pixel in
4x MSAA. This patch adjusts the format we use to allocate the buffer
so that enough memory is set aside for 8x MSAA.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The code to emit 3DSTATE_SAMPLE_MASK was already correct for 8x
MSAA--this patch just removes an assertion that would have prevented
it from being used for 8x MSAA.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch updates the blorp functions encode_msaa() and decode_msaa()
to properly handle the encoding of IMS MSAA buffers when
num_samples=8.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
When operating in persample dispatch mode, the blorp engine would
previously assume that subspan N always represented sample N (this is
correct assuming 4x MSAA and a 16-wide dispatch). In order to support
8x MSAA, we must compute which sample is associated with each subspan,
using the "Starting Sample Pair Index" field in the thread payload.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When rendering to an IMS MSAA surface on Gen7, blorp sets up the
rendering pipeline as though it were rendering to a single-sampled
surface; accordingly it must adjust the size of the primitive it sends
down the pipeline to account for the interleaving of samples in an IMS
surface.
This patch modifies the size adjustment code to properly handle 8x
MSAA, which makes room for the extra samples by using an interleaving
pattern that is twice as wide as 4x MSAA.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch adds a num_samples argument to the blorp function
manual_blend(), allowing it to be told how many samples need to be
blended together. Previously it assumed 4x MSAA, since that was all
we supported.
We also bump up LOG2_MAX_BLEND_SAMPLES from 2 to 3, so that
manual_blend() will be able to handle 8x MSAA.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When the client program uses glDrawBuffer() or glDrawBuffers() to
select more than one color buffer for drawing into, and then performs
a blit, we need to blit into every single enabled draw buffer.
+2 oglconforms.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=50407
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
This patch rearranges the order of steps performed by a blorp blit
from this:
- Sync up state of window system buffers.
- Find buffers.
- Find miptrees.
- Make sure buffer formats match.
- Handle mirroring.
- Make sure width and height match.
- Handle clipping/scissoring.
- Account for window system origin conventions.
- Do depth resolves, if applicable.
- Do the blit.
- Record the need for a future HiZ resolve, if applicable.
To this:
- Sync up state of window system buffers.
- Handle mirroring.
- Make sure width and height match.
- Handle clipping/scissoring.
- Account for window system origin conventions.
- Find buffers.
- Make sure buffer formats match.
- Find miptrees.
- Do depth resolves, if applicable.
- Do the blit.
- Record the need for a future HiZ resolve, if applicable.
The steps are the same, but they are now performed in an order that
will make it possible to implement correct DrawBuffers support. Note
that the last four steps are now in a separate function
(do_blorp_blit), since they will need to be executed repeatedly when
DrawBuffers support is added.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Previously, the blorp engine would fall back to swrast if the source
or destination of a blit had no associated miptree. This was
unnecessary, since _mesa_BlitFramebufferEXT() already takes care of
making the blit silently succeed if there are no buffers bound, so the
fallback paths could never actually happen in practice.
Removing these fallback paths will simplify the implementation of
correct DrawBuffers support in blorp.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
This patch modifies the order of operations in the blorp engine so
that clipping and scissoring are performed before adjusting the
coordinates to account for the difference in origin convention between
window system buffers and framebuffer objects. Previously, we would
do clipping and scissoring after adjusting for origin conventions, so
we would get scissoring wrong in window system buffers.
Fixes Piglit test "fbo-scissor-blit window".
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
When checking that the source and destination dimensions match, we
don't need to store the width and height in variables; doing so just
risks confusion since right after the check, we do clipping and
scissoring, which may alter the width and height.
No functional change.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
On Gen6, multisampled null render targets don't seem to work
properly--they cause the GPU to hang. So, as a workaround, we render
into a dummy color buffer.
Fortunately this situation (multisampled rendering without a color
buffer) is rare, and we don't have to waste too much memory, because
we can give the workaround buffer a very small pitch.
Fixes piglit test "EXT_framebuffer_multisample/no-color {2,4}
depth-computed *" on Gen6.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
The HW docs say that the width and height of null render targets need
to match the width and height of the corresponding depth and/or
stencil buffers, and that they need to be marked as Y-tiled. Although
leaving these values at 0 doesn't seem to cause any ill effects, it
seems wise to follow the documented requirements.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Previously, we used the number of samples in draw buffer 0 to
determine whether to set up the 3D pipeline for multisampling. Using
the visual is cleaner, and has the benefit of working properly when
there is no color buffer.
Fixes all piglit tests "EXT_framebuffer_multisample/no-color" on Gen7.
On Gen6, the "depth-computed" variants of these tests still fail; this
will be addresed in a later patch.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
This patch ensures that Visual.samples and Visual.sampleBuffers are
set correctly even in the case where there is no color buffer.
Previously, these values would retain their default value of 0 in this
circumstance, even if the depth or stencil buffer was multisampled.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Mesa misses a few checks when compiling on a uclibc system
which cause it to fall back on glibc-ism. This patch
addresses those issues.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Anthony G. Basile <blueness@gentoo.org>
The kernel streamout support was supposed to get into 3.3 along
the tiling change and thus use the same kernel version bump of
2.13 to report userspace that streamout register were supported.
This is not what happen. So as streamout kernel support did not
bump the kernel driver version, rely on kernel 2.14 version bump
to know if streamout is enabled or not. Which means you need at
least 3.4 kernel.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
The error was being set on the non-error path, rather
than the error path.
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
For 'non-legacy' contexts we will want to generate an error
if an uninstalled function is called.
The effect of this change will be that we can avoid installing
legacy functions, and they will then generate an error as
needed for deprecated functions in GL >= 3.1.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Commit 2d4b77c7 (automake: Convert src/mesa/drivers/x11/Makefile to
automake, 2012-06-12) dropped the old Makefile, which used GL_LIB, and
replaced it with a Makefile.am hard-coding the name "GL". This broke
handling of --enable-mangling and --with-gl-lib-name options which
depend on GL_LIB to specify the GL library name.
Use "@GL_LIB@" in src/mesa/drivers/x11/Makefile.am to configure the
library name. Also use this approach to simplify src/glx/Makefile.am
and drop the HAVE_MANGLED_GL conditional. While at it, fix the
compatibility link we create in "lib" for the software-only driver to
use version GL_MAJOR instead of hard-coding "1".
Reviewed-by: Dan Nicholson <dbn.lists@gmail.com>
This fixes the piglit EXT_framebuffer_multisample/bitmap tests.
Note that we must not rely on ctx->DrawBuffer when flushing the cache, because
that's already updated with a new framebuffer. We want to draw into the old
framebuffer where glBitmap was called.
Reviewed-by: Brian Paul <brianp@vmware.com>
Testing shows that the standard JIT engine retrofited with AVX support is quite
stable and as capable to handle AVX instructions as MC-JIT is.
And the old JIT is much more memory efficient, as we don't need to
allocate one engine instance per shader, as we do for MC-JIT due to its
incompleteness.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
When X is running it is neccesary for pipe_loader to authenticate with
DRM, in order to be able to use the device.
This makes it possible to run OpenCL programs while X is running.
v2:
- Fix C++ style comments
- Drop Xlib-xcb dependency
- Close the X connection when done
- Split auth code into separate function
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Calling glDeleteShader() should mark shaders as pending for deletion,
but shouldn't decrement the refcount every time. Otherwise, repeated
glDeleteShader() is not safe.
This is particularly bad since glDeleteProgram() frees shaders: if you
first call glDeleteShader() on the shaders attached to the program (thus
decrementing the refcount), then called glDeleteProgram(), it would try
to free them again (decrementing the refcount another time), causing
a refcount > 0 assertion to fail.
Similar to commit d950a778.
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
If the pack type is not supported, use _mesa_problem
rather than asserting.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
_mesa_is_integer_format is moved to formats.c and renamed
as _mesa_is_enum_format_integer.
_mesa_is_format_unsigned, _mesa_is_type_integer,
_mesa_is_type_unsigned, and _mesa_is_enum_format_or_type_integer
are added.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
llvm-3.2svn r160587 moved createBoundsCheckingPass from
lib/Transforms/Scalar to lib/Transforms/Instrumentation.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Except for a couple of explicit uses, _mesa_inv_sqrtf was disabled since
its addition in 2003 (see f9b1e524).
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Temporarily disabled since 2003 (see 386578c5b).
This saves us from calling sqrt() 128 times to generate the sqrttab in
one_time_init().
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Found by compiler warning:
i830_texstate.c:131:28: warning: argument to 'sizeof' in 'memset' call
is the same expression as the destination; did you mean to
dereference it? [-Wsizeof-pointer-memaccess]
memset(state, 0, sizeof(state));
~~~~~ ^~~~~
On 64-bit systems, memset here would write an extra 4 bytes.
Note: This is a candidate for the stable branches.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
This can potentially cut shader program size by a factor of 4 for 4-wide
execution respectively 2 for 8-wide execution and while this ratios aren't
quite reached for more complex shaders it can be close.
Could not really measure a performance difference so far except for trivial
shaders (glxgears).
There seems to be a fair amount of unnecessary move's generated especially
at the beginning it might be possible to optimize those away somehow.
Things aren't quite as clean, some additional stuff needs to be done for
keeping both paths working (though llvm might be able to optimize this away).
glxgears seems to lose about 5-10% of performance, looking at the generated
shaders this is actually less than I'd think it would be - both 4 and 8-wide
shaders, despite containing a loop actually have about 10% more instructions
in total, and will have roughly 50% more executed instructions (though mostly
cheap ones). Need to figure out how to reduce overhead...
v2: keep complex interpolation for 4-wide mode, adapt to interface changes.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
This thread count is only supposed to be enabled when "WIZ Hashing Disable in
GT_MODE register enabled." I've always been confused whether that means the
bit in the register should be 1 or 0. For my IVB GT2's register 0x7008 value
of 0x0, this appears to work fine.
Improves l4d2 performance at 640x480 by 0.88 +/- 0.11% (n=88). Improves
performance with rasterization at 1280x1024 by 1.45% +/- 0.36% (n=6).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now that we finally have a list of uniform blocks in the linked shader
program, we can tell what their indices are.
Fixes piglit GL_ARB_uniform_buffer_object/getuniformblockindex.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
At this point in the linking, we've totally lost track of the struct
gl_uniform_buffer that this pointed to in the original unlinked
shader, so we do a nasty n^2 walk to find it the new one based on the
variable name.
Note that these point into the shader's list of gl_uniform_buffers,
not the linked program's.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We'll need to propagate the UBO fields to the uniform storage records
before we can handle the other pnames.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This is a single entrypoint that maps from a series of names to the
indices of those names within the active uniforms list. Each index is
like glGetUniformLocation()'s return value, except that it doesn't
encode an array offset.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
With the upcoming GL_ARB_uniform_buffer_object changes, the only
other caller that will want the cooked value is state_tracker.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We're going to need this structure to cross-validate the uniform
blocks between shader stages, since unused ir_variables might get
dropped. It's also the place we store the RowMajor qualifier, which
is not part of the GLSL type (since that would cause a bunch of type
equality checks to fail).
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Someone tried to be clever and "optimized" add_vertex_data2() to just use
two points for the texture coordinates and then reuse individual
components. Sadly this is not how matrix multiplication works.
Fixes rendercheck -t tmcoords
Signed-off-by: Lucas Stach <dev@lynxeye.de>
Previously, on Gen7, when texturing from a depth or stencil surface,
the blorp engine would configure the 3D pipeline as though the input
surface was non-multisampled, and perform the necessary coordinate
transformations in the fragment shader to account for the IMS layout.
This meant outputting a lot of extra fragment shader code, and it
raised some uncertainty about how to deal with very large surfaces.
This patch modifies blorp to configure the 3D pipeline properly for
IMS layout when reading from depth and stencil surfaces.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Previously, on Gen7, compute_msaa_layout_for_pipeline() would verify
that IMS layout is not used. However, now that we configure
SURFACE_STATE correctly for IMS surfaces, IMS layout is available.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This patch modifies gen7_set_surface_num_multisamples() to set up the
SURFACE_STATE appropriately for texturing from IMS format MSAA
surfaces (which are only used on Gen7 for depth and stencil buffers).
Since the function now sets more than just the number of multisamples,
it's been renamed to gen7_set_surface_msaa().
This will make it possible to remove some kludginess from the blorp
engine.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
When downsampling a compressed multisampled surface, we can take a
shortcut to downsample any pixels that were completely covered by a
single primitive. In this case, the first color value we fetch is the
correct final color for the downsampled pixel, so we can skip the rest
of the blending operation.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
When downsampling an integer-format buffer on Gen7, we need to use the
"avg" instruction rather than the "add" instruction, to ensure that we
don't overflow the range of 32-bit integers. Also, we need to use the
proper register type (BRW_REGISTER_TYPE_D or BRW_REGISTER_TYPE_UD) for
intermediate color data and for writing to the render target.
Note: this patch causes blorp to use the proper register type for all
operations (downsampling, upsampling, and ordinary blits). Strictly
speaking, this is only necessary for downsampling, because the other
operations exclusively use MOV instructions on the color data. But
it's simpler to use the proper register type in all cases.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
When downsampling from an MSAA image to a single-sampled image, it is
inevitable that some loss of numerical precision will occur, since we
have to use 32-bit floating point registers to hold the intermediate
results while blending. However, it seems reasonable to expect that
when all samples corresponding to a given pixel have the exact same
color value, there will be no loss of precision.
Previously, we averaged samples as follows:
blend = (((sample[0] + sample[1]) + sample[2]) + sample[3]) / 4
This had the potential to lose numerical precision when all samples
have the same color value, since ((sample[0] + sample[1]) + sample[2])
may not be precisely representable as a 32-bit float, even if the
individual samples are.
This patch changes the formula to:
blend = ((sample[0] + sample[1]) + (sample[2] + sample[3])) / 4
This avoids any loss of precision in the event that all samples are
the same, by ensuring that each addition operation adds two equal
values.
As a side benefit, this puts the formula in the form we will need in
order to implement correct blending of integer formats.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
From the Ivy Bridge PRM, Vol4 Part3 p152:
"The avg instruction performs component-wise integer average of
src0 and src1 and stores the results in dst. An integer average
uses integer upward rounding. It is equivalent to increment one to
the addition of src0 and src1 and then apply an arithmetic right
shift to this intermediate value."
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
The kill_emitted variable was duplicating the functionality of
gl_fragment_program::UsesKill. There's no need for both.
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, the code for setting this flag for GLSL programs was
duplicated in three places: brw_link_shader(), glsl_to_tgsi_visitor,
and ir_to_mesa_visitor. In addition to the unnecessary duplication,
there was a performance problem on i965: brw_link_shader() set the
flag before doing its final round of optimizations, which meant that
if the optimizations managed to eliminate all the discard operations,
the flag would still be set, resulting (at least in theory) in slower
performance.
This patch consolidates all of the code that sets UsesKill for GLSL
programs into do_set_program_inouts(), which already is doing a
similar job for UsesDFdy, and which occurs after i965's final round of
optimizations.
Non-GLSL programs (ARB programs and the state tracker's glBitmap
program) are unaffected.
Reviewed-by: Eric Anholt <eric@anholt.net>
Move it to native_wayland_drm_bufmgr_helper.c which only gets compiled when
wayland is enabled and which already includes the right headers.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
The cube sampler generates two-dimensional texture coordinates and
hence passes NULL for the array for the third one. The actual 2D
sampler, lower in the pipe, knew not to used that array since it
didn't need it. But the samplers have become single-texel and the
coordinate array dereference has been moved up one step, to a level
where the code does not know only two coordinates are used. Hence the
segfault.
The simplest fix by far is to add a third dummy coordinate array in
the call to the next pipe step, which will be dereferenced to an
harmless 0 which then will be happily ignored by the sampler.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=52250
Signed-off-by: Olivier Galibert <galibert@pobox.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
We also reuse EGL_TEXTURE_RGBA and EGL_TEXTURE_RGB, adding only the new
planar YUV texture formats: EGL_TEXTURE_Y_U_V_WL, EGL_TEXTURE_Y_UV_WL and
EGL_TEXTURE_Y_XUXV_WL.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
The i965 back-end needs to compile dFdy() differently for FBOs and
window system framebuffers, because Y coordinates are flipped between
the two (see commit 82d2596: i965: Compute dFdy() correctly for FBOs).
This patch avoids unnecessarily recompiling shaders that don't use
dFdy(), by only setting render_to_fbo in the wm program key if the
shader actually uses dFdy().
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This patch updates the ir_set_program_inouts_visitor so that it also
sets gl_fragment_program::UsesDFdy.
This is a bit of a hack (since dFdy() isn't an input or an output),
but there's no other obvious visitor to squeeze this functionality
into, and it would be silly to create a brand new visitor just for
this purpose.
v2: use local 'fprog' var to avoid repeated casting.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The i965 back-end needs to compile dFdy() differently for FBOs and
window system framebuffers, because Y coordinates are flipped between
the two (see commit 82d2596: i965: Compute dFdy() correctly for FBOs).
This boolean will allow it to avoid unnecessarily recompiling shaders
that don't use dFdy().
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Unigine Heaven (at least) has a bug where it incorrectly uses the
GL_ARB_blend_func_extended extension.
Dual source blending allows two color outputs per render target;
individual shader outputs can be assigned to be either the first or
second blending input by setting the 'index' via one of two methods:
- An API call: glBindFragDataLocationIndexed()
- The GLSL 'layout' qualifier provided by GL_ARB_explicit_attrib_location
Both of these only work on user defined fragment shader outputs; it's an
error to use either on built-in outputs like gl_FragData.
Unigine uses gl_FragData and gl_FragColor exclusively, and doesn't even
attempt to use either method to set index == 1. However, it does set
the blending function to SRC1 enums, which requires a fragment shader
output with index == 1 or else rendering is undefined.
In other words, enabling ARB_blend_func_extended causes Unigine to
render incorrectly, resulting in an apparent regression, even though our
driver code (as far as I can tell) is perfectly fine.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=50291
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, if we were spilling the result of a texture call, we would store
all 4 regs, then for each use of one of those regs as the source of an
instruction, we would unspill all 4 regs even though only one was needed.
In both lightsmark and l4d2 with my current graphics config, the shaders that
produce spilling do so on split GRFs, so this doesn't help them out. However,
in a capture of the l4d2 shaders with a different snapshot and playing the
game instead of using a demo, it reduced one shader from 2817 instructions to
2179, due to choosing a now-cheaper texture result to spill instead of piles
of texcoords.
v2: Fix comment noted by Ken, and fix the if condition associated with it for
the current state of what constitutes a partial write of the destination.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
There's one instance of a potential behavior change: propagate_constants may
now propagate into a part of a vgrf after a different part of it was
overwritten by a send that returns multiple registers. I don't think we ever
generate IR that meets that condition, but it's something to note if we bisect
behavior change to this.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In these places, we care about any sort of send that hits more than one reg,
not just textures. We don't yet have anything else returning more than one
reg, so there's no change.
v2: Use mlen instead of is_tex() for the is-it-a-send check.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
"count" is a more useful name, since most of the time we're using it for
looping over the variables.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
OpenGL specification 3.3 (page 196), section 4.1.3 says:
If drawbuffer zero is not NONE and the buffer it references has an
integer format, the SAMPLE_ALPHA_TO_COVERAGE and SAMPLE_ALPHA_TO_ONE
operations are skipped."
This should work properly even if there are other draw buffers that
are not in integer format.
This patch makes following piglit tests pass on mesa:
int-draw-buffers-alpha-to-coverage
int-draw-buffers-alpha-to-one
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
This patch churns a lot because it needs to change 4-wide filters into
single pixel filters, since each fragment may use a different filter.
The only case not entirely supported is the anisotropic filtering.
Not sure what we want to do there, since a full quad is required by
that filter.
Signed-off-by: Olivier Galibert <galibert@pobox.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
From the GL 3.0 spec, section 4.3.3, in the documentation for
CopyPixels():
"An INVALID_OPERATION error will be generated if the object bound
to READ_FRAMEBUFFER_BINDING is framebuffer complete and the value
of SAMPLE_BUFFERS is greater than zero."
The same applies to CopyTexImage...() and CopyTexSubImage...()
functions, since they are defined in terms of CopyPixels().
Previously we were generating an INVALID_FRAMEBUFFER_OPERATION error
in these cases.
Fixes piglit tests
"EXT_framebuffer_multisample/negative-{copypixels,copyteximage}".
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Issues fixed:
- set_vs_sampler_views for evergreen is now properly implemented.
- Added the missing inval_texture_cache call for evergreen.
- have_depth_texture was sometimes incorrectly set to false on evergreen even
if there were depth textures in other shader stages. To fix this, set it
to true once and never set it to false again. It's stupid, but it matches
the r600 code. The proper fix is left to another patch.
- Optimizaton: The sampler views which aren't changed aren't updated.
This is a leftover from:
commit fe1fd67556
Author: Marek Olšák <maraeo@gmail.com>
Date: Sun Jul 8 03:10:37 2012 +0200
r600g: don't flush depth textures set as colorbuffers
If only some buffers are changed, the other ones don't have to re-emitted.
This uses bitmasks of enabled and dirty buffers just like
emit_constant_buffers does.
* Also add mcjit in the non-OpenCL case.
* Replace hardcoded llvm-config with $LLVM_CONFIG everywhere.
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Tom Stellad <thomas.stellard@amd.com>
Helps spotting and removing the obsolete generated files, which otherwise break
the build.
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This is neccessary for linking the llvmpipe tests. It appears this
dependency was introduced by the "wider native register" changes.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
It's been broken (using NULL getBuffersWithFormat() instead of
getBuffers()) due to a copy and paste error for a year now.
GetBuffersWithFormat has been around since 2009, so I don't feel any
guilt in not supporting it.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This means that GLX buffer sharing of these no longer works. On the
other hand, just *look* at this code reduction.
v2:
- [chad] Fix intelCreateBuffer for gen < 6. When the branch for
!screen->hw_has_separate_stencil was taken,
intel_create_private_renderbuffer was incorrectly not used.
- [chad] Remove all code in intel_process_dri2_buffer for processing
depth, stencil, and hiz buffers. That code is now dead.
CC: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
commit '7250cd506baa0bd4649b30d87509cdd0cbc06a57'
changes struct gbm_bo, renaming it's 'pitch' to 'stride'.
This applies to Gallium.
Signed-off-by: Elvis Lee <kwangwoong.lee@lge.com>
Previously, if you ran make followed by make check it would work, but
if you just ran make check the test program would fail to compile.
Reviewed-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Squashed commit of the following:
commit 7acb7b4f60dc505af3dd00dcff744f80315d5b0e
Author: José Fonseca <jfonseca@vmware.com>
Date: Mon Jul 9 17:46:31 2012 +0100
draw: Don't use dynamically sized arrays.
Not supported by MSVC.
commit 5810c28c83647612cb372d1e763fd9d7780df3cb
Author: José Fonseca <jfonseca@vmware.com>
Date: Mon Jul 9 17:44:16 2012 +0100
gallivm,llvmpipe: Don't use expressions with PIPE_ALIGN_VAR().
MSVC doesn't accept exceptions in _declspec(align(...)). Use a
define instead.
commit 8aafd1457ba572a02b289b3f3411e99a3c056072
Author: José Fonseca <jfonseca@vmware.com>
Date: Mon Jul 9 17:41:56 2012 +0100
gallium/util: Make u_cpu_detect.h header C++ safe.
commit 5795248350771f899cfbfc1a3a58f1835eb2671d
Author: José Fonseca <jfonseca@vmware.com>
Date: Mon Jul 2 12:08:01 2012 +0100
gallium/util: Add ULL suffix to large constants.
As suggested by Andy Furniss: it looks like some old gcc versions
require it.
commit 4c66c22727eff92226544c7d43c4eb94de359e10
Author: José Fonseca <jfonseca@vmware.com>
Date: Fri Jun 29 13:39:07 2012 +0100
gallium/util: Truly disable INF/NAN tests on MSVC.
Thanks to Brian for spotting this.
commit 8bce274c7fad578d7eb656d9a1413f5c0844c94e
Author: José Fonseca <jfonseca@vmware.com>
Date: Fri Jun 29 13:39:07 2012 +0100
gallium/util: Disable INF/NAN tests on MSVC.
Somehow they are not recognized as constants.
commit 6868649cff8d7fd2e2579c28d0b74ef6dd4f9716
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu Jul 5 15:05:24 2012 +0200
gallivm: Cleanup the 2 x 8 float -> 16 ub special path in lp_build_conv.
No behaviour change intended, like 7b98455fb40c2df84cfd3cdb1eb7650f67c8a751.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
commit 5147a0949c4407e8bce9e41d9859314b4a9ccf77
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu Jul 5 14:28:19 2012 +0200
gallivm: (trivial) fix issues with multiple-of-4 texture fetch
Some formats can't handle non-multiple of 4 fetches I believe, but
everything must support length 1 and multiples of 4.
So avoid going to scalar fetch (which is very costly) just because length
isn't 4.
Also extend the hack to not use shift with variable count for yuv formats to
arbitrary length (larger than 1) - doesn't matter how many elements we
have we always want to avoid it unless we have variable shift count
instruction (which we should get with avx2).
Reviewed-by: José Fonseca <jfonseca@vmware.com>
commit 87ebcb1bd71fa4c739451ec8ca89a7f29b168c08
Author: Roland Scheidegger <sroland@vmware.com>
Date: Wed Jul 4 02:09:55 2012 +0200
gallivm: (trivial) fix typo for wrap repeat mode in linear filtering aos code
This would lead to bogus coordinates at the edges.
(undetected by piglit because this path is only taken for block-based
formats).
Signed-off-by: José Fonseca <jfonseca@vmware.com>
commit 3a42717101b1619874c8932a580c0b9e6896b557
Author: José Fonseca <jfonseca@vmware.com>
Date: Tue Jul 3 19:42:49 2012 +0100
gallivm: Fix TGSI integer translation with AVX.
commit d71ff104085c196b16426081098fb0bde128ce4f
Author: José Fonseca <jfonseca@vmware.com>
Date: Fri Jun 29 15:17:41 2012 +0100
llvmpipe: Fix LLVM JIT linear path.
It was not working properly because it was looking at the JIT function
before it was actually compiled.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
commit a94df0386213e1f5f9a6ed470c535f9688ec0a1b
Author: José Fonseca <jfonseca@vmware.com>
Date: Thu Jun 28 18:07:10 2012 +0100
gallivm: Refactor lp_build_broadcast(_scalar) to share code.
Doesn't really change the generated assembly, but produces more compact IR,
and of course, makes code more consistent.
Reviewed-by: Brian Paul <brianp@vmware.com>
commit 66712ba2731fc029fa246d4fc477d61ab785edb5
Author: José Fonseca <jfonseca@vmware.com>
Date: Wed Jun 27 17:30:13 2012 +0100
gallivm: Make LLVMContextRef a singleton.
There are any places inside LLVM that depend on it. Too many to attempt
to fix.
Reviewed-by: Brian Paul <brianp@vmware.com>
commit ff5fb7897495ac263f0b069370fab701b70dccef
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu Jun 28 18:15:27 2012 +0200
gallivm: don't use 8-wide texture fetch in aos path
This appears to be a slight loss usually.
There are probably several reasons for that:
- fetching itself is scalar
- filtering is pure int code hence needs splitting anyway, same
for the final texel offset calculations
- texture wrap related code, which can be done 8-wide, is slightly more
complex with floats (with clamp_to_edge) and float operations generally
more costly hence probably not much faster overall
- the code needed to split when encountering different mip levels for the
quads, adding complexity
So, just split always for aos path (but leave it 8-wide for soa, since we
do 8-wide filtering there when possible).
This should certainly be revisited if we'd have avx2 support.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
commit ce8032b43dcd8e8d816cbab6428f54b0798f945d
Author: Roland Scheidegger <sroland@vmware.com>
Date: Wed Jun 27 18:41:19 2012 +0200
gallivm: (trivial) don't extract fparts variable if not needed
Did not have any consequences but unnecessary.
commit aaa9aaed8f80dc282492f62aa583a7ee23a4c6d5
Author: Roland Scheidegger <sroland@vmware.com>
Date: Wed Jun 27 18:09:06 2012 +0200
gallivm: fix precision issue in aos linear int wrap code
now not just passes at a quick glance but also with piglit...
If we do the wrapping with floats, we also need to set the
weights accordingly. We can potentially end up with different
(integer) coordinates than what the integer calculations would
have chosen, which means the integer weights calculated previously
in this case are completely wrong. Well at least that's what I think
happens, at least recalculating the weights helps.
(Some day really should refactor all the wrapping, so we do whatever is
fastest independent of 16bit int aos or 32bit float soa filtering.)
Reviewed-by: José Fonseca <jfonseca@vmware.com>
commit fd6f18588ced7ac8e081892f3bab2916623ad7a2
Author: José Fonseca <jfonseca@vmware.com>
Date: Wed Jun 27 11:15:53 2012 +0100
gallium/util: Fix parsing of options with underscore.
For example
GALLIVM_DEBUG=no_brilinear
which was being parsed as two options, "no" and "brilinear".
commit 09a8f809088178a03e49e409fa18f1ac89561837
Author: James Benton <jbenton@vmware.com>
Date: Tue Jun 26 15:00:14 2012 +0100
gallivm: Added a generic lp_build_print_value which prints a LLVMValueRef.
Updated lp_build_printf to share common code.
Removed specific lp_build_print_vecX.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
commit e59bdcc2c075931bfba2a84967a5ecd1dedd6eb0
Author: José Fonseca <jfonseca@vmware.com>
Date: Wed May 16 15:00:23 2012 +0100
draw,llvmpipe: Avoid named struct types on LLVM 3.0 and later.
Starting with LLVM 3.0, named structures are meant not for debugging, but
for recursive data types, previously also known as opaque types.
The recursive nature of these types leads to several memory management
difficulties. Given that we don't actually need recursive types, avoid
them altogether.
This is an attempt to address fdo bugs 41791 and 44466. The issue is
somewhat random so there's no easy way to check how effective this is.
Cherry-picked from 9af1ba565d
commit df6070f618a203c7a876d984c847cde4cbc26bdb
Author: Roland Scheidegger <sroland@vmware.com>
Date: Wed Jun 27 14:42:53 2012 +0200
gallivm: (trivial) fix typo in faster aos linear int wrap code
no longer crashes, now REALLY tested.
commit d8f98dce452c867214e6782e86dc08562643c862
Author: Roland Scheidegger <sroland@vmware.com>
Date: Tue Jun 26 18:20:58 2012 +0200
llvmpipe: (trivial) remove bogus optimization for float aos repeat wrap
This optimization for nearest filtering on the linear path generated
likely bogus results, and the int path didn't have any optimizations
there since the only shader using force_nearest apparently uses
clamp_to_edge not repeat wrap anyway.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
commit c4e271a0631087c795e756a5bb6b046043b5099d
Author: Roland Scheidegger <sroland@vmware.com>
Date: Tue Jun 26 23:01:52 2012 +0200
gallivm: faster repeat wrap for linear aos path too
Even if we already have scaled integer coords, it's way faster to use
the original float coord (plus some conversions) rather than use URem.
The choice of what to do for texture wrapping is not really tied to int
aos or float soa filtering though for some modes there can be some gains
(because of easier weight calculations).
Reviewed-by: José Fonseca <jfonseca@vmware.com>
commit 1174a75b1806e92aee4264ffe0ffe7e70abbbfa3
Author: Roland Scheidegger <sroland@vmware.com>
Date: Tue Jun 26 14:39:22 2012 +0200
gallivm: improve npot tex wrap repeat in linear soa path
URem gets translated into series of scalar divisions so
just about anything else is faster.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
commit f849ffaa499ed96fa0efd3594fce255c7f22891b
Author: Roland Scheidegger <sroland@vmware.com>
Date: Tue Jun 26 00:40:35 2012 +0100
gallivm: (trivial) fix near-invisible shift-space typo
I blame the keyboard.
commit 5298a0b19fe672aebeb70964c0797d5921b51cf0
Author: Roland Scheidegger <sroland@vmware.com>
Date: Mon Jun 25 16:24:28 2012 +0200
gallivm: add new intrinsic helper to deal with arbitrary vector length
This helper will split vectors which are too large for the hw, or expand
them if they are too small, so a caller of a function using intrinsics which
uses such sizes need not split (or expand) the vectors manually and the
function will still use the intrinsic instead of dropping back to generic
llvm code. It can also accept scalars for use with pseudo-vector intrinsics
(only useful for float arguments, all x86 scalar simd float intrinsics use
4vf32).
Only used for lp_build_min/max() for now (also added the scalar float case
for these while there). (Other basic binary functions could use it easily,
whereas functions with a different interface would need different helpers.)
Expanding vectors isn't widely used, because we always try to use
build contexts with native hw vector sizes. But it might (or not) be nicer
if this wouldn't need to be done, the generated code should in theory stay
the same (it does get hit by lp_build_rho though already since we
didn't have a intrinsic for the scalar lp_build_max case before).
v2: incorporated Brian's feedback, and also made the scalar min/max case work
instead of crash (all scalar simd float intrinsics take 4vf32 as argument,
probably the reason why it wasn't used before).
Moved to lp_bld_intr based on José's request, and passing intrinsic size
instead of length.
Ideally we'd derive the source type info from the passed in llvm value refs
and process some llvmtype return type so we could handle intrinsics where
the source and destination type isn't the same (like float/int conversions,
packing instructions) but that's a bit too complicated for now.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
commit 01aa760b99ec0b2dc8ce57a43650e83f8c1becdf
Author: Roland Scheidegger <sroland@vmware.com>
Date: Mon Jun 25 16:19:18 2012 +0200
gallivm: (trivial) increase max code size for shader disassembly
64kB was just short of what I needed (which caused a crash) hence
increase to 96kB (should probably be smarter about that).
commit 74aa739138d981311ce13076388382b5e89c6562
Author: Roland Scheidegger <sroland@vmware.com>
Date: Mon Jun 25 11:53:29 2012 +0100
gallivm: simplify aos float tex wrap repeat nearest
just handle pot and npot the same. The previous pot handling
ended up with exactly the same instructions plus 2 more (leave it
in the soa path though since it is probably still cheaper there).
While here also fix a issue which would cause a crash after an assert.
commit 0e1e755645e9e49cfaa2025191e3245ccd723564
Author: Roland Scheidegger <sroland@vmware.com>
Date: Mon Jun 25 11:29:24 2012 +0100
gallivm: (trivial) skip floor rounding in ifloor when not signed
This was only done for the non-sse41 case before, but even with
sse41 this is obviously unnecessary (some callers already call
itrunc in this case anyway but some might not).
commit 7f01a62f27dcb1d52597b24825931e88bae76f33
Author: Roland Scheidegger <sroland@vmware.com>
Date: Mon Jun 25 11:23:12 2012 +0100
gallivm: (trivial) fix bogus comments
commit 5c85be25fd82e28490274c468ce7f3e6e8c1d416
Author: José Fonseca <jfonseca@vmware.com>
Date: Wed Jun 20 11:51:57 2012 +0100
translate: Free elt8_func/elt16_func too.
These were leaking.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
commit 0ad498f36fb6f7458c7cffa73b6598adceee0a6c
Author: Roland Scheidegger <sroland@vmware.com>
Date: Tue Jun 19 15:55:34 2012 +0200
gallivm: fix bug for tex wrap repeat with linear sampling in aos float path
The comparison needs to be against length not length_minus_one, otherwise
the max texel is never chosen (for the second coordinate).
Fixes piglit texwrap-1D-npot-proj (and 2D/3D versions).
Reviewed-by: José Fonseca <jfonseca@vmware.com>
commit d1ad65937c5b76407dc2499b7b774ab59341209e
Author: Roland Scheidegger <sroland@vmware.com>
Date: Tue Jun 19 16:13:43 2012 +0200
gallivm: simplify soa tex wrap repeat with npot textures and no mip filtering
Similar to what is already done in aos sampling for the float path (but not
the int path since we don't get normalized float coordinates there).
URem is expensive and the calculation is done trivially with
normalized floats instead (at least with sse41-capable cpus).
(Some day should probably do the same for the mip filter path but it's much
more complicated there hence the gain is smaller.)
Reviewed-by: José Fonseca <jfonseca@vmware.com>
commit e1e23f57ba9b910295c306d148f15643acc3fc83
Author: Roland Scheidegger <sroland@vmware.com>
Date: Mon Jun 18 20:38:56 2012 +0200
llvmpipe: (trivial) remove duplicated function declaration
Reviewed-by: José Fonseca <jfonseca@vmware.com>
commit 07ca57eb09e04c48a157733255427ef5de620861
Author: Roland Scheidegger <sroland@vmware.com>
Date: Mon Jun 18 20:37:34 2012 +0200
llvmpipe: destroy setup variants on context destruction
lp_delete_setup_variants() used to be called in garbage collection,
but this no longer exists hence the setup shaders never got freed.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
commit ed0003c633859a45f9963a479f4c15ae0ef1dca3
Author: Roland Scheidegger <sroland@vmware.com>
Date: Mon Jun 18 16:25:29 2012 +0100
gallivm: handle different ilod parts for multiple quad sampling
This fixes filtering when the integer part of the lod is not the same
for all quads. I'm not fully convinced of that solution yet as it just
splits the vector if the levels to be sampled from are different.
But otherwise we'd need to do things like some minify steps, and getting
mip level base address separately anyway hence it wouldn't really look
like much of a win (and making the code even more complex).
This should now give identical results to single quad sampling.
commit 8580ac4cfc43a64df55e84ac71ce1a774d33c0d2
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu Jun 14 18:14:47 2012 +0200
gallivm: de-duplicate sample code common to soa and aos sampling
There doesn't seem to be any reason why this code dealing with cube face
selection, lod and mip level calculation is separate in aos and
soa sampling, and I am sick of having it to change in both places.
commit fb541e5f957408ce305b272100196f1e12e5b1e8
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu Jun 14 18:15:41 2012 +0200
gallivm: do mip filtering with per quad lod_fpart
This gives better results for mip filtering, though the generated code might
not be optimal. For now it also creates some artifacts if the lod_ipart isn't
the same for all quads, since instead of using the same mip weight for all
quads as previously (which just caused non-smooth gradients) this now will
use the right weights but with the wrong mip level in this case (can easily
be seen with things like texfilt, mipmap_tunnel).
v2: use logic helper suggested by José, and fix issue with negative lod_fpart
values
commit f1cc84eef7d826a20fab6cd8ccef9a275ff78967
Author: Roland Scheidegger <sroland@vmware.com>
Date: Wed Jun 13 18:35:25 2012 +0200
gallivm: (trivial) fix bogus assert in lp_build_unpack_broadcast_aos_scalars
commit 7c17dbae8ae290df9ce0f50781a09e8ed640c044
Author: James Benton <jbenton@vmware.com>
Date: Tue Jun 12 12:11:14 2012 +0100
util: Reimplement half <-> float conversions.
Removed u_half.py used to generate the table for previous method.
Previous implementation of float to half conversion was faulty for
denormalised and NaNs and would require extra logic to fix,
thus making the speedup of using tables irrelevant.
commit 7762f59274070e1dd4b546f5cb431c2eb71ae5c3
Author: James Benton <jbenton@vmware.com>
Date: Tue Jun 12 12:12:16 2012 +0100
tests: Updated tests to properly handle NaN for half floats.
commit fa94c135aea5911fd93d5dfb6e6f157fb40dce5e
Author: Roland Scheidegger <sroland@vmware.com>
Date: Mon Jun 11 18:33:10 2012 +0200
gallivm: do mip level calculations per quad
This is the final piece which shouldn't change the rendering output yet.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
commit 23cbeaddfe03c09ca18c45d28955515317ffcf4c
Author: Roland Scheidegger <sroland@vmware.com>
Date: Sat Jun 9 00:54:21 2012 +0200
gallivm: do per-quad cube face selection
Doesn't quite fix the piglit cubemap test (not sure why actually)
but doing per-quad face selection is doing the right thing and
definitely an improvement.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
commit abfb372b3702ac97ac8b5aa80ad1b94a2cc39d33
Author: Roland Scheidegger <sroland@vmware.com>
Date: Mon Jun 11 18:22:59 2012 +0200
gallivm: do all lod calculations per quad
Still no functional change but lod is now converted to scalar after
lod calculations.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
commit 519368632747ae03feb5bca9c655eccbc5b751b4
Author: James Benton <jbenton@vmware.com>
Date: Tue May 22 16:46:10 2012 +0100
gallivm: Added support for half-float to float conversion in lp_build_conv.
Updated various utility functions to support this change.
commit 135b4d683a4c95f7577ba27b9bffa4a6fbd2c2e7
Author: James Benton <jbenton@vmware.com>
Date: Tue May 22 16:02:46 2012 +0100
gallivm: Added function for half-float to float conversion.
Updated lp_build_format_aos_array to support half-float source.
commit 37d648827406a20c5007abeb177698723ed86673
Author: James Benton <jbenton@vmware.com>
Date: Tue May 22 14:55:18 2012 +0100
util: Updated u_format_tests to rigidly test half-float boundary values.
commit 2ad18165d96e578aa9046df7c93cb1c3284d8c6b
Author: James Benton <jbenton@vmware.com>
Date: Tue May 22 14:54:16 2012 +0100
llvmpipe: Updated lp_test_format to properly handle Inf/NaN results.
commit 78740acf25aeba8a7d146493dd5c966e22c27b73
Author: James Benton <jbenton@vmware.com>
Date: Tue May 22 14:53:30 2012 +0100
util: Added functions for checking NaN / Inf for double and half-floats.
commit 35e9f640ae01241f9e0d67fe893bbbf564c05809
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu May 24 21:05:13 2012 +0200
gallivm: Fix calculating rho for 3d textures for the single-quad case
Discovered by accident, this looks like a very old typo bug.
commit fc1220c636326536fd0541913154e62afa7cd1d8
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu May 24 21:04:59 2012 +0200
gallivm: do calcs per-quad in lp_build_rho
Still convert to scalar at the end of the function.
commit 50a887ffc550bf310a6988fa2cea5c24d38c1a41
Author: Roland Scheidegger <sroland@vmware.com>
Date: Mon May 21 23:21:50 2012 +0200
gallivm: (trivial) return scalar in lp_build_extract_range for length 1 vectors
Our type system on top of llvm's one doesn't generally support vectors of
length 1, instead using scalars. So we should return a scalar from this
function instead of having to bitcast the vector with length 1 later elsewhere.
commit 80c71c621f9391f0f9230460198d861643324876
Author: James Benton <jbenton@vmware.com>
Date: Tue May 22 17:49:15 2012 +0100
draw: Fixed bad merge error
commit c47401cfad0c9167de20ff560654f533579f452c
Author: James Benton <jbenton@vmware.com>
Date: Tue May 22 15:29:30 2012 +0100
draw: Updated store_clip to store whole vectors instead of individual elements.
commit 2d9c1ad74b0b0b41861fffcecde39f09cc27f1cf
Author: James Benton <jbenton@vmware.com>
Date: Tue May 22 15:28:32 2012 +0100
gallivm: Added lp_build_fetch_rgba_aos_array.
A version of lp_build_fetch_rgba_aos which is targeted at simple array formats.
Reads the whole vector from memory in one, instead of reading each element
individually.
Tested with mesa tests and demos.
commit ff7805dc2b6ef6d8b11ec4e54aab1633aef29ac8
Author: James Benton <jbenton@vmware.com>
Date: Tue May 22 15:27:40 2012 +0100
gallivm: Added lp_build_pad_vector.
This function pads a vector with undef to a desired length.
commit 701f50acef24a2791dabf4730e5b5687d6eb875d
Author: James Benton <jbenton@vmware.com>
Date: Fri May 18 17:27:19 2012 +0100
util: Added util_format_is_array.
This function checks whether a format description is in a simple array format.
commit 5e0a7fa543dcd009de26f34a7926674190fa6246
Author: James Benton <jbenton@vmware.com>
Date: Fri May 18 19:13:47 2012 +0100
draw: Removed draw_llvm_translate_from and draw/draw_llvm_translate.c.
This is "replaced" by adding an optimised path in lp_build_fetch_rgba_aos
in an upcoming patch.
commit 8c886d6a7dd3fb464ecf031de6f747cb33e5361d
Author: James Benton <jbenton@vmware.com>
Date: Wed May 16 15:02:31 2012 +0100
draw: Modified store_aos to write the vector as one, not individual elements.
commit 37337f3d657e21dfd662c7b26d61cb0f8cfa6f17
Author: James Benton <jbenton@vmware.com>
Date: Wed May 16 14:16:23 2012 +0100
draw: Changed aos_to_soa to use lp_build_transpose_aos.
commit bd2b69ce5d5c94b067944d1dcd5df9f8e84548f1
Author: James Benton <jbenton@vmware.com>
Date: Fri May 18 19:14:27 2012 +0100
draw: Changed soa_to_aos to use lp_build_transpose_aos.
commit 0b98a950d29a116e82ce31dfe7b82cdadb632f2b
Author: James Benton <jbenton@vmware.com>
Date: Fri May 18 18:57:45 2012 +0100
gallivm: Added lp_build_transpose_aos which converts between aos and soa.
commit 69ea84531ad46fd145eb619ed1cedbe97dde7cb5
Author: James Benton <jbenton@vmware.com>
Date: Fri May 18 18:57:01 2012 +0100
gallivm: Added lp_build_interleave2_half aimed at AVX unpack instructions.
commit 7a4cb1349dd35c18144ad5934525cfb9436792f9
Author: José Fonseca <jfonseca@vmware.com>
Date: Tue May 22 11:54:14 2012 +0100
gallivm: Fix build on Windows.
MC-JIT not yet supported there.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
commit afd105fc16bb75d874e418046b80d9cc578818a1
Author: James Benton <jbenton@vmware.com>
Date: Fri May 18 16:17:26 2012 +0100
llvmpipe: Added a error counter to lp_test_conv.
Useful for keeping track of progress when fixing errors!
Signed-off-by: José Fonseca <jfonseca@vmware.com>
commit b644907d08c10a805657841330fc23db3963d59c
Author: James Benton <jbenton@vmware.com>
Date: Fri May 18 16:16:46 2012 +0100
llvmpipe: Changed known failures in lp_test_conv.
To comply with the recent fixes to lp_bld_conv.
Signed-off-by: José Fonseca <jfonseca@vmware.com>
commit d7061507bd94f6468581e218e61261b79c760d4f
Author: James Benton <jbenton@vmware.com>
Date: Fri May 18 16:14:38 2012 +0100
llvmpipe: Added fixed point types tests to lp_test_conv.
Signed-off-by: José Fonseca <jfonseca@vmware.com>
commit 146b3ea39b4726dbe125ac666bd8902ea3d6ca8c
Author: James Benton <jbenton@vmware.com>
Date: Fri May 18 16:26:35 2012 +0100
llvmpipe: Changed lp_test_conv src/dst alignment to be correct.
Now based on the define rather than a fixed number.
Signed-off-by: José Fonseca <jfonseca@vmware.com>
commit f3b57441f834833a4b142a951eb98df0aa874536
Author: James Benton <jbenton@vmware.com>
Date: Fri May 18 16:06:44 2012 +0100
gallivm: Fixed erroneous optimisation in lp_build_min/max.
Previously assumed normalised was 0 to 1, but it can be -1 to 1
if type is signed.
Tested with lp_test_conv and lp_test_format, reduced errors.
Signed-off-by: José Fonseca <jfonseca@vmware.com>
commit a0613382e5a215cd146bb277646a6b394d376ae4
Author: James Benton <jbenton@vmware.com>
Date: Fri May 18 16:04:49 2012 +0100
gallivm: Compensate for lp_const_offset in lp_build_conv.
Fixing a /*FIXME*/ to remove errors in integer conversion in lp_build_conv.
Tested using lp_test_conv and lp_test_format, reduced errors.
Signed-off-by: José Fonseca <jfonseca@vmware.com>
commit a3d2bf15ea345bc8a0664f8f441276fd566566f3
Author: James Benton <jbenton@vmware.com>
Date: Fri May 18 16:01:25 2012 +0100
gallivm: Fixed overflow in lp_build_clamped_float_to_unsigned_norm.
Tested with lp_test_conv and lp_test_format, reduced errors.
Signed-off-by: José Fonseca <jfonseca@vmware.com>
commit e7b1e76fe237613731fa6003b5e1601a2e506207
Author: José Fonseca <jfonseca@vmware.com>
Date: Mon May 21 20:07:51 2012 +0100
gallivm: Fix build with LLVM 2.6
Trivial, and useful.
commit d3c6bbe5c7f5ba1976710831281ab1b6a631082d
Author: José Fonseca <jfonseca@vmware.com>
Date: Tue May 15 17:15:59 2012 +0100
gallivm: Enable MCJIT/AVX with vanilla LLVM 3.1.
Add the necessary C++ glue, so that we don't need any modifications
to the soon to be released LLVM 3.1.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
commit 724a019a14d40fdbed21759a204a2bec8a315636
Author: José Fonseca <jfonseca@vmware.com>
Date: Mon May 14 22:04:06 2012 +0100
gallivm: Use HAVE_LLVM 0x0301 consistently.
commit af6991e2a3868e40ad599b46278551b794839748
Author: José Fonseca <jfonseca@vmware.com>
Date: Mon May 14 21:49:06 2012 +0100
gallivm: Add MCRegisterInfo.h to silence benign warnings about missing implementation.
Trivial.
commit 6f8a1d75458daae2503a86c6b030ecc4bb494e23
Author: Vinson Lee <vlee@freedesktop.org>
Date: Mon Apr 2 22:14:15 2012 -0700
gallivm: Pass in a MCInstrInfo to createMCInstPrinter on llvm-3.1.
llvm-3.1svn r153860 makes MCInstrInfo available to the MCInstPrinter.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
commit 62555b6ed8760545794f83064e27cddcb3ce5284
Author: Vinson Lee <vlee@freedesktop.org>
Date: Tue Mar 27 21:51:17 2012 -0700
gallivm: Fix method overriding in raw_debug_ostream.
Use matching type qualifers to avoid method hiding.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
commit 6a9bd784f4ac68ad0a731dcd39e5a3c39989f2be
Author: Vinson Lee <vlee@freedesktop.org>
Date: Tue Mar 13 22:40:52 2012 -0700
gallivm: Fix createOProfileJITEventListener namespace with llvm-3.1.
llvm-3.1svn r152620 refactored the OProfile profiling code.
createOProfileJITEventListener was moved from the llvm namespace to the
llvm::JITEventListener namespace.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
commit b674955d39adae272a779be85aa1bd665de24e3e
Author: Vinson Lee <vlee@freedesktop.org>
Date: Mon Mar 5 22:00:40 2012 -0800
gallivm: Pass in a MCRegisterInfo to MCInstPrinter on llvm-3.1.
llvm-3.1svn r152043 changes createMCInstPrinter to take an additional
MCRegisterInfo argument.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
commit 11ab69971a8a31c62f6de74905dbf8c02884599f
Author: Vinson Lee <vlee@freedesktop.org>
Date: Wed Feb 29 21:20:53 2012 -0800
Revert "gallivm: Change getExtent and readByte to non-const with llvm-3.1."
This reverts commit d5a6c17254.
llvm-3.1svn r151687 makes MemoryObject accessor members const again.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
commit 339960c82d2a9f5c928ee9035ed31dadb7f45537
Author: Roland Scheidegger <sroland@vmware.com>
Date: Mon May 14 16:19:56 2012 +0200
gallivm: (trivial) fix assertion failure for mipmapped 1d textures
In lp_build_rho, we may end up with a 1-element vector (for mipmapped 1d
textures), but in this case we require the type to be a non-vector type,
so need a cast.
commit 9d73edb727bd6d196030dc3026b7bf0c574b3e19
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu May 10 18:12:07 2012 +0200
gallivm: prepare for per-quad lod calculations for large vectors
to be able to handle multiple quads at once in texture sampling and still
do lod calculations per quad, it is necessary to get the per-quad derivatives
into the lp_build_rho function.
Until now these derivative values were just scalars, which isn't going to work.
So we now use vectors, and since the interface needs to change we also do some
different (slightly more efficient) packing of the values.
For 8-wide vectors the packed derivative values for 3 coords would look like
this, this scales to a arbitrary (multiple of 4) vector size:
ds1dx ds1dy dt1dx dt1dy ds2dx ds2dy dt2dx dt2dy
dr1dx dr1dy _____ _____ dr2dx dr2dy _____ _____
The second vector will be unused for 1d and 2d textures.
To facilitate future changes the derivative values are put into a struct, since
quite some functions just pass these values through.
The generated code seems to be very slightly better for 2d textures (with
4-wide vectors) than before with sse2 (if you have a cpu with physical 128bit
simd units - otherwise it's probably not a win).
v2: suggestions from José, rename variables, add comments, use swizzle helper
commit 0aa21de0d31466dac77b05c97005722e902517b8
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu May 10 18:10:31 2012 +0200
gallivm: add undefined swizzle handling to lp_build_swizzle_aos
This is useful for vectors with "holes", it lets llvm choose the most
efficient shuffle instructions if some elements aren't needed without having to
worry what elements to manually pick otherwise.
commit 00faf3f370e7ce92f5ef51002b0ea42ef856e181
Author: José Fonseca <jfonseca@vmware.com>
Date: Fri May 4 17:25:16 2012 +0100
gallivm: Get the LLVM IR optimization passes before JIT compilation.
MC-JIT engine compiles the module immediately on creation, so the optimization
passes were being run too late.
So now we create a target data layout from a string, that matches the
ABI parameters reported by the compiler.
The backend optimization passes were always been run, so the performance
improvement is modest (3% on multiarb mesa demo).
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
commit 40a43f4e2ce3074b5ce9027179d657ebba68800a
Author: Roland Scheidegger <sroland@vmware.com>
Date: Wed May 2 16:03:54 2012 +0200
gallivm: (trivial) fix wrong define used in lp_build_pack2
should fix stack-smashing crashes.
commit e6371d0f4dffad4eb3b7a9d906c23f1c88a2ab9e
Author: Roland Scheidegger <sroland@vmware.com>
Date: Mon Apr 30 21:25:29 2012 +0200
gallivm: add perf warnings when not using intrinsics with 256bit vectors
Helper functions using integer sse2 intrinsics could split the vectors with AVX
instead of using generic fallback (which should be faster).
We don't actually expect to hit these paths (hence don't fix them up to actually
do the vector splitting) so just emit warnings (for those functions where it's
obvious doing split/intrinsic is faster than using generic path).
Only emit warnings for 256bit vectors since we _really_ don't expect to hit
arbitrary large vectors which would affect a lot more functions.
The warnings do not actually depend on avx since the same logic applies to
plain sse2 too (but of course again there's _really_ no reason we should hit
these functions with 256bit vectors without avx).
commit 8a9ea701ea7295181e846c6383bf66a5f5e47637
Author: Roland Scheidegger <sroland@vmware.com>
Date: Tue May 1 20:37:07 2012 +0200
gallivm: split vectors manually for avx in lp_build_pack2 (v2)
There's 2 reasons for this:
First, there's a llvm bug (fixed in 3.1) which generates tons of byte
inserts/extracts otherwise, and second, more importantly, we want to use
pack intrinsics instead of shuffles.
We do this in lp_build_pack2 and not the calling code (aos sample path)
because potentially other callers might find that useful too, even if
for larger sequences of code using non-native vector sizes it might be
better to manually split vectors.
This should boost texture performance in the aos path considerably.
v2: fix issues with intrinsics types with old llvm
commit 27ac5b48fa1f2ea3efeb5248e2ce32264aba466e
Author: Roland Scheidegger <sroland@vmware.com>
Date: Tue May 1 20:26:22 2012 +0200
llvmpipe: refactor lp_build_pack2 (v2)
prettify, and it's unnecessary to assert when there's no intrinsic due to
unsupported bit width - the shuffle path will work regardless.
In contrast lp_build_packs2, should only rely on lp_build_pack2 doing the
clamping for element sizes for which there is a sse2 intrinsic.
v2: fix bug spotted by Jose regarding the intrinsic type for packusdw
on old llvm versions.
commit ddf279031f0111de4b18eaf783bdc0a1e47813c8
Author: Roland Scheidegger <sroland@vmware.com>
Date: Tue May 1 20:13:59 2012 +0200
gallivm: add src width check in lp_build_packs2()
not doing so would skip clamping even if no sse2 pack instruction is
available, which is incorrect (in theory only, such widths would also always
hit a (unnecessary) assertion in lp_build_pack2().
commit e7f0ad7fe079975eae7712a6e0c54be4fae0114b
Author: Roland Scheidegger <sroland@vmware.com>
Date: Fri Apr 27 15:57:00 2012 +0200
gallivm: (trivial) fix crash-causing typo for npot textures with avx
commit 28a9d7f6f655b6ec508c8a3aa6ffefc1e79793a0
Author: Roland Scheidegger <sroland@vmware.com>
Date: Wed Apr 25 19:38:45 2012 +0200
gallivm: (trivial) remove code mistakenly added twice.
commit d5926537316f8ff67ad0a52e7242f7c5478d919b
Author: Roland Scheidegger <sroland@vmware.com>
Date: Tue Apr 24 21:16:15 2012 +0200
gallivm: add a new avx aos sample path (v2)
Try to avoid mixing float and int address calculations. This does texture wrap
modes with floats, and then the offset calculations still with ints (because
of lack of precision with floats, though we could do some effort to make it work
with not too large (16MB) textures).
This also handles wrap repeat mode with npot-sized textures differently than
either the old soa or aos int path (likely way faster but untested).
Otherwise the actual address wrap code is largely similar to the soa path (not
quite the same as this one also has some int code), it should get used by avx
soa sampling later as well but doesn't handle more complex address modes yet
(this will also have the benefit that we can use aos sampling path for all
texture address modes).
Generated code for that looks reasonable, but still does not split vectors
explicitly for fetch/filter which means still get hit by llvm (fixed upstream)
which generates hundreds of pinsrb/pextrb instead of two shuffles.
It is not obvious though if it's much of a win over just doing address calcs
4-wide but with ints, even if it is definitely much less instructions on avx.
piglit's texwrap seems to look exactly the same but doesn't test
neither the non-normalized nor the npot cases.
v2: fix comments, prettify based on Brian's and Jose's feedback.
commit bffecd22dea66fb416ecff8cffd10dd4bdb73fce
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu Apr 19 01:58:29 2012 +0200
gallivm: refactor aos lp_build_sample_image_nearest/linear
split them up to separate address calculations and fetching/filtering.
Need this for being able to do 8-wide float address calcs and 4-wide
fetch/filter later (for avx). Plus the functions were very big scary monsters
anyway (in particular lp_build_sample_image_linear).
commit a80b325c57529adddcfa367f96f03557725c4773
Author: Roland Scheidegger <sroland@vmware.com>
Date: Mon Apr 16 17:17:18 2012 +0200
gallivm: fix lp_build_resize when truncating width but expanding vector size
Missed this case which I thought was impossible - the assertion for it was
right after the division by zero...
(AoS) texture sampling may ask us to do this, for things like 8 4x32int
vectors to 1 32x8int vector conversion (eventually, we probably don't want
this to happen).
commit f9c8337caa3eb185830d18bce8b95676a065b1d7
Author: Roland Scheidegger <sroland@vmware.com>
Date: Sat Apr 14 18:00:59 2012 +0200
gallivm: fix cube maps with larger vectors
This makes the branchless cube face selection code work with larger vectors.
Because the complexity is quite high (cannot really be improved it seems,
per-face selection would reduce complexity a lot but this leads to errors
unless the derivatives are calculated all from the same face which almost
doubles the work to be done) it is still slower than the branching version,
hence only enable this with large vectors.
It doesn't actually do per-quad face selection yet (only makes sense with
matching lod selection, in fact it will select the same face for all pixels
based on the average of the first four pixels for now) but only different
shuffles are required to make it work (the branching version actually should
work with larger vectors too now thanks to the improved horizontal add but of
course it cannot be extended to really select the face per-quad unless doing
branching per quad).
commit 7780c58869fc9a00af4f23209902db7e058e8a66
Author: Roland Scheidegger <sroland@vmware.com>
Date: Fri Mar 30 21:11:12 2012 +0100
llvmpipe: (trivial) fix compiler warning
and also clarify comment regarding availability of popcnt instruction.
commit a266dccf477df6d29a611154e988e8895892277e
Author: Roland Scheidegger <sroland@vmware.com>
Date: Fri Mar 30 14:21:07 2012 +0100
gallivm: remove unneeded members in lp_build_sample_context
Minor cleanup, the texture width, height, depth aren't accessed in their
scalar form anywhere. Makes it more obvious those values should probably be
fetched already vectorized (but this requires more invasive changes)...
commit b678c57fb474e14f05e25658c829fc04d2792fff
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu Mar 29 15:53:55 2012 +0100
gallivm: add a helper for concatenating vectors
Similar to the extract_range helper intended to get around slow code generated
by llvm for 128bit insertelements.
Concatenating two 128bit vectors this way will result in a single vinsertf128
operation rather than two 64bit stores plus one 128bit load, though it might be
mildly useful for other purposes as well.
commit 415ff228bcd0cf5e44a4c15350a661f0f5520029
Author: Roland Scheidegger <sroland@vmware.com>
Date: Wed Mar 28 19:41:15 2012 +0100
gallivm: add a custom 2x8f->1x16ub avx conversion path
Similar to the existing 4x4f->1x16ub sse2 path, shaves off a couple
instructions (min/max mostly) because it relies on pack intrinsics clamping.
commit 78c08fc89f8fbcc6dba09779981b1e873e2a0299
Author: Roland Scheidegger <sroland@vmware.com>
Date: Wed Mar 28 18:44:07 2012 +0100
gallivm: add avx arithmetic intrinsics
Add all avx intrinsics for arithmetic functions (with the exception
of the horizontal add function which needs another look).
Seems to pass basic tests.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
commit a586caa2800aa5ce54c173f7c0d4fc48153dbc4e
Author: Roland Scheidegger <sroland@vmware.com>
Date: Wed Mar 28 15:31:35 2012 +0100
gallivm: add avx logic intrinsics
Add the blend intrinsics for 8-wide float and 4-wide double vectors.
Since we lack 256bit int instructions these are used for int vectors as well,
though obviously not for byte or word element values.
The comparison intrinsics aren't extended for avx since these are only used
for pre-2.7 llvm versions.
commit 70275e4c13c89315fc2560a4c488c0e6935d5caf
Author: Roland Scheidegger <sroland@vmware.com>
Date: Wed Mar 28 00:40:53 2012 +0100
gallivm: new helper function for extract shuffles.
Based on José's idea as we can need that in a couple places.
Note that such shuffles should not be used lightly, since data layout
of <4 x i8> is different to <16 x i8> for instance, hence might cause
data rearrangement.
commit 4d586dbae1b0c55915dda1759d2faea631c0a1c2
Author: Roland Scheidegger <sroland@vmware.com>
Date: Tue Mar 27 18:27:25 2012 +0100
gallivm: (trivial) don't overallocate shuffle variable
using wrong define meant huge array...
commit 06b0ec1f6d665d98c135f9573ddf4ba04b2121ad
Author: Roland Scheidegger <sroland@vmware.com>
Date: Tue Mar 27 17:54:20 2012 +0100
gallivm: don't do per-element extract/insert for vector element resize
Instead of doing per-element extract/insert if the src vectors
and dst vector differ in total size (which generates atrocious code)
first change the src vectors size by using shuffles to destination
vector size.
We can still do better than that on AVX for packing to color buffer
(by exploiting pack intrinsics characteristics hence eleminating the
need for some clamps) but this already generates much better code.
v2: incorporate feedback from José, Keith and use shuffle instead of
bitcasts/extracts. Due to llvm deficiencies the latter cause all data
to get moved to GPRs and back in pieces (even though the data in the
regs actually stays the same...).
commit c9970d70e05f95d3f52fe7d2cd794176a52693aa
Author: Roland Scheidegger <sroland@vmware.com>
Date: Fri Mar 23 19:33:19 2012 +0000
gallivm: fix bug in simple position interpolation
Accidental use of position attribute instead of just pixel coordinates.
Caused failures in piglit glsl-fs-ceil and glsl-fs-floor.
commit d0b6fcdb008d04d7f73d3d725615321544da5a7e
Author: Roland Scheidegger <sroland@vmware.com>
Date: Fri Mar 23 15:31:14 2012 +0000
gallivm: fix emission of ceil opcode
lp_build_ceil seems more appropriate than lp_build_trunc.
This seems to be never hit though someone performs some ceil
to floor magic.
commit d97fafed7e62ffa6bf76560a92ea246a1a26d256
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu Mar 22 11:46:52 2012 +0000
gallivm: new vectorized path for cubemap calculations
should be faster when adapted to multiple quads as only selection masks need to be different.
The code is more or less a per-pixel version adapted to only do it per quad.
A per pixel version would be much simpler (could drop 2 selects, 6 broadcasts and the messy
horizontal add of 3 vectors at the expense of only 2 more absolute value instructions -
would also just work for arbitary large vectors).
This version doesn't yet work with larger vectors because the horizontal add isn't adjusted
to be able to work with 2x4 vectors (and also because face selection wouldn't be done per
quad just per block though that would be only a correctness issue just as with lod selection).
The downside is this code is quite a bit slower. On a Core2 it can be sped up by disabling the
hw blend instructions for selection and using logicop fallbacks instead, but it is still slower
than the old code, hence leave that in for now. Probably will chose one or the other version
based on vector length in the end.
commit b375fbb18a3fd46859b7fdd42f3e9908ea4ff9a3
Author: Roland Scheidegger <sroland@vmware.com>
Date: Wed Mar 21 14:42:29 2012 +0000
gallivm: fix optimized occlusion query intrinsic name
commit a9ba0a3b611e48efbb0e79eb09caa85033dbe9a2
Author: José Fonseca <jfonseca@vmware.com>
Date: Wed Mar 21 16:19:43 2012 +0000
draw,gallivm,llvmpipe: Call gallivm_verify_function everywhere.
commit f94c2238d2bc7383e088b8845b7410439a602071
Author: Roland Scheidegger <sroland@vmware.com>
Date: Tue Mar 20 18:54:10 2012 +0000
gallivm: optimize calculations for cube maps a bit
this does some more vectorized calculations and uses horizontal adds if possible.
A definite win with sse3 otherwise it doesn't seem to make much of a difference.
In any case this is arithmetically identical, cannot handle larger vectors.
Should be useful as a reference point against larger vector version later...
commit 21a2c1cf3c8e1ac648ff49e59fdc0e3be77e2ebb
Author: Roland Scheidegger <sroland@vmware.com>
Date: Tue Mar 20 15:16:27 2012 +0000
llvmpipe: slight optimization of occlusion queries
using movmskps when available.
While this is slightly better for cpus without popcnt we should
really sum the vectors ourselves (it is also possible to cast to i4 before
doing the popcnt but that doesn't help that much neither since llvm
is using some optimized popcnt version for i32)
commit 5ab5a35f216619bcdf55eed52b0db275c4a06c1b
Author: Roland Scheidegger <sroland@vmware.com>
Date: Tue Mar 20 13:32:11 2012 +0000
llvmpipe: fix occlusion queries with larger vectors
need to adjust casts etc.
commit ff95e6fdf5f16d4ef999ffcf05ea6e8c7160b0d5
Author: José Fonseca <jfonseca@vmware.com>
Date: Mon Mar 19 20:15:25 2012 +0000
gallivm: Restore optimization passes.
commit 57b05b4b36451e351659e98946dae27be0959832
Author: Roland Scheidegger <sroland@vmware.com>
Date: Mon Mar 19 19:34:22 2012 +0000
llvmpipe: use existing min2 macro
commit bc9a20e19b4f600a439f45679451f2e87cd4b299
Author: Roland Scheidegger <sroland@vmware.com>
Date: Mon Mar 19 19:07:27 2012 +0000
llvmpipe: add some safeguards against really large vectors
As per José's suggestion, prevent things from blowing up if some cpu
would have 1024bit or larger vectors.
commit 0e2b525e5ca1c5bbaa63158bde52ad1c1564a3a9
Author: Roland Scheidegger <sroland@vmware.com>
Date: Mon Mar 19 18:31:08 2012 +0000
llvmpipe: fix mask generation for uberwide vectors
this was the only piece preventing 16-wide vectors from working
(apart from the LP_MAX_VECTOR_WIDTH define that is), which is the maximum
as we don't get more pixels in the fragment shader at once.
Hence adjust that so things could be tested properly with that size
even though there seems to be no practical value.
commit 3c8334162211c97f3a11c7f64e9e5a2a91ad9656
Author: Roland Scheidegger <sroland@vmware.com>
Date: Mon Mar 19 18:19:41 2012 +0000
llvmpipe: fix the simple interpolation method with larger vectors
so both methods actually _really_ work now. Makes textures look
nice with larger vectors...
commit 1cb0464ef8871be1778d43b0c56adf9c06843e2d
Author: Roland Scheidegger <sroland@vmware.com>
Date: Mon Mar 19 17:26:35 2012 +0000
llvmpipe: fix mask generation and position interpolation with 8-wide vectors
trivial bugs, with these things start to look somewhat reasonable.
Textures though have some swizzling issues it seems.
commit 168277a63ef5b72542cf063c337f2d701053ff4b
Author: Roland Scheidegger <sroland@vmware.com>
Date: Mon Mar 19 16:04:03 2012 +0000
llvmpipe: don't overallocate variables
we never have more than 16 (stamp size) / 4 (minimum possible vector size).
(With larger vectors those variables are still overallocated a bit.)
commit 409b54b30f81ed0aa9ed0b01affe15c72de9abd2
Author: Roland Scheidegger <sroland@vmware.com>
Date: Mon Mar 19 15:56:48 2012 +0000
llvmpipe: add some 32f8 formats to lp_test_conv
Also add the ability to handle different sized vectors.
commit 55dcd3af8366ebdac0af3cdb22c2588f24aa18ce
Author: Roland Scheidegger <sroland@vmware.com>
Date: Mon Mar 19 15:47:27 2012 +0000
gallivm: handle different sized vectors in conversion / pack
only fully generic path for now (extract/insert per element).
commit 9c040f78c54575fcd94a8808216cf415fe8868f6
Author: Roland Scheidegger <sroland@vmware.com>
Date: Sun Mar 18 00:58:28 2012 +0100
llvmpipe: fix harmless use of unitialized values
commit 551e9d5468b92fc7d5aa2265db9a52bb1e368a36
Author: Roland Scheidegger <sroland@vmware.com>
Date: Fri Mar 16 23:31:21 2012 +0100
gallivm: drop special path in extract_broadcast with different sized vectors
Not needed, llvm can handle shuffles with different sized result vector just
fine. Should hopefully generate the same code in the end, but simpler IR.
commit 44da531119ffa07a421eaa041f63607cec88f6f8
Author: Roland Scheidegger <sroland@vmware.com>
Date: Fri Mar 16 23:28:49 2012 +0100
llvmpipe: adapt interpolation for handling multiple quads at once
this is still WIP there are actually two methods possible not quite
sure what makes the most sense, so there's code for both for now:
1) the iterative method as used before (compute attrib values at upper left
corner of stamp and upper left corner of each quad initially).
It is improved to handle more than one quad at once, and also do some more vectorized
calculations initially for slightly better code - newer cpus have full throughput with
4 wide float vectors, hence don't try to code up a path which might be faster if there's
just one channel active per attribute.
2) just do straight interpolation for each pixel.
Method 2) is more work per quad, but less initially - if all quads are executed
significantly more overall though. But this might change with larger vector lengths.
This method would also be needed if we'd do some kind of active quad merging when
operating on multiple quads at once.
This path contains some hack to force llvm to generate better code, it is still far
from ideal though, still generates far too many unnecessary register spills/reloads.
Both methods should work with different sized vectors.
Not very well tested yet, still seems to work with four-wide vectors, need changes
elsewhere to be able to test with wider vectors.
commit be5d3e82e2fe14ad0a46529ab79f65bf2276cd28
Author: José Fonseca <jfonseca@vmware.com>
Date: Fri Mar 16 20:59:37 2012 +0000
draw: Cleanup.
commit f85bc12c7fbacb3de2a94e88c6cd2d5ee0ec0e8d
Author: José Fonseca <jfonseca@vmware.com>
Date: Fri Mar 16 20:43:30 2012 +0000
gallivm: More module compilation refactoring.
commit d76f093198f2a06a93b2204857e6fea5fd0b3ece
Author: José Fonseca <jfonseca@vmware.com>
Date: Thu Mar 15 21:29:11 2012 +0000
llvmpipe: Use gallivm_compile/free_function() in linear code.
Should had been done before.
commit 122e1adb613ce083ad739b153ced1cde61dfc8c0
Author: Roland Scheidegger <sroland@vmware.com>
Date: Tue Mar 13 14:47:10 2012 +0100
llvmpipe: generate partial pixel mask for multiple quads
still works with one quad, cannot be tested yet with more
At least for now always fixed order with multiple quads.
commit 4c4f15081d75ed585a01392cd2dcce0ad10e0ea8
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu Mar 8 22:09:24 2012 +0100
llvmpipe: refactor state setup a bit
Refactor to make it easier to emit (and potentially later fetch in fs)
coefficients for multiple attributes at once.
Need to think more about how to make this actually happen however, the
problem is different attributes can have different interpolation modes,
requiring different handling in both setup and fs (though linear and
perspective handling is close).
commit 9363e49722ff47094d688a4be6f015a03fba9c79
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu Mar 8 19:23:23 2012 +0100
llvmpipe: vectorize tri offset calc
cuts number of instructions in quad-offset-factor from 107 to 75.
This code actually duplicated the (scalar) code calculating the determinant
except it used different vertex order (leading to different sign but it doesn't
matter) hence llvm could not have figured out it's the same (of course with
determinant vectorized in the other place that wouldn't have worked any longer
neither).
Note this particular piece doesn't actually vectorize well, not many arithmetic
instructions left but tons of shuffle instructions...
Probably would need to work on n tris at a time for better vectorization.
commit 63169dcb9dd445c94605625bf86d85306e2b4297
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu Mar 8 03:11:37 2012 +0100
llvmpipe: vectorize some scalar code in setup
reduces number of arithmetic instructions, and avoids loading
vector x,y values twice (once as scalars once as vectors).
Results in a reduction of instructions from 76 to 64 in fs setup for glxgears
(16%) on a cpu with sse41.
Since this code uses vec2 disguised as vec4, on old cpus which had physical
64bit sse units (pre-Core2) it probably is less of a win in practice (and if
you have no vectors you can only hope llvm eliminates the arithmetic for
unneeded elements).
commit 732ecb877f951ab89bf503ac5e35ab8d838b58a1
Author: Roland Scheidegger <sroland@vmware.com>
Date: Wed Mar 7 00:32:24 2012 +0100
draw: fix clipping
bug introduced by 4822fea3f0440b5205e957cd303838c3b128419c broke
clipping pretty badly (verified with lineclip test)
commit ef5d90b86d624c152d200c7c4056f47c3c6d2688
Author: Roland Scheidegger <sroland@vmware.com>
Date: Tue Mar 6 23:38:59 2012 +0100
draw: don't store vertex header per attribute
storing the vertex header once per attribute is totally unnecessary.
Some quick look at the generated assembly says llvm in fact cannot optimize
away the additional stores (maybe due to potentially aliasing pointers
somewhere).
Plus, this makes the code cleaner and also allows using a vector "or"
instead of scalar ones.
commit 6b3a5a57b0b9850854cfbd7b586e4e50102dda71
Author: Roland Scheidegger <sroland@vmware.com>
Date: Tue Mar 6 19:11:01 2012 +0100
draw: do the per-vertex "boolean" clipmask "or" with vectors
no point extracting the values and doing it per component.
Doesn't help that much since we still extract the values elsewhere anyway.
commit 36519caf1af40e4480251cc79a2d527350b7c61f
Author: Roland Scheidegger <sroland@vmware.com>
Date: Fri Mar 2 22:27:01 2012 +0100
gallivm: fix lp_build_extract_broadcast with different sized vectors
Fix the obviously wrong argument, so it doesn't blow up.
commit 76d0ac3ad85066d6058486638013afd02b069c58
Author: José Fonseca <jfonseca@vmware.com>
Date: Fri Mar 2 12:16:23 2012 +0000
draw: Compile per module and not per function (WIP).
Enough to get gears w/ LLVM draw + softpipe to work on AVX doing:
GALLIUM_DRIVER=softpipe SOFTPIPE_USE_LLVM=yes glxgears
But still hackish -- will need to rethink and refactor this.
commit 78e32b247d2a7a771be9a1a07eb000d1e54ea8bd
Author: José Fonseca <jfonseca@vmware.com>
Date: Wed Feb 29 12:01:05 2012 +0000
llvmpipe: Remove lp_state_setup_fallback.
Never used.
commit 6895d5e40d19b4972c361e8b83fdb7eecda3c225
Author: José Fonseca <jfonseca@vmware.com>
Date: Mon Feb 27 19:14:27 2012 +0000
llvmpipe: Don't emit EMMS on x86
We already take precautions to ensure that LLVM never emits MMX code.
commit 4822fea3f0440b5205e957cd303838c3b128419c
Author: Roland Scheidegger <sroland@vmware.com>
Date: Wed Feb 29 15:58:19 2012 +0100
draw: modifications for larger vector sizes
We want to be able to use larger vectors especially for running the vertex
shader. With this patch we build soa vectors which might have a different
length than 4.
Note that aos structures really remain the same, only when aos structures
are converted to soa potentially different sized vectors are used.
Samplers probably don't work yet, didn't look at them.
Testing done:
glxgears works with both 128bit and 256bit vectors.
commit f4950fc1ea784680ab767d3dd0dce589f4e70603
Author: José Fonseca <jfonseca@vmware.com>
Date: Wed Feb 29 15:51:57 2012 +0100
gallivm: override native vector width with LP_NATIVE_VECTOR_WIDTH env var for debug
commit 6ad6dbf0c92f3bf68ae54e5f2aca035d19b76e53
Author: José Fonseca <jfonseca@vmware.com>
Date: Wed Feb 29 15:51:24 2012 +0100
draw: allocate storage with alignment according to native vector width
commit 7bf0e3e7c9bd2469ae7279cabf4c5229ae9880c1
Author: José Fonseca <jfonseca@vmware.com>
Date: Fri Feb 24 19:06:08 2012 +0000
gallivm: Fix comment grammar.
Was missing several words. Spotted by Roland.
commit b20f1b28eb890b2fa2de44a0399b9b6a0d453c52
Author: José Fonseca <jfonseca@vmware.com>
Date: Thu Feb 23 19:22:09 2012 +0000
gallivm: Use MC-JIT on LLVM 3.1 + (i.e, SVN)
MC-JIT
Note: MC-JIT is still WIP. For this to work correctly it requires
LLVM changes which are not yet upstream.
commit b1af4dfcadfc241fd4023f4c3f823a1286d452c0
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu Feb 23 20:03:15 2012 +0100
llvmpipe: use new lp_type_width() helper in lp_test_blend
commit 04e0a37e888237d4db2298f31973af459ef9c95f
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu Feb 23 19:50:34 2012 +0100
llvmpipe: clean up lp_test_blend a little
Using variables just sized and aligned right makes it a bit more obvious
what's going on.
The test still only tests vector length 4.
For AoS anything else probably isn't going to work.
For SoA other lengths should work (at least with floats).
commit e61c393d3ec392ddee0a3da170e985fda885a823
Author: José Fonseca <jfonseca@vmware.com>
Date: Thu Feb 23 17:48:30 2012 +0000
gallivm: Ensure vector width consistency.
Instead of assuming that everything is the max native size.
commit 330081ac7bc41c5754a92825e51456d231bf84dd
Author: José Fonseca <jfonseca@vmware.com>
Date: Thu Feb 23 17:44:14 2012 +0000
draw: More simd vector width consistency fixes.
commit d90ca002753596269e37297e2e6c139b19f29f03
Author: José Fonseca <jfonseca@vmware.com>
Date: Thu Feb 23 17:43:00 2012 +0000
gallivm: Remove unused lp_build_int32_vec4_type() helper.
commit cae23417824d75869c202aaf897808d73a2c1db0
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu Feb 23 17:32:16 2012 +0100
gallivm: use global variable for native vector width instead of define
We do not know the simd extensions (and hence the simd width we should use)
available at compile time.
At least for now keep a define for maximum vector width, since a global
variable obviously can't be used to adjust alignment of automatic stack
variables.
Leave the runtime-determined value at 128 for now in all cases.
commit 51270ace6349acc2c294fc6f34c025c707be538a
Author: José Fonseca <jfonseca@vmware.com>
Date: Thu Feb 23 15:41:02 2012 +0000
gallivm: Add a hunk inadvertedly lost when rebasing.
commit bf256df9cfdd0236637a455cbaece949b1253e98
Author: José Fonseca <jfonseca@vmware.com>
Date: Thu Feb 23 14:24:23 2012 +0000
llvmpipe: Use consistent vector width in depth/stencil test.
commit 5543b0901677146662c44be2cfba655fd55da94b
Author: José Fonseca <jfonseca@vmware.com>
Date: Thu Feb 23 14:19:59 2012 +0000
draw: Use a consistent the vector register width.
Instead of 4x32 sometimes, LP_NATIVE_VECTOR_WIDTH other times.
commit eada8bbd22a3a61f549f32fe2a7e408222e5c824
Author: José Fonseca <jfonseca@vmware.com>
Date: Thu Feb 23 12:08:04 2012 +0000
gallivm: Remove garbagge collection.
MC-JIT will require one compilation per module (as opposed to one
compilation per function), therefore no state will be shared,
eliminating the need to do garbagge collection.
commit 556697ea0ed72e0641851e4fbbbb862c470fd7eb
Author: José Fonseca <jfonseca@vmware.com>
Date: Thu Feb 23 10:33:41 2012 +0000
gallivm: Move all native target initialization to lp_set_target_options().
commit c518e8f3f2649d5dc265403511fab4bcbe2cc5c8
Author: José Fonseca <jfonseca@vmware.com>
Date: Thu Feb 23 09:52:32 2012 +0000
llvmpipe: Create one gallivm instance for each test.
commit 90f10af8920ec6be6f2b1e7365cfc477a0cb111d
Author: José Fonseca <jfonseca@vmware.com>
Date: Thu Feb 23 09:48:08 2012 +0000
gallivm: Avoid LLVMAddGlobalMapping() in lp_bld_assert().
Brittle, complex, and unecesary. Just use function pointer constant.
commit 98fde550b33401e3fe006af59db4db628bcbf476
Author: José Fonseca <jfonseca@vmware.com>
Date: Thu Feb 23 09:21:26 2012 +0000
gallivm: Add a lp_build_const_func_pointer() helper.
To be reused in all places where we want to call C code.
commit 6cfedadb62c2ce5af8d75969bc95a607f3ece118
Author: José Fonseca <jfonseca@vmware.com>
Date: Thu Feb 23 09:44:41 2012 +0000
gallivm: Cleanup/simplify lp_build_const_string_variable.
- Move to lp_bld_const where it belongs
- Rename to lp_build_const_string
- take the length from the argument (and don't count the zero terminator twice)
- bitcast the constant to generic i8 *
commit db1d4018c0f1fa682a9da93c032977659adfb68c
Author: José Fonseca <jfonseca@vmware.com>
Date: Thu Feb 23 11:52:17 2012 +0000
gallivm: Set NoFramePointerElimNonLeaf to true where supported.
commit 088614164aa915baaa5044fede728aa898483183
Author: Roland Scheidegger <sroland@vmware.com>
Date: Wed Feb 22 19:38:47 2012 +0100
llvmpipe: pass in/out pointers rather scalar floats in lp_bld_arit
we don't want llvm to potentially optimize away the vectors (though it doesn't
seem to currently), plus we want to be able to handle in/out vectors of arbitrary
length.
commit 3f5c4e04af8a7592fdffa54938a277c34ae76b51
Author: Roland Scheidegger <sroland@vmware.com>
Date: Tue Feb 21 23:22:55 2012 +0100
gallivm: fix lp_build_sqrt() for vector length 1
since we optimize away vectors with length 1 need to emit intrinsic
without vector type.
commit 79d94e5f93ed8ba6757b97e2026722ea31d32c06
Author: José Fonseca <jfonseca@vmware.com>
Date: Wed Feb 22 17:00:46 2012 +0000
llvmpipe: Remove lp_test_round.
commit 81f41b5aeb3f4126e06453cfc78990086b85b78d
Author: Roland Scheidegger <sroland@vmware.com>
Date: Tue Feb 21 23:56:24 2012 +0100
llvmpipe: subsume lp_test_round into lp_test_arit
Much simpler, and since the arguments aren't passed as 128bit values can run
on any arch.
This also uses the float instead of the double versions of the c functions
(which probably was the intention anyway).
In contrast to lp_test_round the output is much less verbose however.
Tested vector width of 32 to 512 bits - all pass except 32 (length 1) which
crashes in lp_build_sqrt() due to wrong type.
Signed-off-by: José Fonseca <jfonseca@vmware.com>
commit 945b338b421defbd274481d8c4f7e0910fd0e7eb
Author: José Fonseca <jfonseca@vmware.com>
Date: Wed Feb 22 09:55:03 2012 +0000
gallivm: Centralize the function compilation logic.
This simplifies a lot of code.
Also doing this in a central place will make it easier to carry out the
changes necessary to use MC-JIT in the future.
gallivm: Fix typo in explicit derivative shuffle.
Trivial.
draw: make DEBUG_STORE work again
adapt to lp_build_printf() interface changes
Reviewed-by: José Fonseca <jfonseca@vmware.com>
draw: get rid of vecnf_from_scalar()
just use lp_build_broadcast directly (cannot assign a name but don't really
need it, vecnf_from_scalar() was producing much uglier IR due to using
repeated insertelement instead of insertelement+shuffle).
Reviewed-by: José Fonseca <jfonseca@vmware.com>
llvmpipe: fix typo in complex interpolation code
Fixes position interpolation when using complex mode
(piglit fp-fragment-position and similar)
Reviewed-by: José Fonseca <jfonseca@vmware.com>
draw: fix clipvertex/position storing again
This appears to be the result of a bad merge.
Fixes piglit tests relying on clipping, like a lot of the interpolation tests.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
gallivm: Fix explicit derivative manipulation.
Same counter variable was being used in two nested loops. Use more
meanigful variable names for the counter to fix and avoid this.
gallivm: Prevent buffer overflow in repeat wrap mode for NPOT.
Based on Roland's patch, discussion, and review .
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
gallivm: Fix dims for TGSI_TEXTURE_1D in emit_tex.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
gallivm: Fix explicit volume texture derivatives.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
gallivm: fix 1d shadow texture sampling
Always r coordinate is used, hence need 3 coords not two
(the second one is unused).
Reviewed-by: José Fonseca <jfonseca@vmware.com>
gallivm: Enable AVX support without MCJIT, where available.
For now, this just enables AVX on Windows for testing. If the code is
stable then we might consider prefering the old JIT wherever possible.
No change elsewhere.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The vertex element state isn't in registers any more, so
remove that old code. That fixes a memory corruption with
the blend state and gets eglgears partially working.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Some application calls eglCreateWindowSurface with
EGLNativeWindowType parameter having zero value. It causes SEGV
and disturbs error handling like EGL_NO_SURFACE.
Signed-off-by: Elvis Lee <kwangwoong.lee@lge.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
a112ca5d rather crassly smashed all the compiler flags together into AM_CFLAGS.
Separate them out the way they were before, putting pre-processor flags into
AM_CPPFLAGS, so assembly source gets preprocessed with the correct pre-processor
flags as well.
Also, remove unneeded CFLAGS from AM_CFLAGS, and CXXFLAGS from AM_CXXFLAGS
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Tested-by: Brian Paul <brianp@vmware.com>
I suck at resolving merge conflicts and broke the build in a5a34b1.
This patch adds the missing field intel_mipmap_tree::wraps_etc1.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Enable it for all hardware.
No current hardware supports ETC1, so this patch implements it by
translating the ETC1 data to RGBX data during the call to
glCompressedTexImage2D(). For details, see the doxygen for
intel_mipmap_tree::wraps_etc1.
Passes the Piglit test spec/OES_compressed_ETC1_RGB8_texture/miptree and
the ETC1 test in the GLES2 conformance suite.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Add function _mesa_etc1_unpack_rgba8888. It is intended to be used by
glCompressedTexSubImage2D to decode ETC1 textures into RGBA.
CC: Chia-I <olv@lunarg.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Move the body of util_etc1_rgb8_unpack_rgba_unorm8 into a new function
that can be shared between gallium and dri drivers,
texcompress_etc_tmp.h:etc1_unpack_rgba8888.
CC: Chia-I <olv@lunarg.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
lp_delete_setup_variants() used to be called in garbage collection,
but this no longer exists hence the setup shaders never got freed.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
When we don't intend to texture from or render to a __DRIimage we
use __DRI_IMAGE_FORMAT_NONE. In that case, we just create the __DRIimage
to reference the underlying buffer, and will create usable __DRIimages
from it using createSubImage later.
If we try to use _mesa_get_format_bytes() on MESA_FORMAT_NONE in
a debug build, we hit an assertion, so let's not do that.
Commit 68e04cc6 was tested using automake-1.11. Unfortunately, automake-1.12
made a "slightly backward-incompatible change" in the use of yacc with C++, and
for a .yy file, the generated header file is now named .hh, not .h
To work with both, write our own rule for running yacc, which generates a
header file named .h, rather than using automake's rule.
Also, remove things from BUILD_SOURCES which don't need to be there
Also, update EXCLUDE rules in doxygen/glsl.doxy, for change of generated files
from .cpp -> .cc, and glsl_lexer.h has never existed.
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Commit defadf2b1 erroneously tries to make gallium drivers link with libdricore
as a static library, not a shared library
Also, change uses of DRI_LIB_DEPS in gallium driver Makefiles to
GALLIUM_DRI_LIB_DEPS, so the libraries added are used in the linking the gallium
driver
Also, fix the path to the libdricore.so symlink, it's made in LIB_DIR, not in
the libdricore directory
Also repair quoting of dricore settings of DRI_LIB_DEPS and GALLIUM_DRI_LIB_DEPS
variables so VERSION is interpolated in configure but TOP and LIB_DIR are
interpolated later (where they are known, but VERSION isn't)
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Signed-off-by: Tom Stellard <thomas.stellard@amd.com>
- Use LLVM limits when LLVM is being used, instead of TGSI limits
- Provide draw_get_shader_param_no_llvm for when llvm is never used (softpipe)
- Eliminate several of the hacks around draw shader caps in several drivers
Unfortunately the hack for PIPE_MAX_VERTEX_SAMPLERS is still necessary.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
The libmesa convenience library is linked with the libglsl convenience
library. libOsmesa is linked with libmesa, and also directly with libglsl.
When using libtool, this gives rise to duplicate symbol errors.
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Matt Turner <mattst88@gmail.com>
* "configure substitutions are not allowed in _SOURCES variables" in automake,
so remove the AC_SUBST'ed GLAPI_ASM_SOURCES and instead use some AM_CONDITIONALS
to choose which asm sources are used
* Change GLAPI_LIB to point to the .la file in other Makefile.am files, and make a link
to the .a file for the convenience of other Makefiles which have not yet been converted
to automake
v2:
- Use AM_CPPFLAGS for cleaner build output
- EXTRA_SOURCES is not needed
- Remove libglapi.a compatibility link on clean
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Matt Turner <mattst88@gmail.com>
Now mesa/drivers/dri is converted to automake, we want to update DRI_LIB_DEPS
so that we link with the libmesa or libdricore libtool library, as appropriate.
However, this is complicated by the fact that gallium/targets is not (yet)
converted, so we can't share the DRI_LIB_DEPS autoconf variable with that anymore.
Add an additional autoconf variable GALLIUM_DRI_LIB_DEPS, which is now used in
gallium/targets/Makefile.dri, to link with the libdircore or libmesa native library.
v2: libdricore$VERSION.a needs to be libdricore$(VERSION).a
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Matt Turner <mattst88@gmail.com>
* "configure substitutions are not allowed in _SOURCES variables" in automake, so instead of
MESA_ASM_FILES, use some AM_CONDITIONALS to choose which architecture's asm sources are used
in libmesa_la_SOURCES. (Can't remove MESA_ASM_FILES autoconf variable as it's still used in
sources.mak)
* Update to link with the .la file in other Makefile.am files, and make a link to the
.a file for the convenience of other Makefiles which have not yet been converted to automake
v2: Remove stray -static from LDFLAGS
v3: Remove .a compatibility link on clean
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Matt Turner <mattst88@gmail.com>
Automake can't handle having both clip.S and clip.c, even though they have different paths
"src/mesa/Makefile.am: object `clip.lo' created by `$(SRCDIR)/sparc/clip.S' and `$(SRCDIR)/main/clip.c'"
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Matt Turner <mattst88@gmail.com>
v2: Use AM_V_GEN to silence generated code rules. Add BUILT_SOURCES to CLEANFILES
v3:
- Fix an accidental // in a path
- Use automake make rules for lex/yacc rather than writing our own
- Update .gitignore appropriately
- Build a libglcpp convenience library rather than awkwardly including
the files in libglsl and delegating the generation
- Remove libglsl.a compatibility link on clean
v4:
- Automake's rules for lex/yacc make .cc if source is .ll or .yy, and apparently we
must use those extensions "because of scons", so update everywhere glsl_parser.cpp
-> glsl_parser.cc and glsl_lexer.cpp -> glsl_lexer.cc. This fixes 'make tarballs'
and building with dricore enabled.
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Matt Turner <mattst88@gmail.com>
This also currently fix the installation of libOSmesa.
v2: Remove old Makefile, libOSmesa is now versioned, fix typos
v3: Keep config substitution alphabetized
v4: Update .gitignore
v5: Libraries will be in the builddir, not the srcdir.
Reviewed-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Matt Turner <mattst88@gmail.com>
This was not implemented, because the spec was changed just recently.
Everything has been in place already.
Gallium has PIPE_FORMAT_B5G6R5_UNORM, while Mesa has MESA_FORMAT_RGB565.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The whole reason I avoided this was because it might operate on a
brw_vertex_program or a brw_fragment_program. However, that isn't a
problem: all we need is the gl_program base type.
This avoids awkwardly passing the loop counter 'i' as a parameter,
simplifies both callers, and also plumbs prog in place for future use.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
If alpha-testing is enabled, we need to send alpha down the pipeline
even if nr_color_buffers == 0. However, tracking whether alpha-testing
is enabled in the WM program key is expensive: it causes us to compile
multiple specializations of the same shader, using program cache space.
This patch removes the check for alpha-testing, and simply emits alpha
whenever nr_color_buffers == 0. We believe this will also be necessary
for alpha-to-coverage, and it should add minimal overhead to an uncommon
case. Saving the recompiles should more than make up the difference.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Previously we only did this pre-Gen6, and used pwrite on Gen6+.
In one workload, this cuts significant amount of overhead.
v2: Simplify the function based on Eric's suggestions.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
We rely on proper IEEE 754 behavior in too many places for this.
See also commit 2fdbbeca43 with equivalent
change for autoconf.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Without that, people with buggy apps that looked at just the server
string for GLX_ARB_create_context would call this function that just
threw an error when you tried to make a context. Google shows plenty
of complaints about this.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This function assumes that lp_build_context::type is a vector type,
which is not true for r600 or radeonsi.
This fixes an assertion failure using glamor 2D accel.
It had many problems:
- The shadow comparison was done post-filtering.
- It required state-dependent recompiles whenever the comparison
function changed.
- It didn't even work: many cases hit assertion failures.
- I never implemented it for the VS.
The new lowering pass which converts textureGrad to textureLod by
computing the LOD value works much better.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Intel hardware doesn't natively support textureGrad with shadow
comparisons. So we need to generate code to handle it somehow.
Based on the equations of page 205 of the OpenGL 3.0 specification,
it's possible to compute the LOD value that would be selected given the
gradient values. Then, we can simply convert the TXD to a TXL.
Currently, this passes 34/46 of oglconform's shadow-grad subtests;
four cubemap tests are regressed. We should investigate this in the
future.
v2: Apply abs() to the scalar case (thanks to Eric).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This swizzles away unwanted components, while preserving the order of
the ones that remain.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
I needed to compute logs and square roots in a patch I was working on,
and wanted to use the convenient interface. We already have a similar
constructor for binops; adding one for unops seems reasonable.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
I ran into this while trying to create a TXS query, which doesn't have a
coordinate. Since it didn't get initialized to NULL, a bunch of
visitors tried to access it and crashed.
Most of the time, this won't be a problem, but it's just a good idea.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The only case a depth buffer can be set as a color buffer is when flushing.
That wasn't always the case, but now this code isn't required anymore.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
- maintain a mask of which mipmap levels are dirty (instead of one big flag)
- only flush what was requested at a given point and not the whole resource
(most often only one level and one layer has to be flushed)
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
we can just update the state when decompressing, there's no need to add
additional info into the DSA state
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
to remove some overhead from draw_vbo. This is a derived state.
BTW, I've got no idea how compute interacts with 3D here, but it should
use cb_misc_state, so that 3D and compute don't conflict.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Because u_blit couldn't sample a 1D, 3D, CUBE and ARRAY texture, we created
a 2D texture holding a copy of one slice of the source texture (even for 1D).
Let's just do it right.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
This patch updates the blorp engine to properly handle the case where
the surface being textured from uses Gen7's CMS MSAA layout. The
following changes were necessary:
- Before reading color values from the surface, we need to read from
the MCS buffer using the ld_mcs sampler message. This is done by
the mcs_fetch() function, and the result is stored in the mcs_data
register. This only needs to be done once per pixel, since the MCS
value is shared between all samples belonging to a pixel.
- When reading color values from the surface, we need to use the
ld2dms sampler message instead of the ld2dss message, and we need to
provide the value read from the MCS buffer as an argument.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
When a buffer using Gen7's CMS MSAA layout is bound to a texture or a
render target, the SURFACE_STATE structure needs to point to the MCS
buffer and to indicate its pitch. This patch updates the functions
that emit SURFACE_STATE to handle CMS layout properly.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Previously the DWORD used to control the CMS MSAA layout was just a
pad value, because we didn't use it.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
To implement Gen7's CMS MSAA layout, we need an extra buffer, the MCS
(Multisample Control Surface) buffer. This patch introduces code for
allocating and deallocating the buffer, and storing a pointer to it in
the intel_mipmap_tree struct.
No functional change, since the CMS layout is not enabled yet.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
From the Ivy Bridge PRM, Vol 1 Part 1, p112:
There are three types of multisampled surface layouts designated
as follows:
- IMS Interleaved Multisampled Surface
- CMS Compressed Mulitsampled Surface
- UMS Uncompressed Multisampled Surface
Previously, the i965 driver only used IMS and UMS formats, and
distinguished beetween them using the boolean
intel_mipmap_tree::msaa_is_interleaved. To facilitate adding support
for the CMS format, this patch replaces that boolean (and other
booleans derived from it) with an enum
INTEL_MSAA_LAYOUT_{IMS,CMS,UMS}. It also updates the terminology used
in comments throughout the driver to match the IMS/CMS/UMS terminology
used in the PRM. CMS layout is not yet used.
The enum has a fourth possible value, INTEL_MSAA_LAYOUT_NONE, which is
used for non-multisampled surfaces.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
On Gen6, MSAA buffers always use an interleaved layout and non-MSAA
buffers always use a non-interleaved layout, so it is not strictly
necessary to keep track of the layout of the texture and render target
surfaces in the blorp program key. However, it is cleaner to do so,
since (a) it makes the blorp compiler less dependent on implicit
knowledge about how the GPU pipeline is configured, and (b) it paves
the way for implementing compressed multisampled surfaces in Gen7.
This patch won't cause any redundant compiles, because the layout of
the texture and render target surfaces depends on other parameters
that are already in the blorp program key.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
We don't generate public entrypoints for GLES extensions, so move the
GL_NV_draw_buffers definition from ARB_draw_buffers.xml to es_EXT.xml.
When the extension is defined in ARB_draw_buffers.xml, we end up with a
public entry point for it, but no prototype, which gives an error when
compiled with --disable-asm and --disable-shared-glapi.
Instead, just move the GLES extension to es_EXT.xml so this doesn't happen.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
This lets us specify an offset into the bo where the miptree starts,
which will let us set up a texture for a single plane in a planar buffer.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
The additions in version 5 enables creating EGLImages for different planes
of a YUV buffer. createImageFromName is still used to create the containing
__DRIimage, and createSubImage can then be used no that __DRIimage to create
__DRIimages that correspond to the y, u, and v planes (__DRI_IMAGE_FORMAT_R8)
or the uv planes (__DRI_IMAGE_FORMAT_RG88) for formats such as NV12 where
the u and v components are interleaved. Packed formats such as YUYV etc
doesn't require any special treatment, we just sample those as a regular
ARGB texture.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
The code for growing the memory pool (which is used for storing all of
the global buffers) wasn't working. There seem to be two separate issues
with the memory pool code. The first was the way it was growing the pool.
When the memory pool needed more space, it would:
1. Copy the data from the memory pool's backing texture to system memory.
2. Delete the memory pool's texture
3. Create a bigger backing texture for the memory pool.
4. Copy the data from system memory into the bigger texture.
The copy operations didn't seem to be working, and I suspect that since
they were using fragment shaders to do the copy, that there might have
been a problem with the mixing of compute and 3D state.
The other issue is that the size of 1D textures is limited, and I was
having trouble getting 2D textures to work.
I think these problems will be easier to solve once more code is shared
between 3D and compute, which is why I decided to disable it for now
rather than continue searching for a fix.
The original strategy for handling floating point loads, which was to
lower (f32 load) to (f32 bitcast (i32 load)) wasn't really working. The
main problem was that the DAG legalizer couldn't handle replacing a node
with two results (load) with a node with only one result (bitcast).
It didn't change performance on Lightsmark or Nexuiz, which both used
DYNAMIC_DRAW buffers, but it was killing performance (40% CPU wasted pwriting
buffers) on a closed-source app we're looking at.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Add the infrastructure required for this extension. There is no
xserver support and no driver support yet. Drivers can enable this be
advertising DRI2 version 4 and accepting the
__DRI_CTX_FLAG_ROBUST_BUFFER_ACCESS flag and the
__DRI_CTX_ATTRIB_RESET_STRATEGY attribute in create context.
Some additional Mesa infrastructure is needed before drivers can do
this. The GL_ARB_robustness spec, which all Mesa drivers already
advertise, requires:
"If the behavior is LOSE_CONTEXT_ON_RESET_ARB, a graphics reset
will result in the loss of all context state, requiring the
recreation of all associated objects."
It is necessary to land this infrastructure now so that the related
infrastructure can land in the xserver. The xserver has very long
release schedules, and the remaining Mesa parts should land long, long
before the next xserver merge window opens.
v2: Expose robustness as a DRI2 extension rather than bumping
__DRI_DRI2_VERSION.
v3: Add a comment explaining why dri2->base.version >= 3 is also
required for GLX_ARB_create_context_robustness.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This allows revising the dri_interface.h separately from adding driver
support.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We neglected to list the deprecation model/forward compatible context
support.
inverse() has been done for a while.
None of us know what "highp change" means; GLSL 1.30 already added the
ability to recognize precision keywords, and it doesn't look like 1.40
has any new requirements there (precision keywords still have no meaning).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Use r600_resource_texture::flished_depth_texture for GPU access, and
allocate it in the VRAM. For transfers we'll allocate texture in the GTT
and store it in the r600_transfer::staging.
Improves performance when flushed depth texture is frequently used by the
GPU, e.g. in Lightsmark (~30%)
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
With fixes and updates from Ben Widawsky and comments from Paul Berry.
v2: Use drm_intel_gem_context_destroy to destroy hardware context;
remove useless initialization of hw_ctx, both suggested by Eric.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Acked-by: Paul Berry <stereotype441@gmail.com>
This doesn't do anything with the uniform block declarations yet, so
usage of those uniforms finds them to be undeclared.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
I've been trying to derive from this for UBO support, and the slightly
obfuscated types were putting me over the edge.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The got_one variable was set iff one of the bits in flags.i was set.
v2: Fix incorrect dropping of the ARB_conservative_depth warning.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This function is used when dispatching compute shader in order to avoid
mixing compute and 3D registers in the context's dirty list. This
allows the compute code to resuse 3D functions like evergreen_cb, which
return a struct r600_pipe_state and still have control over when and how
the register writes are emitted.
The start_compute_cs atom initializes some config and context registers
to the values needed for running compute shaders. When a compute shader
is dispatched, this atom is emitted after the start_cs_cmd atom, which
initializes registers that are common to both 3D and compute.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Some packets require the shader type bit (bit 1) to be set when
used for compute shaders. The pkt_flag will be initialized to
RADEON_CP_PACKET3_COMPUTE_MODE for any struct r600_command_buffer used
for dispatching compute shaders and it will be or'd against the result of
the PKT3 macro when adding a new packet to a struct r600_command buffer.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
For copy propgation, we've dropped the use of a GRF in favor of a
(probably later) use of a different GRF. This definitely requires
invalidating intervals.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Since live intervals are based on ip, removing an instruction trashes
the intervals unless we were to go do some surgery. These happen to
usually remove a use of a grf, so it's time to recalculate, anyway.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
NOTE: This is a candidate for the 8.0 release branch.
This has less impact than for the FS (4k savings), because it was partially
done already, but makes things more consistent.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We factor out all the EGL book-keeping into dri2_create_image() and
simplify the wayland case by using dupImage.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
We have the same switch and allocation code in two places.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This reverts commit cbffaf20e9.
Use the PRIx64 macro in the fprintf() call instead, as suggested
by Dylan Noblesmith.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
ROUND and TRUNC are implemented with one function to reduce code duplication.
Note: ROUND isn't actually used yet, but probably will be soon.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Converting CMP to SLT+LRP didn't work when src2 or src3 was Inf/NaN.
That's the case for GLSL sqrt(0). sqrt(0) actually happens in many
piglit auto-generated tests that use the distance() function.
v2: remove debug/devel code, per Jose
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Was previously implemented with FLOOR.
Fixes quite a few piglit tests of float->int conversion, integer
division, etc.
v2: clean up left over debug/devel code, per Jose
Reviewed-by: José Fonseca <jfonseca@vmware.com>
If the 'dst' register is the same as the 'pass' register we'll generate
invalid code. Use a temporary register in that case.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Redo this commit, and remove the inclusion of gl2ext.h
from src/mapi/glapi/glapi_priv.h. The include was added in
8f3be33985 to fix a missing prototype for
glDrawBuffersNV and others, but it's not possible to include both
glext.h and gl2ext.h from the same file.
I don't see the missing prototype here (with or without shared glapi)
so I'm just removing the offending #include.
Also, since we're redoing this, update to the most recent gl2ext.2.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
That old bug was hidden but the clipper always interpolating in 3d space
no matter what it should have been doing. Now that the interpolation
has been fixed, the bug shows up.
Fixes fdo 51364.
Signed-off-by: Olivier Galibert <galibert@pobox.com>
Signed-off-by: José Fonseca <jfonseca@vmware.com>
Calling glGenerateMipmap could overwrite vertex buffer state, leading
to incorrect rendering or crashes depending on the Gallium driver.
This was happening on WebGL Conformance test texture-size.
Before 784dd51198 this was covered up
by redundant vertex buffer validation.
Reviewed-by: Stéphane Marchesin <marcheu@chromium.org>
Signed-off-by: Marek Olšák <maraeo@gmail.com>
This reverts commit 8818b88748.
I get a lot of errors like this one:
In file included from ../../../src/mapi/glapi/glapi_priv.h:49:0,
from glapi_dispatch.c:40:
../../../include/GLES2/gl2ext.h:1074:28: error: redefinition of typedef ‘PFNGLRENDERBUFFERSTORAGEMULTISAMPLEEXTPROC’
../../../include/GL/glext.h:10237:25: note: previous declaration of ‘PFNGLRENDERBUFFERSTORAGEMULTISAMPLEEXTPROC’ was here
This with a clean build (with git clean -fdX).
I don't get the errors on my other machine. I didn't investigate why,
a wild guess is that this depends on the version of gcc.
This is a big win for savage2, hon and yofrankie. 62 new programs for
savage2/hon get 16-wide mode, along with one for humus demos and two
for tropics. Even a few shaders from tropics see reductions of 15% or
more.
total instructions in shared programs: 216536 -> 207353 (-4.24%)
instructions in affected programs: 123941 -> 114758 (-7.41%)
In benchmarking Tropics, only a .040% +/- 034% performance improvement
was observed (n=90). Rather disappointing, but I was primarily
motivated to do this patch by a regression in the number of 16-wide
shaders compiled after a GRF texturing on IVB patch I'm working on.
Hopefully this helps avoid that regression.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This shaves a few instructions off of a ton of programs. For 12
shaders from tropics and sanctuary, it's enough reduction in register
pressure to get 16-wide mode. 7 shaders from heroes of newerth and
savage2 are hurt by about 1.1%, where copy propagation of negates ends
up preventing coalescing, but we could regain that by doing dataflow
analysis in our copy propagation.
No significant performance difference in tropics (n=11)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The meta-ops _mesa_meta_Clear() and _mesa_meta_glsl_Clear() need to
ignore the state of GL_SAMPLE_ALPHA_TO_COVERAGE,
GL_SAMPLE_ALPHA_TO_ONE, GL_SAMPLE_COVERAGE, GL_SAMPLE_COVERAGE_VALUE,
and GL_SAMPLE_COVERAGE_INVERT when clearing multisampled buffers. The
easiest way to accomplish this is to disable GL_MULTISAMPLE during the
clear meta-ops.
Note: this patch also causes GL_MULTISAMPLE to be disabled during
_mesa_meta_GenerateMipmap() and _mesa_meta_GetTexImage() (since those
two meta-ops use MESA_META_ALL). Arguably this isn't strictly
necessary, since those meta-ops use their own non-MSAA fbo's, but it
shouldn't do any harm.
Fixes Piglit tests "EXT_framebuffer_multisample/clear {2,4}
{color,stencil}" on i965.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
From the Ivy Bridge PRM, Vol 2 Part 1 p280-281 (3DSTATE_WM:
Barycentric Interpolation Mode):
"Errata: When Centroid Barycentric mode is required, HW may
produce incorrect interpolation results when a 2X2 pixels have
unlit pixels."
To work around this problem, after doing centroid interpolation, we
replace the centroid-interpolated values for unlit pixels with
non-centroid-interpolated values (which are interpolated at pixel
centers). This produces correct rendering at the expense of a slight
increase in shader execution time.
I've conditioned the workaround with a runtime flag
(brw->needs_unlit_centroid_workaround) in the hopes that we won't need
it in future chip generations.
Fixes piglit tests "EXT_framebuffer_multisample/interpolation {2,4}
{centroid-deriv,centroid-deriv-disabled}". All MSAA interpolation
tests pass now.
Reviewed-by: Eric Anholt <eric@anholt.net>
In order to compute centroid varyings correctly, the fragment shader
needs to be able to load the current pixel/sample mask into a flag
register. This patch adds an opcode to the fragment shader back-end
to do this; the opcode gets translated into the instruction
mov(1) f0<1>UW g1.14<0,1,0>UW { align1 WE_all }
Since this instruction clobbers f0, instruction scheduling has to
treat it the same as instructions that have a conditional modifier.
Reviewed-by: Eric Anholt <eric@anholt.net>
When querying GL_PRIMITIVES_GENERATED, if primitive restart
is also used, then take the software primitive restart
path so GL_PRIMITIVES_GENERATED is returned correctly.
GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN is also updated
since it will also affected by the same issue.
As noted in brw_primitive_restart.c, with further work we
should be able to move this situation back to a hardware
handled path.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Commit d73f6375f5 fixed the cause of the Piglit failure with
ARB_color_buffer_float fragment clamp modes. Now that it's fixed,
there's no reason to leave snorm format rendering disabled.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Commit 0c005bd7 intended to make ir_loop_jump::mode public, but also
accidentally added a new pointer to the enclosing loop. Furthermore, it
tried to initialize the new field by adding "this->loop = loop;" to the
constructor, but since there is no loop parameter, this only initialized
the field to itself---so it will likely be a garbage pointer.
A lot of code, such as lower_jumps, allocates new loop jumps without
setting this field appropriately, so any uses would probably just crash.
Thankfully, there were none, so we can just delete the field.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=51574
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
DrawPixels uses the MESA_META_CLAMP_FRAGMENT_COLOR flag to save/restore
the fragment color clamp mode. This is unnecessary since it never
alters it. It's also harmful: when the clamp mode is GL_FIXED_ONLY,
setting this flag causes _mesa_meta_begin to force it to GL_FALSE,
breaking clamping on SNORM formats.
DrawPixels should use the user-specified clamp mode and not change it.
Fixes Piglit's spec/ARB_color_buffer_float/GL_RGBA8_SNORM-drawpixels
test on i965/Sandybridge (with SNORM render targets re-enabled).
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Add "-f $(srcdir)/gl_API.xml" to the arguments of all
the scripts that by default look for gl_API.xml in the
working directory when run with no arguments, and prepend
$(srcdir) to those scripts that are already using an
explicit -f argument.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
tgsi_ureg was recently enhanced to support local temporaries, and as result
temps are declared individually.
This change avoids many TEMP register declarations on common shaders.
(And fixes performance regression due to mismatches against performance
sensitive shaders.)
Reviewed-by: Brian Paul <brianp@vmware.com>
The templated copy constructor doesn't prevent the compiler from
emitting a default copy constructor, which leads to inconsistent
memory handling and was reported to cause segfaults when doing event
manipulation.
Reported-by: Tom Stellard <thomas.stellard@amd.com>
The function internalizer pass marks non-kernel functions as internal,
which enables optimizations like function inlining and global dead-code
elimination.
v2:
- Pass vector arguments by const reference
Removed u_half.py used to generate the table for previous method.
Previous implementation of float to half conversion was faulty for
denormalised and NaNs and would require extra logic to fix,
thus making the speedup of using tables irrelevant.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Some parameters need to be checked only once.
check_valid_to_render needs to be called only once.
The validate function is based on the one for DrawElements.
Reviewed-by: Brian Paul <brianp@vmware.com>
This is a cleanup for ARB_transform_feedback3, where
GL_MAX_TRANSFORM_FEEDBACK_BUFFERS is introduced for interleaved attribs and
has the same meaning as GL_MAX_.._SEPARATE_ATTRIBS for separate attribs.
Also, the maximum number of TFB buffers is reduced from 32 to 4, which makes
this patch useful even without the extension.
I don't know of any hardware which can do more than 4.
Reviewed-by: Brian Paul <brianp@vmware.com>
Doesn't really change the generated assembly, but produces more compact IR,
and of course, makes code more consistent.
Reviewed-by: Brian Paul <brianp@vmware.com>
For some reason regular gcc on Linux didn't catch these but the mingw
compiler did (generated errors, not warnings).
v2: include the changes in src/mapi/ too
Fixes the es2 build with gcc.
Note: in glext.h the prototypes for glShaderSource() and glShaderSourceARB()
disagree: only the former has the extra const qualifier.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Set the step_rate value when drawing to implement
ARB_instanced_arrays for gen >= 4.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, we were counting gl_FrontFacing, gl_FragCoord and gl_PointCoord
against the limit of varying variables. This prevented some valid shaders
from linking.
The other potential solution to this is to have the driver advertise
more varying vars or set the GLSLSkipStrictMaxVaryingLimitCheck flag.
But the above-mentioned variables aren't conventional varying attributes
so it doesn't seem right to count them.
Reviewed-by: Eric Anholt <eric@anholt.net>
Updated lp_build_printf to share common code.
Removed specific lp_build_print_vecX.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Since we don't have them in hw we emulate them in the shader. Although not
recommended by the spec it is legit.
As a side effect we also get GL 2.1. I think this is as far as we can take
the i915.
The most recent commit adds support for comments and macro expansion
on #line directives. Add testing to verify the new features.
Signed-off-by: Carl Worth <cworth@cworth.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The GLSL specification requires that #line directives be interpreted
after macro expansion. Our existing implementation of #line macros in
the lexer prevents conformance on this point.
Moving the handling of #line from the lexer to the parser gives us the
macro expansion we need. An additional benefit is that the
preprocessor also now supports comments on the same line as #line
directives.
Finally, the preprocessor now emits the (fully-macro-expanded) #line
directives into the output. This allows the full GLSL compiler to also
see and interpret these directives so it can also generate correct
line numbers in error messages.
Signed-off-by: Carl Worth <cworth@cworth.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This function is currently used only in the expansion of #if lines,
but we will soon be using it more generally (for the expansion of
(_glcpp_parser_expand_and_lex_from) and some more documentation.
Signed-off-by: Carl Worth <cworth@cworth.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Commit b823b99ec0 switched from using
functions such as ralloc_asprintf and ralloc_strcat to
ralloc_asprintf_rewrite_tail. This change maintains the string's
length as a aparamter that is updated by the ralloc functions (rather
than recomputing it with strlen over and over).
However, the change failed to updated two locations (glcpp_error and
glcpp_warning), with the result that the string's length wasn't
updated by these calls. Then, subsequent calls to other
ralloc_asprintf_rewrite_tail would overwrite the text appended by
glcpp_error.
This commit fixes the two missing updates, and restores line numbers
to the output of glcpp error messages, (as noticed by a glcpp unit
test case that has been failing since the above-mentioned commit).
Signed-off-by: Carl Worth <cworth@cworth.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
A strict reading of the GLSL specification would have this be an
error, but we've received reports from users who expect the
preprocessor to interepret undefined macros as 0. This is the standard
behavior of the rpeprocessor for C, and according to these user
reports is also the behavior of other OpenGL implementations.
So here's one of those cases where we can make our users happier by
ignoring the specification. And it's hard to imagine users who really,
really want to see an error for this case.
The two affected tests cases are updated to reflect the new behavior.
Signed-off-by: Carl Worth <cworth@cworth.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
DUAL_EXPORT can be enabled on r6xx/r7xx when all CBs use 16-bit export
and there is no depth/stencil export.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
It seems DUAL_EXPORT on evergreen may be enabled when all CBs use 16-bit export
mode (EXPORT_4C_16BPC), also there should be at least one CB, and the PS
shouldn't export depth/stencil.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
In some cases TGSI shader has more color outputs than the number of CBs,
so it seems we need to limit the number of color exports. This requires
different shader variants depending on the nr_cbufs, but on the other hand
we are doing less exports, which are very costly.
v2: fix various piglit regressions
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Shader variants are stored in the list, the key for lookup is based on the
states that require different hw shaders - currently it's rctx->two_side (all
gpus) and rctx->nr_cbufs (evergreen/cayman, when writes_all property is set).
v2:
- use simple list instead of keymap as suggested by Marek on irc
- call r600_adjust_gprs from r600_bind_vs_shader for r6xx/r7xx
(r600_shader_select isn't used for vertex shaders currently)
v3:
- fix call to r600_adjust_gprs - do it after updating current shader
Improves performance for some apps, e.g. FlightGear -
see https://bugs.freedesktop.org/show_bug.cgi?id=50360
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
As with the previous commit for softpipe.
v2: remove 'default' case to get compile-time warning
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
These all return zero. Add a debug_printf() to catch the default case so
we don't accidently mishandle something important in the future.
v2: remove 'default' case to get compile-time warning
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
This is actually required for GL_ARB_framebuffer_object, but the state
tracker doesn't currently check it.
Direct3D 9 allows mixed format color buffers with some restrictions.
Setting this allows Unigine Heaven 2.5 and 3.0 to run. Tested both on
GL and D3D hosts.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
The type is the destination type (i.e. float vector) and not the
source type. Fixes piglit fs-{in,de}crement-uint.
Signed-off-by: Olivier Galibert <galibert@pobox.com>
Signed-off-by: José Fonseca <jfonseca@vmware.com>
i965 hardware needs to be informed of situations in which it's
possible for pixels (or samples) to be discarded for reasons other
than depth/stencil testing (e.g. due to an explicit "discard" in the
fragment shader). One of these situations is when
GL_ALPHA_TO_COVERAGE is enabled, since that can cause samples to be
discarded by the color calculator when the pixel's alpha value is less
than 1.0.
Without this patch, GL_ALPHA_TO_COVERAGE does not take effect on depth
buffers.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This patch enables the multisampling parameters
GL_SAMPLE_ALPHA_TO_COVERAGE and GL_SAMPLE_ALPHA_TO_ONE, which allow
the fragment shader's alpha output to be converted into a sample
coverage mask and ignored for blending. i965 supports these
parameters through the BLEND_STATE structure.
The GL spec allows, but does not require, the implementation to dither
the conversion from alpha to a sample coverage mask, so that alpha
values that aren't a multiple of 1/num_samples result in the correct
proportion of samples being lit. A bit exists in the BLEND_STATE
structure to enable this functionality, but according to the hardware
docs it must be disabled on Sandy Bridge (see the Sandy Bridge PRM,
Vol2, Part1, p379: AlphaToCoverage Dither Enable). So it is enabled
for Gen7 only.
Fixes piglit tests
"EXT_framebuffer_multisample/sample-alpha-to-{coverage,one} {2,4}".
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This patch enables glSampleCoverage() functionality, which allows the
client program to specify that only a portion of the samples be lit up
when performing multisampled rendering. i965 supports
glSampleCoverage() through the 3DSTATE_SAMPLE_MASK command packet,
which allows the driver to specify a bitfield indicating which samples
to light up.
Fixes piglit tests "EXT_framebuffer_multisample/sample-coverage {2,4}
{inverted,non-inverted}".
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Fixes gles2conform GL.equal.equal_bvec2_frag.
This fixes brw_fs_visitor's translation of ir_unop_f2b. It used CMP to
convert the float to one of 0 or ~0. However, the convention in the
compiler is that true is represented by 1, not ~0. This patch adds an AND
to convert ~0 to 1.
By inspection, a similar problem existed with ir_unop_i2b, with a similar
fix.
[v2 kayden]: eliminate extra temporary register.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=49621
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
This patch causes the fragment shader to be configured correctly (and
the correct code to be generated) for centroid interpolation. This
required two changes: brw_compute_barycentric_interp_modes() needs to
determine when centroid barycentric coordinates need to be included in
the pixel shader thread payload, and
fs_visitor::emit_general_interpolation() needs to interpolate using
the correct set of barycentric coordinates.
Fixes piglit tests "EXT_framebuffer_multisample/interpolation {2,4}
centroid-edges" on i965.
Reviewed-by: Eric Anholt <eric@anholt.net>
To save time, we only instruct the clip stage of the pipeline to
compute noperspective barycentric coordinates if those coordinates are
needed by the fragment shader. Previously, we would determine whether
the coordinates were needed by seeing whether the fragment shader used
the BRW_WM_NONPERSPECTIVE_PIXEL_BARYCENTRIC interpolation mode.
However, with MSAA, it's possible that the fragment shader might use
BRW_WM_NONPERSPECTIVE_CENTROID_BARYCENTRIC instead. In the future,
when we support ARB_sample_shading, it might use
BRW_WM_NONPERSPECTIVE_SAMPLE_BARYCENTRIC.
This patch modifies the upload_clip_state() functions to check for all
three possible noperspective interpolation modes.
Reviewed-by: Eric Anholt <eric@anholt.net>
This bitfield tells the back-ends which of a fragment shader's inputs
require centroid interpolation. It is only set for GLSL fragment
shaders, since assembly fragment shaders don't support centroid
interpolation.
Reviewed-by: Eric Anholt <eric@anholt.net>
It was only no-oping the clear() function, not actual triangle
rasterization. Move the no_rast field from lp_context down into
lp_rasterizer so it's accessible where it's needed.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Fixes this build failure on Solaris.
Compiling build/sunos-debug/glsl/glcpp/glcpp-lex.c ...
"src/glsl/glcpp/glcpp-lex.l", line 30: cannot find include file: "glcpp-parse.h"
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
$CLANG_RESOURCE_DIR is the directory that contains all resources
needed by clang to compile programs. When clover uses clang to
compile kernels it needs to specify a resource dir, so that clang
can find its internal headers (e.g. stddef.h).
clang defines $CLANG_RESOURCE_DIR as $CLANG_LIBDIR/clang/$CLANG_VERSION
This patch adds the --with-clang-libdir option in order to accommodate
clang intalls to non-standard locations, and it also adds a check
to the configure script to verify that $CLANG_RESOURCE_DIR/include
contains the necessary header files.
On i965, dFdx() and dFdy() are computed by taking advantage of the
fact that each consecutive set of 4 pixels dispatched to the fragment
shader always constitutes a contiguous 2x2 block of pixels in a fixed
arrangement known as a "sub-span". So we calculate dFdx() by taking
the difference between the values computed for the left and right
halves of the sub-span, and we calculate dFdy() by taking the
difference between the values computed for the top and bottom halves
of the sub-span.
However, there's a subtlety when FBOs are in use: since FBOs use a
coordinate system where the origin is at the upper left, and window
system framebuffers use a coordinate system where the origin is at the
lower left, the computation of dFdy() needs to be negated for FBOs.
This patch modifies the fragment shader back-ends to negate the value
of dFdy() when an FBO is in use. It also modifies the code that
populates the program key (brw_wm_populate_key() and
brw_fs_precompile()) so that they always record in the program key
whether we are rendering to an FBO or to a window system framebuffer;
this ensures that the fragment shader will get recompiled when
switching between FBO and non-FBO use.
This will result in unnecessary recompiles of fragment shaders that
don't use dFdy(). To fix that, we will need to adapt the GLSL and
NV_fragment_program front-ends to record whether or not a given shader
uses dFdy(). I plan to implement this in a future patch series; I've
left FIXME comments in the code as a reminder.
Fixes Piglit test "fbo-deriv".
NOTE: This is a candidate for stable release branches.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It's not optimal, but it's better than the register pressure scheduler
that was previously being used. The VLIW scheduler currently ignores
all the complicated instruction groups restrictions and just tries to
fill the instruction groups with as many instructions as possible.
Though, it does know enough not to put two trans only instructions in
the same group.
We are able to ignore the instruction group restrictions in the LLVM
backend, because the finalizer in r600_asm.c will fix any illegal
instruction groups the backend generates.
Enabling the VLIW scheduler improved the run time for a sha1 compute
shader by about 50%. I'm not sure what the impact will be for graphics
shaders. I tested Lightsmark with the VLIW scheduler enabled and the
framerate was about the same, but it might help apps that use really
big shaders.
The rest of the TFB implementation remains in transformfeedback.c, and
this will be shared with UBOs.
v2: Move the size/offset checks shared with UBOs to common code as
well. (Kenneth's review)
Reviewed-by: Brian Paul <brianp@vmware.com> (v1)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Fix a typo spotted by Eric Anholt.
v3: Fix missing "GL" on types, fix style, fix Studly_Caps extension name,
drop commented code duplicated with GL3x.xml [anholt]
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Our intention is still that it's not abi stable, so make the package
version number get included in the library name. Now you can parallel
install dricore-using drivers from multiple mesa versions. We can put
it into lib now that we're following library versioning rules
(assuming that ABIs don't change within a single Mesa point release).
LD_LIBRARY_PATH still doesn't work with a non-/, non-/usr prefix
because libtool uses rpath instead of runpath for nonstandard
prefixes.
The weird versioning of the libGL where the package version was sort
of expressed as a big integer is dropped. libtool didn't like the 0
prefix, and it didn't really make sense anyway -- if you interpret it
as an integer version number, old Mesa 071200 was bigger than current
Mesa 08100. Instead, just bump the minor version and drop the
patchlevel.
Except for the deleted linux-cell target, these were just the target
cc/cflags. The only usage was for gen_matypes, which wants the
target's structure packing, not the host, anyway.
Every place that uses ASM_FLAGS already uses DEFINES. Not including
it in DEFINES is just a way to screw up potential users, as I've done
several times while working on the build system.
Even pre-automake, we rely on gmake features for pattern
substitutions, and replacing those with reams more make code is not
interesting. This will let us turn the old Makefiles using pattern
substitutions into automake without spewing warnings.
Reviewed-by: Dan Nicholson <dbn.lists@gmail.com>
1) We need to insert a barrier between consecutive transform feedback calls.
2) VBO cache needs to be flushed when TFB output is used as VBO draw input.
Fixes Piglit test EXT_transform_feedback/immediate-reuse.
Thanks to Christoph Bumiller for pointing out bugs in previous versions
of this patch.
gl_ClipDistance needs special treatment in form of lowering pass
which transforms gl_ClipDistance representation from float[] to
vec4[]. There are 2 implementations - at glsl linker level (enabled
by LowerClipDistance option) and at glsl_to_tgsi level (enabled
unconditionally for gallium drivers). Second implementation is
incomplete - it does not take into account transform feedback (see
commit 642e5b413e "mesa: Fix transform
feedback of unsubscripted gl_ClipDistance array" for details).
There are 2 possible fixes:
- adding transform feedback support into glsl_to_tgsi version
- ripping gl_ClipDistance support from glsl_to_tgsi and enabling
gl_ClipDistance lowering on glsl linker side
This patch implements 2nd option. All it does is:
- reverts most of the commit 59be691638
"st/mesa: add support for gl_ClipDistance"
- changes LowerClipDistance to true
Fixes Piglit tests "EXT_transform_feedback/builtin-varyings
gl_ClipDistance[{2,3,4,5,6,7,8}]-no-subscript" at least on nv50
and evergreen cards.
From the GL 3.0 spec (p.116):
"Multisample rasterization is enabled or disabled by calling
Enable or Disable with the symbolic constant MULTISAMPLE."
Elsewhere in the spec, where multisample rasterization is described
(sections 3.4.3, 3.5.4, and 3.6.6), the following text is consistently
used:
"If MULTISAMPLE is enabled, and the value of SAMPLE_BUFFERS is
one, then..."
So, in other words, disabling GL_MULTISAMPLE should prevent
multisample rasterization from occurring, even if the draw framebuffer
is multisampled. This patch implements that behaviour by setting the
WM and SF stage's "multisample rasterization mode" to
MSRAST_ON_PATTERN only when the draw framebuffer is multisampled *and*
GL_MULTISAMPLE is enabled.
Fixes piglit test spec/EXT_framebuffer_multisample/enable-flag.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Due to hardware limitations, MSAA is unsupported on Gen6 for formats
containing >64 bits of data per pixel. From the Sandy Bridge PRM,
vol4 part1, p72 ("Surface Format"):
If Number of Multisamples is set to a value other than
MULTISAMPLECOUNT_1, this field cannot be set to the following
formats:
- any format with greater than 64 bits per element
- any compressed texture format (BC*)
- any YCRCB* format
Gen7 has a similar, but less stringent limitation: formats with >64
bits of data per pixel only support 4x MSAA.
This patch causes the unsupported formats to report
GL_FRAMEBUFFER_UNSUPPORTED.
Fixes piglit "multisample-formats" tests on Gen6.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Sandy Bridge and later don't use this field, so there's no point in
setting it. It can only cause harmful state-based recompiles.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The system array values concept doesn't really because it expects the
system values to be fixed per call, which is wrong for gl_VertexID and
iffy for gl_SampleID. So this patch does two things:
- kill the array, have emit_fetch_system_value directly pick the
values it needs (only gl_InstanceID for now, as the previous code)
- correctly handle the expected type in emit_fetch_system_value
Signed-off-by: Olivier Galibert <galibert@pobox.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This includes:
- picking up correctly which attributes are flatshaded and which are
noperspective
- copying the flatshaded attributes when needed, including the
non-built-in ones
- correctly interpolating the noperspective attributes in screen-space
instead than in a 3d-correct fashion.
Signed-off-by: Olivier Galibert <galibert@pobox.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
z or stencil texture should not be created with the z/stencil
flags for surface creation as they are intended to be bound
as texture.
v2: remove broken code
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Solaris Studio C compiler does not support anonymous structs and
anonymous unions.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
The idea here is to rewrite comparisons like 2 >= x with x <= 2; we want
to simply exchange arguments, not negate the condition. If equality was
part of the original comparison, it should remain part of the swapped
version.
This is the true cause of bug #50298. It didn't manifest itself on
Sandybridge because we embed the conditional modifier in the IF
instruction rather than emitting a CMP. All other platforms use CMP.
It also didn't manifest itself on the master branch because commit
be5f27a84d ("glsl: Refine the loop instruction counting.") papered over
the problem.
NOTE: This is a candidate for stable release branches.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=50298
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Fixes build error on Cygwin and Solaris. _R, _G, and _B are used in
ctype.h on those platforms.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
This fixes a bug where a sampler view was using stale texture/resource
data when the texture was modified through a surface (render to texture).
Bumping the texture and layer ages triggers sampler view revalidation.
Fixes piglit fbo-blit failure.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
This lets us select the front buffer for reading under GLES2.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This extra condition checks the API not the version of the API, so rename
to reflect that.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is failing sometimes, probably because TargetData keeps a structure layout
cache, which can becomes bogus, ever since the InvalidateStructLayoutInfo API
was removed in LLVM r135245.
This change merely makes the problem easier to diagnose (an assertion
failure instead of a random crash).
instead of failing to allocate a renderbuffer.
This also fixes piglit/get-renderbuffer-internalformat with non-renderable
formats.
Reviewed-by: Brian Paul <brianp@vmware.com>
This allows drivers not to do any allocation in AllocStorage if the storage
cannot be allocated because of an unsupported internalformat + samples combo.
The little ugliness is that AllocStorage is expected to return TRUE in this
case.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This requires the latest streamout kernel patches.
Streamout is disabled by default on r7xx, so this patch is safe for regular
users.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Note: for the moment TGSI_OPCODE_F2U is implemented using
lp_build_itrunc() (the same function used to implement
TGSI_OPCODE_F2I). In the long run, we should create an
lp_build_utrunc() function to do the proper conversion. But this
should allow us to limp along with mostly correct behaviour for now.
Previously, we performed conversions from float->uint by a two step
process: float->int->uint. However, on platforms that use saturating
conversions (e.g. i965), this didn't work, because if the source value
was larger than the maximum representable int (0x7fffffff), then
converting it to an int would clamp it to 0x7fffffff.
This patch just adds the new opcode; further patches will adapt
optimization passes and back-ends to use it, and then finally the
ast_to_hir logic will be modified to emit the new opcode.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch modifies blorp blits (which are used for MSAA) to properly
account for clipping of source coordinates. Previously, if we
detected the possibility of source clipping, we would fall back to the
blit meta-op, which doesn't support MSAA and is very slow for depth
and stencil buffers.
Fixes piglit tests
"EXT_framebuffer_multisample/clip-and-scissor-blit" on i965/Gen6+.
Also substantially speeds up the Humble Bundle V game "Psychonauts" on
Gen6+ (without this patch, the game's depth buffer blits use the slow
blit meta-op).
Reviewed-by: Carl Worth <cworth@cworth.org>
This allows to submit things to the compute only
rings on cayman+
v2: rebased on current master and actually make use
of the new flag in evergreen_compute.c
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
When drawing a depth image the fragment shader also needs to emit the
current raster color.
The new piglit drawpix-z test exercises this.
NOTE: This is a candiate for the 8.0 branch.
This patch updates .gitignore files to account for the new build
artifacts introduced by the following commits:
ae376f0 glx/tests: Rename test as glx-test
8fecdcc mesa/tests: Add tests for _mesa_lookup_enum_by_{name,nr} functions
a29ad2b mesa/tests: Add tests for the generated dispatch table
Haiku targets the Pentium or higher processor.
To ensure compatibility we can do march 586 and
mtune 686. Mesa will still use sse however if
the cpu supports it (and the stack is properly
aligned). These flags only effect the internal
compiler optimizations.
Previously, rbug_*.c would fail to compile with incomplete prototype
errors when make was run from the command line on my machine. My IDE
always built fine, and still does after this patch (Netbeans 7.1.2).
Most of the includes from files in gallium/auxiliary/rbug/* were
assuming an rbug/ subdirectory, while the headers are actually in the
same directory as the .c files.
The build error was also previously a problem for me on Ubuntu 11.10
and Mint 12.
Fixes build for the following configuration: ./autogen.sh
--enable-debug --enable-texture-float --with-gallium-drivers=r600
--with-dri-drivers=radeon --enable-r600-llvm-compiler
Signed-off-by: Brian Paul <brianp@vmware.com>
In single precision, 1.5707963 becomes 1.5707962513 which is too
small. However, 1.5707964 becomes 1.5707963705 which is just right.
The value 1.5707964 is already used in asin.ir.
NOTE: This is a candidate for stable release branches.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Olivier Galibert <galibert@pobox.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
There is no GLX protocol for these functions. Open-source Linux
driver have not supported this extension for many years, and it seems
unlikely at this point that this support will return. There's no
reason to have slots for these functions in the dispatch table.
The unit tests (GetProcAddress::TableDidntShrink and others) are also updated.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
There is no GLX protocol for these functions. No open-source Linux
driver has ever supported this extension, and it seems unlikely at
this point that one ever will. There's no reason to have slots for
these functions in the dispatch table.
The unit tests (GetProcAddress::TableDidntShrink and others) are also updated.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
There is no GLX protocol for these functions. No open-source Linux
driver has ever supported this extension, and it seems unlikely at
this point that one ever will. There's no reason to have slots for
these functions in the dispatch table.
The unit tests (GetProcAddress::TableDidntShrink and others) are also updated.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
There is no GLX protocol for these functions, and no Linux driver has
ever supported this extension. There's no reason to have slots for
these functions in the dispatch table.
The unit tests (GetProcAddress::TableDidntShrink and others) are also updated.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
There is no GLX protocol for this function. Open-source Linux driver
have not supported this extension for many years, and it seems
unlikely at this point that this support will return. There's no
reason to have slots for this function in the dispatch table.
The unit tests (GetProcAddress::TableDidntShrink and others) are also updated.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
There is no GLX protocol for these functions, and no Linux driver has
ever supported this extension. There's no reason to have slots for
these functions in the dispatch table.
The unit tests (GetProcAddress::TableDidntShrink and others) are also updated.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
These are from OpenGL 3.1 and ARB_uniform_buffer_object. I only added
them to 3.1 because that required the least work.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
These are from OpenGL 3.3, ARB_texture_swizzle, and
EXT_texture_swizzle (with different names). I only added them to 3.3
because that required the least work.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Determines whether it's a basis vector, i.e., a vector with one element
equal to 1 and all other elements equal to 0.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When a value was replaced, the new key was strdup'd and leaked.
To fix this, we modify the hash table implementation to return
whether the value was replaced and free() the (now useless)
duplicate string.
When we have multiple shared contexts, and one of them is
long-running, this will lead to never freeing those resources
since they are shared. Instead, free them right away on context
destruction since we know the other context isn't using them.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
NOTE: This is a candidate for the 8.0 branch.
From the GL_NV_primitive_restart spec:
"PrimitiveRestartIndexNV is not compiled into display lists, but is
executed immediately."
Prior to this patch, calls to glPrimitiveRestartIndex would hit the noop
dispatch stub.
+2 oglconforms.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
From the GL_ARB_copy_buffer spec:
"An INVALID_VALUE error is generated if any of readoffset, writeoffset,
or size are negative [...]"
Fixes oglconform's copybuffer/negative.CNNegativeValues test.
NOTE: This is a candidate for stable release branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The warnings appear to occur with newer automake (probably 1.12).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
These allow one to mangle the library names, without also mangling the
symbol names, to make them distinct from other GL libraries on the
system.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Because these classes are used entirely from their own source files
and not from separate DSOs, the linker gets to produce massively less
code. This cuts about 13k of text in the libdricore case. In the
non-libdricore case, the additional linkage information allows the
compiler to inline some code, so libglsl.a size actually increases by
about 300 bytes.
For a dricore build, improves shader_runner runtime on
glsl-fs-copy-propagation-texcoords-1 by 0.21% +/- 0.03% (n=353574,
outliers removed). No statistically significant difference with n=322
on glslparsertest on a yofrankie shader intended to test compiler
performance.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now we have just one library of "all of Mesa core" instead of both
libdricore and libglsl that drivers link against.
I did this change in a sort of nonrecursive make fashion: the
generated files are still produced in the non-automake build, like the
rest of dricore, but the GLSL files are stuffed into libdricore
without building a convenience library in src/glsl (even though we
could now). This would make a bit more sense if glsl was just another
dir under src/mesa, because right now I had to contort the prefix
variable name to look another ../ level up.
This is part of a series to fix our build issues in the automake case
by hooking up the automatic Makefile regeneration support. The
extract_git_sha1 is moved into src/mesa/Makefile so that we get
correct dependency generation.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I tried to update all the old Makefiles that included the default
config to be sure they had a default target if they didn't previously
have one, since this new all target will always point at it. Almost
everything had one.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Some more of the files are now autogenerated, this caused build breakage,
patch adds generation of these missing files. Patch also changes existing
make so that the files are created to be part of the local source
(not intermediate directory, this causes several problems).
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
This patch fixes a copy/paste error and masking of depth/stencil (stencil
is in the top 8 bits), and makes glean/readPixSanity happy.
Both the stencil and the depth buffer piglit test also pass if
glClear(DEPTH | STENCIL) is executed instead of
glClear(DEPTH)/glClear(STENCIL).
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Tested-by: Christopher Egert <cme3000@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
remove archaic .cvsignore
*.pyo is already in toplevel .gitignore
*.pyc is already in toplevel .gitignore
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, blits using the "blorp" mechanism only worked for 8-bit
RGBA color buffers, 24-bit depth buffers, and 8 bit stencil buffers.
This was not enough, because the blorp mechanism must be used for
blitting whenever MSAA is in use. This patch allows all formats to be
used, provided the source and destination formats match.
So far I have confirmed that the following formats work properly with
MSAA:
- GL_RGB
- GL_RGBA
- GL_ALPHA
- GL_ALPHA4
- GL_ALPHA8
- GL_R3_G3_B2
- GL_RGB4
- GL_RGB5
- GL_RGB8
- GL_RGB10
- GL_RGB12
- GL_RGB16
- GL_RGBA2
- GL_RGBA4
- GL_RGB5_A1
- GL_RGBA8
- GL_RGB10_A2
- GL_RGBA12
- GL_RGBA16
Fixes piglit tests "EXT_framebuffer_multisample/formats {2,4}" on
Sandy Bridge and Ivy Bridge.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously the blorp engine only supported RGBA8 color buffers and
24-bit depth buffers. This patch adds support for any color buffer
format that is supported as a render target, and for 16-bit and 32-bit
depth buffers.
This required threading the brw_context struct through into
brw_blorp_surface_info::set() so that it can consult the
brw->render_target_format array.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Even though brw_blorp_surface_info is derived from brw_blorp_mip_info,
this function doesn't need to be virtual, because it is never accessed
through a base class pointer. Making the function non-virtual will
allow it to take additional parameters in the brw_blorp_surface_info
case.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch moves the responsibility for deciding on the format of the
source and destination surfaces from the
gen{6,7}_blorp_emit_surface_state() functions to
brw_blorp_surface_info::set(), which is shared between Gen6 and Gen7.
This will make it possible to add support for more surface formats
without code duplication.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
TGSI doesn't need an opcode, since registers are untyped (but beware
once doubles come into the scene). Mesa IR doesn't handle native
integers, so trying to handle them there is worthless, the case
entries are only added for warning reasons.
It was only tested with softpipe, since llvmpipe doesn't support glsl
1.3 yet.
Signed-off-by: Olivier Galibert <galibert@pobox.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
That adds support for activating the extension. It doesn't actually
*do* anything yet, of course.
Signed-off-by: Olivier Galibert <galibert@pobox.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
From the issues section of the GL_ARB_texture_compression_rgtc extension:
15) What should glGetTexLevelParameter return for
GL_TEXTURE_GREEN_SIZE and GL_TEXTURE_BLUE_SIZE for the RGTC1
formats? What should glGetTexLevelParameter return for
GL_TEXTURE_BLUE_SIZE for the RGTC2 formats?
RESOLVED: Zero bits.
These formats always return 0.0 for these respective components
and have no bits devoted to these components.
Returning 8 bits for red size of RGTC1 and the red and green
sizes of RGTC2 makes sense because that's the maximum potential
precision for the uncompressed texels.
Thus, we need to return 8 bits for GL_TEXTURE_RED_SIZE on all RGTC formats
and 8 bits for GL_TEXTURE_GREEN_SIZE on RGTC2 formats. BLUE should be 0.
Fixes oglconform/rgtc/advanced.texture_fetch.tex_param.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
While ~loop_state() is already freeing the loop_variable_state objects
via ralloc_free(this->mem_ctx), the ~loop_variable_state() destructor
was never getting called, so the hash table inside loop_variable_state
was never getting destroyed.
Fixes a memory leak in any shader with loops.
NOTE: This is a candidate for stable release branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
The functions for handling 1D, 2D and 3D texture images were nearly
identical. This folds them all together.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We can't remove this pass yet, because we need it to convert AMDIL
registers in BRANCH* instructions, but we don't need it for
instruction conversion any more.
OpenGL allows you to declare user-defined fragment shader outputs with
less than four components:
out ivec2 color;
This makes sense if you're rendering to an RG format render target.
Previously, we assumed that all color outputs had four components (like
the built-in gl_FragColor/gl_FragData variables). This caused us to
call emit_color_write for invalid indices, incrementing the output
virtual GRF's reg_offset beyond the size of the register.
This caused cascading failures: split_virtual_grfs would allocate new
size-1 registers based on the virtual GRF size, but then proceed to
rewrite the out-of-bounds accesses assuming that it had allocated enough
new (contiguously numbered) registers. This resulted in instructions
that accessed size-1 GRFs which register numbers beyond
virtual_grf_next (i.e. registers that were never allocated).
Finally, this manifested as live variable analysis and instruction
scheduling accessing their temporary array with an out of bounds index
(as they're all sized based on virtual_grf_next), and the program would
segfault.
It looks like the hardware's Render Target Write message requires you to
send four components, even for RT formats such as RG or RGB. This patch
continues to use all four MRFs, but doesn't bother to fill any data for
the last few, which should be unused.
+2 oglconforms.
NOTE: This is a candidate for stable release branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Commit 4650aea7a5 fixed texelFetchOffset()
on Ivybridge, but didn't update the Ironlake/Sandybridge code.
+18 piglits on Sandybridge.
NOTE: This and 4650aea7a5 are both candidates for stable branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Commit f41ecade7b fixed texelFetchOffset()
on Ivybridge, but didn't update the Ironlake/Sandybridge code.
+15 piglits on Sandybridge.
NOTE: This and f41ecade7b are both candidates for stable branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
This isn't saved/restored by _mesa_meta_begin, so we need to do it
manually (like we do for the read/draw framebuffers). Additionally,
we neglected to re-bind before the glRenderbufferStorage call.
+13 oglconforms.
NOTE: This is a candidate for stable release branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
DeleteBuffer needs to unbind from these binding points as well, based on
the same rationale as the previous patch.
+51 oglconforms (together with the last patch).
NOTE: This is a candidate for stable release branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
_mesa_lookup_bufferobj returns NULL for 0, which caused us to say
"there's no such buffer object" and raise an error, rather than
correctly binding the shared NullBufferObj.
Now you can unbind your buffers.
NOTE: This is a candidate for stable release branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
According to the GL 3.1 spec, section 2.9 ("Buffer Objects"):
"If a buffer object is deleted while it is bound, all bindings to that
object in the current context (i.e. in the thread that called
DeleteBuffers) are reset to zero."
The code already checked for a number of cases, but neglected these
newer binding points.
+21 oglconforms.
NOTE: This is a candidate for stable release branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
We were incorrectly assuming that the coordinate's dimensionality is
equal to the gradient's dimensionality. For array types, the coordinate
has one more component.
Fixes 12 subcases of oglconform's glsl-bif-tex-grad test.
NOTE: This is a candidate for stable release branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Currently, if you pass --with-egl-platforms=x11 but xcb-dri2 isn't available
we just silently fail and disables building the EGL DRI2 driver.
This commit cleans up the EGL platfrom checking and fails if a selected
platform can't find its required dependencies.
Reviewed-by: Eric Anholt <eric@anholt.net>
Commit a07cf3397e added support for TBOs
on Gen7, but missed Gen6.
Passes piglit -t texture_buffer and oglconform's buffermapping
basic.read.texture tests.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
According to Table 6.17 in the GL 2.1 specification, DEPTH_TEXTURE_MODE,
TEXTURE_COMPARE_MODE, and TEXTURE_COMPARE_FUNC need to be restored on
glPopAttrib(GL_TEXTURE_BIT).
Makes a number of oglconform tests happier.
v2: Make restoration conditional on the ARB_shadow and ARB_depth_texture
extensions, as suggested by Brian. I'm not sure that any
implementations still remain that don't support those, but why not?
NOTE: This is a candidate for stable release branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
this fixes libdricore directory build with --enable-32-bit on a x86_64 system
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
The VTX_READ instructions were using the ADDRParam ComplexPattern which
allows a load instruction's offset to be a register, but VTX_READ
instructions can only handle an immediate offset.
Also, the load_param pattern fragment had an erroneous return true;
statement that was causing it to match the wrong load instructions.
Tungsten Graphics has not existed for several years, and the majority of
ongoing development and support is done by Intel. I chose to include
"Open Source Technology Center" to distinguish it from, say, the closed
source Windows OpenGL driver.
The one downside to this patch is that applications that pattern match
against "Intel" may start applying workarounds meant for the Windows
driver. However, it does seem like the right thing to do.
This does change oglconform behavior.
Acked-by: Eric Anholt <eric@anholt.net>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Acked-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
These look like debug messages from the switch-statement development.
NOTE: This is a candidate for the 8.0 release branch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Tom Stellard:
- Updated for gallium interface changes
- Fixed a few bugs:
+ Set the loop counter
+ Calculate the correct number of pipes
- Added hooks into the LLVM compiler
v2:
-Separate IR type and LLVM triple
-Do the OpenCL C->LLVM IR and linking steps for all PIPE_SHADER_IR
types.
v3:
- Coding style fixes
- Removed compatibility code for LLVM < 3.1
- Split build_module_llvm() into three functions:
compile(), link(), and build_module_llvm()
v4:
- Use struct pipe_compute_program
v5:
- Don't malloc memory for struct pipe_llvm_program
v6:
- Fix serialization of llvm bytecode
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
This structure is used as a header that precedes LLVM bytecode programs
that are passed to the drivers.
v2:
- s/pipe_compute_program/pipe_llvm_program/
v3:
- Rename to struct pipe_llvm_program_header
- Drop the char * prog member
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
This is for the llvm code that can't use extended initializers.
v2:
- Use const references for vector arguments
- Move constructor defs before data members
- Initialize all values in the default constructors
v3:
- Fix typo
A device now has two function for getting information about the IR
it needs to return.
ir_format() => returns the preferred IR
ir_target() => returns the triple for the target that is understood by
clang/llvm.
v2:
- renamed ir_target() to ir_format()
- renamed llvm_triple() to ir_target()
v3:
- Remove unnecessary include
- Do proper conversion from std::vector<char> to std::string
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
v2: Tom Stellard
- Update CAP description
v3: Tom Stellard
- TGSI targets should pass an empty string for this CAP.
v4: Tom Stellard
- TGSI targets can ignore this CAP.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
TEX instructions can't do saturation. Do the TEX into a temp reg w/out
saturation, then do a MOV_SAT.
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
Some distributions (like Arch Linux) make /usr/bin/python Python 3,
rather than Python 2. Since compare_ir uses /usr/bin/env python,
such systems will fail to run optimization-test, causing 'make check' to
always fail.
Automake's TESTS_ENVIRONMENT variable provides a mechanism to run
programs or set environment variables in the test environment.
Ideally, I think we would want to use AM_TESTS_ENVIRONMENT, since
TESTS_ENVIRONMENT is supposed to be user-overridable. However, it isn't
supported using the default/serial test runner.
Fixes 'make check' on Arch Linux and Gentoo.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Matt Turner <mattst88@gmail.com>
I started writing unit tests for a new piece of code, and discovered
they all failed due to a bug in ralloc. Clearly it needs a test suite.
v2: Rename to 'ralloc-test' and fix copyright date. (idr review)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
If an object is allocated out of the NULL context, info->parent will be
NULL. Using the PTR_FROM_HEADER macro would be incorrect: it would say
that ralloc_parent(ralloc_context(NULL)) == sizeof(ralloc_header).
Fixes the new "null_parent" unit test.
NOTE: This is a candidate for the 7.9, 7.10, 7.11, and 8.0 branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Discovered while running the Khronos conformance test suite and
receiving "implementation error: meta program compile failed."
This bug was recently introduced by the i965 clear patch set and would
only be detected while using the ES2 API and only on gen6+ hardware.
Signed-off-by: Oliver McFadden <oliver.mcfadden@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is performed in a subdirectory to avoid needing to convert all of
src/mesa/Makefile in one go.
I can now cherry-pick a commit containing glapi XML changes, do "(cd
src/mapi/glapi/gen && make) && make", and get a working driver.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In order to do the minimal change for libdricore conversion to
automake, I need to put its Makefile.am in a subdirectory. Automake
gets whiny/broken if you use GNU make features like "addprefix" or
"$(FILES:%=../%)" to munge your *_SOURCES. So, use a plain old
variable to be able to substitute in that "../"
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*_SOURCES is reserved for files lists for particular automake targets.
Also, "-" in the variable names is not allowed.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This variable won't be set when called from non-automake makefiles,
but it cleans up shared-glapi's output.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Mesa already always depends on python to build. The checked in
changes are not reviewed (because any trivial change rewrites the
world). We also have been pushing commits between xml change and
regen where at-build-time xml-generated code disagrees with committed
xml-generated code. And worst of all, sometimes we ("I") check in
*stale* xml-generated code.
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
commit 87f12bb2d9 tried to fix rb->mt
being NULL, but change this case wrong.
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Kurt Roeckx <kurt@roeckx.be>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We now model loading uses sgpr values with LLVM IR load instructions that
use the USER_SGPR address space.
The definition of the sgpr parameter to the use_sgpr() helper function
in radeonsi_shader.c has changed so that you can pass raw sgpr values
rather than having to divide the sgpr value you want to use by the dword
width of the type you want to load.
This function was causing compile errors in the tablegen'd code for
some intrinsic definitions. I don't think we really need this function,
so I'm removing the function body just as a temporary solution. I'll
look into removing the entire AMDILIntrinsicInfo class later.
v2: use a define for the maximum sample count
v3: also test odd sample counts (r300 supports MS3)
While multisample renderbuffers are supported by mesa, MS visuals
are not, so we need a way to tell dri/st not to advertise them even
if the gallium driver does support multisampled surfaces.
Otherwise applications selecting these non-functional visuals would
run into trouble ...
Reviewed-by: Brian Paul <brianp@vmware.com>
The code which scans the index buffer for restart indexes wasn't adding
the index buffer offset so we were always starting at offset=0. The
offset is usually zero so it wasn't noticed before.
Fixes a failure in the piglit primitive-restart test when testing
vertex data + index data in a single VBO.
NOTE: This is a candidate for the 8.0 branch.
Basic 4x MSAA support now works on Gen7. This patch enables it.
As with Gen6, MSAA support is still fairly preliminary. In
particular, the following are not yet supported:
- 8x oversampling (Gen7 has hardware support for this, but we do not
yet expose it).
- Fully general blits between MSAA and non-MSAA buffers.
- Formats other than RGBA8, DEPTH24, and STENCIL8.
- Centrold interpolation.
- Coverage parameters (glSampleCoverage, GL_SAMPLE_ALPHA_TO_COVERAGE,
GL_SAMPLE_ALPHA_TO_ONE, GL_SAMPLE_COVERAGE, GL_SAMPLE_COVERAGE_VALUE,
GL_SAMPLE_COVERAGE_INVERT).
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
On Gen6, the blending necessary to blit an MSAA surface to a non-MSAA
surface could be accomplished with a single texturing operation. On
Gen7, the WM program must fetch each sample and blend them together
manually. From the Bspec (Shared Functions/Messages/Initiating
Message/Message Types/sample):
[DevIVB+]:Number of Multisamples on the associated surface must be
MULTISAMPLECOUNT_1.
This patch implements the manual blend operation.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Since blorp uses color textures and render targets to do all its work
(even when blitting stencil and depth data), it always has to
configure the Gen7 GPU to use the new "sliced" MSAA layout. However,
when blitting stencil or depth data, the actual MSAA layout is
interleaved (as in Gen6). Therefore, blorp has to do extra coordinate
transformation work to account for the interleaving manually.
This patch causes blorp to perform the necessary extra coordinate
transformations.
It also modifies the blorp SURFACE_STATE setup code for Gen7, so that
it does not try to correct the surface width and height to account for
MSAA, since "sliced" MSAA layout doesn't affect the surface width or
height.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
When a Gen7 SURFACE_STATE is configured for MSAA, a number of
additional constaints come in to play. This patch adds a function
gen7_check_surface_setup() which verifies that all of those
constraints are met.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Starting in Gen7, there are two possible layouts for MSAA surfaces:
- Interleaved, in which additional samples are accommodated by scaling
up the width and height of the surface. This is the only layout
available in Gen6. On Gen7 it is used for depth and stencil
surfaces only.
- Sliced, in which the surface is stored as a 2D array, with array
slice n containing all pixel data for sample n. On Gen7 this layout
is used for color surfaces.
The "Sliced" layout has an additional requirement: it must be used in
ARYSPC_LOD0 mode, which means that the surface doesn't leave any extra
room between array slices for miplevels other than 0.
This patch modifies the surface allocation functions to use the
correct layout when allocating MSAA surfaces in Gen7, and to set the
array offsets properly when using ARYSPC_LOD0 mode. It also modifies
the code that populates SURFACE_STATE structures to ensure that
ARYSPC_LOD0 mode is selected in the appropriate circumstances.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Gen7 support for blorp (blits using the render bath) now works for
non-MSAA purposes. This patch enables it.
Since blorp operations re-use the logic for HiZ ops, this required
adding a case to the switch statement in gen7_blorp_emit_wm_config(),
to allow for the case where no HiZ op is being performed.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
On Gen6, texel fetch is always accomplished using the SAMPLE_LD
message, which accepts arguments (u, v, r, lod, si). On Gen7, there
are two* texel fetch messages: SAMPLE_LD for non-MSAA surfaces, taking
arguments (u, lod, v), and SAMPLE_LD2DSS for MSAA surfaces, taking
arguments (si, u, v).
*Technically, there are other texel fetch messages, but they are used
for "compressed" MSAA surfaces, which we don't yet support.
This patch adds the proper message types and argument orderings for
Gen7.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Gen7 hardware requires us to enable at least one WM dispatch mode,
even if there is no program being dispatched to. When this code was
only used for HiZ operations (which don't use a WM program), we used
32-pixel dispatch, because it didn't matter. But blit programs are
compiled for 16-pixel dispatch. So just enable 16-wide dispatch
unconditionally.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Enable 16-wide dispatch unconditionally rather than add the
unnecessary complication of using 32-wide dispatch when there is no WM
program.
On Gen7, push constants for shader programs are stored in the URB, so
blorp code needs to set aside space for them. This was previously
unnecessary because blorp code was based on HiZ operations, which
don't require any shaders.
This patch adds a call from gen7_blorp_exec() to
gen7_allocate_push_constants(), to ensure that push constants are
assigned the correct location in the URB. It also extracts a new
function gen7_emit_urb_state() from gen7_upload_urb(), which is
re-used by gen7_blorp_emit_urb_config() to ensure that the URB regions
used by all the pipeline stages leave room for the push constants.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We know from previous bug fixes (commits
c25e5300cb and
b2ace06cbb) that texture border color
doesn't work if the dynamic state upper bound is set to 0. Although
the blorp engine doesn't make use of texture borders, it seems like we
ought to err on the safe side and set this value properly.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch separates out the portions of gen6_blorp_emit_batch_head()
that emit 3DSTATE_MULTISAMPLE, 3DSTATE_SAMPLE_MASK, and
STATE_BASE_ADDRESS. This paves the way for making the blorp code work
on Gen7, where additional command packets
(3DSTATE_PUSH_CONSTANT_ALLOC_VS and 3DSTATE_PUSH_CONSTANT_ALLOC_PS)
need to be emitted before 3DSTATE_MULTISAMPLE.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch modifies the "blorp" WM program so that it can be run in
MSDISPMODE_PERSAMPLE (which means that every single sample of a
multisampled render target is dispatched to the WM program, not just
every pixel).
Previously we were using the ugly hack of configuring multisampled
destination surfaces as single-sampled, and generating sample indices
other than zero by swizzling the pixel coordinates in the WM program.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This patch modifies the function brw_blorp_blit_program::texel_fetch()
to emit the SI (sample index) argument to the SAMPLE_LD message when
reading from a sample index other than zero.
Previously we were using the ugly hack of configuring multisampled
source surfaces as single-sampled, and accessing sample indices other
than zero by swizzling the texture coordinates in the WM program.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch generalizes the function
brw_blorp_blit_program::texture_lookup() so that it prepares the
arguments to the sampler message based on a caller-provided array
rather than assuming the argument order is always (u, v).
This paves the way for the messages we will need to use in Gen7, which
use argument orders (u, lod, v) and (si, u, v) (si=sample index).
It will also will allow us to read from arbitrary sample indices on
Gen6, by supplying the arguments (u, v, r, lod, si) to the SAMPLE_LD
message instead of just (u, v).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Gen6 MSAA buffers (and Gen7 MSAA depth/stencil buffers) interleave
MSAA samples in a complex pattern that repeats every 2x2 pixel block.
Therefore, when allocating an MSAA buffer, we need to make sure to
allocate an integer number of 2x2 blocks; if we don't, then some of
the samples in the last row and column will be cut off.
Fixes piglit tests "EXT_framebuffer_multisample/unaligned-blit {2,4}
color msaa" on i965/Gen6.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Without passing the -ldflags parameter before $(LDFLAGS) in some cases
flags will be passed to MKLIB which it does not understand.
This might be -m64, -m32 or similar.
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Thomas Gstädtner <thomas@gstaedtner.net>
Signed-off-by: Brian Paul <brianp@vmware.com>
This patch gets the FreeBSD SCons build working again. The build still
fails though.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
We need to return immediately after inserting instructions that require
S_WAITCNT so that the parent class' custom inserter won't try to insert
them again.
Fix uninitialized scalar variable defects report by Coverity.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
We should just set the bits of functionality that we support; the
GL/ES1/ES2 flags in extensions.c will take care of advertising the
appropriate extensions for the current API.
This enables the GL_EXT_texture_compression_dxt1 extension on ES1/ES2
when libtxc_dxtn is installed or the force_s3tc driconf option is set.
The main extension code set this up properly, but the ES-specific code
failed to do so.
Otherwise, the extension strings reported by es1_info, es2_info, and
glxinfo all remain the same.
This patch manually disables the ARB_framebuffer_object bit on ES
to preserve the behavior of 1c0f5d8324.
v2: Rebase, fix the i915 Makefile, and unconditionally set the
OES_draw_texture bit as core Mesa will only apply it to ES1 now.
Tested-by: Daniel Charles <daniel.charles@intel.com> [v1]
Reviewed-by: Chad Versace <chad.versace@linux.intel.com> [v1]
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
If the primitive restart index and the primitive type can
be handled by the cut index feature, then use the hardware
to handle the primitive restart feature.
The VBO module's software handling of primitive restart is
used as a fall back.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
For newer hardware we disable the VBO module's software handling
of primitive restart. We now handle primitive restarts in
brw_handle_primitive_restart.
The initial version of brw_handle_primitive_restart simply calls
vbo_sw_primitive_restart, and therefore still uses the VBO
module software primitive restart support.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
When considering which components of a variable were killed by an
assignment, constant propagation would previously just use the write
mask of the assignment. This worked if the LHS of the assignment was
simple, e.g.:
v.xy = ...; // (assign (xy) (var_ref v) ...)
But it did the wrong thing if the LHS of the assignment involved an
array indexing operator, since in this case the write mask is always
(x):
v[i] = ...; // (assign (x) (deref_array (var_ref v) (var_ref i)) ...)
In general, we can't predict which vector component will be selected
by array indexing, so the only safe thing to do in this case is to
kill the entire variable.
Fixes piglit tests {fs,vs}-vector-indexing-kills-all-channels.shader_test.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Now that the linker handles initializers of samplers just like any
other uniform, a bunch of this annoying code is unnecessary.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The linker may have set initial values for uniforms. Propagate these
values to the driver's backing storage when it is first associated.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Fix handling of arrays-of-structure. Thanks to Eric Anholt for
pointing this out.
v3: Minor comment change based on feedback from Ken.
Fixes piglit glsl-1.20/execution/uniform-initializer/fs-structure-array
and glsl-1.20/execution/uniform-initializer/vs-structure-array.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Add support for gen6, and don't turn it on if blending is
disabled. (fixes GPU hang), and note it in docs/GL3.txt
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The i965 driver needed this as well for hardware setup, so instead of
duplicating the logic, just save it off.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
While it doesn't have the same warning in the simulator as in gen7,
let's emit it out of paranoia. We wouldn't want our resolves of some
previous clear to get clamped to some current clamping value.
Suggested-by: pretty much everyone
When doing fast clears, a fulsim warning said that the batch was being
emitted without the viewport set up. While the fast clear pass I was
looking at doesn't use the clear value, the later resolves which also
didn't set up the vieport would trigger the same. It's not obvious
from the error message whether it meant "fast clear value gets clamped
to something you haven't defined" or "fast clear value doesn't get
clamped, and I saw it was out of the current (uninitialized) range,
and you probably wanted it clamped to that (uninitialized) range". Be
paranoid and assume the first case.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Having this enum separate caused us to need a bunch of helper
functions to translate to the op to be executed.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
The GLSL clear path doesn't need any buffer presence checks, since
those are already handled in the normal drawing path code.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Our understanding is that the 3D engine is supposed to be faster
anyway. We used to have more overhead in our tri clear path than we
do today, which would have led to this choice. But given that we
almost always see a depth clear along with a color clear, the path was
hardly exercised anyway.
Also, the color mask logic was broken in the presence of
GL_EXT_draw_buffers2's per-buffer colormask.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Previously, when the environment variable INTEL_DEBUG=aub was set,
mesa would simply instruct DRM to start dumping data to an .aub file,
but we would not provide DRM with any information about the format of
the data in various buffers. As a result, a lot of the data in the
generate .aub file would be unannotated, making further data analysis
difficult.
This patch causes the entire contents of each batch buffer to be
annotated using the data in brw->state_batch_list (which was
previously used only to annotate the output of INTEL_DEBUG=bat). This
includes data that was allocated by brw_state_batch, such as binding
tables, surface and sampler states, depth/stencil state, and so on.
The new annotation mechanism requires DRM version 2.4.34.
Reviewed-by: Eric Anholt <eric@anholt.net>
When we are generating an AUB dump, we make a final call to
aub_dump_bmp() as the context is being destroyed, to ensure that any
rendering performed before the application exits can be seen during a
simulation run. However, we were doing this before flushing the batch
buffer; as a result simulation runs would not always see the effect of
all rendering commands.
This patch flushes the batch buffer just before making the final call
to aub_dump_bmp(), to ensure that all rendering is properly captured
in the final bitmap.
This is a long standing problem, that recently surfaced with the change
to enable perspective correct color interpolation.
A fix for all possible formats is left to the future.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Previously assumed normalised was 0 to 1, but it can be -1 to 1
if type is signed.
Tested with lp_test_conv and lp_test_format, reduced errors.
Signed-off-by: José Fonseca <jfonseca@vmware.com>
Fixing a /*FIXME*/ to remove errors in integer conversion in lp_build_conv.
Tested using lp_test_conv and lp_test_format, reduced errors.
Signed-off-by: José Fonseca <jfonseca@vmware.com>
This patch removes two Clang warnings in GLU:
The first one seems to be an actual bug in mapdesc.cc: Clang complains
that sizeof(dest) will return the size of REAL*[MAXCOORDS], instead of
the intended REAL[MAXCOORDS][MAXCOORDS]. The second one is just
cosmetic because Clang doesn't like extra parentheses.
NOTE: This is a candidate for the 8.0 branch
Reviewed-by: Brian Paul <brianp@vmware.com>
Fixes another case of sampler views being created by one context,
shared by another, then deleted by the first, leaving a dangling
pipe context pointer.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Use it where performance matters more and the exact method of float->int
conversion/rounding isn't terribly important. There should no net change
here since F_TO_I() is the new name of the old IROUND() function.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
The different implementations of IROUND() behaved differently and in
the case of fistp, depended on the current x86 FPU rounding mode.
This caused some tests like piglit roundmode-pixelstore and
roundmode-getintegerv to fail on 32-bit x86 but pass on 64-bit x86.
Now IROUND() always rounds to the nearest integer (away from zero).
The new F_TO_I function converts a float to an int by whatever means
is fastest. We'll use this where we're more concerned with performance
and not too worried to how the conversion is done.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
The IROUND converted all arguments to 0 or 1. That's not what we wanted.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
For zero-stride vertex arrays, the svga driver copies the value into
the constant value and uses that value in the shader. The recent
gallium-userbuf changes caused a regression in this. An example
symptom was per-primitive glColor3f() calls getting ignored.
Where we copied the vertex value from the vertex buffer to the
constant buffer we neglected to take into account the
pipe_vertex_buffer::buffer_offset field. Adding that value to the
source offset fixes the problem. Actually, it looks like we should
have been doing this all along, but it never was an issue before for
some reason.
If the MESA_GLSL env var contains "errors", GLSL compilation and
link errors will be reported to stderr.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Fix uninitialized scalar variable defect reported by Coverity.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Piglits test for fragment shaders pass, vertex shaders fail. The
actual failure seems to be in the interpolators, and not the
textureSize query.
Signed-off-by: Olivier Galibert <galibert@pobox.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: José Fonseca <jose.r.fonseca@gmail.com>
Fixes a bunch of piglit tests related to flat interpolation of floats.
Signed-off-by: Olivier Galibert <galibert@pobox.com>
Signed-off-by: José Fonseca <jose.r.fonseca@gmail.com>
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.