Choose MESA_FORMAT_ARGB2101010 when storing
GL_RGBA + GL_UNSIGNED_INT_2_10_10_10_REV or
GL_RGB + GL_UNSIGNED_INT_2_10_10_10_REV.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Mesa core's copyteximage calls the driver with format/type==GL_NONE
to "Allocate texture memory". In this case, we shouldn't call
_mesa_store_teximage.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
In ES or GL+GL_ARB_ES2_compatibility, the usage of
format = IMPLEMENTATION_COLOR_READ_FORMAT +
type = IMPLEMENTATION_COLOR_READ_TYPE
can function, even if the src/dst int vs. non-int types
differ.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
If the source read buffer is integer based, and the the read
pixels type is RGBA/UNSIGNED_BYTE, then use the integer pixel
conversion path.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
This function checks for ES3 compatible
format/type/internalFormat/dimension combinations.
[jordan.l.justen@intel.com: additional tweaks for gles3-gtf]
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
gles3conform expects than when converting from a signed
int to an unsigned byte, the output will be clamped at a
max of 0x7f. This impacts conversion from
int16_t => uint8_t and int32_t => uint8_t.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
This should be squashed into the earlier patch when mailing it out for
review or merging it to master.
The error path was missing a "return" like all the other error paths.
Also, we may as well call it glDrawBuffers in the error message since
the ARB suffix doesn't exist in ES 3.
Fixes error EGL_BAD_ATTRIBUTE in the tests below on Intel Sandybridge:
* piglit egl-create-context-verify-gl-flavor, testcase OpenGL ES 3.0
* gles3conform, revision 19700, when runnning GL3Tests with -fbo
This plumbing is added in order to comply with the EGL_KHR_create_context
spec. According to the EGL_KHR_create_context spec, it is illegal to call
eglCreateContext(EGL_CONTEXT_MAJOR_VERSION_KHR=3) with a config whose
EGL_RENDERABLE_TYPE does not contain the EGL_OPENGL_ES3_BIT_KHR. The
pertinent
portion of the spec is quoted below; the key word is "respectively".
* If <config> is not a valid EGLConfig, or does not support the
requested client API, then an EGL_BAD_CONFIG error is generated
(this includes requesting creation of an OpenGL ES 1.x, 2.0, or
3.0 context when the EGL_RENDERABLE_TYPE attribute of <config>
does not contain EGL_OPENGL_ES_BIT, EGL_OPENGL_ES2_BIT, or
EGL_OPENGL_ES3_BIT_KHR respectively).
To create this patch, I searched for all the ES2 bit plumbing by calling
`git grep "ES2_BIT\|DRI_API_GLES2" src/egl`, and then at each location
added a case for ES3.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
If the hardware/driver combo supports GLES3, then set the GLES3 bit in
intel_screen's bitmask of supported DRI API's. Neither the EGL nor GLX
layer uses the bit yet.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This enum corresponds to EGL_OPENGL_ES3_BIT_KHR.
Neither the GLX nor EGL layer use the enum yet.
I don't like the GLES bits. I'd prefer that all GLES APIs be exposed
through a single API bit, as is done in GLX_EXT_create_context_es_profile.
But, we need this GLES3 enum in order to do the plumbing necessary to
correctly support EGL_OPENGL_ES3_BIT_KHR as required by the
EGL_KHR_create_context spec.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Each driver (i830, i915, i965) used independent but similar code to
validate the requested context version. With the rececnt arrival of GLES3,
that logic has needed an update. Rather than apply identical updates to
each drivers validation code, let's just move the validation into the
shared routine intelInitContext.
This refactor required some incidental changes to functions
i830CreateContext and intelInitContext. For each function, this patch:
- Adds context version parameters to the signature.
- Adds a DRI_CTX_ERROR out param to the signature.
- Sets the DRI_CTX_ERROR at each early return.
Tested against gen6 with piglit egl-create-context-verify-gl-flavor.
Verified that this patch does not change the set of exposed EGL context
flavors.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Before this patch, intelInitScreen2 set DRIScreen::api_mask with the hacky
heuristic below:
if (gen >= 3)
api_mask = GL | GLES1 | GLES2;
else
api_mask = 0;
This hack was likely broken on gen2 (i830), but I don't care enough to
properly investigate. It appears that every EGLConfig on i830 has
EGL_RENDERABLE_TYPE=0, and thus eglCreateContext will never succeed.
Anyway, moving on to living drivers...
With the arrival of EGL_OPENGL_ES3_BIT_KHR, this heuristic is now
insufficient. We must enable the GLES3 bit if and only if the driver is
capable of creating a GLES3 context. This requires us to determine the
maximum supported context version supported by the hardware/driver for
each api *during initialization of intel_screen*.
Therefore, this patch adds four new fields to intel_screen which indicate
the maximum supported context version for each api:
max_gl_core_version
max_gl_compat_version
max_gl_es1_version
max_gl_es2_version
The api mask is now correctly set as:
api_mask = GL;
if (max_gl_es1_version > 0)
api_mask |= GLES1;
if (max_gl_es2_version > 0)
api_mask |= GLES2;
Tested against gen6 with piglit egl-create-context-verify-gl-flavor.
Verified that this patch does not change the set of exposed EGL context
flavors.
v2:
- Replace the if-tree on gen with a switch, for Ian.
- Unconditionally enable the DRI_API_OPENGL bit, for Ian.
v3:
- Drop max gl version to 1.4 on gen3 if !has_occlusion_query,
because occlusion queries entered core in 1.5. For Ian.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick.intel.com>
Since patch "i965: Validate requested GLES context version in
brwCreateContext", we have been able to create ES 3.0 contexts due to the
max version check. So...bump the max version.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
The GLSL ES 3.0 spec (Section 12.17) says:
"GLSL ES 1.00 removed token pasting and other functionality."
NOTE: This is a candidate for the stable branches.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Carl Worth <cworth@cworth.org>
Simply emitting a nicely-formatted error message if any undefined macro is
encountered in a parser context expecting an expression.
With this commit, the following piglit test now passes:
spec/glsl-es-3.00/compiler/undefined-macro.vert
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This can be triggered either by creation of a GLES context (with
api == API_OPENGLES2) or else by a #version directive with version
value 100 or with a string of "es" following the version value.
There's no behavioral change with this commit—just preparation for ES-specific
behavior in the preprocessor in the future.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
I'm not sure if this is the correct fix. The
_mesa_es_error_check_format_and_type function (used above in the ES 1
and 2 cases) was originally added for glTexImage checking and allows
GL_DEPTH_STENCIL/GL_UNSIGNED_INT_24_8 combinations. Using it in ES 3
causes other tests to regress.
Fixes es3conform's packed_depth_stencil_error test.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
INVALID_ENUM is for when the type is simply not known.
Fixes part of es3conform's packed_depth_stencil_error test.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
ES 3 specifies some formats as texture-only (i.e., not available for
renderbuffers).
See the "Required Texture Formats" section (pg 126) of the ES 3 spec.
Fixes es3conform's color_buffer_unsupported_format test.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
According to both the GL 3.0 and ES 3.0 specifications (table 2.7 for GL
and table 2.8 for ES), the default value of BUFFER_ACCESS_FLAGS is
supposed to be zero.
Note that there are two related quantities: the obsolete BUFFER_ACCESS
enum and the new BUFFER_ACCESS_FLAGS bitfield.
BUFFER_ACCESS can only be GL_READ_ONLY, GL_WRITE_ONLY, or GL_READ_WRITE;
BUFFER_ACCESS_FLAGS can easily represent all three via GL_MAP_WRITE_BIT,
GL_MAP_READ_BIT, and their logical or. It also supports more flags.
Thus, Mesa only stores the bitfield, and simply computes the old enum
when queried, via simplified_access_mode(bufObj->AccessFlags).
The tricky part is that, while BUFFER_ACCESS_FLAGS defaults to 0,
BUFFER_ACCESS defaults to GL_READ_WRITE for desktop [GL 3.0, table 2.8]
and GL_WRITE_ONLY_OES for ES [the GL_EXT_map_buffer_range extension].
Mesa tried to implement this by setting the default AccessFlags to
GL_MAP_READ_BIT | GL_MAP_WRITE_BIT on desktop, and GL_MAP_WRITE_BIT on
ES. But in all specifications, it needs to be 0.
This patch moves that logic into simplified_access_mode(): when
AccessFlags == 0, it now returns GL_READ_WRITE for desktop and
GL_WRITE_ONLY for ES 1/2. (BUFFER_ACCESS doesn't exist on ES 3.0,
so it's irrelevant there.)
With that in place, it changes the AccessFlags default to 0.
Fixes three es3conform tsets:
- copy_buffer_defaults
- map_buffer_range_modify_indices
- pixel_buffer_object_default_parameters
Perhaps most importantly, this patch adds comments quoting the relevant
spec paragraphs above each error condition.
It also makes three changes:
- For FBOs, GL_COLOR_ATTACHMENTm where m >= MaxDrawBuffers is supposed
to generate INVALID_OPERATION (not INVALID_ENUM).
- Constants that refer to multiple buffers (such as FRONT, BACK, LEFT,
RIGHT, and FRONT_AND_BACK) are supposed to generate INVALID_OPERATION,
not INVALID_ENUM.
- In ES 3.0, for FBOs, buffers[i] must be NONE or GL_COLOR_ATTACHMENTi
or else INVALID_OPERATION occurs. (This is a new restriction.)
Fixes es3conform's draw-buffers-api test.
This requires some derived state. The cut vertex used is either the
value specified by glPrimitiveRestartIndex or it's hard-coded to ~0.
The derived state gl_array_attrib::_RestartIndex captures this value.
In addition, the derived state gl_array_attrib::_PrimitiveRestart is set
whenever either gl_array_attrib::PrimitiveRestart or
gl_array_attrib::PrimitiveRestartFixedIndex is set.
v2: Use _mesa_is_gles3.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
The function was named badly and wasn't in the dispatch table,
making it hard to find.
Fixes transform_feedback2_states and gets a few other transform
feedback tests closer to working in es3conform.
Reviewed-by Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
For glGetIntegerv, add support for the following in an OpenGL ES 3.0
context:
GL_MAJOR_VERSION
GL_MINOR_VERSION
GL_NUM_EXTENSIONS
See Table 6.29 of the OpenGL ES 3.0 spec.
Fixes error GL_INVALID_ENUM in piglit egl-create-context-verify-gl-flavor,
testcase for OpenGL ES 3.0.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
The ES 3 spec says that the minumum allowable value is 2^24-1, but the
GL 4.3 and ARB_ES3_compatibility specs require 2^32-1, so return 2^32-1.
Fixes es3conform's element_index_uint_constants test.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
From GL/GLES/GL_CORE and GLES2 -> GL/GL_CORE/GLES2.
Yes, we really were exposing ES2_compatibility queries on ES 1.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Fixes the transform_feedback2_init_defaults test from es3conform.
The ES 3 spec lists these as TRANSFORM_FEEDBACK_PAUSED and
TRANSFORM_FEEDBACK_ACTIVE.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This code now lives in an external tree.
For the next Mesa release fetch the code from the master branch
of this LLVM repo:
http://cgit.freedesktop.org/~tstellar/llvm/
For all subsequent Mesa releases, fetch the code from the official LLVM
project:
www.llvm.org
- skip the vertex buffer reallocation in flush and just use
the unsynchronized flag to get new memory.
- remove the cruft needed to get around the issues with the vertex buffer
reallocation in flush
- use pb_buffer instead of pipe_resource
This patch fixes intel_miptree_unmap_etc() (which decompresses ETC
textures to linear) to pay attention to map->x and map->y when writing
to the destination image. Previously these values were ignored,
causing the xoffset and yoffset parameters passed to
glCompressedTexSubImage2D() to be ignored.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
- We should use a 3D transfer of size Width x 1 x NumLayers.
- We should use layer_stride instead of stride.
(even though they are likely to be equal with 1D array textures)
Reviewed-by: Brian Paul <brianp@vmware.com>
There was the fast path based on _mesa_format_matches_format_and_type
for GetTexImage, but it never worked, because the Mesa format we were testing
there was always compressed. Further testing showed that the fast path
had been completely broken.
In this commit, the somewhat limited helper util_create_rgba_texture is
no longer used and instead, custom code for the texture creation is added,
which tries to find the best matching RGBA8 format, so that we can hit
the fast path *always* if the read format is a variant of RGBA8 and supported
by the driver.
Reviewed-by: Brian Paul <brianp@vmware.com>
Usage with pipe_context:
pipe->flush(pipe, NULL, PIPE_FLUSH_END_OF_FRAME);
Usage with st_context_iface:
st->flush(st, ST_FLUSH_END_OF_FRAME, NULL);
The flag is only a hint for drivers. Radeon will use it for buffer eviction
heuristics in the kernel (e.g. for queries like how many frames have passed
since a buffer was used).
The flag is currently only generated by st/dri on SwapBuffers.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Stéphane Marchesin <marcheu@chromium.org>
Automake 1.13 creates a bunch of new build artefacts:
- bin/test-driver, a script for running tests.
- *.trs files for every "make check" test result.
- *.log files containing the output of every test run by "make check".
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Every generation except Gen7 creates SURFACE_STATE entries via a
uint32_t array. Only Gen7 uses the older bitfield structure, which we
moved away from because it was less efficient. Convert it for
consistency.
This reduces the compiled size of gen7_wm_surface_state.o by 2.86% in a
release build.
v2: Fix accidental use of BRW_SURFACE_WIDTH/HEIGHT in brw_state_dump.c;
switch back to gen7_set_surface_mcs_info setting surf[6] directly
(both per Eric's review comments).
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Since wl_display_dispatch_queue() returns the number of processed events
or -1 on error, only cancel the roundtrip if an -1 is returned.
This also fixes a potential memory corruption bug happening when the
roundtrip does an early return and the callback later writes to the then
out of scope stack allocated `done' parameter.
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
In Jelly Bean, the interface to ANativeWindow changed. The change included
adding a new parameter the queueBuffer and dequeueBuffer methods,
removing the lockBuffer method, and requiring libsync.
v2:
- s/fence_fd == -1/fence_fd != -1/
- Fix leak. Close the fence_fd.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Define the following Make variables:
MESA_ANDROID_MAJOR_VERSION
MESA_ANDROID_MINOR_VERSION
MESA_ANDROID_VERSION
These variable will allow us to make version-dependent decisions on
library dependencies. In particular, building Mesa against JellyBean will
require libsync.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Commit f22d49de added the SamplerParamter* functions but only used
ASSERT_OUTSIDE_BEGIN_END inside the -f and -fv versions.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch adds functionality to Mesa to upload compressed
2-dimensional array textures, using the glCompressedTexImage3D and
glCompressedTexSubImage3D calls.
Fixes piglit tests "EXT_texture_array/compressed *" and "!OpenGL ES
3.0/ext_texture_array-compressed_gles3 *". Also partially fixes GLES3
conformance test "CoverageES30.test".
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
The old error reporting was completely bogus, passing _mesa_error() a
format string that didn't even match the remaining arguments. Also,
in many cases the number of dimensions in the TexImage call was not
preserved in the error message (e.g. an error in glTexImage2D was
reported simply as an error in glTexImage).
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
If the call fails, we should return NULL from XMesaCreateVisual().
This was found when Waffle tried to create a visual with depth/stencil
bits = -1. That's an illegal value for glXChooseFBConfig() and we should
return NULL in that situation.
Note: This is a candidate for the stable branches.
Dungeon Defenders hits TexImage()'s try_pbo_upload() path where
image->Width == 2, which doesn't meet intelEmitCopyBlit's requirement
that the pitch needs to be a multiple of 4.
Since intelEmitCopyBlit can already fail for a myriad of other reasons,
and it's not clear that other callers are immune to this failure mode,
simply make it return false rather than assert.
Fixes Dungeon Defenders on i965/Ivybridge. Now playable (aside from
having to work around the EXT_bindable_uniform issue).
NOTE: This is probably a candidate for the 9.0 branch.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We don't need them now that our set of parameter pointers points at the
GL core storage for them. This should save memory/bandwidth/overhead in
uniform updates.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
NumParameters used to be an upper bound on the number of vec4s to be
uploaded, which was basically safe (unless your buffer was bound near
the top of address space *and* you array indexed outside the buffer, in
which case I think you might GPU hang). As I migrate the driver away
from ParameterValues[], this is no longer true.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Like in the FS, there's no reason to use an external copy if the
ParameterValues[] relayout of it isn't the layout we need.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If adding scale parameters during program compile caused a realloc of
ParameterValues, then the driver uniform storage set up by
_mesa_associate_uniform_storage() would point to potentially freed
memory.
Note that this uses TexturesUsed, which may change at runtime for GLSL
when sampler uniforms change. This is a flaw in our handling of texrect
in general, and not one I'm fixing currently.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We don't have native hardware support for these, so they get promoted to
RGBA, in which case we don't have hardware dealing with the channel
swizzling for us.
Fixes piglit EXT_texture_snorm/texwrap formats bordercolor (-swizzled).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I had left this out for a long time because it regressed some
depthstencil-render-miplevels cases when it was enabled. Now that the
bugs causing those are fixed, there's nothing stopping us.
Improves glbenchmark 2.1 offscreen performance by 7.3% +/- 2.8% (n=10).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This worked out before because the parent was always 4 bytes so it
didn't affect the layout, but now we want to support Z16 too.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixing these rendering bugs has been implicated in performance
regressions (which may be unfixable), but at least knowing that it's
happening should help diagnose those regressions.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
The ETC1 changes failed at this, so let's make sure it will be caught in
testing next time.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This was caught by the assertion in the next commit. It fixes the
remaining piglit depthstencil-render-miplevels cases, probably by
avoiding broken stencil copies in the validation path.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Relayout is expensive, so it's something developers (both us and others)
should know about when it happens.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Rename existing _Used flag to EverBound.
The GL 4.3 and ES 3.0 specs say
These names are marked as used, for the purposes of GenVertexArrays
only, but they do not acquire array state until they are first bound.
This also affects Apple VAOs, which is fine since the
APPLE_vertex_array_object spec says
A vertex array object is created by binding an unused name. This
binding is accomplished by calling BindVertexArrayAPPLE with id set
to the name of the new vertex array object.
Fixes arb_vertex_array_object_isvertexarray.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The GL 4.3 an ES 3.0 specs say
A transform feedback object is created by binding a name returned by
GenTransformFeedbacks with the command
void BindTransformFeedback( enum target, uint id );
Fixes arb_transform_feedback2-istransformfeedback and part of
es3conform's CoverageES30.test.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
i.e. we have to allocate a temporary tiled resource if dst isn't tiled.
This fixes hardlocks on r6xx-r7xx, though using a linear resource is forbidden
on later asics as well.
NOTE: This is a candidate for the stable branches.
No piglit regressions and now passes glsl-uniform-out-of-bounds-2.
validate_uniform_parameters now checks that the array index is
valid. This means if an index is out of bounds, glGetUniform* now
fails with GL_INVALID_OPERATION, as it should.
_mesa_uniform and _mesa_uniform_matrix also call
validate_uniform_parameters so the bounds checks there became
redundant and were removed.
The test in glGetUniformLocation is modified to check array bounds
so it now returns GL_INVALID_INDEX (-1) if you ask for the location
of a non-existent array element, as it should.
Signed-off-by: Frank Henigman <fjhenigman@google.com>
Reviewed-by: Stéphane Marchesin <marcheu@chromium.org>
It's a build time option you need to set R600_TRACE_CS to 1 and it
will print to stderr all cs along as cs trace point value which
gave last offset into a cs process by the GPU.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
htile is used for HiZ and HiS support and fast Z/S clears.
This commit just adds the htile setup and Fast Z clear.
We don't take full advantage of HiS with that patch.
v2 really use fast clear, still random issue with some tiles
need to try more flush combination, fix depth/stencil
texture decompression
v3 fix random issue on r6xx/r7xx
v4 rebase on top of lastest mesa, disable CB export when clearing
htile surface to avoid wasting bandwidth
v5 resummarize htile surface when uploading z value. Fix z/stencil
decompression, the custom blitter with custom dsa is no longer
needed.
v6 Reorganize render control/override update mecanism, fixing more
issues in the process.
v7 Add nop after depth surface base update to work around some htile
flushing issue. For htile to 8x8 on r6xx/r7xx as other combination
have issue. Do not enable hyperz when flushing/uncompressing
depth buffer.
v8 Fix htile surface, preload and prefetch setup. Only set preload
and prefetch on htile surface clear like fglrx. Record depth
clear value per level. Support several level for the htile
surface. First depth clear can't be a fast clear.
v9 Fix comments, properly account new register in emit function,
disable fast zclear if clearing different layer of texture
array to different value
v10 Disable hyperz for texture array making test simpler. Force
db_misc_state update when no depth buffer is bound. Remove
unused variable, rename depth_clearstencil to depth_clear.
Don't allocate htile surface for flushed depth. Something
broken the cliprect change, this need to be investigated.
v11 Rebase on top of newer mesa
v12 Rebase on top of newer mesa
v13 Rebase on top of newer mesa, htile surface need to be initialized
to zero, somehow special casing first clear to not use fast clear
and thus initialize the htile surface with proper value does not
work in all case.
v14 Use resource not texture for htile buffer make the htile buffer
size computation easier and simpler. Disable preload on evergreen
as its still troublesome in some case
v15 Cleanup some comment and remove some left over
v16 Define name for bit 20 of CP_COHER_CNTL
Signed-off-by: Pierre-Eric Pelloux-Prayer <pelloux@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
This bring r600g allmost inline with closed source driver when
it comes to flushing and synchronization pattern.
v2-v4: history lost somewhere in outer space
v5: Fix compute size of flushing, use define for flags, update
worst case cs size requirement for flush, treat rs780 and
newer as r7xx when it comes to streamout.
v6: Fix num dw computation for framebuffer state, remove dead
code, use define instead of hardcoded value.
v7: Remove dead code
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Previously, Mesa code assumed that glReadBuffer(GL_NONE) was only
valid for user-created framebuffer objects. However, the spec is
quite clear that is should also be valid for the default framebuffer.
From section 18.2.1 ("Obtaining Pixels from the Framebuffer") of the
GL 4.3 spec:
"When READ_FRAMEBUFFER_BINDING is zero, i.e. the default
framebuffer, src must be one of the values listed in table 17.4,
including NONE."
Similar language exists in the GLES 3.0 spec, and in desktop GL all
the way back to ARB_framebuffer_object.
Partially fixes GLES3 conformance test "CoverageES30.test".
NOTE: This is a candidate for stable branches.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
It was slightly wrong: we were computing the longest duration of
the query among all the rasterizer tasks.
Regardless, for tile-based implementations such as llvmpipe, time differences
will never be very useful, because rendering before/during/after the query
is all interleaved. And this is expected, see ARB_timer_query spec, issue 10.
In particular, piglit ext_timer_query-time-elapsed still fails, because
it makes assumptions that don't hold true in in tiled architectures. Not
sure how to fix that though.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
ARB/EXT_timer_query's definition of GL_TIME_ELAPSED match precisely the
subtraction of two GL_TIMESTAMP queries.
And for a lot of drivers, that's precisely how they have to implement
internally -- by emitting two hardware timestamp queries.
So, to simplify driver implementation, simply allow doing so in the state
tracker.
Eventually if no driver implements PIPE_QUERY_TIME_ELAPSED then we could
retire it.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The burst was incorrectly used, because ELEM_SIZE was always 0.
I don't know if the burst works, because I don't know of any test
which uses it.
NOTE: This is a candidate for the stable branches.
Reviewed-by: Dave Airlie <airlied@redhat.com>
The old call to tgsi_exec_machine_bind_shader() in
softpipe_delete_fs_state() was never called since the shader's original
tokens are never passed to the tgsi interpreter (only shader _variant_
tokens are). Now, unbind the variant's tokens from the tgsi interpreter
when we free the variant.
This doesn't fix any known bugs but it's the right thing to do.
Note: This is a candidate for the stable branches.
In exec_prepare() we were comparing pointers to see if the fragment
shader variant had changed before calling tgsi_exec_machine_bind_shader().
This didn't work reliably when there was a lot of shader token malloc/
freeing going on because the memory might get reused.
Instead, bind the shader variant during regular state validation.
Fixes http://bugs.freedesktop.org/show_bug.cgi?id=40404
(fixes a couple of piglit's glsl-max-varyings test)
Note: This is a candidate for the stable branches.
This force surface allocated from ddx to be consider as height
aligned on 8 and fix 1D->2D tiling transition that result from
this.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
brw_emit_vertices contains special case logic to handle the case where
a vertex shader doesn't read any inputs. This special case logic was
incorrectly activating in the case were the only vertex input is
gl_VertexID. As a result, if a shader used gl_VertexID but used no
other inputs, then all vertices got a gl_VertexID of zero.
Fixes oglconform test "ubo-usage advanced.transform_feedback".
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The rather unweildy logic for determining this condition was repeated
in a large number of places. This patch consolidates it to a single
inline function.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This patch implements the following behaviours, which are mandated by
the GL 4.3 and GLES3 specs.
1. Regarding the GL_TRANSFORM_FEEDBACK_BUFFER_SIZE query: "If the
... size was not specified when the buffer object was bound
(e.g. if it was bound with BindBufferBase), ... zero is returned."
(GL 4.3 section 6.7.1 "Indexed Buffer Object Limits and Binding
Queries").
2. "BindBufferBase binds the entire buffer, even when the size of the
buffer is changed after the binding is established. It is
equivalent to calling BindBufferRange with offset zero, while size
is determined by the size of the bound buffer at the time the
binding is used." (GL 4.3 section 6.1.1 "Binding Buffer Objects to
Indexed Targets"). I interpret "at the time the binding is used"
to mean "at the time of the call to glBeginTransformFeedback".
3. "Regardless of the size specified with BindBufferRange, or
indirectly with BindBufferBase, the GL will never read or write
beyond the end of a bound buffer. In some cases this constraint may
result in visibly different behavior when a buffer overflow would
otherwise result, such as described for transform feedback
operations in section 13.2.2." (GL 4.3 section 6.1.1 "Binding
Buffer Objects to Indexed Targets").
Item 1 has been part of the spec all the way back to the inception of
the EXT_transform_feedback extension. Items 2 and 3 were added in GL
4.2 and GLES 3.
Prior to GL 4.2, in place of items 2 and 3, the spec simply said
"BindBufferBase is equivalent to calling BindBufferRange with offset
zero and size equal to the size of buffer." For transform feedback,
Mesa behaved as though this meant "...equal to the size of buffer at
the time of the call to BindBufferBase". However, this was
problematic because it left it ambiguous what to do if the buffer is
shrunk between the call to BindBuffer{Base,Range} and the call to
BeginTransformFeedback. Prior to this patch, Mesa's behaviour was to
try to write beyond the end of the buffer, likely resulting in memory
corruption. In light of this, I'm interpreting the spec change as a
clarification, not an intended behavioural change, so I'm making the
change apply regardless of API version.
Fixes GLES3 conformance test transform_feedback2_pause_resume.test.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
In desktop GL, if a draw call would cause transform feedback buffers
to overflow, the draw call should succeed, and the extra primitives
should simply not be recorded in the transform feedback buffers.
In GLES3, however, if a draw call would cause transform feedback
buffers to overflow, the draw call is supposed to produce an
INVALID_OPERATION error and no drawing should occur.
This patch implements the GLES3-required behaviour.
Fixes GLES3 conformance test "transform_feedback_overflow.test".
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
In GLES3, only glDrawArrays() and glDrawArraysInstanced() calls are
allowed when transform feedback is active.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Previously, the i965 driver contained code to compute the maximum
number of vertices that could be written without overflowing any
transform feedback buffers. This code wasn't driver-specific, and for
GLES3 support we're going to need to use it in core mesa. So this
patch moves the code into a core mesa function,
_mesa_compute_max_transform_feedback_vertices().
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
v2: Eliminate C++-style variable declarations, since these won't work
with MSVC.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
No functional change--this simply paves the way to allow futures
patches to call vbo_count_tessellated_primitives() during error
checking, before the _mesa_prim struct has been constructed.
This will be needed for GLES3, which requires draw calls to fail if
there is not enough space available in transform feedback buffers to
accommodate the primitives to be drawn.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Since the idea is to just expand or shrink the bit width but not otherwise do
conversion we also need to adjust the sign bit according to src, otherwise
the conversion code will incorrectly clamp the values. (Since this only works
for casting to ordinary floats the norm and fixed bits should always be fine.)
This fixes the remaining piglit attribs GL3 failures.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
a460aea3f1 wasn't entirely correct,
since all coords are already ints hence need to skip the iround.
Passes piglit texelFetch with sampler1DArray/sampler2DArray.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Make sure drivers initialize the version before:
* _mesa_initialize_exec_table is called
* _mesa_initialize_exec_table_vbo is called
* A context is made current
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The driver should call _mesa_initialize_vbo_vtxfmt after
computing the context version.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Drivers must compute the context version, and then call
_mesa_initialize_exec_table themselves.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In a future patch the exec functions will no longer set up
by _mesa_initialize_context and _vbo_CreateContext.
Therefore we must call _mesa_initialize_exec_table and
_mesa_initialize_exec_table_vbo.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This change forces the context version to be computed before
initilizing the exec dispatch tables.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In glapi/gl_genexec.py:
* Remove _mesa_alloc_dispatch_table call
In glapi/gl_genexec.py and api_exec.h:
* Rename _mesa_create_exec_table to _mesa_initialize_exec_table
In context.c:
* Call _mesa_alloc_dispatch_table instead of _mesa_create_exec_table
* Call _mesa_initialize_exec_table (this is temporary)
Once all drivers have been modified to call
_mesa_initialize_exec_table, then the call to
_mesa_initialize_context can be removed from context.c.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is used by st_BlitFramebuffer() / r600_blit(), and ARB_fbo allows
overlapped blits, even though the result is undefined. No piglit regressions
on r600g / CYPRESS.
Signed-off-by: Henri Verbeet <hverbeet@gmail.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
struct brw_instruction and the related instruction emitting code won't
be useful on Gen8+, as the instruction encoding changed. However, the
struct brw_reg code is still extremely valuable.
While we're at it, fix up some style points:
- s/GLuint/unsigned/g
- s/GLint/int/g
- s/GLshort/int16_t/g
- s/GLushort/uint16_t/g
- s/INLINE/inline/g
- Replace tabs with spaces
- Put return types on a separate line from the function name/parameters
- Remove trailing whitespace
- Remove extraneous whitespace around function parameters
Reviewed-by: Eric Anholt <eric@anholt.net>
This adds the extensions + the tex buffer support for checking
the formats.
There is a piglit test enhancement sent to that list.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Not sure what was going on here, but running piglit with debug builds
might be a good plan :-)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
No statistically significant performance difference on glbenchmark 2.7
(n=60). It reduces cycles spent in the vertex shader by 3.3% +/- 0.8%
(n=5), but that's only about .3% of all cycles spent according to the
fixed shader_time.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The way our visitor works, scalar expression/swizzle results that get
stored in channels other than .x will have an intermediate MOV from
their result in the .x channel to the real .y (or whatever) channel, and
similarly for vec2/vec3 results.
By knowing how to adjust DP4-type instructions for optimizing out a
swizzled MOV, we can reduce instructions in common matrix multiplication
cases.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The compute-to-mrf code is really twitchy, and it's hard to construct
GLSL testcases for it. This unit test is also really hard to work with
(for example, if your instruction is removed by dead code elimination,
you end up inspecting something irrelevant), but I did use it for
debugging some of the commits to follow.
I called it test_vec4_register_coalesce because the compute-to-mrf code
is about to morph into that.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The final halt of the fragment shader turns off the remaining channels,
then jumps such that everything is turned back on. So, we can have our
last ENDIF of the shader point at that directly.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
From the Ivybridge PRM, Volume 4, Part 3, section 6.24 (page 172):
"The endif instruction is also used to hop out of nested conditionals by
jumping to the end of the next outer conditional block when all
channels are disabled."
Also:
"Pseudocode:
Evaluate(WrEn);
if ( WrEn == 0 ) { // all channels false
Jump(IP + JIP);
}"
First, ENDIF re-enables any channels that were disabled because they
didn't match the conditional. If any channels are active, it proceeds
to the next instruction (IP + 16). However, if they're all disabled,
there's no point in walking through all of the instructions that have no
effect---it can jump to the next instruction that might re-enable some
channels (an ELSE, ENDIF, or WHILE).
Previously, we always set JIP on ENDIF instructions to 2 (which is
measured in 8-byte units). This made it do Jump(IP + 16), which just
meant it would go to the next instruction even if all channels were off.
It turns out that walking over instructions while all the channels are
disabled like this is worse than just instruction dispatch overhead: if
there are texturing messages, it still costs a couple hundred cycles to
not-actually-read from the texture results.
This patch finds the next instruction that could re-enable channels and
sets JIP accordingly.
Reviewed-by: Eric Anholt <eric@anholt.net>
V3: Put enable in an existing block rather than making a new
one for no good reason.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
V2: Moved up into emit(ir_texture *) to avoid duplication and fix
ordering for Gen7; Gen6 math quirks moved into previous patches.
Tested on Gen6 only; passes all the cube_map_array piglits.
V3: Fixed weird whitespace
V4: Use sampler->type; otherwise broken on arrays of samplers.
v5: Minor style fixes (by anholt)
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
V4: Fix various style nits as pointed out by Eric, and expand IMM
operands on both Gen6 and Gen7.
v5: minor style nits (by anholt)
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
V3: Fixed weird whitespace
V4: Use sampler's type rather than variable's type; otherwise broken
with arrays of samplers. (Thanks Eric)
v5: Fix a couple more style nits (by anholt)
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This causes immediate values to get moved to a temp on gen7, which is needed
for an upcoming change but hadn't happened in the visitor until then.
v2: Drop gen > 7 checks (doesn't exist), and style-fix comments (changes by
anholt).
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Actually switch on the other math instructions mentioned in the
comment.
v3: Add timing data for textureSize(), and clean up some long comment
lines.
Testing shader_time of fs16 shaders on a few frames of various apps:
nexuiz improved by 2.9% +/- 1.5% (n=10)
no difference on GLB2.5 (n=36, outliers removed)
no difference on GLB2.7 (n=25)
etqw improved by 2.6% +/- 2.2% (n=25)
no difference on lightsmark (n=25)
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
I've tested this to be true with various ALU ops on gen7 (with the
exception of MADs, which go at either 3 or 4 cycles per dispatch).
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
This gives the instruction scheduler a chance to schedule between the
loads, whereas before it was restricted due to the dependencies between
the MRFs for setting them up.
For one shader in gles3conform, it goes from getting stuck in register
allocation for as long as anybody's bothered to leave it running down
to 23 seconds, thanks to the LIFO scheduling.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
This came from an idea by Ben Segovia. 16-wide pixel shaders are very
important for latency hiding on i965, so we want to try really hard to
get them. If scheduling an instruction makes some set of instructions
available, those are probably the ones that make the instruction's
result dead. By choosing those first, we'll have a tendency to reduce
the amount of live data as opposed to creating more.
Previously, we were sometimes getting this behavior out of the
scheduler, which was what produced the scheduler's original performance
wins on lightsmark. Unfortunately, that was mostly an accident of the
lame instruction latency information that I had, which made it
impossible to fix the actual scheduling for performance. Now that we've
fixed the scheduling for setup for register allocation, we can safely
update the latency parameters for the final schedule.
In shader-db, we lose 37 16-wide shaders, but gain 90 new ones. 4
shaders that were spilling change how many registers spill, for a
reduction of 70/3899 instructions.
v2: Simplify the new loop.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Sometimes I've got a patch for a performance optimization that's not
showing a statistically significant performance difference on reported
FPS, but still seems like a good idea because it ought to reduce time
spent in the shader. If I can see the total number of cycles spent in
the shader stage being optimized, it may show that the patch is still
worthwhile (or point out that it's actually broken in some way).
Some shaders experience resets more than others, which skews the numbers
reported. Attempt to correct for this by linearly scaling according to
the number of resets that happen.
Note that will not be accurate if invocations of shaders have varying
times and longer invocations are more likely to reset. However, this
should at least be better than the previous situation.
I'm about to emit other kinds of writes besides time deltas, and it
turns out with the frequency of resets, we couldn't really use the old
time delta write() function more than once in a shader.
This patch implements varying packing between varyings.
Previously, each varying occupied components 0 through N-1 of its
assigned varying slot, so there was no way to pack two varyings into
the same slot. For example, if the varyings were a float, a vec2, a
vec3, and another vec2, they would be stored as follows:
<----slot1----> <----slot2----> <----slot3----> <----slot4----> slots
* * * * * * * * * * * * * * * *
flt x x x <vec2-> x x <--vec3---> x <vec2-> x x varyings
(Each * represents a varying component, and the "x"s represent wasted
space).
This change packs the varyings together to eliminate wasted space
between varyings, like so:
<----slot1----> <----slot2----> <----slot3----> <----slot4----> slots
* * * * * * * * * * * * * * * *
<vec2-> <vec2-> flt <--vec3---> x x x x x x x x varyings
Note that we take advantage of the sort order introduced in previous
patches (vec4's first, then vec2's, then scalars, then vec3's) to
minimize how often a varying is "double parked" (split across varying
slots).
Reviewed-by: Eric Anholt <eric@anholt.net>
v2: Skip varying packing if ctx->Const.DisableVaryingPacking is true.
This patch implements varying packing within varyings that are
composed of multiple vectors of size less than 4 (e.g. arrays of
vec2's, or matrices with height less than 4).
Previously, such varyings used up a full 4-wide varying slot for each
constituent vector, meaning that some of the components of each
varying slot went unused. For example, a mat4x3 would be stored as
follows:
<----slot1----> <----slot2----> <----slot3----> <----slot4----> slots
* * * * * * * * * * * * * * * *
<-column1-> x <-column2-> x <-column3-> x <-column4-> x matrix
(Each * represents a varying component, and the "x"s represent wasted
space). In addition to wasting precious varying components, this
layout complicated transform feedback, since the constituents of the
varying are expected to be output to the transform feedback buffer
contiguously (e.g. without gaps between the columns, in the case of a
matrix).
This change packs the constituents of each varying together so that
all wasted space is at the end. For the mat4x3 example, this looks
like so:
<----slot1----> <----slot2----> <----slot3----> <----slot4----> slots
* * * * * * * * * * * * * * * *
<-column1-> <-column2-> <-column3-> <-column4-> x x x x matrix
Note that matrix columns 2 and 3 now cross a boundary between varying
slots (a characteristic I call "double parking" of a varying).
We don't bother trying to eliminate the wasted space at the end of the
varying, since the patch that follows will take care of that.
Since compiler back-ends don't (yet) support this packed layout, the
lower_packed_varyings function is used to rewrite the shader into a
form where each varying occupies a full varying slot. Later, if we
add native back-end support for varying packing, we can make this
lowering pass optional.
Reviewed-by: Eric Anholt <eric@anholt.net>
v2: Skip varying packing if ctx->Const.DisableVaryingPacking is true.
On hardware that supports a limited number of texture indirections,
varying packing will comsume an extra texture indirection, since ALU
operations are needed in the fragment shader to unpack the varyings
before any texturing can be done.
This patch introduces a new driver option,
ctx->Const.DisableVaryingPacking, which can be used by a driver to opt
out of varying packing if the extra texture indirection is costly
enough to outweigh the advantages of packing varyings.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
This lowering pass generates GLSL code that manually packs varyings
into vec4 slots, for the benefit of back-ends that don't support
packed varyings natively.
No functional change--the lowering pass is not yet used.
Reviewed-by: Eric Anholt <eric@anholt.net>
v2: Don't use ir_hierarchical_visitor--just loop over instructions
directly. Also, make the names of the packed varyings include the
names of the original varyings that were packed into them.
This patch paves the way for varying packing by adding a sorting step
before varying assignment, which sorts the varyings into an order that
increases the likelihood of being able to find an efficient packing.
First, varyings are sorted into "packing classes" by considering
attributes that can't be mixed during varying packing--at the moment
this includes base type (float/int/uint/bool) and interpolation mode
(smooth/noperspective/flat/centroid), though later we will hopefully
be able to relax some of these restrictions. The number of packing
classes places an upper limit on the amount of space that must be
wasted by varying packing, since in theory a shader might nave 4n+1
components worth of varyings in each of m packing classes, resulting
in 3m components worth of wasted space.
Then, within each packing class, varyings are sorted by vector size,
with vec4's coming first, then vec2's, then scalars, and then finally
vec3's. The motivation for this order is that it ensures that the
only vectors that might be "double parked" (with part of the vector in
one varying slot and the remainder in another) are vec3's.
Note that the varyings aren't actually packed yet, merely placed in an
order that will facilitate packing.
Reviewed-by: Eric Anholt <eric@anholt.net>
This patch further subdivides the loop that assigns varying locations
into two phases: one phase to match up the varyings between shader
stages, and one phase to assign them varying locations.
In between the two phases the matched varyings are stored in a new
data structure called varying_matches. This will free us to be able
to assign varying locations in any order, which will pave the way for
packing varyings.
Note that the new varying_matches::assign_locations() function returns
the number of varying slots that were used; this return value will be
used in a future patch.
Reviewed-by: Eric Anholt <eric@anholt.net>
This patch subdivides the loop that assigns varying locations into two
phases: one phase to match up varyings between shader stages (and
assign them varying locations), and a second phase to record the
varying assignments for use by transform feedback.
This paves the way for varying packing, which will require us to
further subdivide the first phase.
In addition, it lets us avoid a clumsy O(n^2) algorithm, since we can
now record the locations of all transform feedback varyings in a
single pass through the tfeedback_decls array, rather than have to
iterate through the array after assigning each varying.
Reviewed-by: Eric Anholt <eric@anholt.net>
Currently, the location of each varying is recorded in ir_variable as
a multiple of the size of a vec4. In order to pack varyings, we need
to be able to record, e.g. that a vec2 is stored in the second half of
a varying slot rather than the first half.
This patch introduces a field ir_variable::location_frac, which
represents the offset within a vec4 where a varying's value is stored.
Varyings that are not subject to packing will always have a
location_frac value of zero.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously, the linker used a value of -1 in ir_variable::location to
denote a generic input or output of the shader that had not yet been
matched up to a variable in another pipeline stage.
This patch introduces a new ir_variable field,
is_unmatched_generic_inout, for that purpose.
In future patches, this will allow us to separate the process of
matching varyings between shader stages from the processes of
assigning locations to those varying. That will in turn pave the way
for packing varyings.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously, link_invalidate_variable_locations() was only called
during assign_attribute_or_color_locations() and
assign_varying_locations(). This meant that in the corner case when
there was only a vertex shader, and varyings were being captured by
transform feedback, link_invalidate_variable_locations() wasn't being
called for the varyings.
This patch migrates the calls to link_invalidate_variable_locations()
to link_shaders(), so that they will be called in all circumstances.
In addition, it modifies the call semantics so that
link_invalidate_variable_locations() need only be called once per
shader stage (rather than once for inputs and once for outputs).
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This patch modifies the clip distance lowering pass so that the new
symbol it generates (glClipDistanceMESA) is added to the shader's
symbol table.
This will allow a later patch to modify the linker so that it finds
transform feedback varyings using the symbol table rather than having
to iterate through all the declarations in the shader.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This builds on the previous draw/softpipe patch.
So llvmpipe does streamout calls after clip/viewport stages,
but we have the pre-clip position stored for later use, so
when we are doing transform feedback, and its the position vertex
grab the vertex from the stored pre clip position.
The perfect fix is too probably add a codegen transform feedback
stage in between shader and clip stages, but this is good enough
for now.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds support to draw for the new features of transform feedback.
a) fix count_from_stream_output, using max_index+1 for now but it looks
like it should be valid as its derived from the vertex elements/vbo.
b) fix striding and dst offsets in output buffers - was just wrong before.
c) fix crash if tfb is suspended (so.num_targets == 0)
This also enables the new features on softpipe. It should be possible
to enable them on llvmpipe as well after this commit, but would need
to schedule piglit runs.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Every call to _cl_program::build() was erasing the binaries and logs for
every device associated with the program. This is incorrect because
it is possible to build a program for only a subset of devices and so
any device not being build should not have this information erased.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Override the cross_compiling and ac_tool_prefix variables by reassigning
to them instead of redefining the macros. Redefining them will actually
cause the variable names to be replaced instead of their content.
Furthermore push the definition of CPPFLAGS before running the checks
for the build tools to avoid the host CPPFLAGS from leaking into the
build CPPFLAGS.
While at it drop the redefinition of AC_TRY_COMPILER which hasn't been
used since autoconf 2.50 and make sure that all definitions are properly
popped when done (LDFLAGS, ac_cv_prog_CPP, ac_cv_prog_CXXCPP).
Acked-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Thierry Reding <thierry.reding@avionic-design.de>
Since we don't call lp_build_sample_common() in the texel fetch path we missed
the layer fixup code. If someone would have tried to do texelFetch with array
textures it would have crashed for sure.
Not really tested (can't run the piglit test being able to use texelFetch with
array samplers for now with llvmpipe).
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Previously, if the client program didn't specify a stride when setting
up a vertex attribute, we used _mesa_sizeof_type() to compute the size
of the type, and multiplied it by the number of components.
This didn't work for the 2_10_10_10 formats, since _mesa_sizeof_type()
returns -1 for those types, resulting in all kinds of havoc, since it
was causing the hardware to be programmed with a negative stride
value.
This patch adds a new function _mesa_bytes_per_vertex_attrib(), which
is similar to the existing function _mesa_bytes_per_pixel(), but which
computes the size of a vertex attribute based on the type and the
number of formats. For packed formats (currently only the 2_10_10_10
formats), it verifies that the number of components is correct and
returns the size of the packed format. For unpacked formats, it
returns the size of the type times the number of components.
In addition, this patch adds an assertion so that if we ever forget to
update _mesa_bytes_per_vertex_attrib() when adding a new vertex
format, we'll see the problem quickly rather than having to debug a
subtle conformance test failure.
Fixes GLES3 conformance tests
vertex_type_2_10_10_10_rev_{conversion,divisor,stride_pointer}.test.
Reviewed-by: Brian Paul <brianp@vmware.com>
The GL 3.1 and ES 3.0 specs say of glGetActiveUniformsiv:
"If an error occurs, nothing will be written to params."
So, make a pass through the indices and check that they're valid before
the pass that actually writes to params. Checking pname happens on the
first iteration of the second loop.
Fixes es3conform's getactiveuniformsiv_for_nonexistent_uniform_indices
test.
NOTE: This is a candidate for the 9.0 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Otherwise messages say silly things like
glGetActiveUniformBlockiv(block index -1 >= 0)
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
try_rewrite_rhs_to_dst is a quick optimization to avoid generating new
temporaries (and MOVs from those temporaries to the dest) for every
expression tree we visit. By generating better code in simple cases, we
reduce the burden on later optimization passes like register coalescing.
Previously, we compared inst->regs_written() to lhs->vector_elements
to make sure the instruction generating our value wrote the same number
of components as our destination register.
However, this fails in some cases. One example is texturing (which
produces a vec4) into gl_FragData[i]. Technically, gl_FragData[i] is
also a vec4. However, the destination VGRF actually has size 4n (where
n is the size of the array).
split_virtual_grfs() can't split VGRFs that are used by SEND messages
which require contiguous destination registers (like texturing), and
register allocation needs all VGRFs to have sizes between 1 and 4.
Amnesia: The Dark Descent hits this case: a texturing instruction
(4 components) gets rewritten to the gl_FragData output register
(which was 4*3 = 12 components), causing the register allocator to
hit the "we rely on split_virtual_grfs" assertion.
This makes it possible to play Amnesia.
Reviewed-by: Eric Anholt <eric@anholt.net>
This is redundant since we're calling draw_bind_fragment_shader()
which already does a flush.
v2: the redundant flush in llvmpipe_set_constant_buffer() has
already been removed by commit 3427466e6d
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Fetch shaders are usually destroyed at the context destruction by the state
tracker, so we can put them all in a large buffer without wasting memory.
This reduces the number of relocations sent to the kernel a little bit.
Tested-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Instead of having a 4-byte buffer for each streamout target, we suballocate
each dword from a 4K buffer.
This further reduces the overall number of relocations.
Tested-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
u_upload_mgr suballocates memory from a large buffer and maps the allocated
range (unsychronized), which is perfect for short-lived staging buffers.
This reduces the number of relocations sent to the kernel.
Tested-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
There are 2 ways. I prefer the former:
GALLIUM_MSAA=n
__GL_FSAA_MODE=n
Tested with ETQW, which doesn't support MSAA on Linux. This is
the only way to get MSAA there.
Reviewed-by: Brian Paul <brianp@vmware.com>
There are only 2 possible usages: render target and depth stencil.
Both can be derived from the surface format, so the flag is redundant.
And it's going away...
Reviewed-by: Brian Paul <brianp@vmware.com>
This adds seamless sampling for cubemap boundaries if requested.
The corner case averaging is messy but seems like it should be spec
compliant.
The face direction stuff is also a bit messy, I've no idea if that could
or should be simpler, or even if all my directions are fully correct!
v1.1: update comments, drop unneeded seamless calls for nearest, fix
if statement layout.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This follow the code from the i965 driver, and emits the structs
and arrays recursively.
This fixes an assert in the two UBO tests
fs-struct-copy-complicated and
vs-struct-copy-complicated
These tests now pass on softpipe, with no regressions.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This fixes some use-after-free issues. I haven't measured any real
performance difference with a handful of Mesa demos.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Before this we only supported user-based constant buffers.
First, we basically plumb pipe_constant_buffer objects through llvmpipe
rather than pipe_resource objects.
Second, update llvmpipe_set_constant_buffer() and try_update_scene_state()
so they understand both resource- and user-based constant buffers.
The problem with user constant buffers is the potential for use-after-free,
as seen in some WebGL tests. The next patch will flip the switch for
resource-based const buffers.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
I had tried this in the past, but ran into trouble with applications
that sample from undiscarded pixels in the same subspan. To fix that
issue, only jump to the end for an entire subspan at a time.
Improves GLbenchmark 2.7 (1024x768) performance by 7.9 +/- 1.5% (n=8).
v2: Drop the br variable in the jump instruction -- if I ever do jumps
pre-gen6, it'll be a different code block anyway since we don't have
HALT until gen6.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This makes much more sense on gen6+, and will also prove useful for
early exit of shaders on discard.
v2: fix up a stale comment from before converting gen4-5.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We're going to redo discard handling to track discards in the other flag
subregister, saving instructions in the discard and allowing predicated
jumps out to the end of the shader.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This makes our output more consistent with other disasm tools, and
will be necessary when we start using f0.1.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We've been calling it a register number, it's actually the subregister,
and things will get confusing once we start using it if it isn't fixed.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
There's a flag subreg nr field in bits2 next to src0.vertstride, but
there shouldn't be anything in bits3 next to src1.vertstride.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes compiler warning:
drm/native_drm.c: In function ‘native_create_display’:
drm/native_drm.c:180:21: warning: ‘device’ may be used uninitialized in this function [-Wmaybe-uninitialized]
drm/native_drm.c:157:24: note: ‘device’ was declared here
Signed-off-by: Tobias Droste <tdroste@gmx.de>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
This fixes a number of crashes on r600g due to the fact that
lp_build_mul assumes vector types when optimizing mul to bit shifts.
This bug was uncovered by 0ad1fefd69
Noticed would fail, we were doing two things wrong
a) 1d arrays require the layers in height
b) minifying the layers field.
v2: don't change height code, fixup completely inside txq
as suggested by Roland.
v3: just add minify before texture array size
v1: Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
command mistakenly used vector instead of scalar emit (the more or less
identical code in radeon is already correct).
Seems like it would be broken ever since kms probably.
Should fix bugs 22576, 26809.
I noticed the texelFetch offset test failed on 2D rect samplers
with GLSL 1.40. This is because I wrote the immediate->offset
translation wrong.
Fixed the translation to actually use the ureg info to set the
offsets up.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This ports over from the dri2 code to the drisw bits. It means 3.1
core contexts now work for softpipe.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is needed to compute render_to_fbo. It even has the comment.
NOTE: This is a candidate for stable branches.
Reviewed-by: Eric Anholt <eric@anholt.net>
This patch does two things:
1. Constant buffer state changes were broken (but happened to work by
dumb luck). The problem is we weren't calling draw_do_flush() in
draw_set_mapped_constant_buffer() when we changed that state. All the
other draw_set_foo() functions were calling draw_do_flush() already.
2. Use a simpler state validation step when we're changing light-weight
parameter state such as constant buffers, viewport dims or clip planes.
There's no need to revalidate the whole pipeline when changing state
like that. The new validation method is called bind_parameters()
and is called instead of the prepare() method. A new
DRAW_FLUSH_PARAMETER_CHANGE flag is used to signal these light-weight
state changes. This results in a modest but measurable increase in
FPS for many Mesa demos.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
When one function is changed, also look at the other.
Presently, there are some differences with respect to geometry
shaders and instanced drawing...
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
this adds UBO support to the state tracker, it works with softpipe
as-is.
It uses UARL + CONST[x][ADDR[0].x] type constructs.
v2: don't disable UBOs if geom shaders don't exist (me)
rename upload to bind (calim)
fix 12 -> 13 comparison as comment (calim + brianp)
fix signed->unsigned (Brian)
remove assert (Brian)
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds the necessary changes to the st to allow texture buffer object
support if the driver advertises it.
v1.1: remove extra blank line and whitespace
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This patch enables support for ETC2 compressed textures on
all intel hardware. At present, ETC2 texture decoding is not
available on intel hardware. So, compressed ETC2 texture data
is decoded in software and stored in a suitable uncompressed
MESA_FORMAT at the time of glCompressedTexImage2D. Currently,
ETC2 formats are only exposed in OpenGL ES 3.0.
V2: Use single etc_wraps variable for both etc1 and etc2.
V3: Remove redundant code and use just one intel_miptree_map_etc()
and intel_miptree_unmap_etc() function.
Choose MESA_FORMAT_SIGNED_{R16, GR1616} for ETC2 signed-{r11, rg11}
formats
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Tested-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Data in GL_COMPRESSED_SRGB8_PUNCHTHROUGH_ALPHA1_ETC2 format is decoded and stored
in MESA_FORMAT_SARGB.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Data in GL_COMPRESSED_RGB8_PUNCHTHROUGH_ALPHA1_ETC2 format is decoded and stored
in MESA_FORMAT_RGBA8888_REV.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Data in GL_COMPRESSED_SIGNED_RG11_EAC format is decoded and stored in
MESA_FORMAT_SIGNED_GR1616.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Data in GL_COMPRESSED_SIGNED_R11_EAC format is decoded and stored in
MESA_FORMAT_SIGNED_R16.
v2:
16 bit signed data is converted to 16 bit unsigned data by
adding 2 ^ 15 and stored in an unsigned texture format.
v3:
1. Handle a corner case when base code word value is -128. As per
OpenGL ES 3.0 specification -128 is not an allowed value and should
be truncated to -127.
2. Converting a decoded 16 bit signed data to 16 bit unsigned data by
adding 2 ^ 15 gives us an output which matches the decompressed image
(.ppm) generated by ericsson's etcpack tool. ericsson is also doing this
conversion in their tool because .ppm image files don't support signed
data. But gles 3.0 specification doesn't suggest this conversion. We
need to keep the decoded data in signed format. Both signed format
tests in gles3 conformance pass with these changes.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Tested-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Data in GL_COMPRESSED_RG11_EAC format is decoded and stored in
MESA_FORMAT_RG1616.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Data in GL_COMPRESSED_R11_EAC format is decoded and stored in
MESA_FORMAT_R16.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Data in GL_COMPRESSED_SRGB8_ALPHA8_ETC2_EAC format is decoded and stored
in MESA_FORMAT_SARGB8.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Data in GL_COMPRESSED_RGBA8_ETC2_EAC format is decoded and stored
in MESA_FORMAT_RGBA8888_REV.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Data in GL_COMPRESSED_SRGB8_ETC2 format is decoded and stored
in MESA_FORMAT_SARGB8.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Data in GL_COMPRESSED_RGB8_ETC2 format is decoded and stored in
MESA_FORMAT_RGBX8888_REV.
v2: Use CLAMP macro and stdbool.h
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This patch changes nonlinear_to_linear() function to non static inline
and makes it available outside format_unpack.c. Also, removes the
duplicate copies in other files.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
It is required by OpenGL ES 3.0 to support ETC2 textures.
This patch adds new MESA_FORMATs for following etc2 texture
formats:
GL_COMPRESSED_RGB8_ETC2
GL_COMPRESSED_SRGB8_ETC2
GL_COMPRESSED_RGBA8_ETC2_EAC
GL_COMPRESSED_SRGB8_ALPHA8_ETC2_EAC
GL_COMPRESSED_R11_EAC
GL_COMPRESSED_RG11_EAC
GL_COMPRESSED_SIGNED_R11_EAC
GL_COMPRESSED_SIGNED_RG11_EAC
MESA_FORMAT_ETC2_RGB8_PUNCHTHROUGH_ALPHA1
MESA_FORMAT_ETC2_SRGB8_PUNCHTHROUGH_ALPHA1
Above formats are currently available in only gles 3.0.
v2: Add entries in texfetch_funcs[] array.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
v3 (Paul Berry <stereotype441@gmail.com>): comment out symbols that
are not implemented yet, so that this commit compiles on its own;
future commits will uncomment the symbols as they become available.
Removes a collision of the object file name for main/hash_table
and program/hash_table.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The ES 3 conformance suite unbinds buffers (by binding buffer 0) and
passes zero for the size and offset, which the spec explicitly
disallows. Otherwise, this seems like a reasonable thing to do.
Khronos will be changing the spec to allow this (bug 9765). Fixes
es3conform's transform_feedback_init_defaults test.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Just enough for draw module to work ok.
This improves "piglit attribs GL3", though something fishy is still
happening with certain unsigned integer values.
Reviewed-by: Brian Paul <brianp@vmware.com>
The ADDR file is cumbersome for native integer capable drivers. We
should consider deprecating it eventually, but this just adds support
for indirection from TEMP registers.
Reviewed-by: Brian Paul <brianp@vmware.com>
Support 16 (defined in LP_MAX_TGSI_CONST_BUFFERS) as opposed to 32 (as
defined by PIPE_MAX_CONSTANT_BUFFERS) because that would make the jit
context become unnecessarily large.
v2: Bump limit from 4 to 16 to cover ARB_uniform_buffer_object needs,
per Dave Airlie.
Reviewed-by: Brian Paul <brianp@vmware.com>
All MSAA buffers are allocated privately and resolved into the DRI-provided
back and front buffers.
If an MSAA visual is chosen, the buffers st/mesa receives are all
multi-sample. st/mesa doesn't have access to the single-sample buffers
in that case.
This makes MSAA work in games like Nexuiz.
Reviewed-by: Brian Paul <brianp@vmware.com>
- We can use a single loop for adding new configs.
- The useless parameter depth_bits is removed.
- The maximum number of samples is bumped to 32.
- We can support Z16_UNORM and Z32_UNORM unconditionally since the zbuffers
are private.
Reviewed-by: Brian Paul <brianp@vmware.com>
This disables DRI2 sharing of zbuffers. The window zbuffer is allocated just
like any other texture - through resource_create.
The idea of allocating a zbuffer through DRI2 isn't very useful with MSAA,
where a single-sample zbuffer is useless.
IIRC, the Intel driver does the same thing.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Just use pipe->blit, which can do resolve, flipping, and format conversions.
The util_blit_pixels codepath is still there for the cases where we have to
force alpha to 1.
This also turns on acceleration for copying GL_DEPTH_STENCIL.
This may not be strictly necessary, but every other rule in the grammar ends
with a semicolon. It also appears that this was supposed to be commited with
the original patch that changed this rule, but the wrong version of the patch
was accidentally pushed.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Note that while 'packed' is a reserved word in GLSL ES, row_major is not.
This means that we have to use the string-based matching for that.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Carl Worth <cworth@cworth.org>
Nearly all of the builtin functions in GLSL 3.00 ES are already
implemented in Mesa; this patch enables them.
A few functions are not implemented yet; those have been commented
out, with a FIXME comment to act as a reminder of what still needs to
be implemented. Here is the complete list: packSnorm2x16,
unpackSnorm2x16, packUnorm2x16, unpackUnorm2x16, packHalf2x16,
unpackHalf2x16.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Carl Worth <cworth@cworth.org>
These functions are defined in GLSL 1.50 and GLES 3.00 ES.
The formulas have been extracted from the existing implementation of
inverse().
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Carl Worth <cworth@cworth.org>
This patch also adds assertions so that when we add new GLSL versions,
we'll notice that we need to update the builtin variables.
[v2, idr]: s/Frab/Frag/ Noticed by Eric.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> [v1]
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Carl Worth <cworth@cworth.org>
This patch implements all of the built-in types for GLSL 3.00 ES.
This is almost exactly the same as the set of built-in types for GLSL
1.30, except ate 1D samplers are skipped, and samplerCubeShadow is
added.
This patch also addes an assertion so that when we add new GLSL
versions, we'll notice that we need to update the types.
In review, Eric noted:
"This change looks correct. The overall interaction of profiles is
getting ugly, though. I'm imagining a restructure of the symbol
table population so that there's a big list of types, and each
#version has a nice list of strings of type names copy and pasted
out of its spec."
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Carl Worth <cworth@cworth.org>
This patch updates the following linker checks to do the right thing
in GLSL 3.00 ES:
- Failing to write to gl_Position is allowed in GLSL 1.40+ as well as
GLSL 3.00 ES.
- It is an error to write to both gl_ClipVertex and gl_ClipDistance in
GLSL 1.30+. This does not apply to GLSL 3.00 ES.
- GLSL 3.00 ES uses the same varying counting rules as GLSL 1.00 ES.
- In GLSL 1.30 and GLSL 3.00 ES, "discard" terminates the shader.
- In GLSL 1.00 ES and GLSL 3.00 ES, both a fragment and a vertex
shader must be present.
[v2, idr]: Fix minro typo in a comment. Noticed by Ken.
[v3, idr]: s/IsEs(Shader|Prog)/IsES/ Suggested by Ken and Eric.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Carl Worth <cworth@cworth.org>
Previously we recorded just the GLSL version (or the max version, if
GLSL 1.10 and GLSL 1.20 programs were linked together).
[v2, idr]: s/IsEs(Shader|Prog)/IsES/ Suggested by Ken and Eric.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Carl Worth <cworth@cworth.org>
Previously, we prohibited mixing of shading language versions if
min_version == 100 or max_version >= 130. This was technically
correct (since desktop GLSL 1.30 and beyond prohibit mixing of shading
language versions, as does GLSL 1.00 ES), but it was confusing. Also,
we asserted that all shading language versions were between 1.00 and
1.40, which was unnecessary (since the parser already checks shading
language versions) and doesn't work for GLSL 3.00 ES.
This patch changes the code to explicitly check that (a) ES shaders
aren't mixed with desktop shaders, (b) shaders aren't mixed between ES
versions, and (c) shaders aren't mixed between desktop GLSL versions
when at least one shader is GLSL 1.30 or greater. Also, it removes
the unnecessary assertion.
[v2, idr]: Slightly tweak the is_es_prog detection to occur outside the loop
instead of doing something special on the first loop iteration. Suggested by
Ken.
[v3, idr]: s/IsEs(Shader|Prog)/IsES/ Suggested by Ken and Eric.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> [v1]
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Carl Worth <cworth@cworth.org>
Previously we recorded just the GLSL version, with the knowledge that
100 means GLSL 1.00 ES. With the advent of GLSL 3.00 ES, this is
going to get more complex, and eventually will probably become
ambiguous (GLSL 4.00 already exists, and GLSL 4.00 ES is likely to be
created some day).
To reduce confusion, this patch simply records whether the shader is
GLSL ES as an explicit boolean.
[v2, idr]: s/IsEs(Shader|Prog)/IsES/ Suggested by Ken and Eric.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> [v1]
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Carl Worth <cworth@cworth.org>
Note that GLSL 1.00 is selected using "#version 100", so "#version 100
es" is prohibited.
v2: Check for GLES3 before allowing '#version 300 es'
v3: Make sure a correct language_version is set in
_mesa_glsl_parse_state::process_version_directive.
Signed-off-by: Paul Berry <stereotype441@gmail.com>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Carl Worth <cworth@cworth.org>
Version directive handling is going to have to be used within two
parser rules, one for desktop-style version directives (e.g. "#version
130") and one for the new ES-style version directive (e.g. "#version
300 es"), so this patch moves it to a function that can be called from
both rules.
No functional change.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Carl Worth <cworth@cworth.org>
Version directive handling is going to have to be used within two
parser rules, one for desktop-style version directives (e.g. "#version
130") and one for the new ES-style version directive (e.g. "#version
300 es"), so this patch moves it to a function that can be called from
both rules.
No functional change.
[mattst88] v2: Use intmax_t instead of int for version argument. Would
otherwise write garbage after #version since PRIiMAX was reading 64-bits
instead of 32.
[idr] v3: A later commit fixes the caller of
_glcpp_parser_handle_version_declaration to pass the correct number of
parameters. Fix it in the patch that changes the interface instead.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Carl Worth <cworth@cworth.org>
This patch turns on the following features for GLSL ES 3.00:
- Array constructors, whole array assignment, and array comparisons.
- Second and third operands of ?: may be arrays.
- Use of "in" and "out" qualifiers on globals.
- Bitwise and modulus operators.
- Integral vertex shader inputs.
- Range-checking of literal integers.
- array.length method.
- Function calls may be constant expressions.
- Integral varyings must be qualified with "flat".
- Interpolation and centroid qualifiers may not be applied to vertex
shader inputs.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Carl Worth <cworth@cworth.org>
GLSL ES 3.00 adds the following keywords over GLSL 1.00: uint,
uvec[2-4], matNxM, centroid, flat, smooth, various samplers, layout,
switch, default, and case.
Additionally, it reserves a large number of keywords, some of which
were already reserved in versions of desktop GL that Mesa supports,
some of which are new to Mesa.
A few of the reserved keywords in GLSL ES 3.00 are keywords that are
supported in all other versions of GLSL: attribute, varying,
sampler1D, sampler1DShador, sampler2DRect, and sampler2DRectShadow.
This patch updates the lexer to handle all of the new keywords
correctly when the language being parsed is GLSL 3.00 ES.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Carl Worth <cworth@cworth.org>
This patch expands the lexer KEYWORD macro to take two additional
arguments: the GLSL ES versions in which the given keyword was first
reserved, and supported, respectively. This will allow us to
trivially add support for GLSL 3.00 ES keywords, even though the set
of GLSL 3.00 ES keywords is neither a subset or a superset of the
keywords corresponding to any desktop GLSL version.
The new KEYWORD macro makes use of the
_mesa_glsl_parse_state::is_version() function, so it accepts 0 as
meaning "unsupported" (rather than 999, which we used previously).
Note that a few keywords ("packed" and "row_major") are supported
*either* when GLSL 1.40 is in use or when ARB_uniform_buffer_obj
support is enabled. Previously, we handled these by cleverly taking
advantage of the fact that the KEYWORD macro didn't parenthesize its
arguments in the usual way. Now they are handled more
straightforwardly, with a new macro, KEYWORD_WITH_ALT.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Carl Worth <cworth@cworth.org>
Previous to this patch, we were not very consistent about the errors
we generate when a shader tried to use a feature that is prohibited in
the current GLSL version. Some error messages failed to mention the
GLSL version currently in use (or did so inaccurately), and some error
messages failed to mention the first GLSL version in which the given
feature is allowed.
This patch reworks all of the error checks to use the check_version()
function, which produces error messages in a standard form
(approximately "$FEATURE forbidden in $CURRENT_GLSL_VERSION
($REQUIRED_GLSL_VERSION required).").
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Carl Worth <cworth@cworth.org>
With the advent of GLSL 3.00 ES, the version checks we perform in the
GLSL compiler (to determine which language features are present) will
become more complicated. To reduce the complexity, this patch adds
functions check_version() and is_version() to _mesa_glsl_parse_state.
These functions take two version numbers: a desktop GLSL version and a
GLSL ES version, and return a boolean indicating whether the GLSL
version being compiled is at least the required version. So, for
example, is_version(130, 300) returns true if the GLSL version being
compiled is at least desktop GLSL 1.30 or GLSL 3.00.
The check_version() function additionally produces an error message if
the version check fails, informing the user of which GLSL version(s)
support the given feature.
[v2, idr]: Add PRINTFLIKE annotation to the new method. The numbering of th
parameters is correct because GCC is silly.
[v3, idr]: Fix copy-and-paste error in the comment before
_mesa_glsl_parse_state::is_version. Noticed by Ken.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Carl Worth <cworth@cworth.org>
Fixes a bug where version_string would be left uninitialized if no
GLSL "#version" directive was used.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Carl Worth <cworth@cworth.org>
This will be useful in generating more helpful error messages,
especially with the addition of GLSL 3.00 ES support.
[v2, idr]: Rename ctx parameter to mem_ctx
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Carl Worth <cworth@cworth.org>
Previously, we stored the GLSL language version in the
glsl_symbol_table struct. But this was unnecessary--all
glsl_symbol_table needs to know is whether functions and variables
have separate namespaces (they do in GLSL 1.10 only).
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Carl Worth <cworth@cworth.org>
Adding this now makes it easier to develop and test GLES3 features, since we
can do initial development and testing using desktop GL. Later GLSL compiler
patches check for either ctx->Extensions.ARB_ES3_compatibility or
_mesa_is_gles3 to allow certain features (i.e., "#version 300 es").
[v2, idr]: Just edits to the commit message.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Carl Worth <cworth@cworth.org>
Previously, the user could send in a pointer that was not created
by mesa. When we dereferenced that pointer, there would be an
exception.
Now we keep a set of pointers and verify that the pointer
exists in that set before dereferencing it.
Note: This fixes several crashing gles3conform tests.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Note: The GL/GLES3 web man pages don't seem to properly
document glWaitSync's error when the sync object is invalid.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
From: git://people.freedesktop.org/~anholt/hash_table
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
[jordan.l.justen@intel.com: minor rework for mesa]
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
u_rect.h said these should move to a different file, and u_surface seems
a better home.
Leave #include "util/u_surface.h" to avoid having to touch thousand of
files.
Reviewed-by: Brian Paul <brianp@vmware.com>
- Re-implement os_time_get in terms of os_time_get_nano() for consistency
- Use CLOCK_MONOTONIC as recommended
- Only use clock_gettime on Linux for now.
Reviewed-by: Brian Paul <brianp@vmware.com>
Otherwise the driver announces 4096 vertex shader constants and other
way too high limits.
NOTE: This is a candidate for stable release branches.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
GL/gl.h provides some definitions (GL_FALSE, GL_ONE, etc) that have
the same value as other gl headers but are represented differently
(0 vs 0x0 and 1 vs 0x1).
This causes compiler warnings about redefining such definitions when
including GL/gl.h with other gl headers.
Fixes http://bugs.freedesktop.org/show_bug.cgi?id=57802
Signed-off-by: Brian Paul <brianp@vmware.com>
Several issues actually:
- Fix a regression in unsigned normalized in the rescaling
[0, 255] to [0, 256]
- Ensure we use signed shifts where appropriate (instead of
unsigned shifts)
- Refactor the code slightly -- move all the logic inside
lp_build_lerp_simple().
This change, plus an adjustment in the tolerance of signed normalized
results in piglit fbo-blending-formats fixes bug 57903
Reviewed-by: Brian Paul <brianp@vmware.com>
They need to be converted to the native integer type to prevent garbage
in higher order bits from being printed.
Reviewed-by: Brian Paul <brianp@vmware.com>
Remove the draw_vs_set_constants() and draw_gs_set_constants()
functions and the draw->vs.aligned_constants,
draw->vs.aligned_constant_storage and draw->vs.const_storage_size
fields. None of it was used.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Commit 4097308 fixed the build in a questionable way. It worked at the
time, but, as Ian pointed out, the fix would likely fail at a future
commit due to the indeterminism of parallel builds. And that's exactly
what happened; the fix no longer works. `mm -j4` on Fedora 17 fails for
me.
The problem is that there is no rule for program_parse.tab.h. To fix that,
this patch adds a rule that makes program_parse.tab.c depend on
program_parse.tab.h. Technically, the c file does not depend on the
h file. However, because the two files are generated together by a single
invocation of Bison, any rule that forces execution of Bison is
sufficient.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
I'd written most of this ages ago, but never finished it off.
This passes 115/130 piglit tests so far. I'll look into the
others as time permits.
v1.1: fix calloc return check as suggested by Jose.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This can be used for two purposes: Using hand-coded shaders to determine
per-instruction timings, or figuring out which shader to optimize in a
whole application.
Note that this doesn't cover the instructions that set up the message to
the URB/FB write -- we'd need to convert the MRF usage in these
instructions to GRFs so that our offsets/times don't overwrite our
shader outputs.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
v2: Check the timestamp reset flag in the VS, which is apparently
getting set fairly regularly in the range we watch, resulting in
negative numbers getting added to our 32-bit counter, and thus large
values added to our uint64_t.
v3: Rebase on reladdr changes, removing a new safety check that proved
impossible to satisfy. Add a comment to the AOP defs from Ken's
review, and put them in a slightly more sensible spot.
v4: Check timestamp reset in the FS as well.
Fixes flat shading for AA lines. demos/src/trivial/line-smooth is a
test case which hits this.
Note: This is a candidate for the stable branches.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
x11_screen.c includes xf86drm.h, which comes from libdrm-dev.
This patch fixes this build error.
Compiling src/gallium/state_trackers/egl/x11/x11_screen.c ...
src/gallium/state_trackers/egl/x11/x11_screen.c:30:21: fatal error: xf86drm.h: No such file or directory
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Serious Sam 3 had a shader hitting this path, but it's used rarely so it
didn't show a significant performance difference (n=7). It does reduce
compile time massively, though -- one shader goes from 14s compile time
and 11723 instructions generated to .44s and 499 instructions.
Note that some shaders lose 16-wide mode because we don't support
16-wide and pull constants at the moment (generally, things looping over
a few-element array where the loop isn't getting unrolled). Given that
those shaders are being generated with 15-20% fewer instructions, it
probably outweighs the loss of 16-wide.
The gen7 send-from-GRF path is sufficiently different from the perspective of
IR generation and optimization that I just made it a separate opcode.
v2: fix whitespace, rebase on Ken's recent refactor.
As of gen7, we can skip the header on some messages, and this can make
optimization on those messages much nicer when you've got GRFs instead of MRFs
as the source.
This is a temporary hack. I believe the only way of properly fixing this
is to check buffer overflow just before fetching based on addresses,
instead of number of vertices/instances. This change simply allows tests
that stress buffer overflows to complete without asserting, and should
not affect valid rendering.
Reviewed-by: Brian Paul <brianp@vmware.com>
We need to clamp vertex buffer fetch based on its size, not based on the
user specified max index hint.
This matches draw_pt_fetch_run() above.
NOTE: This is a candidate for the stable branches.
Reviewed-by: Brian Paul <brianp@vmware.com>
A single vertex size is chosen for the whole pipeline. So the number of
geometry shader outputs must also be taken in consideration.
Reviewed-by: Brian Paul <brianp@vmware.com>
There is more work necessary to properly support buffers in shaders, but
this gets things a bit further along.
Reviewed-by: Brian Paul <brianp@vmware.com>
This fixes fdo bug 57755 and most of the failures of piglit fbo-blending-formats
GL_EXT_texture_snorm.
GL_INTENSITY_SNORM is still failing, but problem is probably elsewhere,
as GL_R8_SNORM works fine.
Now that _mesa_BindFramebuffer does the right thing in ES contexts when the
gl_extensions::ARB_framebuffer_object bit is set, the Intel driver doesn't
need this hack.
No piglit or GLES2 conformance regressions observed on IVB, and this
patch (and the previous) fix es3conform's framebuffer_srgb_draw and
transform_feedback_misc tests.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Desktop OpenGL implementations that support either
GL_ARB_framebuffer_object or OpenGL 3.0 must require names from
glGenFramebuffers for glBindFramebuffer. We have enforced this rule for
quite some time. However, OpenGL ES 1.0, 2.0, and 3.0 implementations
are required to allow user-defined names (e.g., not from
glGenFramebuffers{OES,}).
The Intel drivers have hacked around this by not enabling
GL_ARB_framebuffer_object in an ES context. Instead, just pick the
correct behavior in _mesa_BindFramebuffer based on the context API.
Chad pointed out in a review e-mail:
"I'd like to point out, though, that glBindFramebufferEXT and
glBindRenderbufferEXT are still broken on desktop GL because they
don't accept user-genned names. But that fix belongs to a different
series."
Currently glBindFramebufferEXT is an alias for glBindFramebuffer.
Unalising two functions presents some difficulty, so we'll have to
revisit this eventually.
v2: Perform same check in _mesa_BindRenderbuffer too.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> [v1]
The NV formulation of primitive restart is turned on/off with
glEnableClientState/glDisableClientState. These two functions don't
exist in core contexts, which mean that GL_NV_primitive_restart is
essentially useless...even broken.
However, leaving it on causes oglconform's primitive-restart-nv tests to
run in OpenGL 3.1 contexts, which results in them all failing. This
patch causes 29 subtests to go from "fail" to "not run".
NOTE: This is a candidate for stable branches.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
I keep accidentally trying to use it. "fs" is a sensible name for
fragment shader debugging, and "wm" is...not. It's also more symmetric
with "vs".
Leave INTEL_DEBUG=wm because old habits die hard.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Also remove the recently added and overloaded LLVM_CXXFLAGS from CXXFLAGS.
Note: This is a candidate for the stable branches.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
After walking our IR instructions (Mesa or GLSL), we don't want to also
mark the start of the FB/URB writes or whatever as being that IR. This
can end up being misleading when the end of the IR visit got copy
propagated out to a later instruction in the URB writes.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The VP generation doesn't set up the output reg strings, so if you
didn't happen to get these values as 0 on the stack, you'd lose.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Bison -o parameter expects a .c file.
The corresponding .h filename is obtained
by removing the extension of the initial .c.
This was breaking compilation on Ubuntu 12.04
libmesa_dricore_intermediates/libmesa_dricore.a(program_parse.tab.o): In
function `_mesa_parse_arb_program':
external/mesa/src/mesa/program/program_parse.y:2682: multiple definition
of `_mesa_parse_arb_program'
libmesa_dricore_intermediates/libmesa_dricore.a(lex.yy.o):external/mesa/src/mesa/program/program_parse.y:2682:
first defined here
Signed-off-by: Adrian Marius Negreanu <adrian.m.negreanu@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-and-tested-by: Chad Versace <chad.versace@linux.intel.com>
In my testing I haven't found any cases where we get a null context
pointer, but it might still be possible. Check for null just to be safe.
Note: This is a candidate for the stable branches.
Only fail if GLX_SAMPLE_BUFFERS_ARB or GLX_SAMPLES_ARB are non-zero.
We were already doing this in the older swrast/glx code.
This fixes a piglit/waffle problem where we'd always fail to get a
visual/config and report the test as "skip".
Note: This is a candidate for the stable branches.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We were warning when there was no current context and we're about
to delete a renderbuffer, but that happens fairly often and isn't
really a problem.
Fixes http://bugs.freedesktop.org/show_bug.cgi?id=57754
Note: This is a candidate for the stable branches.
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
This required an update for the query storage in llvmpipe, there
can now be an active query per query type, so an occlusion query
can run at the same time as a time elapsed query.
Based on PIPE_QUERY_TIME_ELAPSED patch from Dave Airlie.
v2: fix up piglits for timers (also from Dave Airlie)
a) if we don't render anything the result is 0, so just
return the current time
b) add missing screen get_timestamp callback.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: José Fonseca <jfonseca@vmware.com>
we need to rely on util code for fetching those, just like before
9f06061d50.
Fixes bugs 57699 and 57756.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Tell LLVM the exact alignment we can guarantee, based on the fs block
dimensions, pixel format, and the alignment of the resource base pointer
and stride.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The fs shader now depends on the color buffer formats. The shader key was
extended to accommodate this, but llvmpipe_update_derived needs to be
updated to check the framebuffer dirty flag.
This fixes bug 57674.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
I fixed the only known bugs on r500 with 0222b2bd41.
Now there are no piglit regressions with Hyper-Z and all apps I tested seem
to work.
To summarize how it works:
- Only one process can use it at a time. This is a hardware limitation.
- The first process to clear a zbuffer gets the exclusive access to use
Hyper-Z.
- Compositors don't use any zbuffer, so they won't steal it, but some web
browsers do, so make sure there's no web browser running if you want your
game to use Hyper-Z.
- There's no need to restart an app which couldn't get the access to Hyper-Z.
Just quit the app which took it, the driver can turn it on for the other app
in the middle of rendering.
- If an app gets the access to Hyper-Z, it prints "radeon: Acquired Hyper-Z"
to stdout.
r300-r400:
Hyper-Z will be enabled by default on r300-r400 once sufficient testing is
done with piglit and Lightsmark at least.
Be sure to set the env var RADEON_HYPERZ and run piglit with parameters: -c 0
This fixes wrong rendering in Lightsmark and
the piglit/depthstencil-render-miplevels.
I think I fixed Hyper-Z. So far every app seems to work like a charm.
The following commit broke the i965 build:
commit 4a486f8bf2
Author: Marek Olšák <maraeo@gmail.com>
Date: Fri Nov 23 18:31:42 2012 +0100
glx/dri2: add and use new driver hook flush_with_flags
That commit added a forward declaration of enum __DRI2throttleReason to
dri_interface.h. C++ 98 does not allow forward declarations of enums.
The fix: Move the enum's definition to earlier in the file.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
The border clamping code is unnecessary, since we don't care if a wrapped
coord value is -1 or <-1 (same for length vs. >length), in either case the
border handling code will mask out the offset and replace the texel value with
the border color.
Note that technically this is not entirely correct. Omitting clamping on the
float coords means that flt->int conversion may result in undefined values for
values of very large magnitude.
However there's no reason we should honor this here since:
a) we don't care for that for ordinary wrap modes in the aos code when
converting coords and the problem is worse there (as we've got only
effectively 24 instead of 32bits)
b) at least in some cases the clamping was done already in int space hence
doing nothing to fix that problem.
c) with sse2 flt->int conversion with such values results in 0x80000000 which
is just perfect (for clamp to border - not so much for the ordinary clamp to
edge).
Reviewed-by: Brian Paul <brianp@vmware.com>
Coverity pointed out this uninitialised class member.
Note: This is a candidate for stable branches.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
coverity pointed out this field was being used uninitialised.
Note: This is a candidate for stable branches.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
all unsigneds are >= 0 :-)
There may be an argument for leaving this in, in case someone
changes min_lod to an integer, so feel free to apply or drop.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reported by coverity scan.
v2: fix second case
Note: This is a candidate for stable branches.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
I haven't confirmed this is doing the correct thing, but at
least this might make someone review it!
Reported by internal RH coverity scan.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
the critical error would use driverName.
Found by internal RH coverity scan.
Note: This is a candidate for stable branches.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This reverts commit 962a1c07b4.
Further testing revealed that this commit can cause the pre-processor to enter
infinite loops. For now, simply revert this code until a cleaner,
better-tested version is available.
Previously, we were only supporting line-continuation backslash characters
within lines of pre-processor directives, (as per the specification). With
OpenGL 4.2 and GLES3, line continuations are now supported anywhere within a
shader.
While changing this, also fix a bug where the preprocessor was ignoring
line continuation characters when a line ended in multiple backslash
characters.
The new code is also more efficient than the old. Previously, we would
perform a ralloc copy at each newline. We now perform copies only at each
occurrence of a line-continuation.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When a client frame callback is executed and the client starts rendering
again, the egl event queue might not have been dispatched so that the
buffer release event for the previous frame hasn't been processed. In
that case a third buffer is allocated, even though it would be possible
to reuse the buffer that was just released.
The wl_display_dispatch_queue_pending() entry point is available from
wayland-client 1.0.2, so require that in configure.ac. Also, just
let the pkg-config macro throw its own error, which will show what version
we were looking for and failed to find.
Note: This is a candidate for stable branches.
Signed-off-by: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>
Commit ca3ed3e024 fixed the problem where
eglMakeCurrent would trigger a getbuffer callback that then breaks the
following wl_egl_window_resize() call. However, we still need to
invalidate buffers in eglSwapBuffers, since in wayland we always swap
buffers, so the dri driver needs to come out and ask us for the next buffer
after each swapbuffer.
Note: this is a candidate for stable branches.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
These helper macros save you from writing nasty expressions like:
if ((inst->src[1].type == BRW_REGISTER_TYPE_F &&
inst->src[1].imm.f == 1.0) ||
((inst->src[1].type == BRW_REGISTER_TYPE_D ||
inst->src[1].type == BRW_REGISTER_TYPE_UD) &&
inst->src[1].imm.u == 1)) {
Instead, you simply get to write inst->src[1].is_one(). Simple.
Also, this makes the FS backend match the VS backend (which has these).
This patch also converts opt_algebraic to use the new helper functions.
As a consequence, it will now also optimize integer-typed expressions.
Reviewed-by: Eric Anholt <eric@anholt.net>
The use-after-free happened when the renderbuffer was shared by multiple
contexts and we tried to delete the renderbuffer using a context which
was previously deleted.
Note: this is a candidate for the stable branches.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
To fix a pipe_context::surface_destroy() use-after-free problem.
We previously added pipe_sampler_view_release() for similar reasons.
Note: this is a candidate for the stable branches.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
We sometimes need a rendering context when deleting renderbuffers.
Pass it explicitly instead of trying to grab a current context
(which might be NULL). The next patch will make use of this.
Note: this is a candidate for the stable branches.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
We used to invalidate the drawable after a call to eglSwapBuffers(),
so that a wl_egl_window_resize() would take effect for the next frame.
However, that leads to calling dri2_get_buffers() when eglMakeCurrent()
is called with the current context and surface, and a later call to
wl_egl_window_resize() would not take effect until the next buffer
swap.
Instead, add a callback from wl_egl_window_resize() back to the wayland
egl platform, and invalidate the drawable only when it is resized.
This solves a bug on wayland clients when going back to windowed mode
from fullscreen when clicking a pop up menu, where the window size
after this would be the fullscreen size.
Note: this is a candidate for stable branches.
CC: wayland-devel@lists.freedesktop.org
Completely forgot about updating Makefile when removing it. Stephane
already fixed the make build, but there were a few mentions of
lp_tile_soa left in the tree.
The gen4 simd16 workaround looks at ir->type to determine how much
storage to allocate for the simd16 value. In fragment programs,
texturing only ever returns float vec4s (unlike GLSL, which can also
have scalar floats or vector integers), so this is the right type.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=56962
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We need to rebase colors (ex: set G=B=0) when getting GL_LUMINANCE
textures in following cases:
1. If the luminance texture is actually stored as rgba
2. If getting a luminance texture, but returning rgba
3. If getting an rgba texture, but returning luminance
A similar fix was pushed by Brian Paul for uncompressed textures
in commit: f5d0ced.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=47220
Observed no regressions in piglit and ogles2conform due to this fix.
This patch will cause failures in intel oglconform pxconv-gettex,
pxstore-gettex and pxtrans-gettex test cases. The cause of failures
is a bug in test cases. Expected luminance value is calculted
incorrectly in test cases: L = R+G+B.
V2: Set G = 0 when getting a RG texture but returning luminance.
Note: This is a candidate for stable branches.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <idr@freedesktop.org>
Drop these from the known limitations list since support was recently added
for these.
Also, fix a typo while in the area, (and the oddly missing final newline).
Reviewed-by: Matt Turner <mattst88@gmail.com>
This test file is very similar to test 113-line-and-file-macros but uses token
pasting for cleaner quiz answers (without spaces between the digits). This
test passes thanks to the recent addition of support for pasting INTEGER
tokens, (but would have failed without that).
(Note that this test is distinct from test 059-token-pasting-integer which
pastes integers parsed from the source. Those are parsed to INTEGER_STRING
tokens and are already pasted correctly as verified by that test. The only way
to generate the INTEGER tokens which currently fail to paste is with an
internal define such as __LINE__ that results in an integer.)
Reviewed-by: Matt Turner <mattst88@gmail.com>
As recently tested in the additions to the invalid paste test, it is illegal
to paste a non-digit sequence onto the end of an integer.
The 082-invalid-paste test should now pass again.
Reviewed-by: Matt Turner <mattst88@gmail.com>
The current code lets a few invalid pastes through, such as an string pasted
onto the end of an integer. Extend the invalid-paste test to catch some of
these.
Reviewed-by: Matt Turner <mattst88@gmail.com>
This time creating a new _token_list_create_with_one_integer function
modeled after the existing _token_list_create_with_one_space function
(both implemented with new _token_list_create_with_one_ival).
Reviewed-by: Matt Turner <mattst88@gmail.com>
This function is getting a little long too read. Simplify it by pulling
up one assignment from every condition.
Reviewed-by: Matt Turner <mattst88@gmail.com>
These tokens are easy to expand by just looking at the current, tracked
location values, (and no need to look anything up in the hash table).
Add a test which verifies __LINE__ with several values, (and verifies __FILE__
for the single value of 0). Our testing framework isn't sophisticated enough
here to have a test with multiple file inputs.
This commit fixes part of es3conform's preprocess16_frag test.
Reviewed-by: Matt Turner <mattst88@gmail.com>
This should help avoid confusion now that we're using the gl_api enum
to distinguishing between core and compatibility API's. The
corresponding enum value for core API's is API_OPENGL_CORE.
Acked-by: Eric Anholt <eric@anholt.net>
Acked-by: Matt Turner <mattst88@gmail.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
The current implementation was close by not fully correct: several
operations that should be done in floating point were being done in
integer.
Fixes piglit fbo-clear-formats GL_ARB_texture_float
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
untested (couldn't get the piglit test to run even with version overrides)
but seemed blatantly wrong.
In any case it would only affect an error case which when it would happen
probably all hope is lost anyway.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
This adds array (1d,2d) texture support to llvmpipe.
Though probably should do something about 1d array textures requiring gobs
of memory (this issue is not strictly limited to arrays but it is probably
worse there).
Initial code by Jakob Bornecrantz <jakob@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Support 1d and 2d array textures (including shadow samplers),
and (as a side effect mostly) also shadow cube samplers.
Seems to pass the relevant piglit tests both for sampling and rendering
to (though some require version overrides).
Since we don't support render target indices rendering to array textures
is still restricted to a single layer at a time.
Also, the min/max layer in the sampler view (which is unnecessary for GL)
is ignored (always use all layers).
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Now dead code.
Also had to remove the show_tiles/show_subtiles because now the color
buffers are always stored in their native format, so there is no longer
an easy way to paint the tile sizes.
Depth-stencil buffers are still swizzled.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Update llvmpipe_is_format_supported and llvmpipe_is_format_unswizzled
so that only the formats that we can render without swizzling are
advertised.
We can still render all D3D10 required formats except
PIPE_FORMAT_R11G11B10_FLOAT, which needs to be implemented in a future
opportunity.
Removal of rendertarget swizzling will be done in a subsequent change.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
It is buggy (it was giving wrong results for some of the formats with
padding), and util_format_description::is_array already does precisely
what's intended.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
This is what we want in practice.
The only change is in PIPE_FORMAT_R8SG8SB8UX8U_NORM, which no longer is
considered an array format.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
This patch fixes various format manipulation for big-endian
architectures.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This patch fixes various format manipulation for big-endian
architectures.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This patch adds two more functions in type conversions header:
* lp_build_bswap: construct a call to llvm.bswap intrinsic for an
element
* lp_build_bswap_vec: byte swap every element in a vector base on the
input and output types.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This patch fixes the vector constant generation used for vector shuffle
for big-endian machines.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This patch enforces the clear of NJ bit in VSCR Altivec register so
denormal numbers are handles as expected by IEEE standards.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This patch adds Altivec intrinsics for float vector types. It changes
the SSE specific definitions to a platform neutral and adds the calls
to Altivec intrinsic builder.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This patch add correct vector addition and substraction intrisics when
using Altivec with PPC. Current code uses default path and LLVM backend
ends up issuing carry-out arithmetic instruction while it is expected
saturated ones.
It also includes a fix for PowerPC where char are unsigned by default,
resulting in bogus values for vector shifting.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This patch adds PPC Altivec support for pack/unpack operations using Altivec
supported vector type (8xi8, 16xi16, 4xi32, 4xf32).
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The brw_compile structure contains the brw_instruction store and the
brw_eu_emit.c state tracking fields. These are only useful for the
final assembly generation pass; the earlier compilation stages doesn't
need them.
This also means that the code generator for future hardware won't have
access to the brw_compile structure, which is extremely desirable
because it prevents accidental generation of Gen4-7 code.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Compiling shaders requires several main steps:
1. Generating VS IR from either GLSL IR or Mesa IR
2. Optimizing the IR
3. Register allocation
4. Generating assembly code
This patch splits out step 4 into a separate class named "vec4_generator."
There are several reasons for doing so:
1. Future hardware has a different instruction encoding. Splitting
this out will allow us to replace vec4_generator (which relies
heavily on the brw_eu_emit.c code and struct brw_instruction) with
a new code generator that writes the new format.
2. It reduces the size of the vec4_visitor monolith. (Arguably, a lot
more should be split out, but that's left for "future work.")
3. Separate namespaces allow us to make helper functions for
generating instructions in both classes: ADD() can exist in
vec4_visitor and create IR, while ADD() in vec4_generator() can
create brw_instructions. (Patches for this upcoming.)
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Final code generation should never fail. This is a bug, and there
should be no user-triggerable cases where this could occur.
Also, we're not going to have a fail() method after the split.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
The brw_compile structure is closely tied to the Gen4-7 hardware
encoding. However, do_vs_prog is very generic: it just calls out to
get a compiled program and then uploads it.
This isn't ultimately where we want it, but it's a step in the right
direction: it's now closer to the code generator.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
During compilation, we allocate a bunch of things: the IR needs to last
at least until code generation...and then the program store needs to
last until after we upload the program.
For simplicity's sake, just keep it all around until we upload the
program. After that, it can all be freed.
This will also save a lot of headaches during the upcoming refactoring.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
We used to steal it out of the brw_compile struct, but that won't be
initialized in time soon (and is eventually going away).
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
We used to steal it out of the brw_compile struct...but vec4_visitor
isn't going to have one of those in the future.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This leaves only the final code generation stage in brw_vec4_emit.cpp,
moving the payload setup, run(), and brw_vs_emit functions to brw_vec4.cpp.
The fragment shader backend puts these functions in brw_fs.cpp, so this
patch also helps with consistency.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
I ran across this while running a glGenerateMipmap() test.
_meta_GenerateMipmap sets MESA_META_TRANSFORM, which causes
_mesa_meta_begin to try and set a default orthographic projection.
Unfortunately, if the drawbuffer isn't set up, ctx->DrawBuffer->Width
and Height are 0, which just causes an GL_INVALID_VALUE error.
Fixes oglconform's fbo/mipmap.automatic, mipmap.manual, and
mipmap.manualIterateTexTargets.
Reviewed-by: Brian Paul <brianp@vmware.com>
The rest of the plumbing was in place already.
I have tested this by turning on all GL 3.1 features.
The drivers not supporting GL 3.1 will fail to create a core profile
as they should.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Add a DEBUG_FREED_MEMORY option to help catch use-after-free errors.
Add debug_memory_check() function which can be periodically called to
check that all known blocks are good.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Will allow formats with padding, e.g. RGBX.
Will now allow swizzled formats as long as the alpha is channel 3.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
And add test cases to ensure that this works
- 110 verifies that glcpp rejects #elif<digits> which glcpp
previously accepted.
- 111 verifies that glcpp accepts #if followed immediately by
(, +, -, !, or ~.
- 112 does the same as 111 but for #elif.
See 17f9beb6 for #if change.
Reviewed-by: Carl Worth <cworth@cworth.org>
radeonsi now supports Z16 and doesn't fail these assertions anymore.
This partially reverts commit 7bba4879bb, but
leaves the error messages in place to allow diagnosing such problems even with
non-debugging builds.
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Fixes this SCons build error on Mac OS X if X11 is found.
NameError: name 'ws_xlib' is not defined:
File "SConstruct", line 144:
duplicate = 0 # http://www.scons.org/doc/0.97/HTML/scons-user/x2261.html
File "scons-2.2.0/SCons/Script/SConscript.py", line 614:
return method(*args, **kw)
File "scons-2.2.0/SCons/Script/SConscript.py", line 551:
return _SConscript(self.fs, *files, **subst_kw)
File "scons-2.2.0/SCons/Script/SConscript.py", line 260:
exec _file_ in call_stack[-1].globals
File "src/SConscript", line 34:
SConscript('gallium/SConscript')
File "scons-2.2.0/SCons/Script/SConscript.py", line 614:
return method(*args, **kw)
File "scons-2.2.0/SCons/Script/SConscript.py", line 551:
return _SConscript(self.fs, *files, **subst_kw)
File "scons-2.2.0/SCons/Script/SConscript.py", line 260:
exec _file_ in call_stack[-1].globals
File "src/gallium/SConscript", line 135:
'targets/libgl-xlib/SConscript',
File "scons-2.2.0/SCons/Script/SConscript.py", line 614:
return method(*args, **kw)
File "scons-2.2.0/SCons/Script/SConscript.py", line 551:
return _SConscript(self.fs, *files, **subst_kw)
File "scons-2.2.0/SCons/Script/SConscript.py", line 260:
exec _file_ in call_stack[-1].globals
File "src/gallium/targets/graw-xlib/SConscript", line 9:
ws_xlib,
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
According to the ARB_vertex_type_2_10_10_10_rev specification:
"The error INVALID_ENUM is generated by VertexP*, NormalP*,
TexCoordP*, MultiTexCoordP*, ColorP*, or SecondaryColorP if <type>
is not UNSIGNED_INT_2_10_10_10_REV or INT_2_10_10_10_REV."
Fixes 7 subcases of oglconform's packed-vertex test.
v2: Add "gl" prefix to error messages (pointed out by Brian).
Also rebase atop the ctx plumbing.
Reviewed-by: Brian Paul <brianp@vmware.com>
Traditionally, OpenGL has had two separate equations for converting from
signed normalized fixed-point data to floating point data. One was used
primarily for vertex data, while the other was primarily for texturing
and framebuffer data.
However, ES 3.0 and GL 4.2 change this, declaring there's only one
equation to be used in all cases. Unfortunately, it's the other one.
v2: Correctly convert 0b10 to -1.0, as pointed out by Chris Forbes.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
The rules for converting these values actually depend on the current
context API and version. The next patch will implement those changes.
v2: Mark ctx as const, as suggested by Brian.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Fixes part of es3conform's transform_feedback_init_defaults test.
NOTE: This is a candidate for the stable branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
v2: Perform this count the same way as elsewhere in this file, per
Brian Paul's review.
Fixes part of es3conform's transform_feedback_init_defaults test.
NOTE: This is a candidate for the stable branches.
Reviewed-by: Brian Paul <brianp@vmware.com>
Previously this function would assert if the format didn't fit an expected 4 channel format size.
Now will work with any format type with any amount of channels.
Signed-off-by: José Fonseca <jfonseca@vmware.com>
The brw_compile structure contains the brw_instruction store and the
brw_eu_emit.c state tracking fields. These are only useful for the
final assembly generation pass; the earlier compilation stages doesn't
need them.
This also means that the code generator for future hardware won't have
access to the brw_compile structure, which is extremely desirable
because it prevents accidental generation of Gen4-7 code.
v2: rzalloc p, as suggested by Eric.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Compiling shaders requires several main steps:
1. Generating FS IR from either GLSL IR or Mesa IR
2. Optimizing the IR
3. Register allocation
4. Generating assembly code
This patch splits out step 4 into a separate class named "fs_generator."
There are several reasons for doing so:
1. Future hardware has a different instruction encoding. Splitting
this out will allow us to replace fs_generator (which relies
heavily on the brw_eu_emit.c code and struct brw_instruction) with
a new code generator that writes the new format.
2. It reduces the size of the fs_visitor monolith. (Arguably, a lot
more should be split out, but that's left for "future work.")
3. Separate namespaces allow us to make helper functions for
generating instructions in both classes: ADD() can exist in
fs_visitor and create IR, while ADD() in fs_generator() can
create brw_instructions. (Patches for this upcoming.)
Furthermore, this patch changes the order of operations slightly.
Rather than doing steps 1-4 for SIMD8, then 1-4 for SIMD16, we now:
- Do steps 1-3 for SIMD8, then repeat 1-3 for SIMD16
- Generate final assembly code for both modes together
This is because the frontend work can be done independently, but final
assembly generation needs to pack both into a single program store to
feed the GPU.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Final code generation should never fail. This is a bug, and there
should be no user-triggerable cases where this could occur.
Also, we're not going to have a fail() method in a moment.
v2: Just abort() rather than assert, to cover the NDEBUG case
(suggested by Eric).
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
All we really need is a memory context and the instruction list; passing
a backend_visitor is just convenient at times.
This will be necessary two patches from now.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
The brw_compile structure is closely tied to the Gen4-7 hardware
encoding. However, do_wm_prog is very generic: it just calls out to
get a compiled program and then uploads it.
This isn't ultimately where we want it, but it's a step in the right
direction: it's now closer to the code generator.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
We used to steal it out of the brw_compile struct...but fs_visitor
isn't going to have one of those in the future.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Also change it from a brw_fragment_program to a gl_fragment_program,
since that seems to be what everything wants anyway.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
We can easily recover it from prog, and this makes it clear that we
aren't passing additional information in.
v2: Use an if-statement rather than the ?: operator (suggested by Eric).
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Also, rather than having brw_wm_fs_emit poke at it directly, make it a
parameter to the fs_visitor constructor.
All other changes generated by search and replace (with occasional
whitespace fixup).
v2: Make dispatch_width const (as suggested by Paul); fix doxygen
mistake (pointed out by Eric); update for rebase.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Now that we only have the one backend, there's no real point in keeping
this separate. Moving it should allow some future simplifications.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Everybody determines this by checking if fp's OutputsWritten field
contains the FRAG_RESULT_DEPTH bit. Rather than having payload setup
check this and set the computes_depth flag, we can just do the check in
the only place that actually used it: emit_fb_writes().
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
No longer have to split fetching into quads dynamically if mip levels
are not the same for all quads (aos sampling still always splits due
to performance reasons).
Instead handle multiple mip levels further down, minification etc. takes
this into account.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
This also adds some code to handle per-quad lods for more than 4-wide fetches,
because otherwise I'd have to integrate the texelFetch function into
the splitting stuff... (but it is not used yet outside texelFetch).
passes piglit fs-texelFetch-2D, fails fs-texelFetchOffset-2D due to I believe
a test error (results are undefined for out-of-bounds fetches, we return
whatever is at offset 0, whereas the test expects [0,0,0,1]).
Texel offsets are only handled by texelFetch for now, though the interface
can handle it for everything.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
v2 (Kayden): Move the enable into an existing intel->gen >= 4 block
(as suggested by Ian).
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Implements BGRA swizzle, sign recovery, and normalization
as required by ARB_vertex_type_10_10_10_2_rev.
V2: Ported to the new VS backend, since that's all that's left;
fixed normalization.
V3: Moved fixups out of the GLSL-only path, so it works for FF/VP too.
V4 (Kayden): Rework ES3 normalization, don't heap allocate registers;
tidy comments.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Flag the need for various workarounds to be applied by
the vertex shader.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Next few patches build on this to add other workarounds
for packed formats.
V2: rename BRW_ATTRIB_WA_COMPONENTS to BRW_ATTRIB_WA_COMPONENT_MASK;
V3 (Kayden): remove separate bit for ES3 signed normalization
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Always use R10G10B10A2_UINT; Most of the other formats we'd like
don't actually work on the hardware. Will emit w/a for scaling,
sign recovery and BGRA swizzle in the VS.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Until we have proper 'make dist' this is an improvement of the current
situation, because each time some old Makefiles got converted to automake
we had to update the tarballs target.
NOTE: This is a candidate for the 9.0 branch.
Cc: Eric Anholt <eric@anholt.net>
Acked-by: Matt Turner <mattst88@gmail.com>
We can't support IF statements in 16-wide on these. To get back to 16-wide
for these shaders, we need to support predicate on discard instructions in the
backend IR, which is something we've sort of got on the list to do anyway.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=55828
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Commit 774fb90db3 introduced a ralloc context to
each user of struct brw_compile, but for this one a NULL context was used,
causing the later ralloc_free(mem_ctx) to not do anything.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=55175
NOTE: This is a candidate for the stable branches.
We have a special case where non-shadow comparison with LOD requires using a
SIMD16 vec4 in an 8-wide shader, which appears in the register allocator as a
size 8 vgrf.
Fixes assertions in various piglit tests and webgl conformance.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=56521
Since a signed 2-bit integer can only represent -1, 0, or 1, it is
tempting to simply to convert it directly to a float. This maps it
onto the correct range of [-1.0, 1.0]. However, it gives different
values compared to the usual equation:
(2.0 * 1.0 + 1.0) * (1.0 / 3.0) = +1.0 (same)
(2.0 * 0.0 + 1.0) * (1.0 / 3.0) = +0.33333333... (different)
(2.0 * -1.0 + 1.0) * (1.0 / 3.0) = -0.33333333... (different)
According to the GL_ARB_vertex_type_2_10_10_10_rev extension, signed
normalization is performed using equation 2.2 from the GL 3.2
specification, which is:
f = (2c + 1)/(2^b - 1). (2.2)
Comments below that equation state: "In general, this representation is
used for signed normalized fixed-point parameters in GL commands, such
as vertex attribute values." Which is what we're doing here.
The 3.2 specification goes on to declare an alternate formula:
f = max{c/(2^(b-1) - 1), -1.0} (2.3)
which is closer to the existing code, and maps the end points to exactly
-1.0 and 1.0. Comments below the equation state: "In general, this
representation is used for signed normalized fixed-point texture or
framebuffer values." Which is *not* what we're doing here.
It then states: "Everywhere that signed normalized fixed-point
values are converted, the equation used is specified." This is the real
clincher: the extension explicitly specifies that we must use equation
2.2, not 2.3. So we need to do (2x + 1) / 3.
This matches the behavior expected by oglconform's packed-vertex test,
and is correct for desktop GL (pre-4.2). It's not correct for ES 3.0,
but a future patch will correct that.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Marek Olšák <maraeo@gmail.com>
For the 10-bit components, the divisor was incorrect. A 10-bit signed
integer can represent -2^9 through 2^9 - 1, which leads to the following
ranges:
(float)value.x -> [ -512, 511]
2.0F * (float)value.x -> [-1024, 1022]
2.0F * (float)value.x + 1.0F -> [-1023, 1023]
So dividing by 511 would incorrectly scale it to approximately:
[-2.001956947, 2.001956947]. To correctly scale to [-1.0, 1.0], we need
to divide by 1023.
This correctly implements the desktop GL rules. ES 3.0 has different
rules, but those will be implemented in a separate patch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Marek Olšák <maraeo@gmail.com>
The bug was found by Coverity.
NOTE: This is a candidate for the stable branches.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This gives us checking of our arguments (no more passing 1 operand to
BRW_OPCODE_MUL!), at the cost of a couple of extra parens.
v2: Rebase on gen6-if fix.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
This was a regression in the brw_fs_fp.cpp change. We just need to return
something good enough to get the IR generation to the end without crashing,
but ir->type isn't initialized and we wanted something of the coordinate's
type anyway.
Fixes around 30 piglit cases on my ilk system in drawpixels and framebuffer
blit.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=56962
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The theory of the guardband is that you extend the clip volume to avoid
expensive clipping computation, and just let fragments outside the viewport
get clipped by the drawable's bounds. But if a smaller-than-window-size
viewport is set, and we don't also happen to have a scissor set, then
rendering could incorrectly extend outside of the viewport when it should have
been clipped to the viewport.
Fixes the new piglit triangle-guardband-viewport test.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
NOTE: This is a candidate for the 9.0 branch.
When you're comparing to the spec, you're trying to immediately see what
numbered dword of the packet your bit ends up in.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
NOTE: This is a candidate for the 9.0 branch.
All Intel code is compiled with -std=c99. There is no excuse to not use
designated initializers.
As a nice benefit, the code is now more friendly to grep. Without
designated initializers, psychic prowess is required to find the
initialization of DRI extension function pointers with grep. I have
observed several people, when they first encounter the DRI code, fail at
statically chasing the DRI function pointers due to this problem.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
The dri directory is compiled with -std=c99. There is no excuse to not use
designated initializers.
As a nice benefit, the code is now more friendly to grep. Without
designated initializers, psychic prowess is required to find the
initialization of DRI extension function pointers with grep. I have
observed several people, when they first encounter the DRI code, fail at
statically chasing the DRI function pointers due to this problem.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
For a packed depth/stencil buffer on separate stencil hardware, the
separate depth miptree is set up with alignment of 4,4 and the separate
stencil miptree is setup with alignment of 8,8. We can't just use the
irb->draw_{x,y} offsets for stencil, since that is the offset in the
depth miptree.
Fixes 12 piglit depthstencil testcases on ivb.
Acked-by: Chad Versace <chad.versace@linux.intel.com>
Given that we have the mask information here (assuming the rebase is to
the same tiling, which is safe), we can just save a set of miptrees and
offsets and the global intra-tile offset in the context and cut out a
bunch of logic. This will also save emitting the next fix I need to do
twice.
Acked-by: Chad Versace <chad.versace@linux.intel.com>
Fixes a theoretical problem where we had an aligned depth buffer and a
misaligned stencil buffer with a matching tile offset, so we would fail
to rebase depth even after the needed tile offset changed due to the
rebase of stencil.
It should also fix double-rebase of a misaligned packed depth/stencil
renderbuffer, which may have been a performance issue.
Acked-by: Chad Versace <chad.versace@linux.intel.com>
We were always passing 0 for one of the two fields, and the code just used
whichever one wasn't 0.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
I noticed these in the next patch where these paths were using the Face
of a teximage but didn't have array handling.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Apparently this was accidentally marked as unimplemented, and thus not
put in the dispatch table.
Fixes 7 es3conform tests:
- copy_buffer_parameters
- copy_buffer_data
- copy_buffer_usage
- pixel_buffer_object_bind
- pixel_buffer_object_parameteriv
- pixel_buffer_object_texture_read
- pixel_buffer_object_usage
v2: Also update the DispatchSanity test for this change.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Only legacy OpenGL allows the use of non-gen'd names. Core profiles
and ES 3 both require the use of glGenQueries().
Note that BeginQuery doesn't exist in ES 1 or ES 2.
Fixes es3conform's occlusion_query_invalid_beginquery test.
Reviewed-and-tested-by: Matt Turner <mattst88@gmail.com>
GL_READ_FRAMEBUFFER and GL_DRAW_FRAMEBUFFER are valid targets in ES 3.
Fixes 23 es3conform framebuffer_blit tests. Two more go from fail to
crash, but that appears to be because they actually run now.
Reviewed-and-tested-by: Matt Turner <mattst88@gmail.com>
Calling glTexParameteri() with pname GL_TEXTURE_MAX_LEVEL and either a
target of GL_TEXTURE_RECTANGLE or a negative value previously generated
GL_INVALID_OPERATION. However, GL_INVALID_VALUE seems more appropriate.
Fixes oglconform's api-error/negative.glTexParameter and es3conform's
sgis_texture_lod_basic_error.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-and-tested-by: Matt Turner <mattst88@gmail.com>
The new brw_reg always had type BRW_REGISTER_TYPE_F, rather than
inheriting the original type of the ATTR file register.
In the past, this hasn't been a problem since we only execute this code
when fixing up GL_FIXED attributes, which always have float types.
However, we'll soon be using it for ARB_vertex_type_10_10_10_2 support,
which uses D and UD types.
Reviewed-by: Eric Anholt <eric@anholt.net>
For GLES1 and GLES2, brwCreateContext neglected to validate the requested
context version received from the DRI layer. If DRI requested an OpenGL
ES2 context with version 3.9, we provided it one.
Before this fix, the switch statement that validated the requested GL
context flavor was an ugly #ifdef copy-paste mess. Instead of reproducing
the copy-past-mess for GLES1 and GLES2, I first refactored it. Now the
switch statement is readable.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
It seems that -NDEBUG and other flags might still be leaked through
those variables, so strip those off there as well.
NOTE: This is a candidate for the 9.0 branch.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
In addition to registers used by instructions, fs_visitor maintains
direct references to certain "special" values used for inputs/outputs.
When I added VGRF compaction, I overlooked these, believing that these
direct references weren't used once instructions were generated. That
was wrong. For example, pixel_x/y are used in virtual_grf_interferes(),
which is called by optimization passes and register allocation.
This patch treats all of them as used and patches them after compacting.
While it's not strictly necessary to patch all of them (as some aren't
used after emitting code), it seems safer to simply fix them all.
Fixes oglconform's textureswizzle/advanced.shader.targets, piglit's
glsl-fs-lots-of-tex, and glean's texCombine on pre-Gen6 hardware.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=56790
Reviewed-by: Eric Anholt <eric@anholt.net>
The goal of that change was to skip counting things that aren't actually
outputs from the VS to the FS. However, explicit_location isn't set in
the case of linker-assigned locations (the common case), so basically
varying component counting got disabled. At this stage of the linker,
we've already ensured that var->location is set, so we can just look at
it without worrying.
Fixes i965 assertion failure with the new
piglit glsl-max-varyings --exceed-limits.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=51545
Reviewed-by: Brian Paul <brianp@vmware.com>
The diff looks funny, but it's moving the integer vs non-integer check
below the _mesa_source_buffer_exists() check that ensures
_ColorReadBuffer is non-null, so we get a GL_INVALID_OPERATION instead
of a segfault. This looks like it had regressed in the
_mesa_error_check_format_and_type() changes, which removed the first of
the two duplicated checks for the source buffer. Fixes segfault in the
new piglit ARB_framebuffer_object/negative-readpixels-no-rb.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=45877
NOTE: This is a candidate for the stable branches.
Reviewed-by: Brian Paul <brianp@vmware.com>
Now that we're using the new backend, we may actually put things into push
constants if you have too many uniform values uploaded. Also, correctly
account for texture rectangle params and drop the old special case for the
0.0/1.0 params from the old backend.
MinGW has snprintf.
The patch fixes these warnings with the MinGW SCons build.
src/gallium/auxiliary/util/u_snprintf.c:459:1: warning: no previous prototype for ‘util_vsnprintf’ [-Wmissing-prototypes]
src/gallium/auxiliary/util/u_snprintf.c:1436:1: warning: no previous prototype for ‘util_snprintf’ [-Wmissing-prototypes]
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Tested-by: Brian Paul <brianp@vmware.com>
If an instruction reads from a constant register that contains
immediates using an invalid swizzle, we can avoid generating MOV
instructions to fix up the swizzle by loading the immediates into a
different constant register that can be read using a valid swizzle.
This only affects r300 and r400 cards.
For example:
CONST[1] = { -3.5000 3.5000 2.5000 1.5000 }
MAD temp[4].xy, const[0].xy__, const[1].xz__, input[0].xy__;
========== Before this change would be lowered to: =========
CONST[1] = { -3.5000 3.5000 2.5000 1.5000 }
MOV temp[0].x, const[1].x___;
MOV temp[0].y, const[1]._z__;
MAD temp[4].xy, const[0].xy__, temp[0].xy__, input[0].xy__;
========== After this change is lowered to: ===============
CONST[1] = { -3.5000 3.5000 2.5000 1.5000 }
CONST[2] = { 0.0000 -3.5000 2.5000 0.0000 }
MAD temp[4].xy, const[0].xy__, const[2].yz__, input[0].xy__;
============================================================
This change reduces one of the Lightsmark shaders from 133 to 91
instructions.
v2:
- Fix crash caused by swizzles with only inline constants.
Use per asic golden values.
Programming this register doesn't seem to be strictly
necessary on SI, but programming it wrong leads to
rendering issues or reduced performance so just
go ahead and program the golden values explicitly
to avoid any potential problems down the road.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
For precise lts support I had to do some magic with the library names, which works fine
as long as the libraries from pkg-config are used.
The parts with src/gallium/targets/va-*/Makefile will not apply on the master branch,
but do apply to the 9.0 branch.
NOTE: This is a candidate for the 9.0 branch.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Acked-by: Matt Turner <mattst88@gmail.com>
fixes regression introduced in 9078441072
Targets for making lex.yy.c program_parse.tab.c and program_parse.tab.h
got moved into its own Makefile
Reviewed-by: Matt Turner <mattst88@gmail.com>
This was added in version 22 of the GL_ARB_sync spec.
Fixes gles3conform's sync_error_waitsync_timeout test.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
All the other range checks on index already return the proper error,
INVALID_VALUE.
Fixes gles3conform's instanced_arrays_invalid test.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
brw_optimize.c's brw_opcodes table was a copy of brw_disasm.c's
opcode_descs table, but with an additional field: is_arith. Now that
I've deleted that, the two are identical. Keep the one in brw_disasm.c.
Reviewed-by: Eric Anholt <eric@anholt.net>
All users of basic block analysis simply create their own local
variables. Nobody uses the visitor-wide field.
Reviewed-by: Eric Anholt <eric@anholt.net>
The old brw_remove_grf_to_mrf_moves() pass is obsolete and replaced by
fs_visitor::compute_to_mrf().
The old brw_remove_duplicate_mrf_moves() pass is obsolete and replaced
by fs_visitor::remove_duplicate_mrf_writes().
The remaining pass, brw_set_dp4_dependency_control(), is currently
unused, but could be, so I'm leaving it for now.
Reviewed-by: Eric Anholt <eric@anholt.net>
At this point, it's just gl_shader_program. Nobody even uses it; even
the program that creates them only returns gl_shader_program pointers.
Reviewed-by: Eric Anholt <eric@anholt.net>
The passthrough pipeline needs to check index values (which might be passed
through) as they can be invalid (which causes crashes and various assertion
failures if the clip code runs). Obviously, rendering won't be well-defined,
but those bogus indices might come directly from apps.
There were already debug printfs which reported the out-of-bounds indices but
we really ought to not crash.
While checking at that point doesn't seem like the most efficient solution,
it seems there isn't really another appropriate function to do it.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Assert the the CB format is valid and default to
the INVALID hw format rather than ~0U when the format
doesn't match for non-debug builds.
v2: use INVALID hw format rather than ~0U
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Assert that the DB format is valid and default to
the INVALID hw format rather than ~0U when the format
doesn't match for non-debug builds.
v2: use INVALID hw format rather than ~0U
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
This is necessary for backwards compatibility with pre-SI for stencil.
Fixes a number of stencil related piglit tests, and real apps using stencil.
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
On Gen6-7, we don't compact clip planes, and nr_userclip_plane_consts
is the last bit set, so iterating from i = 0..nr_userclip_plane_consts
covers all active clip planes and is the right thing to do.
works and is the right thing to do.
However, that doesn't work at all on Gen4-5. Since we don't compact
clip planes, we skip over ones which aren't active (via the continue
statement). We also set set nr_userclip_plane_consts to the number of
active clip planes, which means that we end the loop after checking that
many bits. If the set of clip planes wasn't contiguous, this means we'd
fail to find the last few.
By changing the iteration to MAX_CLIP_PLANES, we correctly find all of
the active clip planes.
Fixes regressions since 66c8473e02 (replacing the old VS backend) in
Piglit's spec/glsl-1.20/execution/clipping/fixed-clip-enables and
oglconform's mustpass(basic.clip) and userclip(basic.allCases).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=56791
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
There's no compaction, so we can drop that code and simply use 'i'.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Since Gen4-5 compacts clip planes and Gen6-7 doesn't, it makes sense to
split them into separate code paths. This patch simply copies the code
to both halves; the next commits will simplify it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The previous 1023-entry chaining hash table never resized, so it was very
inefficient when there were many objects live. While one could have an even
more efficient implementation than this (keep an array for genned names with
packed IDs, or take advantage of the fact that key == hash or key ==
*(uint32_t *)data to store less data), this is fairly fast, and I want a nice
replacement hash table for other parts of Mesa, too.
It improves Minecraft performance 12.3% +/- 1.4% (n=9), dropping hash lookups
from 8% of the profile to 0.5%.
I also tested cairo-gl, which should be a pessimal workload for this hash
table: around 247000 FBOs created and destroyed, only around 65 live at any
time, and few lookups of them between creation and destruction. No
statistically significant performance difference at n=76 (mean 20.3/20.4
seconds, sd 2.8/3.2 seconds). If I remove the >20 seconds outliers that
appear to be due to thermal throttling, there's possibly a .97% +/- 0.31%
performance win (n=61/59). The choice of cutoff for outliers feels a lot like
cooking the data, but I've gone through this process 3 times for minor
iterations of the code with the same conclusion each time.
Reviewed-by: Brian Paul <brianp@vmware.com>
Acked-by: Chad Versace <chad.versace@linux.intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
Mesa's chaining hash table for object names is slow, and this should be much
faster. I namespaced the functions under _mesa_*, to avoid visibility
troubles that we may have had before with hash_table_* functions.
v2: Move .c file to main/, const a few things, clean up loop conditions,
add/extend some comments.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
sparc/clip.c got moved to sparc/sparc-clip.c to avoid doing this workaround in
the parent directory.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
While simplifying mesa/Makefile.am, the more important feature of this commit
is allowing a file with the same name to appear in both main/ and program/.
v2: [chadv] Add changes to Android makefiles.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
Signed-off-by: Chad Versace <chad.versace@linux.intel.com> (v2)
The pair of files src/mesa/Android.mk and src/mesa/Android.gen.mk are too
long and complex to be easily understood. This patch belongs to a series
that decomposes them into several easily digestible makefiles.
This patch move the rules for libmesa_st_mesa.a from Android.mk to
Android.libmesa_st_mesa.mk.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
The pair of files src/mesa/Android.mk and src/mesa/Android.gen.mk are too
long and complex to be easily understood. This patch belongs to a series
that decomposes them into several easily digestible makefiles.
This patch move the rules for libmesa_dricore.a from Android.mk to
Android.libmesa_dricore.mk.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
The pair of files src/mesa/Android.mk and src/mesa/Android.gen.mk are too
long and complex to be easily understood. This patch belongs to a series
that decomposes them into several easily digestible makefiles.
This patch move the rules for host executable mesa_gen_matypes from
Android.mk to Android.mesa_gen_matypes.mk.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
The pair of files src/mesa/Android.mk and src/mesa/Android.gen.mk are too
long and complex to be easily understood. This patch belongs to a series
that decomposes them into several easily digestible makefiles.
This patch move the rules for the host and target libmesa_glsl_utils.a
from Android.mk to Android.libmesa_glsl_utils.mk.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
They were always used with the corresponding *_FILES variables now that
automake handles rule generation.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Array textures were broken.
NOTE: This is a candidate for the stable branches.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Array textures were broken.
NOTE: This is a candidate for the stable branches.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
It was pretty broken with array textures, where the array size (height or
depth depending on the target) shouldn't be magnified.
The guessing also doesn't fail with 1D and cube textures.
NOTE: This is a candidate for the stable branches.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
MaxLog2 led to bugs, because it didn't work well with 1D and 3D textures.
NOTE: This is a candidate for the stable branches.
v2: correct the comment at MaxNumlevels
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
This might have a slight overhead but handling mip offsets more like
the width (and image) strides should make some things easier (mip level
being just part of the offset calculation) later.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
This is preparation work for using mip level offsets + base_ptr for texture
sampling instead of per-mip pointers.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
I can never remember what "AB" means, and having to constantly consult
the docs is annoying. Just add comments to the top which explain each
of the abbreviations.
Previously, we used these XML annotations to make the code generation
scripts aware of any instances where the Mesa implementation of a
function had a prefix other than "_mesa_". Now that all of the mesa
implementation functions have been renamed to match the XML, we only
need to handle exec="skip", exec="dynamic", and the default case of
exec="mesa".
Acked-by: Brian Paul <brianp@vmware.com>
Previously, we used the mesa_name XML attribute to make the code
generation scripts aware of any instances where the Mesa
implementation of a function had a different function name suffix than
the primary name in the XML. Now that all of the Mesa implementation
functions have been renamed to match the XML, this attribute is no
longer necessary.
Acked-by: Brian Paul <brianp@vmware.com>
This patch changes the use of const in the type signatures of
_mesa_ShaderSource() and _mesa_TransformFeedbackVaryings(), to match
the type signatures in the GL spec. This avoids warnings when
building the code-generated api_exec.c file.
Note: previously we avoided the build warnings because these functions
were being type-checked against ShaderSourceARB and
TransformFeedbackVaryingsEXT; those functions are semantically
equivalent, but have fewer const qualifiers in their type signatures.
Acked-by: Brian Paul <brianp@vmware.com>
Vector indexing on matrixes generates several copy of the
constant matrix, for instance vec=mat4[i][j] generates :
vec=mat4[i].x;
vec=(j==1)?mat4[i].y;
vec=(j==2)?mat4[i].z;
vec=(j==3)?mat4[i].w;
In the case of constant matrixes, the mat4[i] expression generates
copy of the 16 elements of the matrix 4 times ; indirect addressing
also prevents some conservative CSE algorithms (like the one in LLVM)
from factoring the mat4[i] expression.
This patch will make the vec_index_to_cond pass generates :
temp = mat4[i];
vec=temp.x;
vec=(j==1)?temp.y;
vec=(j==2)?temp.z;
vec=(j==3)?temp.w;
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We were accidentally setting bit 14 in DWord 2 (which is Reserved/MBZ)
rather than bit 14 in DWord 3 (which is AA Line Distance Mode).
There's also no reason to ever set it to legacy mode; the bit is only
used when drawing antialiased lines anyway. Set it unconditionally.
NOTE: This is a candidate for stable branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The parameters and operation of this function changed, but I didn't
bother to change the prologue comment.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
This reverts commit 0d61f879a1.
Assigning the FS inputs to the 12 bit field is fine since we don't care
about the higher FS inputs. Maybe I'll revisit silencing the compiler
warning another day.
By moving the HASH_LINE rule out of control_line: and into line:, we avoid
adding control_line's additional \n (as seen in the first hunk).
mattst88: Carl and I determined independently of Fabian that the 091
test needed to be modified identically to this, and our patch to fix the
test was more complicated.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=51506
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Carl Worth <cworth@cworth.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously we were accepting garbage after #else and #endif tokens when
the previous preprocessor conditional evaluated to false (eg, #if 0).
When the preprocessor hits a false conditional, it switches the lexer
into the SKIP state, in which it ignores non-control tokens. The parser
pops the SKIP state off the stack when it reaches the associated #elif,
#else, or #endif. Unfortunately, that meant that it only left the SKIP
state after the lexing the entire line containing the #token and thus
would accept garbage after the #token.
To fix this we use a mid-rule, which is executed immediately after the
#token is parsed.
NOTE: This is a candidate for the stable branch
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=56442
Fixes: preprocess17_frag.test from oglconform
Reviewed-by: Carl Worth <cworth@cworth.org> (glcpp-parse.y)
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Brian reported seeing:
r600_texture.c: In function ‘r600_texture_create_object’:
r600_texture.c:468:12: warning: format ‘%llu’ expects type ‘long long unsigned int’, but argument 3 has type ‘uint64_t’
r600_texture.c:468:12: warning: format ‘%llu’ expects type ‘long long unsigned int’, but argument 4 has type ‘uint64_t’
r600_texture.c:485:12: warning: format ‘%llu’ expects type ‘long long unsigned int’, but argument 3 has type ‘uint64_t’
r600_texture.c:485:12: warning: format ‘%llu’ expects type ‘long long unsigned int’, but argument 4 has type ‘uint64_t’
this should wrap over them fine.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This contains the evergreen support.
Support is possible on rv670 upwards and the code in here
should work, but it doesn't and I haven't debugged it to
figure out why.
Beyond just adding support for the cube map array sampling,
r600 resinfo isn't conformant with the GL specification,
which states the number of layers should be returned for
the textureSize, so we have to track in an external
constant buffer the layers for each sampler if we need
them in the shader.
v2: only update the sampler constants if the sampler views have changed,
as suggested by Marek.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
draw_delete_geometry_shader() seems to be the real one.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
util_pack_z_stencil was being unconditionally invoked for all formats,
causing an assertion failure for Z32_FLOAT_S8X24_UINT.
NOTE: Candidate for the stable branches.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Alpha is also 1 for formats like R32G32_FLOAT.
NOTE: Candidate for the stable branches.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
We must multiply the factor against the destination, not the source.
NOTE: Candidate for the stable branches.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
For drivers with native integer / SM4 support this is just an hindrance.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The GL_POINT_BIT state attribute GL_POINT_SPRITE_COORD_ORIGIN
is only supported on OpenGL-2.0 or later. Prevent glPopAttrib()
from trying to restore it on OpenGL-1.4 implementations which
support GL_ARB_POINT_SPRITE, as otherwise the sequence...
glPushAttrib(GL_POINT_BIT);
glPopAttrib();
throws an GL_INVALID_ENUM error in glPopAttrib().
See also commit f778174ea1
NOTE: This is a candidate for the 9.0 branch.
Signed-off-by: Mario Kleiner <mario.kleiner@tuebingen.mpg.de>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Since cf438f5375e242, we store actual integers for the attribute data.
We just need to reinterpret the GLfloat array as a GLint/GLuint array
so we can read the proper data.
Fixes oglconform's glsl-vertex-attrib/basic.VertexAttribI[1234][u]i
subtests (after fixing an unrelated bug in those test cases).
v2: Use the COPY_4V macro to be concise.
NOTE: This is a candidate for the stable branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Marek Olšák <maraeo@gmail.com> [v1]
This adds support to the softpipe texture sampler and tgsi exec.
In order to handle the extra input to the texture sampling,
I've had to expand the interfaces to take a c1 value for storing
the texture compare value for the TEX2 case.
v1.1: add comments (Brian)
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds mesa state tracker support for the new extension,
along with glsl->tgsi conversion to use the new opcodes
where appropriate.
v2: fix assert found running textureSize tests.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This just adds the texture target and capability along
with 3 new opcodes required to support this extension.
As this extension requires some texture opcodes with samp + 5 args,
we need to use another src register, this is only required
for TEX, TXL and TXB opcodes to implement this spec.
TEX2 is required for shadow cube map arrays
TXL2 is required for cube map array sampler + explicit lod
TXB2 is required for cube map array sampler + lod bias
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds all the new builtins + the new sampler types,
and hooks them up if the extension is supported.
v2: fix missing signatures for grad/lod
fix missing textureSize clarifications
fix compare vs starts with usage
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds the mesa core + texture + fbo support for the
texture cube map array extension.
v2:
add comment to _mesa_num_tex_faces related to cube map arrays (Brian)
drop wrong comment cut-n-paste (Brian)
fix / 6 maximum check issue (Kenneth)
coalsece some array case statements (Kenneth)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
While developing cube map array support I found that we didn't
support this properly, also piglit didn't test for it at all.
I've submitted a test to piglit to check for this, and this
fixes explicit lod and lod bias with cube maps.
NOTE: This is a candidate for the 9.0 branch.
Signed-off-by: Dave Airlie <airlied@redhat.com>
For cube map arrays I'll need another driver private constant
buffer, and looking forward to UBOs. So clean up with some
defines, that can be modified when adding cube map array and ubos
later.
Signed-off-by: Dave Airlie <airlied@redhat.com>
It is common for complicated shaders, particularly code-generated ones, to
have a big array of uniforms or attributes, and a prologue in the shader that
dereferences from the big array to more informatively-named local variables.
Then there will be some small control flow operation (like a ? : statement),
and then use of those informatively-named variables. We were emitting extra
MOVs in these cases, because copy propagation couldn't reach across control
flow.
Instead, implement dataflow analysis on the output of the first copy
propagation pass and re-run it to propagate those extra MOVs out.
On one future Steam release, reduces VS+FS instruction count from 42837 to
41437. No statistically significant performance difference (n=48), though, at
least at the low resolution I'm running it at.
shader-db results:
total instructions in shared programs: 722170 -> 702545 (-2.72%)
instructions in affected programs: 260618 -> 240993 (-7.53%)
Some shaders do get hurt by up to 2 instructions, because a choice to copy
propagate instead of coalesce or something like that results in a dead write
sticking around. Given that we already have instances of those instructions
in the affected programs (particularly unigine), we should just improve dead
code elimination to fix the problem.
I've no idea why there isn't a piglit that triggers this behaviour,
but while enabling TBOs for softpipe and r600g, I noticed all the
integer tests failed. I tracked it back to the TXF returning a float
when it should be returning an int. This fixed it and I haven't
seen any regressions in a full piglit run on softpipe.
http://bugs.freedesktop.org/55010
NOTE: This is a candidate for the 9.0 branch.
Signed-off-by: Dave Airlie <airlied@redhat.com>
If a frame callback is not destroyed when destroying a surface, its
handler function will be invoked if the surface was destroyed after the
callback was requested but before it was invoked, causing a write on
free:ed memory.
This can happen if eglDestroySurface() is called shortly after
eglSwapBuffers().
Note: This is a candidate for stable branches.
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
These were only used for geometry shader support back in the days before
the new GLSL compiler. Future geometry shader support will not use
these.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Fixes a uninitialized pointer read defect reported by Coverity.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
The new code-generated version of _mesa_create_exec_table() populates
the entire dispatch table (except for dynamic functions) by itself; it
no longer calls separate functions to initialize parts of the dispatch
table. This patch removes those no-longer-needed functions.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This patch adjusts makefiles to cause src/mesa/main/api_exec.c to be
generated using src/mapi/glapi/gen/gl_genexec.py. There should be no
functional change.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This script generates the file api_exec.c, which contains just the
function _mesa_create_exec_table(), based on the XML files in
src/mapi/glapi/gen.
The following XML attributes, in particular, are used:
- "es1" indicates functions that should be available in ES1 contexts.
- "es2" indicates functions that should be available in ES2/ES3
contexts.
- "exec" indicates which Mesa function should be dispatched to. E.g.
if the GL function is glFoo(), then:
- exec="mesa" (the default) dispatches to _mesa_Foo().
- exec="check" dispatches to _check_Foo().
- exec="es" dispatches to _es_Foo().
- exec="loopback" dispatches to loopback_Foo().
- exec="skip" or exec="dynamic" causes this function to be skipped;
either it is not yet supported ("skip"), or its dispatch table
entry will be dynamically populated based on GL state ("dynamic").
- "desktop" indicates functions that should be available in desktop GL
(non-ES) contexts.
- "deprecated" indicates functions that should not be available in
core contexts.
- "mesa_name" indicates functions whose implementation in Mesa has a
different suffix than the corresponding GL function name.
The generated code looks roughly like this (showing just a single
statement in each block for brevity):
struct _glapi_table *
_mesa_create_exec_table(struct gl_context *ctx)
{
struct _glapi_table *exec;
exec = _mesa_alloc_dispatch_table(_gloffset_COUNT);
if (exec == NULL)
return NULL;
if (_mesa_is_desktop_gl(ctx)) {
SET_ActiveProgramEXT(exec, _mesa_ActiveProgramEXT);
/* other functions not shown */
}
if (_mesa_is_desktop_gl(ctx) || _mesa_is_gles3(ctx)) {
SET_BeginQueryARB(exec, _mesa_BeginQueryARB);
/* other functions not shown */
}
if (_mesa_is_desktop_gl(ctx) || ctx->API == API_OPENGLES) {
SET_GetPointerv(exec, _mesa_GetPointerv);
/* other functions not shown */
}
if (_mesa_is_desktop_gl(ctx) || ctx->API == API_OPENGLES || ctx->API == API_OPENGLES2) {
SET_ActiveTextureARB(exec, _mesa_ActiveTextureARB);
/* other functions not shown */
}
if (_mesa_is_desktop_gl(ctx) || ctx->API == API_OPENGLES2) {
SET_AttachShader(exec, _mesa_AttachShader);
/* other functions not shown */
}
if (ctx->API == API_OPENGL) {
SET_Accum(exec, _mesa_Accum);
/* other functions not shown */
}
if (ctx->API == API_OPENGL || ctx->API == API_OPENGLES) {
SET_AlphaFunc(exec, _mesa_AlphaFunc);
/* other functions not shown */
}
if (ctx->API == API_OPENGLES) {
SET_AlphaFuncxOES(exec, _es_AlphaFuncx);
/* other functions not shown */
}
return exec;
}
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This patch updates gl_XML.py to parse the new XML attributes "exec",
"desktop", "deprecated", and "mesa_name", which will be needed to code
generate _mesa_create_exec_table().
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
gl_XML.py's gl_function class keeps track of an entry_point_api_map
property that tracks, for each set of aliased functions, which ES1 or
ES2 version the given function name first appeared in.
This patch aggregates that information together across aliased
functions, into an easier-to-use api_map property.
Future patches will use this information when code generating
_mesa_create_exec_table(), to determine which set of dispatch table
entries should be populated based on the API.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Some of the functions that we store in the dispatch table are declared
as non-static in their .c files and are inserted into the dispatch
table directly by _mesa_create_exec_table(). Other functions are
declared as static, and are inserted into the dispatch table by a
dedicated function that lives in the same .c file
(e.g. _mesa_loopback_init_api_table() in api_loopback.c).
This patch makes all of these functions non-static, and creates
appropriate prototypes for them, so that in future patches we can
populate the entire dispatch table using a single code-generated
function.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
When the XML lists one or more GL api functions as aliases for another
GL function, the mesa function that implements the functionality is
usually named after the canonical version of the function (the one
that is the target of the aliases). For example, FogCoordd is listed
as an alias of FogCoorddEXT, and the Mesa function implementing the
functionality is called loopback_FogCoorddEXT.
However, there are exceptions. For example, Enablei is listed as an
alias of EnableIndexedEXT, but the Mesa function implementing the
functionality is called _mesa_EnableIndexed.
To account for these anomalies, this patch annotates the XML with
"mesa_name" attributes, which describe how to adjust the function name
to find the corresponding Mesa function.
For example:
<function name="EnableIndexedEXT" mesa_name="-EXT">...</function>
<function name="IsProgramNV" mesa_name="-NV+ARB">...</function>
means that EnableIndexedEXT is implemented by a Mesa function called
_mesa_EnableIndexed, and IsProgramNV is implemented by a Mesa function
called _mesa_IsProgramARB.
Future patches will use this annotation when code generating
_mesa_create_exec_table(), to determine the name of the Mesa function
that should be stored in each dispatch table entry.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Future patches will use this annotation when code generating
_mesa_create_exec_table(), to determine which functions should be
skipped when the API is desktop GL.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Future patches will use this annotation when code generating
_mesa_create_exec_table(), to determine which functions should be
dispatched to ES-specific implementations. exec="es" indicates that
the ES-specific implementation has a name beginning with "_es_"
(e.g. _es_QueryMatrixxOES), and exec="check" indicates that the
ES-specific implementation has a name beginning with "_check_"
(e.g. _check_GetTexGenxvOES).
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Future patches will use this annotation when code generating
_mesa_create_exec_table(), to determine which functions should be
dispatched to functions in api_loopback.c.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Future patches will use this annotation when code generating
_mesa_create_exec_table(), to determine which functions should be
skipped because Mesa dispatches them differently depending on GL
state.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Future patches will use this annotation when code generating
_mesa_create_exec_table(), to determine which functions should be
skipped because they aren't implemented by Mesa.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Future patches will use this annotation when code generating
_mesa_create_exec_table(), to determine which functions should be
skipped in core contexts.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We were already doing this for some GLX extensions, but not others.
This patch makes our use of window_system="glX" consistent.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This patch standardizes the category names used in the glapi XML files
to begin each extension name with the prefix "GL_" or "GLX_". There
is no functional change, because these category names are not used in
the generated code.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This allows the GLES1.1 dispatch sanity test to be run on all builds,
even builds that do not include GLES1 support.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
fragprog_inputs_read is a 12-bit bitfield so check the assigned value.
MSVC warns on the assignment. Not easy to fix but let's do a sanity check.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The decompression is done in-place and only the compressed tiles are
decompressed. Note: R6xx-R7xx can do that only with Z16 and Z32F.
The texture unit is programmed to use non-displayable tiling and depth
ordering of samples, so that it can fetch the texture in the native DB format.
The latest version of the libdrm surface allocator is required for stencil
texturing to work. The old one didn't create the mipmap tree correctly.
We need a separate mipmap tree for stencil, because the stencil mipmap
offsets are not really depth offsets/4.
There are still some known bugs, but this should save some memory and it also
improves performance a little bit in Lightsmark (especially with low
resolutions; tested with Radeon HD 5000).
The DB->CB copy is still used for transfers.
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
The functions were broken, because they converted ints to floats.
Now we can finally advertise OpenGL 3.0. ;)
In this commit, the vbo module also tracks the type for each attrib
in addition to the size. It can be one of FLOAT, INT, UNSIGNED_INT.
The little ugliness is the vertex attribs are declared as floats even though
there may be integer values. The code just copies integer values into them
without any conversion.
This implementation passes the glVertexAttribI piglit test which I am going
to commit in piglit soon. The test covers vertex arrays, immediate mode and
display lists.
NOTE: This is a candidate for the stable branches.
Reviewed-by: Brian Paul <brianp@vmware.com>
v2: cosmetic changes as suggested by Brian
Integer textures generate invalid operation in glGenerateMipmap.
So, the code related to integer textures is now redundant.
Note: This is a candidate for stable branches.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Khronos has reached a conclusion and disallowed following texture formats in
glGenerateMipMap():
(a) ASTC textures
(b) integer internal formats (e.g., RGBA8UI, RG16I)
(c) textures with stencil formats (e.g., STENCIL_INDEX8)
(d) textures with packed depth/stencil formats (e.g, DEPTH24_STENCIL8)
https://cvs.khronos.org/bugzilla/show_bug.cgi?id=9471
Note: This is a candidate for stable branches.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This is part of fixing gl-3.1/genned-names.
v2: Fix a missing return value.
NOTE: This is a candidate for the 9.0 branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
It's usually forced to 1 by the surface format, but sometimes we actually have
alpha present because it's the only format available.
Fixes piglit texwrap bordercolor tests for OpenGL 1.1, GL_EXT_texture_sRGB and
GL_ARB_texture_float.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If the index buffer is full of values like "0 1 2 3", but basevertex is 4, we
need to upload at least vertex data for elements 4 5 6 7. Whether we also
upload 0 1 2 3 is a question of whether there are VBOs present or not -- see
the code setting start_vertex_bias in brw_draw_upload.c.
Fixes piglit draw-elements*base-vertex user_varrays
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Otherwise, if we had a set of prims passed in with a num_instances varying
between them, we wouldn't upload enough (or too much!) from user vertex
arrays.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The brw_draw_upload.c start_vertex_bias code has support for doing the rebase
without rewriting the index buffer by applying a basevertex. It looks like
vbo_rebase_prims() is not equipped to handle basevertex.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We haven't been only tracking raw GRF-GRF moves since the constant propagation
merge, and also the extension for source modifiers and uniforms.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Given that we handle similarly-regioned GRFs registers for our copy
propagation from our UNIFORM file, there's no reason not to allow it.
The shader-db impact is negligible -- +90 instructions total, 2 shaders helped
and 7 hurt (slightly increased register pressure increased spilling), but this
is to prevent regression in other shaders when fixing copy_propagation to
reduce register pressure in the shaders that are hurt here.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If we put the register coalescing in between the two, then we end up with code
sequences involving dead writes that the dead code elimination doesn't know
how to remove. In place of making dead code elimination smart (which we
should do, too), make it less important for the moment.
shader-db results:
total instructions in shared programs: 722240 -> 721275 (-0.13%)
instructions in affected programs: 50573 -> 49608 (-1.91%)
(no shaders regressed).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
During code generation, we create tons of temporary variables, many of
which get immediately killed and are never used. Later optimization and
analysis passes, such as compute_live_intervals, loop over all the
virtual GRFs. By compacting them, we can save a lot of overhead.
Reduces compilation time in L4D2's largest fragment shader from 10.2
seconds to 5.2 seconds (50%). Drops compute_live_variables() from
10-12% of another game's startup time to 8%.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
The function list was generated from glcorearb.h for GL 4.3.
Note that many GL 4.X functions are commented out, and indicate
that they need to be added to Mesa's XML.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
We also no longer call _swrast_CreateContext, _tnl_CreateContext
or _swsetup_CreateContext when creating the context.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
If a GL function was introduced in a later GL version than the
context we are testing, then it is okay if it is set to the
_mesa_generic_nop function.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
This will be used by GL CORE contexts to differentiate functions that
can be set to nop from functions that are required for a particular
context version.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
This function can be re-added with an actual implementation
when ARB_geometry_shader4 is supported.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
This function can be re-added with an actual implementation
when ARB_geometry_shader4 is supported.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
ProgramParameteri will be required for ARB_geometry_shader4
or GLES3. Don't enable this function until either of those
is supported.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
These functions are part in GL 4.3. Moving this will allow
ProgramParameteriARB to alias ProgramParameteri.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
These EXT_separate_shader_objects function will no longer be
enabled for CORE profiles:
* UseShaderProgramEXT
* ActiveProgramEXT
* CreateShaderProgramEXT
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
This fixes the Android build after the move of builtin_stubs.cpp into
the builtin_compiler subdirectory. This patch is untested.
Signed-off-by: Thierry Reding <thierry.reding@avionic-design.de>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Note this by itself is not enough to fix scons build -- it will fail
until you remove:
rm -rf build/*/glsl/builtin_compiler
because that node was a filei before, but it will be now a directory.
This also means that bisecting across this change will require wiping
the build directory..
The builtin_compiler binary is used during the build process to generate
code for the builtin GLSL functions. Since this binary needs to be run
on the build host, it must not be cross-compiled.
This patch fixes the build system to compile a second version of the
source files and the builtin_compiler binary itself for the build
system. It does so by defining the CC_FOR_BUILD and CXX_FOR_BUILD
variables, which are searched for by the configure script and point to
the location of native C and C++ compilers.
In order for this to work properly, builtin_function.cpp is removed
from BUILT_SOURCES, otherwise the build system would try to generate it
before having had a chance to descend into the builtin_compiler
subdirectory. With the builtin_compiler and glsl_compiler now being
generated at different stages, the build instructions for glsl_compiler
can be simplified a bit.
Signed-off-by: Thierry Reding <thierry.reding@avionic-design.de>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Variable indexing of non-uniform arrays only exists in GLSL. Likewise,
OPCODE_CAL/OPCODE_RET only existed to try and support GLSL's function
calls. We don't use Mesa IR for GLSL, and these features are explicitly
disallowed by ARB_vertex_program/ARB_fragment_program and never
generated by ffvertex_prog.c.
Since they'll never happen, there's no need to check for them, which
saves us from walking through all the Mesa IR instructions.
Reviewed-by: Eric Anholt <eric@anholt.net>
Rather than having two separate backends, just create a small layer that
translates the subset of Mesa IR used for ARB_vertex_program and fixed
function programs to the Vec4 IR. This allows us to use the same
optimization passes, code generator, register allocator as for GLSL.
v2: Incorporate Eric's review comments.
- Fix use of uninitialized src_swiz[] values in the SWIZZLE_ZERO/ONE
case: just initialize it to 0 (.x) since the value doesn't matter
(those channels get writemasked out anyway).
- Properly reswizzle source register's swizzles, rather than overwriting
the swizzle.
- Port the old brw_vs_emit code for computing .x of the EXP2 opcode.
- Update comments, removing mention of NV_vertex_program, etc.
- Delete remaining #warning lines and debug comments.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
v2: Properly use "conditionalmod" pre-Gen6, rather than the incorrectly
copy-and-pasted "BRW_CONDITIONAL_G".
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This will become necessary once we start supporting ARB programs and
fixed function in this backend.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This patch removes the generated files api_exec_es1.c,
api_exec_es1_dispatch.h, and api_exec_es1_remap_helper.h (and the
source files and build rules used to generate them), since they are no
longer used. GLES1 now uses the same dispatch table layout as all the
other APIs.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch modifies context creation code for GLES1 to use
_mesa_create_exec_table() (which is used for all other APIs) instead
of the GLES1-specific _mesa_create_exec_table_es1().
There is a slight change in functionality. As a result of a mistake
in the code generation of _mesa_create_exec_table_es1(), it does not
include glFlushMappedBufferRangeEXT or glMapBufferRangeEXT (this is
because when support for those two functions was added in commit
762d9ac, src/mesa/main/APIspec.xml wasn't updated). With this patch,
glFlushMappedBufferRangeEXT and glMapBufferRangeEXT are properly
included in the dispatch table. Accordingly, dispatch_sanity.cpp is
modified to expect these two functions to be present.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Leave GLES1.1 dispatch sanity test disabled when not building
GLES1 support.
Currently, _mesa_create_exec_table() (in api_exec.c) is used for all
APIs except GLES1. In GLES1, _mesa_create_exec_table_es1() (a code
generated function) is used instead.
In principle, this shouldn't be necessary. It should be possible for
api_exec.c to contain the logic for populating the dispatch table for
all API's.
This patch paves the way for using _mesa_create_exec_table() instead
of _mesa_create_exec_table_es1(), by making _mesa_create_exec_table()
(and the functions it calls) expose the correct subset of desktop GL
functions for GLES1.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch creates a header querymatrix.h, to allow functions defined
in querymatrix.c to be used from other .c files. It also switches
from the nonstandard GL_APIENTRY to GLAPIENTRY.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Don't declare _mesa_Get{Integer,Float}v in querymatrix.c.
Instead, just include main/get.h.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
This patch adds the usual boilerplate (copyright notice and guards
against redundant inclusion) to es1_conversion.h. It also moves the
definition of GL_APIENTRY from es1_conversion.c.
This allows es1_conversion.h to be safely included from other .c files.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Use copyright notice from src/mesa/main/es_generator.py (the
script that used to generate this file).
Previously dispatch table-related code was generated from gl_API.xml,
so it did not include slots for GLES1-only functions (such as those
taking fixed-point arguments).
This patch generates dispatch table-related code from
gl_and_es_API.xml, so that GLES1-only functions are included. This
paves the way for future patches that will unify the GLES1 dispatch
table with the dispatch tables for the other APIs.
The following generated files are affected:
- glapi_x86.S
- glapi_x86-64.S
- glapi_sparc.S
- glprocs.h
- glapitemp.h
- glapitable.h
- glapi_gentable.c
- dispatch.h
- remap_helper.h
Since this change affects makefiles, a full rebuild is required.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Adjust dependencies to ensure that generated files will be rebuilt
whenever any ES-related XML source files are changed.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Previously, when code-generating aliased functions in glapitemp.h, we
weren't consistent about which function alias we used to obtain the
parameter names, with the risk that we would generate incorrect code
like this:
KEYWORD1 void KEYWORD2 NAME(Foo)(GLint x)
{
(void) x;
DISPATCH(Foo, (x), (F, "glFoo(%d);\n", x));
}
KEYWORD1 void KEYWORD2 NAME(FooEXT)(GLint y)
{
(void) x;
DISPATCH(Foo, (x), (F, "glFooEXT(%d);\n", x));
}
At the moment there are no aliased functions with mismatched parameter
names, so this isn't the problem. But when we introduce GLES1
functions into the dispatch table, there will be
(MapBufferRange/MapBufferRangeEXT). This patch paves the way for that
by fixing the code generation script to handle the mismatch correctly.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
This ensures that GLES1-only typedefs are available in these files.
In a future patch, this will allow us to expand the dispatch table to
include GLES1-only functions.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
In commits bad96f6 and e7dd2e5 I added the following aliases:
- ClampColor -> ClampColorARB
- VertexAttribDivisor -> VertexAttribDivisorARB
But I neglected to update check_table.cpp, causing "make check" to
fail for non-shared-glapi builds.
This patch removes the functions that are now aliased from
check_table.cpp, so that "make check" works correctly again.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
When copy_array_to_vbo_array encountered an array with src_stride == 0
and dst_stride != 0, we would replicate out the single element to the
whole size (max - min + 1). This is unnecessary: we can simply upload
one copy and set the buffer's stride to 0.
Decreases vertex upload overhead in an upcoming Steam for Linux title.
Prior to this patch, copy_array_to_vbo_array appeared very high in the
profile (Eric quoted 20%). After the patch, it disappeared completely.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
This essentially reverts the following:
commit c625aa19cb
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Fri Feb 18 10:37:43 2011 +0000
intel: extend current vertex buffers
While working on optimizing an upcoming Steam title, I broke this code.
Eric expressed his doubts about this optimization, and noted that the
original commit offered no performance data.
I ran before and after benchmarks on Xonotic and Citybench, and found
that this code made no difference. So, remove it to reduce complexity
and make future work simpler.
Reviewed-by: Eric Anholt <eric@anholt.net>
The problem was we set VRAM|GTT for relocations of STATIC resources.
Setting just VRAM increases the framerate 4 times on my machine.
I rewrote the switch statement and adjusted the domains for window
framebuffers too.
NOTE: This is a candidate for the stable branches.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
By removing the array size, the static assertion to check for missing
elements can do its job properly. This will catch cases where a new
Mesa format is added but the swrast texfetch code isn't updated.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
On r6xx/r7xx shader resource management need to make sure that the
shader does not goes over the gpr register limit. Each specific
asic has a maxmimum register that can be split btw shader stage.
For each stage the shader must not use more register than the
limit programmed.
v2: Print an error message when discarding draw. Don't add another
boolean to context structure, but rather propagate the discard
boolean through the call chain.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
This is a regression since b3921e1f53.
The array stores VS outputs, not FS inputs.
Now llvmpipe can do 32 varyings too.
NOTE: This is a candidate for the stable branches.
Reviewed-by: Brian Paul <brianp@vmware.com>
For Intel, expose it only if gen >= 4.
For Gallium, expose it only if PIPE_CAP_SM3 is advertised.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
v2: update relnotes-9.1
v3: use align_malloc and align_free for malloced buffers in r300g
v4: document the new CAP in the docs
This allows updating only a subrange of buffer bindings.
set_vertex_buffers(pipe, start_slot, count, NULL) unbinds buffers in that
range. Binding NULL resources unbinds buffers too (both buffer and user_buffer
must be NULL).
The meta ops are adapted to only save, change, and restore the single slot
they use. The cso_context can save and restore only one vertex buffer slot.
The clients can query which one it is using cso_get_aux_vertex_buffer_slot.
It's currently set to 0. (the Draw module breaks if it's set to non-zero)
It should decrease the CPU overhead when using a lot of meta ops, but
the drivers must be able to treat each vertex buffer slot as a separate
state (only r600g does so at the moment).
I can imagine this also being useful for optimizing some OpenGL use cases.
Reviewed-by: Brian Paul <brianp@vmware.com>
It was defined as an empty function since Nov 2010 and was ultimately
removed completely.
See xserver commit 1cb0261
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Fixes build error on Cygwin and Solaris. _R, _G, and _B are used in
ctype.h on those platforms.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
With the explit NUM_TEXTURE_TARGETS array size, the assertion that
Elements(targets) == NUM_TEXTURE_TARGETS would pass even if elements
were missing.
Reviewed-by: Eric Anholt <eric@anholt.net>
Patch adds additional singlesample config with 565 color buffer,
24 bit depth and 8 bit stencil buffer. This makes Quadrant benchmark
work on Android. Tested with Sandybridge and Ivybridge machines.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
This is instead of the pair of GLenums for format and type that were
previously used. This is necessary for the Intel drivers to expose sRGB
framebuffer formats.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
There is no gl_format in Mesa that corresponds to this arrangement, so I
have a very hard time believing that this works.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, if the server didn't send a GLX_FRAMEBUFFER_SRGB_CAPABLE_EXT
tag, it would still be set to GLX_DONT_CARE (which is -1). Set it to
GL_FALSE instead.
NOTE: This is a candidate for stable release branches.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: Maciej Wieczorek <maciej.t.wieczorek@intel.com>
This fixes an issue where glsl_to_tgsi_visior::get_opcode() would emit the
wrong opcode because the register type was GLSL_TYPE_ARRAY/STRUCT instead of
GLSL_TYPE_FLOAT/INT/UINT/BOOL, so the function would use the float opcodes for
operations on integer or boolean values dereferenced from an array or
structure. Assertions have been added to get_opcode() to prevent this bug
from reappearing in the future.
NOTE: This is a candidate for the stable branches.
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Andreas Boll <andreas.boll.dev@gmail.com>
The 2x and 4x MSAA cases are completely broken. The lfdptr instruction returns
garbage there.
The 8x MSAA case is broken on Cayman, though at least the result looks somewhat
correct.
Only the 8x MSAA case works on Evergreen and is enabled.
llvm-3.2svn r166772 no longer requires RTTI for lib/Support.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
LLVM 3.1+ haven't more "extern unsigned llvm::StackAlignmentOverride"
and friends for configuring code generation options, like stack
alignment.
So I restrict assiging of lvm::StackAlignmentOverride and other
variables to LLVM 3.0 only, and wrote similiar code using
TargetOptions.
This patch fix segfaulting of WINE using llvmpipe built with LLVM 3.1
Signed-off-by: Alexander V. Nikolaev <avn@daemon.hole.ru>
Signed-off-by: José Fonseca <jose.r.fonseca@gmail.com>
This is a leftover from when we had to split those two functions due to
the separate BO validation step.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
"Active" is an already-used term for the query being between
glBeginQuery() and glEndQuery(), while this is tracking whether the
start of the packet pair for emitting state has been inserted into the
current batchbuffer.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Put the back face colour right after the front face colour in the LDS parameter
space.
Fixes 18 piglit tests related to two sided lighting.
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
It's required. The CP uses this to properly allocate new
contexts. Also do a CS partial flush since we are updating
CONFIG regs which are single state.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Use printf instead of debug_printf to be consistent with print
statements in rest of unit tests.
This also fixes the lack of print output with the MinGW build of
u_format_compatible_test.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Global initializers using the ?: operator with at least one non-constant
operand generate ir_if statements. For example,
float foo = some_boolean ? 0.0 : 1.0;
becomes:
(declare (temporary) float conditional_tmp)
(if (var_ref some_boolean)
((assign (x) (var_ref conditional_tmp) (constant float (0.0))))
((assign (x) (var_ref conditional_tmp) (constant float (1.0)))))
This pattern is necessary because the second or third arguments could be
function calls, which create statements (not expressions).
The linker moves these global initializers into the main() function.
However, it incorrectly had an assertion that global initializer
statements were only assignments, calls, or temporary variable
declarations. As demonstrated above, they can be if statements too.
Other than the assertion, everything works fine. So remove it.
Fixes new Piglit test condition-08.vert, as well as an upcoming
game that will be released on Steam.
NOTE: This is a candidate for stable release branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Consider the following code, which reinterprets a register as a
different type:
mov(8) g6<1>F g1.4<0,4,1>.xF
and(8) g5<1>.xUD g6<4,4,1>.xUD 0x7fffffffUD
Copy propagation would notice that we can replace the use of g6 with
g1.4 and eliminate the MOV. Unfortunately, it failed to preserve the UD
type, incorrectly generating:
and(8) g5<1>.xUD g6<4,4,1>.xF 0x7fffffffUD
Found while debugging Ian's uncommitted ARB_vertex_program LOG opcode
test with my new Mesa IR -> Vec4 IR translator.
NOTE: This is a candidate for stable release branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Consider the following code sequence:
mul(8) g4<1>F g1<0,4,1>.wzwwF g3<4,4,1>.wzwwF
mov.sat(8) m1<1>.xyF g4<4,4,1>F
mul(8) g4<1>F g1<0,4,1>.xxyxF g3<4,4,1>.xxyxF
mov.sat(8) m1<1>.zwF g4<4,4,1>F
The compute-to-MRF pass will discover the first mov.sat and attempt to
replace it by rewriting earlier instructions. Everything works out,
so it replaces scan_inst's destination file, reg, and reg_offset,
resulting in:
mul(8) m1<1>F g1<0,4,1>.wzwwF g3<4,4,1>.wzwwF
mul(8) g4<1>F g1<0,4,1>.xxyxF g3<4,4,1>.xxyxF
mov.sat(8) m1<1>.zwF g4<4,4,1>F
Unfortunately, it loses the .xy writemask on the mov.sat's MRF
destination. While this doesn't pose an immediate problem, it then
proceeds to transform the second mov.sat, resulting in:
mul(8) m1<1>F g1<0,4,1>.wzwwF g3<4,4,1>.wzwwF
mul(8) m1<1>F g1<0,4,1>.xxyxF g3<4,4,1>.xxyxF
Instead of writing both halves of the vector (like the original code),
it overwrites the full vector both times, clobbering the desired .xy
values.
When encountering a MOV, the compute-to-MRF code scans for instructions
which generate channels of the MOV source. It ensures that all
necessary channels are available (possibly written by several
instructions). In this case, *more* channels are available than
necessary, so we want to take the subset that's actually used.
Taking the bitwise and of both writemasks should accomplish that.
This was discovered by analyzing an ARB_vertex_program test
(glean/vertProg1/MUL test (with swizzle and masking)) with my new
Mesa IR -> Vec4 IR translator code. However, it should be possible
with GLSL programs as well.
NOTE: This is a candidate for stable release branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, we used lookahead patterns to differentiate:
#define FOO(x) function macro
#define FOO (x) object macro
Unfortunately, our rule for function macros:
{HASH}define{HSPACE}+/{IDENTIFIER}"("
relies on infinite lookahead, and apparently triggers a Flex bug where
the generated code overflows a state buffer (see YY_STATE_BUF_SIZE).
There's no need to use infinite lookahead. We can simply change state,
match the identifier, and use a single character lookahead for the '('.
This apparently makes Flex not generate the giant state array, which
avoids the buffer overflow, and should be more efficient anyway.
Fixes piglit test 17000-consecutive-chars-identifier.frag.
NOTE: This is a candidate for every release branch ever.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Carl Worth <cworth@cworth.org>
While copying the values into the batch space, we advance the param
pointer. The debug code then tries to iterate over all the uploaded
values, starting at param...which is now the end of the uploaded data,
rather than the start.
This patch saves a pointer to the start of push constant space before
it gets altered and switches the debug code to use that.
Tested by uncommenting the code and examining the output of
glsl-vs-clamp-1.shader_test. Previously all values appeared to be zero.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Since ES3.0 is backward compatible with 2.0, we check that all the 2.0
functions and additional 3.0 functions exist.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Previously we just printed the dispatch table index and the user had
to convert it to a function name. That was a pain because when
FEATURE_remap_table is defined, the assignment of functions to
dispatch table entries is done at run time.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Previously this function was only implemented for non-shared-glapi
builds. Since the function is only intended for debugging purposes we
use a simple O(n) algorithm.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
When specifying per-target CFLAGS (e.g., ralloc_test_CFLAGS) AM_CFLAGS
are not used. AM_CPPFLAGS should be used for includes anyway.
Fixes a build problem since 41b14d125:
CC ralloc_test-ralloc.o
In file included from ../../../src/glsl/ralloc.c:42:0:
../../../src/glsl/ralloc.h:57:27: fatal error: main/compiler.h: No such file or directory
Acked-by: Paul Berry <stereotype441@gmail.com>
Catches problems such as (in the gles3 branch)
glcpp-parse.y: In function '_glcpp_parser_handle_version_declaration':
glcpp-parse.y:1990:39: warning: format '%lli' expects argument of type
'long long int', but argument 4 has type 'int' [-Wformat]
As a side-effect, remove ralloc.c's likely/unlikely macros and just use
the ones from main/compiler.h.
NOTE: This is a candidate for the release branches.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Fixes the problem where configure from the tarball would report missing
files:
$ ./configure
configure: error: cannot find install-sh, install.sh, or shtool in bin
NOTE: This is a candidate for the 9.0 branch.
4bits and 3bits quantitization values differ significantly for
values other than 0 and 1.
Fixes piglit draw-pixels for softpipe/llvmpipe.
NOTE: Probably a candidate for stable branches.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
This fixes an issue where glsl_to_tgsi_visior::get_opcode() would emit the
wrong opcode because the register type was GLSL_TYPE_ARRAY/STRUCT instead of
GLSL_TYPE_FLOAT/INT/UINT/BOOL, so the function would use the float opcodes for
operations on integer or boolean values dereferenced from an array or
structure. Assertions have been added to get_opcode() to prevent this bug
from reappearing in the future.
This silences a zillion GCC warnings like:
../../../src/mesa/main/pack.c: In function '_mesa_pack_rgba_span_from_uints':
../../../src/mesa/main/pack.c:560:13: warning: comparison of unsigned expression < 0 is always false [-Wtype-limits]
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The layer dimension of array textures is not subject to mipmap minification.
OTOH we were missing an assertion for the depth dimension.
Fixes assertion failures with piglit {f,v}s-textureSize-sampler1DArrayShadow.
For some reason, they only resulted in piglit 'warn' results for me, not
failures.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=56211
NOTE: This is a candidate for the stable branches.
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Andreas Boll <andreas.boll.dev@gmail.com>
cuts down the while loop iterations from 4600 to 380 commits at the
moment
NOTE: This is a candidate for the stable branches.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This function is only useful for the ARB_{vertex,fragment}_program
extensions, which we don't expose in core contexts.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
glGetPointerv was de-deprecated in GL 4.3, because GL 4.3 adds
functionality from KHR_debug and ARB_debug_output, which require
glGetPointerv.
This patch modifies _mesa_create_exec_table() to populate
glGetPointerv in the dispatch table for core contexts.
Technically this is not in compliance with the spec--what we really
ought to do for core contexts is expose glGetPointerv only when a GL
4.3 context is in use or one of the two extensions is present.
However, it seems silly to go to that extra work, since the only
client-visible effect would be for glGetPointerv to raise an
INVALID_OPERATION error instead of an INVALID_ENUM error. Besides,
the other functions set up by _mesa_create_exec_table() only depend on
the API in use, not on the GL version or extensions supported.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
There's no reason to have separate slots in the dispatch table for
these two functions, since they are synonymous.
Note: previous to this patch, we never populated the dispatch table
slot for VertexAttribDivisor, which was ok, since it is not required
until 3.3. After this patch, both functions will be usable provided
that the ARB_instanced_arrays extension is present.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
There's no reason to have separate slots in the dispatch table for
these two functions, since they are synonymous.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
With the previous two commits, this fixes piglit
GL_ARB_occlusion_query2/api.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
There's a similar test below, but it's not the same: that one checks whether
this query object is already active (potentially on another target).
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We should use the later since we're freeing the memory with free(),
not the gallium FREE() macro.
This fixes a mismatch when using the gallium debug memory functions.
NOTE: This is a candidate for the 9.0 branch.
We need to create bos suitable for cursor usage that we can map and
write data into. The kms dumb ioctls is all we need for this, so drop
the dependency on libkms.
Given the usecase we have of trying to measure timestamps across individual
draw calls, flushing will totally mess up what people are trying to measure.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The theory I had when I wrote the code was that you wanted to minimize latency
on your queries because the app was going to ask soon. Only, it turns out
that everybody batches up their queries and asks for the results later (often
after the next SwapBuffers!), so this was a pessimization.
Until now, I had no workload where it mattered enough to benchmark. Recently
I started playing some Minecraft, which uses tons of queries to decide whether
to render chunks of the terrain. For that app, avoiding the flush in the
query-generation loop improves performance 22.7% +/- 4.7% (n=3) on an apitrace
capture of it (confirmed in game by watching the fps meter found by pressing
F3, 15/16 -> 20/21 fps).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
otherwise some compilers will throw error
"error: format not a string literal and no format arguments"
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
If GL_BASE_LEVEL==0 and GL_MAX_LEVEL==0 that's a pretty good hint that
there'll be a single mipmap level in the texture.
Google Earth sets the texture's state this way before the first glTexImage
call. This saves a bit of texture memory.
Fixes piglit tests "unpack-teximage2d --pbo=* --format=GL_BGRA" on
Sandybridge+.
The fastpath was checking an incomplete set of pixel unpack state. This
patch adds checks for all the fields of gl_pixelstore_attrib that affect
2D texture uploads. Also, it begins permitting the case where
GL_UNPACK_ROW_LENGTH is 0.
Ideally, we would just ask a unicorn to JIT this fastpath for us in
a way that safely handles the unpacking state. Until then, it's safer if
only a small set of situations activate the fastpath.
v2: Use _mesa_is_bufferobj(), per Anholt.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
It doesn't provide the cross-process buffer sharing that a window system
pixmap could otherwise support and we don't have anything left that uses
this type of surface.
The 0.99.0 Wayland release changes the event API to provide a thread-safe
mechanism for receiving events specific to a subsystem (such as EGL) and
we need to use it in the EGL platform.
The Wayland protocol now also requires a commit request to make changes
take effect, issue that from eglSwapBuffers.
Now that we've replaced all the variable settings other than reg_width, it's
easy to hang on to this (the expensive part of setting up the allocator).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Improves performance of the Lightsmark penumbra shadows scene by 15.7% +/-
1.0% (n=15), by eliminating register spilling. (tested by smashing the list of
scenes to have all other scenes have 0 duration -- includes additional
rendering of scene description text that normally doesn't appear in that
scene)
v2: Allow allocation of all but g0/g1 of the payload.
v3: Pull count_to_loop_end() out to a helper function.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v2, recommended v3)
Based on split_virtual_grfs(), we choose the same set every time, so set it in
stone. This will help us avoid regenerating the somewhat expensive
class/register set setup every compile.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is derived from the FS visitor code for the same, but tracks each channel
separately (otherwise, some typical fill-a-channel-at-a-time patterns would
produce excessive live intervals across loops and cause spilling).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=48375
(crash -> failure, can turn into pass by forcing unrolling still)
These messages always have m0 = g0 and m1 = offset, and write has m2 = data.
Avoids regression in opt_compute_to_mrf() with a change to scratch writes to
set up the data as an MRF write in the IR.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Note that BRW_PREDICATE_NONE is 0 and BRW_PREDICATE_NORMAL is 1, so that's a
lot like the true/false we had in the FS before.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
fs_bblock_link -> bblock_link
fs_bblock -> bblock_t (to avoid conflicting with all the fs_bblock *bblock)
fs_cfg -> cfg_t (to avoid conflicting with all the fs_cfg *cfg)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This fixes confusion by the upcoming live variable analysis which saw e.g. use
of temp.w when only temp.xyz were initialized in the basic block, and
concluded that temp.w must have come from outside of the block (even though it
was never initialized anywhere).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Due to a string mismatch, INTEL_swap_event wasn't listed among GLX
extensions for the connection, even when present on both client and
server. That is, glXQueryServerString and glXGetClientString reported the
extension, but glXQueryExtensionsString did not.
Note: This is a candidate for the stable branches.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=56057
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Per commentary and direction in the LLVM community, support for ppc64 is
going into MCJIT rather than the old JIT. There is no existing support
in prior llvm versions, so no need to specify LLVM version numbers.
Signed-off-by: Will Schmidt <will_schmidt@vnet.ibm.com>
Signed-off-by: José Fonseca <jfonseca@vmware.com>
The GCC c99 standard on Cygwin sets __STRICT_ANSI__ and symbols such as
strdup are not available.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Note that we are missing the ARB_internalformat_query extension, which
provides the glGetInternalformativ function needed by GL ES 3.0.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
The relevant ES2 code is always in Mesa. Always building the tests
ensures that things aren't accidentally broken when people don't build
with --enable-es2.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This code is twisty, and the comment before most of the blocks was actually
giving me the opposite impression from its intention: We want to apply as much
of our offset as possible through coarse tile-aligned adjustment, since we can
do so independently per buffer, and apply the minimum we can through
fine-grained drawing offset x/y, since it has to agree between all buffers.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
There are a number of places where some obscure piece of the code is not
currently worth fixing, and we have some workaround behavior available. It's
nicer for users to do some lame workaround than to just assert, but without
asserts we never knew when the workaround was at fault.
This should give us a nice compromise: Execute the workaround, but mention
that the obscure workaround was hit.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Note: mapi_abi can consume API information from either XML or a .csv
file. A side effect of this change is that the ES1 and ES2 API
printers can only be used with XML input now. That's ok, since the
.csv input format is only used for the OpenVG API.
Tested-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously, the ES1, ES2, and shared GLAPI printers passed a list of
function names to the base class constructor, which was used by the
_override_for_api() function to loop over all the API functions and
adjust their 'hidden' and 'handcode' attributes as appropriate for the
API flavour being code-generated.
This patch lifts the loop from _override_for_api() into its caller,
and makes it into a polymorphic function, so that the derived classes
can customize its behaviour directly. In a future patch, this will
allow us to override the 'hidden' and 'handcode' attributes based on
information from the XML rather than a list of functions.
Tested-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously, _get_api_entries() would make a deep copy of each element
in the entries table before modifying the 'hidden' and 'handcode'
attributes. This was unnecessary, since the entries aren't used again
after this function. Removing the copy simplifies the code, because
it is no longer necessary to adjust the alias pointers to point to the
copied entries.
Tested-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Currently mapi_abi.py uses hardcoded lists of function names (in
gles_api.py) to determine which functions need to be included in the
GLES 1 or GLES 2 API. This patch removes a sanity check which
verified that all GLES functions listed in the hardcoded lists were
actually present in the XML.
Later patches in this series will modify mapi_abi.py to determine
which functions need to be included in the GLES 1 or GLES 2 API based
directly on the XML. Once that is done, the sanity check will be
redundant. Removing the sanity check now will simplify the patches to
come.
Tested-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Currently, the set of functions which exist in GLES1 or GLES2 is
determined by hardcoded lists of function names in gles_api.py. This
patch encodes that information into the XML files using new
attributes, es1 and es2.
The es1 attribute denotes the first version of GLES 1 in which the
function exists (e.g. es1="1.1" means the function exists in GLES 1.1
but not GLES 1.0). "none" (the default) means the function is not
available in any version of GLES 1.
The es2 attribute denotes the first version of GLES 2/3 in which the
function exists (e.g. es2="2.0" means the function exists in both GLES
2.0 and GLES 3.0). "none" (the default) means the function is not
available in any version of GLES 2 or GLES 3.
Note that since GLES 3 is a strict superset of GLES 2, there is no
need for a separate attribute for it; instead, 'es2="3.0"' should be
used to denote functions that are present in GLES 3 but not GLES 2.
This patch only adds information about GLES versions 1.0, 1.1, and
2.0.
Later patches will modify the python code generation scripts to use
this information rather than the hardcoded lists in gles_api.py.
Tested-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
An unfortunate quirk of Python 2 is that there are two types of
classes: "classic" classes (which are backward compatible with some
unfortunate design decisions made early in Python's history), and
"new-style" classes. Classic classes have a number of limitations
(for example they don't support super()) and are unavailable in Python
3. There's really no reason to use classic classes, except in
unmaintained legacy code. For more information see
http://www.python.org/download/releases/2.2.3/descrintro/.
This patch upgrades the Python code in src/mapi/glapi/gen to use
exclusively new-style classes.
Tested-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Now that ARB programs and fixed function are routed through the new
backend, shader might be NULL. Don't do INTEL_DEBUG=perf support in
that case, since it relies on shader->compiled_once.
Since INTEL_DEBUG=perf wasn't previously supported, this maintains the
status quo. It might be nice to support it someday, however.
This could be moved to brw_shader_program instead of brw_shader, but
it appears even prog can be NULL in that case.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
MaxIfDepth of 0 means "flatten all the time", not "never flatten".
This is only desirable on hardware that can't support control flow;
software rasterization and most hardware drivers want this.
This alters behavior for swrast as well as i915. Tested on i915.
NOTE: This is a candidate for stable release branches.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
The previous patch removed the producer of things in this file.
Since there aren't any, we can remove it.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
All flags are now gone, so we can stop storing and passing this around.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Nobody ever set the flag, which makes this dead code.
v2: Leave the ureg_DECL_fs_input_cyl function in place, even though it's
unused, since VMWare uses it for their internal projects.
Reviewed-by: Eric Anholt <eric@anholt.net>
GLSL doesn't use the program code anymore. Accordingly, there were no
consumers of these flags, so there's no need to define them.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
These were only part of NV_fragment_program, so we can kill them.
The fact that PROGRAM_NAMED_PARAM appears in r200_vertprog.c is rather
comedic, but also demonstrates that people just spam the various types
of parameters everywhere because they're confusing.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Without NV programs, there's no need for the compatible_program_targets
function. A simple (non-)equality check will do.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Also remove a leftover remnant from NV_vertex_program.
v2: Update for Imre's get changes.
Reviewed-by: Brian Paul <brianp@vmware.com> [v1]
Reviewed-by: Eric Anholt <eric@anholt.net> [v1]
Previously, Mesa used nvprogram.c's _mesa_GetVertexAttribPointervNV()
function to implement this GL call. There was also a second
implementation in varray.c, _mesa_GetVertexAttribPointervARB(), which
was entirely unused.
The varray.c variant has an additional assertion and checks the index
against ctx->Const.VertexProgram.MaxAttribs rather than
MAX_VERTEX_GENERIC_ATTRIBS. However, that variable is defined to the
same value, so it should be fine.
This will allow us to kill the duplicate function.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Also kill the resulting dead code for display list handling.
v2: Also kill dlist's OPCODE_REQUEST_RESIDENT_PROGRAMS_NV.
Reviewed-by: Brian Paul <brianp@vmware.com> [v1]
Reviewed-by: Eric Anholt <eric@anholt.net>
The NamedParameter functions were introduced in NV_fragment_program, and
are not shared with any other extensions.
Although this patch appears to remove the LocalParameter functions, it
does not: the ARB_fragment_program section also set them up. Now we
simply initialize them a single time.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
No hardware drivers support this, it's obsolete, and unlikely to be
useful without NV_vertex_program, which is gone now.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
dri2DrawableGetMSC(), dri2WaitForMSC() and dri2WaitForSBC() were
inadvertently changed to return 0 on success. This resulted in the callers
returning an error to the client.
Restore the previous behavior and also check that the reply pointers are
valid before accessing them.
Reviewed-by: Eric Anholt <eric@anholt.net>
Note that _mesa_GetVertexAttribPointervNV() is actually
glGetVertexAttribPointerv(), which operates on the generic attributes. The
geometry shader initialization looks like arbitrary cruft to me.
Reviewed-by: Brian Paul <brianp@vmware.com>
Note that the MAP2 getters were missing from the implementation. Neat.
v2: Rebase on top of get.c changes.
Reviewed-by: Brian Paul <brianp@vmware.com> (v1)
It wasn't supported in hardware, and the comments in the code indicated no
known uses (similar to my experience on Intel) and a possible intent to remove
it.
Reviewed-by: Brian Paul <brianp@vmware.com>
We were holding on to this code because we were aware that NWN 1 had some
support for vertex programs -- no other linux programs I've come across would
use it (since other software also has ARB_vp or GLSL support). Only, it turns
out that NWN doesn't even give us any vertex programs. Given that we have
known issues where the extension has never been fully supported, just give up
on it.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=46795
Reviewed-by: Brian Paul <brianp@vmware.com>
- stopped using util_color
- reformatted to occupy less characters per line.
- used memcpy for the border color
- used pipe_color_union in the state structure
And the clear color too, though that may be an issue only with GL_RGB if it's
actually RGBA in the driver.
NOTE: This is a candidate for the stable branches.
Reviewed-by: Brian Paul <brianp@vmware.com>
v2: The types of st_translate_color parameters were changed to gl_color_union
and pipe_color_union as per Brian's comment.
configure.ac would previously refuse to complete if libX11 wasn't
installed, even if we'd disabled GLX and weren't building an X11 EGL
platform. Make the check simply set the no_x variable that's used (but
never set) immediately below for what looks like this very case.
Signed-off-by: Daniel Stone <daniel@fooishbar.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Dan Nicholson <dbn.lists@gmail.com>
commit a010215463 removed ES2 specific dispatch
table and remap_helper, since now we are using dispatch.h which is generated
from gl_and_es_API.xml we need to generate a matching remap_helper using the
same xml.
Note: This is a candidate for the 9.0 branch.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
lp_build_rsqrt initially did not do any newton-raphson step. This meant that
precision was only ~11 bits, but this handled both input 0.0 and +infinity
correctly. It did not however handle input 1.0 accurately, and denormals
always generated infinity result.
Doing a newton-raphson step increased precision significantly (but notably
input 1.0 still doesn't give output 1.0), however this fails for inputs
0.0 and infinity (both result in NaNs).
Try to fix this up by using cmp/select but since this is all quite fishy
(and still doesn't handle denormals) disable for now. Note that even with
workarounds it should still have been faster since the fallback uses sqrt/div
(which both use the usually unpipelined and slow divider hw).
Also add some more test values to lp_test_arit and test lp_build_rcp() too while
there.
v2: based on José's feedback, avoid hacky infinity definition which doesn't
work with msvc (unfortunately using INFINITY won't cut it neither on non-c99
compilers) in lp_build_rsqrt, and while here fix up the input infinity case
too (it's disabled anyway). Only test infinity input case if we have c99,
and use float cast for calculating reference rsqrt value so we really get
what we expect.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
"get_transfer + transfer_map" becomes "transfer_map".
"transfer_unmap + transfer_destroy" becomes "transfer_unmap".
transfer_map must create and return the transfer object and transfer_unmap
must destroy it.
transfer_map is successful if the returned buffer pointer is not NULL.
If transfer_map fails, the pointer to the transfer object remains unchanged
(i.e. doesn't have to be NULL).
Acked-by: Brian Paul <brianp@vmware.com>
Only the first 'nr_cbufs' color buffers in the pipe_framebuffer_state are
valid. The rest of the color buffer pointers might be unitialized.
Fixes a regression in the piglit fbo-srgb-blit test since changes in the
gallium blitter code.
NOTE: This is a candidate for the 9.0 branch (just to be safe).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This should improve our ability to register allocate without spilling.
Unfortuantely, due to the live variable analysis being ignorant of loops, we
still have register allocation failures on some programs.
v2: Add more context to the comment explaining the function.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
Before, we'd spill one reg, then continue on without actually register
allocating, then assertion fail when we tried to use a vgrf number as a
register number.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
To validate this code, I ran piglit -t vs quick.tests with the "go spill
everything" debugging code enabled. There was only one regression:
glsl-vs-unroll-explosion simply ran out of registers. This should be
fine in the real world, since no one actually spills every single
register.
NOTE: This is a candidate for the 9.0 branch. Even if it proves to have
bugs, it's likely better than simply failing to compile.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
move_grf_array_access_to_scratch() calculates scratch buffer offsets in
bytes. However, emit_scratch_read/write() expects the base_offset
parameter to be measured in OWords.
As a result, a shader using a scratch read/write offset greater than
zero (in practice, a shader containing more than one variable in
scratch) would use too large an offset, frequently exceeding the
available scratch space.
This patch corrects the mismatch by removing spurious conversion from
OWords to bytes in move_grf_array_access_to_scratch().
This is based on a patch by Paul Berry.
NOTE: This is a candidate for stable release branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Version 12 of the EGL_KHR_create_context spec changed this behavior.
NOTE: This is a candidate for the 9.0 branch
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
This brings us into accordance with the official Python style guide
(http://www.python.org/dev/peps/pep-0008/#indentation).
To preserve the indentation of the c code that is generated by these
scripts, I've avoided re-indenting triple-quoted strings (unless those
strings appear to be docstrings).
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Should fix MSVC build, as windows.h also defines CONST.
CONST usage in get.c is not new, so probably this just appeared now due
to changes in the includes.
This got broken by:
7182a1f glapi: rename/move GL_POLYGON_OFFSET_BIAS to its extension
section
Fix it by appending the _EXT suffix to the enum in the test too.
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Oliver McFadden <oliver.mcfadden@linux.intel.com>
This will be needed by the next patch, which will switch to using
the parameter descriptor- and hash tables generated by the script.
The hash algorithm remains the same, the output parameter descriptor
table format changes slightly. There the TYPE_API_MASK entries are
removed and an invalid NULL entry is inserted at the beginning. This is
ok, as get.c:find_value() doesn't rely on TYPE_API_MASK any more to
detect an invalid enum.
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Oliver McFadden <oliver.mcfadden@linux.intel.com>
The following enums used to be extensions but later became part of the
core specification. The _EXT/_ARB versions of these are not present in
in the current XML spec files, only defined in GL/glext.h
Later we'll need to look up these in a python script using the XML spec.
As a preparation for that remove the _EXT,_ARB suffix from these enums
and rename GL_DISTANCE_ATTENUATION_EXT to GL_POINT_DISTANCE_ATTENUATION.
Naturally, all enums keep their numerical values.
Note that similar renames shouldn't be necessary in the future: in case
of a new extension the XML spec is updated with the new _EXT/_ARB etc.
name and this name is added to the enum table in get.c. Later the
extension may become part of the core spec, at which point the name w/o
the _EXT/_ARB suffix is added to the XML spec and the table in get.c
remains the same.
GL_BLEND_DST_ALPHA_EXT
GL_BLEND_DST_RGB_EXT
GL_BLEND_SRC_ALPHA_EXT
GL_BLEND_SRC_RGB_EXT
GL_COLOR_SUM_EXT
GL_COMPRESSED_TEXTURE_FORMATS_ARB
GL_CURRENT_FOG_COORDINATE_EXT
GL_CURRENT_SECONDARY_COLOR_EXT
GL_DISTANCE_ATTENUATION_EXT
GL_FOG_COORDINATE_ARRAY_EXT
GL_FOG_COORDINATE_ARRAY_STRIDE_EXT
GL_FOG_COORDINATE_ARRAY_TYPE_EXT
GL_FOG_COORDINATE_SOURCE_EXT
GL_FRAGMENT_SHADER_DERIVATIVE_HINT_ARB
GL_PACK_IMAGE_HEIGHT_EXT
GL_PACK_SKIP_IMAGES_EXT
GL_SECONDARY_COLOR_ARRAY_EXT
GL_SECONDARY_COLOR_ARRAY_SIZE_EXT
GL_SECONDARY_COLOR_ARRAY_STRIDE_EXT
GL_SECONDARY_COLOR_ARRAY_TYPE_EXT
GL_UNPACK_IMAGE_HEIGHT_EXT
GL_UNPACK_SKIP_IMAGES_EXT
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Oliver McFadden <oliver.mcfadden@linux.intel.com>
When traversing the hash table looking up an enum that is invalid we
eventually reach the first element in the descriptor array. By looking
at the type of that element, which is always TYPE_API_MASK, we know that
we can stop the search and return error. Since this element is always
the first it's enough to check for its index being 0 without looking at
its type.
Later in this patchset, when we generate the hash tables during build
time, this will allow us to remove the TYPE_API_MASK and related flags
completly.
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Oliver McFadden <oliver.mcfadden@linux.intel.com>
The glGet hash was initialized only once for a single GL API, even if
the application later created a context for a different API. This
resulted in glGet failing for otherwise valid parameters in a context
if that parameter was invalid in another context created earlier.
Fix this by using a separate hash table for each API.
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Oliver McFadden <oliver.mcfadden@linux.intel.com>
This should be named GL_POLYGON_OFFSET_BIAS_EXT and listed under the
EXT_polygon_offset section. (Solution by Ian Romanick)
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Oliver McFadden <oliver.mcfadden@linux.intel.com>
POLY_OFFSET_DB_FMT_CNTL is moved to the framebuffer state, because it only
depends on the zbuffer format.
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
This is not so trivial, because we disable blending if the dual src
blending is turned on and the number of color outputs is less than 2.
I decided to create 2 command buffers in the blend state object and just
switch between them when needed, because there are other states unrelated
to blending (like the color mask) and those shouldn't be changed
(the old code had it wrong).
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
r600_command_buffer is not an atom.
The "atoms" have evolved into state slots (or groups of state slots) where
you can bind states. There is a fixed amount of atoms (state slots)
in the context.
The command buffers are nothing like that. They represent states, not state
slots.
We could probably give r600_atom a better name someday.
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
The invalidate event support is a careful dance between driver and loader,
where both have to say they can handle it, and then the loader reports
invalidate events for the driver so the driver can do the optimization.
The EGL code doesn't report __DRIuseInvalidateExtension to the driver, so it
has no responsibility to call the driver's invalidate function, and the driver
is doing the glViewport hack because it assume. This is not
the only time invalidate would need to be called (we need it *any* time an
invalidate event comes down the pipe, but we don't watch for them), so just
stop calling the driver's function.
Acked-by: Chad Versace <chad.versace@linux.intel.com>
This behavior mostly matches glx_dri2. It's slightly complicated in
comparison because EGL exposes the implementation limits in the EGL config.
Note that platform_x11 was the only one setting swap_available, so the move of
the MaxSwapInterval into it is appropriate.
Acked-by: Chad Versace <chad.versace@linux.intel.com>
It's been in place but never enabled since 2010. Note how one piece called a
DRI2 function, suggesting never being tested.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
dri_interface.h comes from our tree, so why litter our tree with ifdefs for
older versions of it?
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
dri_interface.h comes from our tree, so why litter our tree with ifdefs for
older versions of it?
I left in the DRI_TEX_BUFFER_VERSION ifdefs, which is broken and uncompiled
(the version wasn't bumped from 2 to 3 when the patch was landed), but I don't
know what should be done with it.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
I'm going to transition a bunch of the protocol to using XCB so we can stop
rolling it ourselves.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
The EGLNative* types are all defined to be pointers across all our EGL
implementations, but in the X11 platform they're actually just XIDs (32-bit
integers).
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Commit 006c1a3c65 introduced a call to
clock_gettime, but failed to include <time.h>, breaking the build in
some cases.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Ever since df4a88ac, the check for compressed formats has been
unnecessary. And ever since cb72ec5f, the build has been broken with
FEATURE_ES. Remove it, as it does nothing.
Signed-off-by: Daniel Stone <daniel@fooishbar.org>
Signed-off-by: Marek Olšák <maraeo@gmail.com>
Use a simple chaining hash table for the ACP. This is not really very good,
because we still do a full walk of the tree per destination write, but it
still reduces fp-long-alu runtime from 5.3 to 3.9s.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This means that we don't get constant prop across into the first block after a
BRW_OPCODE_IF or a BRW_OPCODE_DO, but we have hope for properly doing it
across control flow at some point. More importantly, with the next commit it
will help avoid O(n^2) with instruction count runtime for shaders that have
many constant moves.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This makes a giant pile of code newly dead. It also fixes TXB on newer
chipsets, which has been totally broken (I now have a piglit test for that).
It passes the same set of Ian's ARB_fragment_program tests. It also improves
high-settings ETQW performance by 3.2 +/- 1.9% (n=3), thanks to better
optimization and having 8-wide along with 16-wide shaders.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=24355
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I don't know of any programs that would need more than this. The larger
programs I've seen have neared 100 instructions. This prevent excessive
runtimes of automatic tests that attempt to test up to the exposed maximums
(like fp-long-alu).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
ARB_fp doesn't go through the GLSL optimizer, and these were things you see
frequently thanks to conditionals being lowered to SLT/SGE and MUL.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This will be reused from the ARB_fp compiler. I touched up the pre-gen6 path
to not overwrite dst in the first instruction, which prevents the need for
aliasing checks (we'll need that in the ARB_fp compiler, but it actually
hasn't been needed in this codebase since the revert of the nasty old
MOV-avoidance code). I also made the conditional_mod between gen6 and
pre-gen6 consistent, which shouldn't matter except for denorm/(+/-)0
comparisons where the choice between left and right hand side of the
comparison changes.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We'll want to reuse this for ARB_fp handling.
v2: Fold the remaining bit of emit_texcoord back into visit(ir_texture).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Applications may destroy HDC at any time. So always get a HDC as needed.
Fixes lack of presents with Solidworks eDrawings when screen resolution is
changed.
Reviewed-by: Brian Paul <brianp@vmware.com>
'#extension foo: enable' is harmless. The functionality is only
actually enabled if the extension is supported. The shader won't use
the functionality if it's not supported, so we're fine.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
The diff looks weird, but this moves the code from the first 'if
(ctx->Const.GLSLVersion < 130)' block down into the second block. It
also moves some variable decalarations closer to their use.
NOTE: This is a candidate for the 9.0 branch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
When an occlusion query was active, the derived DB state wasn't changed
for u_blitter even though all the occlusion queries were suspended.
It's fixed by moving the state update into the emit functions, which are
called whenever queries are stopped or suspended.
pipe_resource can be shared between contexts, we shouldn't modify its
description. Instead, let's use the resource "views" (sampler views and
surfaces), where we can freely change almost any property of a resource.
The idea here is to not flag _NEW_VARYING_VP_INPUTS when shaders (either
GLSL or ARB vp/fp) are in use. If either TNL or TexEnv programs are
active, at least one stage is using fixed function.
On Pineview, fixes 20 Piglit, 60 oglconforms, and 7 ES 1.1 conformance
tests, as well as missing textures in Xonotic. These were all
regressions since commit fb4a34e60e.
NOTE: This is a candidate for the 9.0 branch.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=49127
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=54807
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
When using u_blitter, the state was being saved from saved_*, but we
don't use that. So after u_blitter resumed we got some corrupted
state in.
So let's just remove the saved_* stuff. I thought it was weird but
harmless, it's actually broken.
This function is only present in GLES1 and in the OpenGL compatibility
profile.
Fixes the following "make check" failure:
[----------] 1 test from DispatchSanity_test
[ RUN ] DispatchSanity_test.GLES2
Mesa warning: couldn't open libtxc_dxtn.so, software DXTn
compression/decompression unavailable
dispatch_sanity.cpp:122: Failure
Value of: table[i]
Actual: 0x4de54e
Expected: (_glapi_proc) _mesa_generic_nop
Which is: 0x41af72
i = 321
[ FAILED ] DispatchSanity_test.GLES2 (4 ms)
[----------] 1 test from DispatchSanity_test (4 ms total)
NOTE: This is a candidate for stable release branches.
Reviewed-by: Oliver McFadden <oliver.mcfadden@linux.intel.com>
Tested-by: Oliver McFadden <oliver.mcfadden@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Since we started doing fixups for different render target formats,
this has been an issue. Instead just don't do anything, when the
program gets emitted later it'll get the correct fixup.
Fixes a bunch of piglit tests.
This simply avoids some failed assertions but there's no reason to
call the driver hooks for storing a tex image if its size is zero.
Note: This is a candidate for the stable branches.
413c49141 added an optimisation to improve the performance of teximage
under a limited set of circumstances. If GL_EXT_unpack_subimage has been
used then we we must also skip this optimisation since the optimised
codepath does not take the packing values into consideration.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
I think libtool should be handling this for us, but the build fails for
Jordan because libdricommon (a static library, which uses expat) appears
before -lexpat on the linker command.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Tested-by: Jordan Justen <jordan.l.justen@intel.com>
Previously, we considered all registers as candidates for spilling.
This was counterproductive--for any registers that have already been
removed from the interference graph, there is no benefit to spilling
them, since they don't contribute to register pressure.
This patch ensures that we will only try to spill registers that are
still in the interference graph after register allocation has failed.
This is consistent with the recommendations of the paper "Retargetable
Graph-Coloring Register Allocation for Irregular Architectures", on
which our register allocator is based.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Wine or a windows app changes fpucw to 0x7f, causing doubles to be equivalent
to floats, which broke the calculation of FPS.
We should be very careful about using doubles in Mesa.
Henri Verbeet adds:
For reference, this is done by for example d3d9 when a D3D device is
created without D3DCREATE_FPU_PRESERVE set. In the general case
applications can do all kinds of terrible things to the FPU control
word of course.
[ RUN ] EnumStrings.LookUpByNumber
enum_strings.cpp:43: Failure
Value of: _mesa_lookup_enum_by_nr(everything[i].value)
Actual: "GL_COMPRESSED_RGBA_S3TC_DXT3_ANGLE"
Expected: everything[i].name
Which is: "GL_COMPRESSED_RGBA_S3TC_DXT3_EXT"
enum_strings.cpp:43: Failure
Value of: _mesa_lookup_enum_by_nr(everything[i].value)
Actual: "GL_COMPRESSED_RGBA_S3TC_DXT5_ANGLE"
Expected: everything[i].name
Which is: "GL_COMPRESSED_RGBA_S3TC_DXT5_EXT"
[ FAILED ] EnumStrings.LookUpByNumber (2 ms)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=55505
Signed-off-by: Oliver McFadden <oliver.mcfadden@linux.intel.com>
The EGL_NOK_swap_region2 spec states that the rectangles are specified
with a bottom-left origin within a surface coordinate space also with a
bottom left origin, so this patch ensures the rectangles are flipped
before passing them on to dri2_copy_region.
Fixes piglit's egl-nok-swap-region test.
Tested-by: Matt Turner <mattst88@gmail.com>
A compressed texture image size doesn't have to be a multiple of the
compressed block size (only sub-images do). Fixes issues when building
compressed mipmaps because we often wind up with non-block-size images
for the higher mipmap levels.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=55445
Note: This is a candidate for the stable branches.
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Sven Arvidsson <sa@whiz.se>
We were previously using the TGSI input index, which can exceed the number of
parameters passed from the vertex shader via the parameter cache. Now we use
a separate index which only counts those parameters.
Prevents piglit regressions with the following fix.
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Port the 'glcpp: fix abuse of yylex' commit to Android.mk
Also, since the Android.*.mk are sourced in a global namespace,
the local-y-to-c-and-h is prefixed with the LOCAL_MODULE name,
The initial fix commit is 53d46bc787
There's also a bugzilla for this: 54947
Signed-off-by: Negreanu Marius Adrian <adrian.m.negreanu@intel.com>
Reviewed-by: Oliver McFadden <oliver.mcfadden@linux.intel.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Commit 8d9778589f added all-targets to the
LLVM_COMPONENTS list, but this component does not exist with LLVM 2.8.
Adding all-targets is not necessary for any drivers, and it seems to be
left over from earlier versions of the commit mentioned above.
Tested-by: Stéphane Marchesin <marcheu@chromium.org>
The items are ordered in the item list by their offsets, with the lowest
offset coming first in the list. The old code was assuming that new
items being added to the list would always have a greater offset than
the first item in the list, however this is not always the case.
This can be used to initialize the CB* registers for buffers without a
radeon_surface.
v2:
- Get correct group_bytes value from r600_screen
- Stop setting unnecessary fields
Reviewed-by: Marek Olšák <maraeo@gmail.com>
This also fixes a lot tests, especially all the clip-and-scissor-blit MSAA
piglit tests.
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The original blit function is extended and the otAher functions reuse it.
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Fixes this build error on Cygwin.
Explicit dependency `src/glsl/builtins/tools/texture_builtins.py' not
found, needed by target
`build/cygwin-x86-debug/glsl/builtin_function.cpp'.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Often, the original shader IR isn't terribly interesting because a lot
of crucial optimizations haven't been done (such as inlining built-ins).
ir_to_mesa used to print this out for us, but since we don't use it, we
have to do it ourselves.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The anonymous namespace should keep these private classes to file scope,
preventing clashes with other symbols of the same name elsewhere.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
From SandyBridge PRM, volume 2 Part 1, section 12.2.3, BLEND_STATE:
DWord 1, Bit 30 (AlphaToOne Enable):
"If Dual Source Blending is enabled, this bit must be disabled"
Note: This is a candidate for stable branches.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Change the format to MAJOR.MINOR[FC]
For example: 2.1, 3.0FC, 3.1
The FC suffix indicates a forward compatible context, and
is only valid for versions >= 3.0.
Examples:
2.1: GL Legacy/Compatibility context
3.0: GL Legacy/Compatibility context
3.0FC: GL Core Profile context + Forward Compatible
3.1: GL Core Profile context
3.1FC: GL Core Profile context + Forward Compatible
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
intelDestroyContext will eventually be called, and it will clean things
up. The call to brwInitVtbl is moved earlier so that
intelDestroyContext can call the device-specific destructor. This also
makes the code look more like the i915 code.
NOTE: This is a candidate for the 9.0 branch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=54301
_glapi_table is a struct full of named function pointers, while the generated
code just wants to treat it as an array of function pointers. Cast to avoid
the compiler warning.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This test is only built when shared-glapi is used. Because of changes
elsewhere in the tree that were necessary to make shared-glapi work
correct with GLX, it's not feasible to make the test function both ways.
The list of expected functions originally came from the functions set by
api_exec_es2.c. This file no longer exists in Mesa (but api_exec_es1.c
is still generated). It was the generated file that configured the
dispatch table for ES2 contexts. This test verifies that all of the
functions set by the old api_exec_es2.c (with the recent addition of VAO
functions) are set in the dispatch table and everything else is a NOP.
When adding ES2 (or ES3) extensions that add new functions, this test
will need to be modified to expect dispatch functions for the new
extension functions.
v2: Expect VAO functions be non-NOP.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
When building with shared-glapi, we can just use Mesa's _mesa_warning without
problems. stubs.cpp is only used when shared-glapi is not used.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
v2: Allow GL_ARB_shader_objects functions in core profile because we
still expose the extension string there. Don't allow
glBindFragDataLocation in GLES3 because it's not part of that API.
Based (mostly) on review comments from Eric Anholt.
NOTE: This is a candidate for the 9.0 branch
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This isn't used by this patch, but it will be necessary for several
follow-on patches. Separating this out will make it easier to reorder
patches later.
NOTE: This is a candidate for the 9.0 branch
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This function is not the same as glGetProgramiv.
NOTE: This is a candidate for the 9.0 branch
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
The most recent commit that touched this function,
commit b1d0fe022d
Author: Chad Versace <chad.versace@linux.intel.com>
Date: Wed Sep 26 11:05:12 2012 -0700
intel: Fix segfault in intel_texsubimage_tiled_memcpy
did fix the segfault, but introduced yet another bug. From Anholt: """You
need to still test format/type, because that's the incoming format (e.g.
GL_RGBA/GL_FLOAT) that you're trying to memcpy."""
This patch re-introduces the checks on the incoming format and type.
Note: This is a candidate for the 9.0 branch.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
In commit 091eb15b69, Jordan changed get_temp_image_type() to use
_mesa_get_format_datatype() instead of returning GL_FLOAT. That has
several possible return values: GL_FLOAT, GL_INT, GL_UNSIGNED_INT,
GL_SIGNED_NORMALIZED, and GL_UNSIGNED_NORMALIZED.
We do want to use GL_INT/GL_UNSIGNED_INT for integer formats. However,
we want to continue using GL_FLOAT for the normalized fixed-point types.
There isn't any code in pack.c to handle GL_(UN)SIGNED_NORMALIZED.
Fixes oglconform's fboarb advanced.blit.copypix, which was regressed by
commit 091eb15b69.
NOTE: This is a candidate for the 9.0 branch.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=53573
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
This patch removes all gl_config's with swapMethod=GLX_SWAP_COPY_OML. When
page flipping, we are unable to comply with swap-copy semantics.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
This failed when all the uploads to occur were uniform-type vertex data (like
glColor4f being active across a DrawArrays), because it would upload 1 element
instead of 1 element per vertex. There was no citation for how this code
helped any particular application, and it breaks ETQW, so just remove it.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=47170
NOTE: This is a candidate for the 9.0 and 8.0 branches.
Reviewed-and-tested-by: Kenneth Graunke <kenneth@whitecape.org>
The only symbols that need to be public (those in intel_screen.c that the
loader looks for) are already marked public. Saves 100k of compiled driver
size.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The function segfaulted when a game called glTexSubImage2D on a texture
with internalformat/format/type = GL_SLUMINANCE8/GL_BGRA/GL_UNSIGNED_BYTE.
The function only supports MESA_FORMAT_ARGB8888 and returns early if it
detects an unsupported format. Clearly, its detection condition was
insufficient. This patch fixes it to explicity check for
MESA_FORMAT_ARGB8888.
Note: This is a candidate for the 9.0 branch (fixes 413c491).
Reviewed-and-tested-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Paul Berry <stereotype441@gmail.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Haswell supports EXT_texture_swizzle and legacy DEPTH_TEXTURE_MODE
swizzling by setting SURFACE_STATE entries. This means we don't have to
bake the swizzle settings into the shader code by emitting MOV
instructions, and thus don't have to recompile shaders whenever the
swizzles change.
Unfortunately, we can't handle GL_ALPHA this way: unlike all the others,
which store the comparison result in the .r channel (and possibly others
as well), GL_ALPHA puts it in the .a channel. The GLSL 1.30+ style
functions which return a float always simply return the .r channel,
which would be zero if we handled this as a surface override. In this
case, fall back to doing it the old way. DEPTH_TEXTURE_MODE = GL_ALPHA
isn't an interesting performance path anyway.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes valgrind errors in piglit test
oes_compressed_etc1_rgb8_texture-miptree: an invalid write in
_mesa_store_compressed_store_texsubimage() at line 4406 and invalid reads
in texcompress_etc_tmp.h:etc1_parse_block().
The calculation of the size of the temporary etc1 buffer allocated by
intel_miptree_map_etc1() was incorrect. Sometimes the allocated buffer was
too small, sometimes too large. This patch corrects the size to that
expected by _mesa_store_compressed_store_texsubimage().
Note: This is candidate for the 9.0 branch.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Do all error checking of glTexSubImage, glCopyTexSubImage and
glCompressedTexSubImage's xoffset, yoffset, zoffset, width, height, and
depth params in one place.
If a subtexture region isn't aligned to the compressed block size,
return GL_INVALID_OPERATION, not gl_INVALID_VALUE.
NOTE: This is a candidate for the stable branches.
Reviewed-by: Eric Anholt <eric@anholt.net>
Instead of tracking the inferred state changes separately
just check if queued and emitted states are the same.
This patch just reworks the update of the SPI map between
vs and ps, but there are probably more cases like this.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
GLES 3 supports sRGB functionality, but it does not expose the
GL_FRAMEBUFFER_SRGB enable/disable bit. Instead the implementation
is expected to behave as though that bit is always enabled.
This patch ensures that ctx->Color.sRGBEnabled (the internal variable
tracking GL_FRAMEBUFFER_SRGB) is initially true in GLES 2/3 contexts,
and that it cannot be modified through the GLES 3 API.
This is safe for GLES 2, since ctx->Color.sRGBEnabled has no effect on
non-sRGB formats, and GLES 2 doesn't support any sRGB formats.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Previously, meta logic was saving and restoring the value of
GL_FRAMEBUFFER_SRGB in an ad-hoc fashion. As a result, it was not
properly disabled and/or restored for some meta operations.
This patch causes GL_FRAMEBUFFER_SRGB to be saved/restored in the
conventional way of meta-ops (using _mesa_meta_begin() and
_mesa_meta_end()). It is now reliably saved/restored for
_mesa_meta_BlitFramebuffer, _mesa_meta_GenerateMipmap, and
decompress_texture_image, and preserved for all other meta ops.
Fixes piglit tests "ARB_framebuffer_sRGB/blit renderbuffer
{linear_to_srgb,srgb} scaled {disabled,enabled}".
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
GLES3 supports sRGB formats, but it does not support the
GL_FRAMEBUFFER_SRGB enable/disable flag (instead it behaves as if this
flag is always enabled). Therefore, meta ops that need to disable
GL_FRAMEBUFFER_SRGB will need a backdoor mechanism to do so when the
API is GLES3.
We were already doing a similar thing for GL_MULTISAMPLE, which has
the same constraints.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This patch reduces the time spent in glTexImage and glTexSubImage by
over 5x on Sandybridge for the workload described below.
It adds a new fast path for glTexImage2D and glTexSubImage2D,
intel_texsubimage_tiled_memcpy, which is optimized for Google Chrome's
paint rectangles. The fast path is implemented only for 2D GL_BGRA
textures for chipsets with a LLC.
=== Performance Analysis ===
Workload description:
Personalize your google.com page with a wallpaper. Start chromium
with flags "--ignore-gpu-blacklist --enable-accelerated-painting
--force-compositing-mode". Start recording with chrome://tracing. Visit
google.com and wait for page to finish rendering. Measure the time spent
by process CrGpuMain in GLES2DecoderImpl::HandleTexImage2D and
HandleTexSubImage2D.
System config:
cpu: Sandybridge Mobile GT2+ (0x0126)
kernel 3.4.9 x86_64
chromium 21.0.1180.89 (154005)
Statistics:
| N Median Avg Stddev
--------------|-------------------------
before (msec) | 8 472.5 463.75 72.6
after (msec) | 8 78.0 79.6 5.7
Arithmetic difference at 95.0% confidence:
-384.1 +/- 55.2 msec
-82.8% +/- 11.9%
Ratio at 95.0% confidence:
5.81 +/- 0.119
v2:
- Replace check for `intel->gen >= 6` with `intel->has_llc`, per
danvet.
- Fix typo in comment, s/throuh/through/.
- Swap 'before' and 'after' rows in stat table.
v3:
- If the current batch references the bo, then flush batch before mapping
the bo. Found by Chris.
- Restrict supported texture images to level 0 of target
GL_TEXTURE_2D. This avoids an arithmetic bug in calculating image
offsets within the miptree, found by Paul. This restriction does not
diminish this patch's benefit to Chrome OS performance.
- Use less instructions for bit6 swizzling, suggested by Paul.
- Remove erroneous comment about Y-tiling, for Paul.
- Print perf_debug messages when flushing and stalling.
- Update stats in commit message; run workload under a release build
rather than a debug build.
Note: This is a candidate for the 9.0 branch.
Acked-by: Eric Anholt <eric@anholt.net>
CC: Stéphane Marchesin <marcheu@chromium.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
A game we're working with leaves scissoring enabled, but frequently sets
the scissor rectangle to the size of the whole screen. In that case,
scissoring has no effect, so it's safe to go ahead with a fast clear.
Chad believe this should help with Oliver McFadden's "Dante" as well.
v2/Chad: Use the drawbuffer dimensions rather than the miptree slice
dimensions. The miptree slice may be slightly larger due to alignment
restrictions.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-and-tested-by: Oliver McFadden <oliver.mcfadden@linux.intel.com>
Fixes an assertion failure when compiling certain shaders that need both
pull constants and register spilling:
brw_eu_emit.c:204: validate_reg: Assertion `execsize >= width' failed.
NOTE: This is a candidate for release branches.
Signed-off-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Commit e2249e8c4d (i965/blorp: Add
support for blits between SRGB and linear formats) changed blorp to
always configure surface states for in linear format (even if the
underlying surface is sRGB). This allowed sRGB-to-linear and
linear-to-sRGB blits to occur without causing the image to be
inappropriately brightened or darkened.
However, it broke sRGB MSAA resolves, since they rely on the
destination buffer format being sRGB in order to ensure that samples
are averaged together in sRGB-correct fashion.
This patch fixes the problem by instead configuring the source buffer
to use the *same* format as the destination buffer. This ensures that
the image won't be brightened or darkened, but preserves proper sRGB
averaging.
Fixes piglit tests "EXT_framebuffer_multisample/accuracy srgb".
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=55265
NOTE: This is a candidate for stable release branches.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-and-tested-by: Kenneth Graunke <kenneth@whitecape.org>
There are a few automake files that reference $(X11_INCLUDES) such as
src/glx/Makefile.am but configure.ac wasn't declaring the variable for
substitution. This would break builds of glx if libxcb, for example, was
installed in its own prefix since AM_CFLAGS wouldn't coincidentally
list the needed include path in that case.
Reviewed-by: Matt Turner <mattst88@gmail.com>
This patch is a band-aid fix for a bug in commit 5fd67fa (i965/blorp:
Reduce alignment restrictions for stencil blits), which causes
multisampled stencil blits to work incorrectly on Sandy Bridge.
When blitting to or from a normal stencil buffer, we have to use a
coordinate transformation that swizzles coordinates to account for the
fact that stencil buffers use W tiling, but the most similar tiling
format available for textures and render targets is Y tiling. The
differences between W and Y tiling cause pixels to be scrambled within
a block of size 8x4 (width x height) as measured relative to a W tile,
or 16x2 as measured relative to a Y tile. So in order to make sure
that pixels at the edges of the blit aren't lost, we need to align the
rendering rectangle (and the buffer sizes) to multiples of the 8x4
block size. This alignment happens in the brw_blorp_blit_params
constructor, whereas the determination of how to swizzle the
coordinates happens during code generation, in the
brw_blorp_blit_program class.
When blitting to or from a multisampled stencil buffer, the coordinate
swizzling is more complex, because it has to account for the
interleaving pattern of samples, which uses 4x4 blocks for 4x MSAA and
8x4 blocks for 8x MSAA. The end result is that if multisampling is in
use, the 16x2 block size (relative so a Y tile) needs to be expanded
to 16x4, and the corresponding size relative to a W tile expands to
8x8.
The problem doesn't affect Ivy Bridge severely enough to crop up in
Piglit tests because on Ivy Bridge we have to disable multisampling
when blitting *to* a multisampled stencil buffer (the blorp compiler
generates code to compensate for the fact that multisampling is
disabled). However I suspect a bug is still present because we don't
disable multisampling when blitting *from* a multisampled stencil
buffer.
This patch fixes the problem by doubling the vertical alignment
requirement when blitting to or from a multisampled stencil buffer,
and multisampling has not been disabled.
In the long run I would like to rework the brw_blorp_blit_params
constructor--it's difficult to follow and has had several subtle bugs
like this one. However this band-aid fix should be suitable for
cherry-picking to release branches.
Fixes Piglit tests "unaligned-blit {2,4} stencil {msaa,upsample}" on
Sandy Bridge.
NOTE: This is a candidate for stable release branches.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Recent version of GCC report a warning for the implicit conversion from
int to float:
ff_fragment_shader.cpp:897:3: warning: narrowing conversion of '(1 << ((int)rgb_shift))' from 'int' to 'float' inside { } is ill-formed in C++11 [-Wnarrowing]
This is because floats cannot precisely represent all possible 32-bit
integer values. However, texenv code is all expected to be floating
point, so this should not be a problem.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
From the OpenGL Registry:
"2012/08/13: specs named GL_ARB_debug_group, GL_ARB_debug_label, and
GL_ARB_debug_output2 were published in error during the initial OpenGL 4.3
release. All functionality in these documents was combined into
the extension GL_KHR_debug. They have been withdrawn from the registry,
and a few other extensions were renumbered to avoid holes in the numbering
scheme."
pipe_draw_info::indexed determines if it should be indexed and not
the presence of an index buffer.
This fixes crashes in r300g.
NOTE: This is a candidate for the stable branches.
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
A call to glGenerateMipmap() follows the generation of a relevant
shader program in setup_glsl_generate_mipmap().
To support all texture targets and to avoid compiling shaders
everytime, per target shader programs are compiled on demand
and saved for the next call.
Fixes float-texture(mipmap.manual):
See Comment 6: https://bugs.freedesktop.org/show_bug.cgi?id=54296
NOTE: This is a candidate for stable branches.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Blorp has to convert rectangle coordinates from integers to floats in
order to send them down the GPU pipeline. Recent versions of GCC
issue a warning for this, since a float is not capable of precisely
representing all possible 32-bit integer values. Suppress the warning
with an explicit type cast in the case of blorp, since rectangle
coordinates will never be large enough to cause a loss of precision.
Reviewed-by: Eric Anholt <eric@anholt.net>
Given that it exists between a push/pop of instruction state, this call
can only affect the MOV or ADD instruction generated just below it.
Neither of those instructions are predicated, so it makes no sense to
ask for the inverse predicate.
This fixes grumblings from the simulator debugger, which was
complaining about an invalid predicate.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Commit 42723d88d intended to override an S3TC internalFormat to a
generic compressed format when the application requested online
compression of uncompressed data. Unfortunately, it also broke
pre-compressed textures when libtxc_dxtn isn't installed but the
extensions are forced on.
Both glCompressedTexImage2D() and glTexImage2D() call teximage(), which
calls _mesa_choose_texture_format(), hitting this override code. If we
have actual S3TC source data, we can't treat it as any other format, and
need to avoid the override.
Since glCompressedTexImage2D() passes in a format of GL_NONE (which is
illegal for glTexImage), we can use that to detect the pre-compressed
case and avoid the overrides.
Fixes a regression since 42723d88d3.
NOTE: This is a candidate for the 9.0 branch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-and-tested-by: Jordan Justen <jordan.l.justen@intel.com>
Fixes colorspace issues in L4D2 when multisampling is enabled (the
scene was far too dark, but the flashlight area was way too bright).
The nVidia and AMD binary drivers both allow this kind of blit.
NOTE: This is a candidate for the 9.0 branch.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
MSAA resolves and other blit-like operations ignore SRGB state anyway,
so we should be able to safely allow resolves between compatible
SRGB/linear formats like SRGBA8 and RGBA8888.
This matches the behavior of the nVidia and AMD binary drivers.
Fixes completely black rendering when using multisampling in L4D2.
NOTE: This is a candidate for the 9.0 branch.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
v2: use uint64_t for the total_size variable, per Jose.
Also add two earlier checks for exceeding the max texture size.
For example a 1K^3 RGBA volume would overflow the lpr->image_stride
variable.
Use simple algebra to avoid overflow in intermediate values.
So instead of "x * y > z" use "x > z / y".
This should work if we happen to be on a platform that doesn't have
64-bit types.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This was already (correctly) supported for glGetSamplerParameter paths.
NOTE: This is a candidate for stable branches.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Initializing the regalloc state is expensive, and since it is always
the same for every compile we only need to initialize it once per
context. This should help improve shader compile times for the driver.
Compute shaders fetch data from vertex buffers via the texture cache, so
we need to make sure the texture cache is flushed.
v2:
- Fix rebase mistake
- Fix spelling in comment
Reviewed-by: Marek Olšák <maraeo@gmail.com>
LOOP_START_DX10 ignores the LOOP_CONFIG* registers, so it is not limited
to 4096 iterations like the other LOOP_* instructions. Compute shaders
need to use this instruction, and since we aren't optimizing loops with
the LOOP_CONFIG* registers for pixel and vertex shaders, it seems like
we should just use it for everything.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
For buffers (which is what is being used for RATs), the
COLOR*_DIM.WIDTH_MASK field needs to be set to the low 16-bits of the
buffer size, and the COLOR*_DIM.HEIEGHT_MAX needs to be set to the
high bits.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
- add OpenCL state tracker Clover
- add XvMC state tracker
- remove progs
directory got moved into its own repository mesa/demos
- remove vf
directory removed with abda64efce
Don't cache pointers to elements of reallocatable array.
In some circumstances it caused false cache hits resulting in incorrect
command stream and gpu lockup.
Note: This is a candidate for the stable branches.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
If the gallium driver implements the can_create_resource() function, call
it to do proxy texture size checks.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Used to implement proxy textures. If a gallium driver doesn't implement
this function we'll just continue to use the core Mesa fallback code.
Without this hook we really have no good way to implement OpenGL proxy
textures with gallium drivers.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Before, the limit was 8K. For 32-bit RGBA that would be require 1.5 GB
of memory (w/out mipmaps). That's well beyond the LP_MAX_TEXTURE_SIZE
of 1GB.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Simplify the code and make it more like the other glTexImage commands.
Call _mesa_legal_texture_dimensions() to validate width, height, depth.
Call ctx->Driver.TestProxyTexImage() to make sure texture is not too large.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
There are two aspects to texture image size checking:
1. Are the width, height, depth legal values (not negative, not larger
than the max size for the mipmap level, etc)?
2. Is the texture just too large to handle? For example, we might not be
able to really allocate memory for a 3D texture of maxSize x maxSize x
maxSize.
Previously, we did (1) via the ctx->Driver.TestProxyTextureImage() hook
but those tests are really device-independent. Now we do (2) via that
hook since the max texture memory and texture shape are device-dependent.
Also, (1) is now done outside the general texture parameter error checking
functions because of the special interaction with proxy textures. The
recently introduced PROXY_ERROR token is removed.
The teximage() and copyteximage() functions are bit simpler now (less
if-then nesting, etc.)
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Basically, move the body into a new _mesa_legal_texture_dimensions() function.
More refactoring to come.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
I can't see any reason this is global (unless for debugging)
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds basic flow control support for If-Then-Else blocks using
predicates (stored in the EXEC register) and a predicate stack for
nested flow control.
No regressions found in the tests of opencl-example/run_tests.sh.
Signed-off-by: Xinya Zhang <zxy_thf@hotmail.com>
Signed-off-by: Tom Stellard <thomas.stellard@amd.com>
As far as I can see, the intention of the requirement that we do so is to
prevent instruction prefetch from wandering out into either unmapped memory or
memory with a different caching type, and hanging the chip. The kernel makes
sure that the page after your BO has a valid page of the same caching type,
which meets this requirement, so there's no need to waste space between our
programs (and in instruction cache) on this.
Saves another 9kb instructions in l4d2 shaders.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reduces l4d2 program size from 1195kb to 919kb. Improves performance by 0.22%
+/- 0.11% (n=70).
v2: Rebase on compaction v2, fix up flag reg handling (by anholt).
v3: Fix uncompaction of the flag register number.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
This reduces program size by using some smaller encodings for common bit
patterns in the Gen ISA, with the hope of making programs fit in the
instruction cache better.
v2: Use larger bitshifts for the uncompressed field setups, in line with the
way it's described in the spec. Consistently name a brw_compile "p" like
all other code. Add a couple more tests. Consistently call things
"compacted" not "compressed" (which is a different feature). Drop the
explicit check for not compacting SENDs, which is unjustified and already
implied by our lack of support for immediate values.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
The first cut at instruction compaction won't compact things that
would change control flow jump distances, but we do need to still be
able to walk the instruction stream, which involves jumping by 8 or 16
bytes between instructions.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
It's going to get more complicated when we do instruction compaction. This
also introduces putting the program offset in the output.
v2: Use next_insn_offset in brw_get_program(), too.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
To do unit testing of i965, we want to be able to link against the
driver's symbols and prod them. If we don't have a separate lib from
our loadable module, libtool gets super whiny.
Acked-by: Paul Berry <stereotype441@gmail.com>
This file is used to provide stubs for the link test in gallium dri drivers.
But the same stubs without the main can be used for making unit tests for code
in a dri driver.
Acked-by: Paul Berry <stereotype441@gmail.com>
I noticed in valgrind that p->single_program_flow was used while
uninitialized. Everything else zeroed out brw_compile, but this is better
API.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
This fixes glGetStringi(GL_EXTENSIONS,.. for core contexts. Previously,
all extension names returned would be NULL.
NOTE: This is a candidate for release branches.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
GL_TEXTURE_1D, GL_TEXTURE_3D, GL_TEXTURE_RECTANGLE, and
GL_TEXTURE_GEN_S/T/R/Q don't exist in ES 1 contexts, so any meta ops
that used _mesa_meta_begin with MESA_META_TEXTURE would trigger GL
errors. One such operation is _mesa_meta_Clear().
On ES 1, we want to disable GL_TEXTURE_GEN_STR_OES instead.
Fixes the ES1 conformance test miplin.c, which was regressed by commit
08be1d288f.
NOTE: This is a candidate for the 9.0 branch.
v2: Also blacklist GL_TEXTURE_3D, per Brian's comment.
v3: Disable GL_TEXTURE_GEN_STR_OES, per Ian's comment.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=54297
Reviewed-by: Brian Paul <brianp@vmware.com> [v1]
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Just to make it consistent with the rest of vbo, since it would
be an exported symbol anyways.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
I can't see any external users, and this is a global symbol,
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The current code is duplicated in two places and relies on `uname` to
detect the flags. This is no good for cross-compiling, and the current
logic uses -m64 for the x32 ABI which breaks things.
Unify the code in one place, avoid `uname` completely, and add support
for the new x32 ABI.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
This symbol with dricore escapes into the namespace, its too generic,
we should prefix it with something just to be nice.
Should be applied to stable + 9.0
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
So glcpp tried to workaround yylex its own way, but failed,
do it properly.
This fixes another crash found after fixing the first crash.
this is a candidate for 9.0 and stable branches
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This avoids us making a global yylex symbol which will interfere will
all sorts of apps.
with libdricore which can't do symbol visibility currently we pollute
the namespace with this.
This is a candidate for 9.0 & stable branches.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
In commit 055093e (meta: remove call to _meta_in_progress(), fix
multisample enable/disable), we created a meta_set_enable() function
that could be used by meta ops to enable and disable GL_MULTISAMPLE
even when the GLES API was in use (the GLES API doesn't support
GL_MULTISAMPLE; it behaves as if it is always enabled). This created
some unfortunate code duplication between meta_set_enable() and the
existing _mesa_set_enable() function.
This patch eliminates the duplication by creating a
_mesa_set_multisample() function, which is used by both meta ops and
_mesa_set_enable() to enable/disable GL_MULTISAMPLE.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
glsl version of _mesa_meta_GenerateMipmap() would require separate
shaders for glsl 120 and 130.
V2: Removed the code for integer textures as ARB is planning to
disallow automatic mipmap generation for integer textures.
NOTE: This is a candidate for stable branches.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
glsl path of _mesa_meta_GenerateMipmap() function would require different fragment
shaders depending on the texture target. This patch adds the code to generate
appropriate fragment shader programs at run time.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=54296
V2: Removed the code for integer textures as ARB is planning to
disallow automatic mipmap generation for integer textures.
Now using ralloc_asprintf in setup_glsl_generate_mipmap().
NOTE: This is a candidate for stable branches.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
v2: Group vgt register together to avoid lockup
v3: Split multi primitive register and index bias register
v4: Bump R600_NUM_ATOMS
Signed-off-by: Marek Olšák <maraeo@gmail.com>
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Update only those sampler states which are changed in a shader stage,
instead of always updating all sampler states in the shader stage.
That requires keeping a bitmask of those states which are enabled, and those
states which are dirty at a given point (subset of enabled states).
This is similar to how sampler views, constant buffers, and vertex buffers
are handled.
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Based on the patch called "simplify and fix flushing and synchronization"
by Jerome Glisse.
Rebased, removed unneded code, simplified more and cleaned up.
Also, SH_ACTION_ENA is not set when changing shaders (hw doesn't seem
to need it). It's only used to flush constant buffers.
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Some of the old AMDIL code was hard-coding subreg indices when creating
the VBUILD node, which was making it difficult to match the
vector_insert patterns.
ARB fragment programs use texture unit numbers directly, unlike GLSL
which has an extra indirection. If a fragment program only uses one
texture assigned to GL_TEXTURE1, SamplersUsed will only contain a single
bit, which would make us only upload a single surface/sampler state
entry. However, it needs to be the second entry.
Using _mesa_fls() instead of _mesa_bitcount() solves this. For ARB
programs, this makes num_samplers the ID of the highest texture unit
used. Since GLSL uses consecutive integers assigned by the linker,
_mesa_fls() should give the same result as _mesa_bitcount()..
Fixes a regression since 85e8e9e000,
which caused GPU hangs in ETQW (and probably others), as well as
breaking piglit test fp-fragment-position.
v2: Add a comment, as suggested by Matt.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=54098
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=54179
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Tested-by: meng <mengmeng.meng@intel.com>
ffs() finds the least significant bit set; _mesa_fls() finds the /most/
significant bit.
v2: Make it an inline function in imports.h, per Brian's suggestion.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes piglit test "framebuffer-blit-levels draw stencil".
NOTE: This is a candidate for stable release branches.
Acked-by: Eric Anholt <eric@anholt.net>
Previously, we aligned all stencil blit operations to multiples of the
size of a tile, since stencil buffers use W-tiling, and blorp has to
approximate this by configuring the 3D pipeline for Y-tiling and
swizzling coordinates.
However, this was unnecessarily conservative; it turns out that the
differences between W-tiling and Y-tiling are confined to 32-byte
sub-tiles within the 4k tiling pattern; the layout of these 32-byte
sub-tiles within the larger 4k tile is the same (8 sub-tiles across by
16 sub-tiles down, in column-major order). Therefore we only need to
align stencil blit operations to multiples of the sub-tile size.
Note: although the performance improvement of this change is probably
quite small, the fact that W-tiling and Y-tiling formats only differ
within 32-byte sub-tiles will be essential in a future patch to ensure
that stencil blits work correctly between parts of the miptree other
than level/layer 0. Making this change provides handy documentation
(and validation) of this fact.
NOTE: This is a candidate for stable release branches.
Acked-by: Eric Anholt <eric@anholt.net>
When blitting to a stencil buffer, we need to align the rectangle we
send down the rendering pipeline, to account for the fact that the
stencil buffer uses a W-tiled layout, but we are configuring its
surface state as Y-tiled.
Previously, when the stencil buffer was multisampled, we assumed that
we could reduce the amount of alignment that was necessary, since each
pixel occupies a block of 2x2 or 4x2 samples in the stencil buffer.
That would have been correct if the coordinates we were adjusting were
measured in pixels. However, the conversion from pixel coordinates to
coordinates within the interleaved buffer has already been done;
therefore the full alignment restriction applies.
Note: the reason this mistake wasn't previously uncovered by piglit
tests is because it is being masked by another mistake: the blorp
engine is using overly conservative alignment restrictions when doing
stencil blits. The overly conservative alignment restrictions will be
removed in the patch that follows. Doing this fix now will prevent
the subsequent patch from introducing regressions.
NOTE: This is a candidate for stable release branches.
Acked-by: Eric Anholt <eric@anholt.net>
This patch modifies intel_region_get_aligned_offset() to make the
appropriate calculation when the blorp engine sets up a W-tiled
stencil buffer using a Y-tiled SURFACE_STATE.
NOTE: This is a candidate for stable release branches.
Acked-by: Eric Anholt <eric@anholt.net>
When the blorp engine is performing a blit from one stencil buffer to
another, it sets up the surface state for these buffers as Y-tiled, so
it needs to be able to force intel_region_get_tile_masks() to return
the appropriate masks for a Y-tiled region.
NOTE: This is a candidate for stable release branches.
Acked-by: Eric Anholt <eric@anholt.net>
Fixes piglit tests "framebuffer-blit-levels {read,draw} depth".
NOTE: This is a candidate for stable release branches.
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, when performing a blit using the blorp engine, we failed
to account for the level and layer of the source and destination. As
a result, all blits would occur between miplevel 0 and layer 0 of the
corresponding textures, regardless of which level/layer was bound to
the framebuffer.
This patch passes the correct level and layer through
brw_blorp_miptrees() into the brw_blorp_blit_params data structure.
Further patches in the series will adapt
gen{6,7}_blorp_emit_surface_state to make use of these parameters.
NOTE: This is a candidate for stable release branches.
Reviewed-by: Eric Anholt <eric@anholt.net>
Currently, gen{6,7}_blorp_emit_surface_state assumes that the src and
dst surfaces are mapped to miplevel 0 and layer 0 (thus no surface
offset is required). This is a bug, since the user might try to blit
to and from levels/layers other than 0.
To fix this bug, it will not be sufficient to have
gen6_{6,7}_blorp_emit_surface_state look up the surface offset at the
time they set up the surface state, since these offsets will need to
be tweaked when blitting stencil buffers (due to the fact that stencil
buffer blits have to swizzle between W and Y tiling formats).
So, to pave the way for the bug fix, this patch causes the x and y
offsets to be computed during blit setup and stored in
brw_blorp_mip_info.
As a result of this change, brw_blorp_mip_info doesn't need to store
the level and layer anymore.
For consistency, this patch makes a similar change to the handling of
depth buffers when doing HiZ operations.
NOTE: This is a candidate for stable release branches.
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, gen{6,7}_blorp_emit_surface_state would look up the width
and height of the surface at the time they set up the surface state,
and then tweak it if necessary (it's necessary when a W-tiled surface
is being mapped as Y-tiled). With this patch, we look up the width
and height when setting up the blit, and store them in
brw_blorp_mip_info. This allows us to do the necessary tweak in the
brw_blorp_blit_params constructor (where it makes more sense). It
also reduces the need to keep track of level and layer in
brw_blorp_mip_info, so that a future patch can eliminate them
entirely.
For consistency, this patch makes a similar change to the handling of
depth buffers when doing HiZ operations.
NOTE: This is a candidate for stable release branches.
Reviewed-by: Eric Anholt <eric@anholt.net>
This makes it more convenient for blorp functions to get access to
Intel-specific data inside the renderbuffer objects.
NOTE: This is a candidate for stable release branches.
Reviewed-by: Eric Anholt <eric@anholt.net>
Also add a clarifying comment for why the width/height doesn't need
adjustment for Gen7.
NOTE: This is a candidate for stable release branches.
Reviewed-by: Eric Anholt <eric@anholt.net>
Since Gen6+ stencil buffers use W-tiling (a tiling arrangement which
drm and the kernel are not aware of) we need to round up the width and
height of a stencil buffer to multiples of the W-tile size (64x64)
before allocating a stencil buffer. Previously, we rounded up the
size of the base miplevel, and then computed the miptree layout based
on the rounded up size. This was incorrect, because it meant that the
total size of the miptree would not be properly W-tile aligned, and
therefore we would not always allocate enough pages.
(Note: even though the GL API doesn't allow creation of mipmapped
stencil textures, it does allow mipmapping of a combined depth/stencil
texture, and on Gen6+, a combined depth/stencil texture is internally
implemented as a pair of separate depth and stencil buffers.)
For example, on Sandy Bridge, when allocating a mipmapped stencil
texture of size 128x128, we would first round up to the nearest
multiple of 64x64 (causing no change to the size), and then compute
the miptree layout (whose size worked out to 128x196). Then we would
request an allocation of 128*196 bytes (6.125 pages), causing 7 pages
to be allocated to the texture. However, the texture needs 8 pages,
since each W-tile occupies a page, and it takes 2 W-tiles to cover a
width of 128 and 4 W-tiles to cover a height of 196.
This patch changes the order of operations so that the miptree layout
is computed first and then the total size of the miptree is rounded up
to be W-tile aligned.
NOTE: This is a candidate for stable release branches.
Reviewed-by: Eric Anholt <eric@anholt.net>
Fixes piglit shaders/glsl-fs-uniform-sampler-array and many other similar
tests.
In fact, I just completed a piglit quick-driver.tests run without any GPU
lockups or even VM protection faults. Yay!
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
The value was too small by 1 in some cases (non-first of several vertex
elements interleaved in a single buffer).
Fixes intermittent incorrect geometry in many apps, e.g. piglit
spec/EXT_texture_snorm/fbo-generatemipmap-formats.
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
These enums are valid only in ES1 and ES2. So far they were marked valid
incorrectly, depending on the previous API mask in the enum list.
Signed-off-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
This is basically a follow-on to 1f5b1f9846.
Basically, generate GL errors for ordinary invalid parameters for proxy
targets the same as for non-proxy targets. Only texture size and OOM
errors should be handled specially for proxies.
Note: This is a candidate for the stable branches.
Turns out we weren't doing any format checking before. Now check
the internal format and, in particular, make sure that unsized internal
formats aren't accepted.
Note: This is a candidate for the stable branches.
From the GL 4.3 spec, section 18.3.1 "Blitting Pixel Rectangles":
If SAMPLE_BUFFERS for either the read framebuffer or draw
framebuffer is greater than zero, no copy is performed and an
INVALID_OPERATION error is generated if the dimensions of the
source and destination rectangles provided to BlitFramebuffer are
not identical, or if the formats of the read and draw framebuffers
are not identical.
It is not clear from the spec whether "dimensions" should mean both
sign and magnitude, or just magnitude.
Previously, Mesa interpreted "dimensions" as meaning both sign and
magnitude, so any multisampled blit that attempted to flip the image
in the X and/or Y direction would fail.
However, Y flips are likely to be commonplace in OpenGL applications
that have been ported from DirectX applications, as a result of the
fact that DirectX and OpenGL differ in their orientation of the Y
axis. Furthermore, at least one commercial driver (nVidia) permits Y
filps, and L4D2 relies on them being permitted. So it seems prudent
for Mesa to permit them.
This patch changes Mesa to allow both X and Y flips, since there is no
language in the spec to indicate that X and Y flips should be treated
differently.
NOTE: This is a candidate for stable release branches.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
The compiler needs to know which interpolation modes are enabled, so
it knows which values will be preloaded into the VGPRs.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
At least one interpolation mode must be enable, but the code that checks
this was not checking for perspective center.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Previous command stream might have set any of the constant buffer
and the previous address might no longer be valid thus GPU might
preload constant from random invalid address and possibly triggering
lockup.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
* Handle arbitrary border colours.
* Use correct packing format for detecting special border colours.
Fixes piglit tex-border-1 and probably many other tests using border colours.
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
According to the GLSL 4.30 specification, this is a compile time error.
Earlier specifications don't specify a behavior, but since 0 and 1 are
the only valid indices for dual source blending, it makes sense to
generate the error.
Fixes (the fixed version of) piglit's layout-12.frag.
NOTE: This is a candidate for the 9.0 branch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Fixes piglit spec/EXT_texture_snorm/fbo-generatemipmap-formats (except for
what seems like a random fluke).
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Saves 96MB of wasted memory in the l4d2 demo.
v2: Rebase on compare func change, change brace style.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Saves 26.5MB of wasted memory allocation in the l4d2 demo.
v2: Rebase on compare func change, fix comments.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Currently, this just avoids comparing all unused parts of param[] and
pull_param[], but it's a step toward getting rid of those giant statically
sized arrays.
v2: Actually use the new function instead of just looking at its
address. This required changing the args to const pointers.
(review by Kenneth)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We don't fully process the builtin uniforms, but at least
num_uniform_components reflects reality now.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This fixes an issue where the local 'table' variable was hiding the
function parameter name in glGetColorTable(..., void *table).
This should be OK as long as there's never a GL entrypoint that uses
'disp_table' as a parameter name.
Note: This is a candidate for the 9.0 branch.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Haswell moved the "Cut Index Enable" bit from the INDEX_BUFFER packet to
a new 3DSTATE_VF packet, so we need to emit that. Also, it requires us
to specify the cut index rather than assuming it's 0xffffffff.
This adds a new Haswell-specific tracked state atom to gen7_atoms.
Normally, we would create a new generation-specific atom list, but since
there's only one difference over Ivybridge so far, I chose to simply
make it return without doing any work on non-Haswell systems.
Fixes five piglit tests:
- general/primitive-restart-DISABLE_VBO
- general/primitive-restart-VBO_COMBINED_VERTEX_AND_INDEX
- general/primitive-restart-VBO_INDEX_ONLY
- general/primitive-restart-VBO_SEPARATE_VERTEX_AND_INDEX
- general/primitive-restart-VBO_VERTEX_ONLY
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
To avoid GPU lockup registers must be emited in a specific order
(no kidding ...). This patch rework atom emission so order in which
atom are emited in respect to each other is always the same. We
don't have any informations on what is the correct order so order
will need to be infered from fglrx command stream.
v2: add comment warning that atom order should not be taken lightly
v3: rebase on top of alphatest atom fix
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
glGetStringi(GL_EXTENSIONS) failed to respect the context's API, and so
returned all internally enabled GLES extensions from a GL context.
Likewise, glGetIntegerv(GL_NUM_EXTENSIONS) also failed to repsect the
context's API.
Note: This is a candidate for the 8.0 and 9.0 branches.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Such as
"llvmpipe (LLVM 3.1, 128 bits)"
or
"llvmpipe (LLVM 3.1, 256 bits)"
when leveraging AVX 8-wide registers.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Should be at least mostly working now (with the corresponding fixes in
libdrm_radeon).
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
We can always use the offset and tiling mode from level 0 and restrict the
first and last mipmap level to be used in the sampler resource.
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Same as earlier commit, except for "FREE"
This patch has been generated by the following Coccinelle semantic
patch:
// Remove useless checks for NULL before freeing
//
// free (NULL) is a no-op, so there is no need to avoid it
@@
expression E;
@@
+ FREE (E);
+ E = NULL;
- if (unlikely (E != NULL)) {
- FREE(E);
(
- E = NULL;
|
- E = 0;
)
...
- }
@@
expression E;
type T;
@@
+ FREE ((T) E);
+ E = NULL;
- if (unlikely (E != NULL)) {
- FREE((T) E);
(
- E = NULL;
|
- E = 0;
)
...
- }
@@
expression E;
@@
+ FREE (E);
- if (unlikely (E != NULL)) {
- FREE (E);
- }
@@
expression E;
type T;
@@
+ FREE ((T) E);
- if (unlikely (E != NULL)) {
- FREE ((T) E);
- }
Reviewed-by: Brian Paul <brianp@vmware.com>
This patch has been generated by the following Coccinelle semantic
patch:
@@
expression E;
identifier I;
@@
- I = malloc(E);
+ I = calloc(1, E);
...
- memset(I, 0, sizeof *I);
Reviewed-by: Brian Paul <brianp@vmware.com>
This patch has been generated by the following Coccinelle semantic
patch:
// Remove useless checks for NULL before freeing
//
// free (NULL) is a no-op, so there is no need to avoid it
@@
expression E;
@@
+ free (E);
+ E = NULL;
- if (unlikely (E != NULL)) {
- free(E);
(
- E = NULL;
|
- E = 0;
)
...
- }
@@
expression E;
type T;
@@
+ free ((T) E);
+ E = NULL;
- if (unlikely (E != NULL)) {
- free((T) E);
(
- E = NULL;
|
- E = 0;
)
...
- }
@@
expression E;
@@
+ free (E);
- if (unlikely (E != NULL)) {
- free (E);
- }
@@
expression E;
type T;
@@
+ free ((T) E);
- if (unlikely (E != NULL)) {
- free ((T) E);
- }
Reviewed-by: Brian Paul <brianp@vmware.com>
This patch has been generated by the following Coccinelle semantic
patch:
// Don't cast the return value of malloc/realloc.
//
// Casting the return value of malloc/realloc only stands to hide
// errors.
@@
type T;
expression E1, E2;
@@
- (T)
(
_mesa_align_calloc(E1, E2)
|
_mesa_align_malloc(E1, E2)
|
calloc(E1, E2)
|
malloc(E1)
|
realloc(E1, E2)
)
These calls allowed Xlib to use a custom memory allocator, but Xlib has
used the standard C library functions since at least its initial import
into git in 2003. It seems unlikely that it will grow a custom memory
allocator. The functions now just add extra overhead. Replacing them
will make future Coccinelle patches simpler.
This patch has been generated by the following Coccinelle semantic
patch:
// Remove Xcalloc/Xmalloc/Xfree calls
@@ expression E1, E2; @@
- Xcalloc (E1, E2)
+ calloc (E1, E2)
@@ expression E; @@
- Xmalloc (E)
+ malloc (E)
@@ expression E; @@
- Xfree (E)
+ free (E)
@@ expression E; @@
- XFree (E)
+ free (E)
Reviewed-by: Brian Paul <brianp@vmware.com>
This is a long-standing omission in Mesa's texture image size checking.
We need to take the mipmap level into consideration when checking if the
width, height and depth are too large.
Fixes the new piglit max-texture-size-level test.
Thanks to Stéphane Marchesin for finding this problem.
Note: This is a candidate for the stable branches.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
According to Eric, this shouldn't matter since we don't do precompiles
using the old backend. In other words, brw->fragment_program (the
currently active program) should equal c->fp (the program currently
being compiled).
However, it's just not a good idea to access brw->fragment_program
directly in compiler code. It's totally illegal in the new backend, so
let's just not do it here either.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reported-by: Paul Berry <stereotype441@gmail.com>
The CodeEmitter was not setting the VGPR bit for src0, because the
instruction definition had the VCC register in the src0 slot, instead of
the actual src0 register. This has been fixed by moving the VCC
register to the end of the operand list.
Looks like converting this to a macro, returning bool, caused us to
lose the high (31st) bit result. Fixes piglit fbo-1d test. Strange
that none of the other tests I ran caught this.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=54365
Tested-by: Vinson Lee <vlee@freedesktop.org>
We were already defining sqrtf where we don't have the C99 version.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Use 1/256 for R6xx/7xx, 1/4096 for evergreen, instead of default 1/16.
Helps to pass some piglit tests (fbo, multisample).
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
I wonder if the better solution is to have _mesa_meta_GenerateMipmap not
use MESA_META_ALL for the GLSL path. Even on compatibility profiles
there is no reason to save and restore fog on this path.
NOTE: This is a candidate for the 9.0 branch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Lu Hua <huax.lu@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=54295
Looks like we have an alignment issue with NPOT textures
and mipmaps. So disable NPOT textures until we figure out
what is going wrong here.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reading brw->fragment_program is nonsensical in compiler code: it
contains the currently active program (if any), not the one currently
being compiled. Attempting to access it may either lead to crashes
(null pointer dereference if no program is active) or wrong results.
Fixes piglit regressions since 9ef710575b
on pre-Sandybridge hardware. The actual bug was created in commit
7b1fbc6889.
NOTE: This is a candidate for the 9.0 and 8.0 branches.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=54183
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
From ARB_sync spec:
If the value of <timeout> is zero, then ClientWaitSync does not
block, but simply tests the current state of <sync>. TIMEOUT_EXPIRED
will be returned in this case if <sync> is not signaled, even though
no actual wait was performed.
Fixes random fails of the arb_sync-timeout-zero piglit test on r600g.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
As discussed with Kristian on #wayland. Pushes the decision of components into
the dri driver giving it greater freedom to allow t to implement YUV samplers
in hardware, and which mode to use.
This interface will also allow drivers like SVGA to implement YUV surfaces
without the need to sub-allocate and instead send 3 seperate buffers for each
channel, currently not implemented.
I have tested these changes on Gallium Svga. Scott tested them on both intel
and Gallium Radeon. Kristan and Pekka tested them on intel.
v2: Fix typo in dri2_from_planar.
v3: Merge in intel changes.
Tested-by: Scott Moreau <oreaus@gmail.com>
Tested-by: Pekka Paalanen <ppaalanen@gmail.com>
Tested-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Signed-off-by: Jakob Bornecrantz <jakob@vmware.com>
Immediate operands were previously handled in the CodeEmitter, but that
code was buggy and very confusing. This commit adds a pass that simplifies
the handling of immediate operands by spliting the loading of the
immediate into a sperate insruction that is bundled with the original.
The relevant POINT_SIZE registers are being set using the
pipe_rasterizer_state, so we just need to tell the shader compiler which
export type to use.
This fixes several of the glean glsl tests.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
On Android we want to add only double buffered configs for visuals.
Earlier implementation set the SurfaceType as 0 for single buffered
configs but driver still exposed these configs that were not compatible
with any egl surface type. This caused Khronos conformance test runs to
fail on Android. This patch fixes the issue by skipping single buffered
configs earlier and not exposing them.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
The CALLOC() macro only takes one argument so this was being treated
as a comma expression. Simply use calloc() instead.
A follow-on patch will replace all CALLOC() calls with calloc().
NOTE: This is a candidate for the 8.0 and 9.0 branches.
_mesa_delete_renderbuffer() should free the mutex (though that may be a
no-op) and then free the renderbuffer object itself. Subclasses of
gl_renderbuffer can use this function too.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Now that OpenGL 3.1 is supported by at least one driver, follow
tradition and bump the major version number.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The blend state is different and the resolve single-sample buffer must have
FMASK and CMASK enabled. I decided to have one CMASK and one FMASK
per context instead of per resource.
There are new FMASK and CMASK allocation helpers and a new buffer_create
helper for that.
The color resolve on r6xx needs PT_RECTLIST. Using conventional primitive
types (triangles and quads) produces an ugly line between two diagonally
opposite corners. I guess a rectangular point sprite would work too.
This partially reverts d638da23d2.
With gallium the meta code is not always built so the call to
_meta_in_progress() was unresolved. Simply special-case the
GL_MULTISAMPLE case in the meta code. There might be other special
cases in the future given all the differences between legacy GL,
core GL, GLES, etc.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=54234
and https://bugs.freedesktop.org/show_bug.cgi?id=54239
v2 (Paul Berry <stereotype441@gmail.com>): keep _meta_in_progress
function, since it's needed by the i965 driver, but don't call it from
core mesa.
Signed-off-by: Brian Paul <brianp@vmware.com>
Prior to commit 2f1869822, emit_fb_writes() looped from 0 to 3, writing
all four components of a vec4 color output. However, that broke for
smaller output types (float, vec2, or vec3). To fix that, I introduced
a new variable (output_components[]) containing the size of the output
type for each render target.
Unfortunately, I forgot to actually initialize it in the constructor,
which meant that unless a shader wrote to gl_FragColor, or the specific
output for each render target, output_components would contain a garbage
value, and we'd loop for a completely non-deterministic amount of time.
Not actually emitting any color writes seems like the right approach.
We may still need to emit a render target write (to terminate the
thread), but don't have to put in any sensible values (the shader didn't
write anything, after all).
Fixes a regression since 2f18698220.
NOTE: This is a candidate for stable release branches.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=54193
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Tested-by: Ian Romanick <idr@freedesktop.org>
It is possible to force S3TC extensions to be enabled. This is
generally done to support applications that will only supply
pre-compressed textures. This accounts for the vast majority of
applications.
However, there is still the possibility of an application asking for
on-line compression. In that case, generate a warning and substitute a
generic compressed format. The driver will either pick an uncompressed
format or a compressed format that Mesa can handle on-line (e.g., FXT1).
This should only cause problems for applications that request on-line
compression and read the compressed texture back. This is likely an
infinitesimal subset of an already infinitesimal subset.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Fix API_OPENGL_CORE handling when TEXTURE_FLOAT_ENABLED is not
defined. Based on review feedback from Eric Anholt.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This is a purely software extension. The drivers don't need to do any
work to support it.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Page 407 (page 423 of the PDF) of the OpenGL 3.0 spec says (in the list
of deprecated functionality):
"Separate polygon draw mode - PolygonMode face values of FRONT and
BACK; polygons are always drawn in the same mode, no matter which
face is being rasterized."
Also modify meta to not use FRONT or BACK in a core context.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
We were calling through a dispatch table entry that was NULL, since the apple
variant is only on legacy desktop. Just call the function we mean instead of
indirecting through the dispatch.
v2: Use API_OPENGL_CORE.
v3: Only require desktop GL. If a driver can't support TexBOs in a non-core
context, it should not enable them.
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Fix completely broken condition around ClearColorIiEXT and
ClearColorIuiEXT.
v3: Add special VertexAttrib handling for ES2.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
The comment in the code even says this is the right thing to do.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
All drivers in Mesa do. This allows a lot of extension checking code to be
gutted from the function.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Fixes a bug that glGetMaterial[fx]v in ES1 contexts would (try to) allow
queries of GL_AMBIENT_AND_DIFFUSE. This enum can only be used in glMaterial,
not in the get.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Also handle glDisable, glIsEnabled, glEnableClientState, and
glDisableClientState.
v2: Add proper core-profile and GLES3 filtering.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
v2: Add proper core-profile and GLES3 filtering.
v3: Allow glGetVertexAttribfv(0, GL_CURRENT_VERTEX_ATTRIB_ARB, param) in
OpenGL 3.1, just like OpenGL ES 2.0.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
v2: Add proper core-profile filtering.
v3: Allow GL_SRC_ALPHA_SATURATE as a destination factor in GLES3. Based
on review feedback from Eric Anholt.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
v2: Add proper core-profile and GLES3 filtering.
v3: Allow GL_RGB10_A2UI in GLES3 based on review feedback from Eric
Anholt.
v4: Arg. Reject unsized RED and RG enums on GLES. More feedback from
Eric.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
v2: Add proper core-profile, GLES1, and GLES3 filtering.
v3: Fix the GL_FRAMEBUFFER_ATTACHMENT_OBJECT_NAME query when the
attachment type is GL_NONE on GLES3. Other cleanups. Based on review
feedback from Eric Anholt.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
v2: Add proper core-profile and GLES3 filtering.
v3: Fix a typo in GL_TEXTURE_2D_ARRAY checking.
v4: Change !_mesa_is_desktop_gl tests to _mesa_is_gles test. The test
around GL_TEXTURE_2D_ARRAY got some other changes because that enum is
also available with GLES3 (which uses API_OPENGLES2). Based on review
feedback from Eric Anholt.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
v2: Add proper core-profile and GLES3 filtering.
v3: Change !_mesa_is_desktop_gl tests to _mesa_is_gles test. The test
around GL_TEXTURE_2D_ARRAY got some other changes because that enum is
also available with GLES3 (which uses API_OPENGLES2). Based on review
feedback from Eric Anholt.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
The Common Subexpression Elimination pass will not operate on
instructions with physical register defs, so we end up with
several redundant copies to M0 when using interpolation.
Adding a register class that only contains the M0 register allows
use to use a virtual register to represent M0, and makes it possible
for the Common Subexpression Elimination pass to remove the extra
copies.
This reduces the overhead of using the fixed function internally
in the driver.
V2: Use setup_glsl_generate_mipmap() and setup_ff_generate_mipmap()
functions to avoid code duplication.
Use glsl version when ARB_{vertex, fragmet}_shader are present.
Remove redundant code.
V3: Remove redundant border related code leaving the assertion.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <idr@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
the progs/util directory is now in mesa demos
replace glean with piglit
add ApiTrace
markup: replace the unordered list <ul> with a definition list <dl>
Signed-off-by: Brian Paul <brianp@vmware.com>
I've reviewed the code, and the swrast callsites remaining are all in
drawpixels/copypixels/bitmap/accum, or _swrast_BlitFramebuffer that shouldn't
be hit. A piglit run with the context setup disabled on legacy GL and GLES2
showed regressions only in the copypixels and drawpixels tests.
If the context type is forced, this reduces the shader_runner maximum heap
size for glsl-algebraic-add-add-1.shader_test from 15,137,496b to 4,165,376b.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
The Fallback field of the context struct doesn't work that way on i965, and
it's the only caller of FALLBACK() in the driver.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This code has been in the driver since the first commit. I think it was
trying to stop rendering from happening with a disabled position array. Core
mesa has since had changes to deal with disabled position arrays correctly.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
But cap the size in bytes, to avoid depleting the whole system memory,
with humongus textures.
Tested with max-texture-size piglit test.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
We want to check whether there are bits set outside of the valid flags.
Fixes piglit test egl-create-context-invalid-flag-gl
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Now that it's on by default, we may as well make it obey the flag,
for consistency's sake if nothing else.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Precompiling the shader at link time often allows us to avoid compiling
it at the first use. This moves the expensive compilation and
optimization process to game or level load time, rather than at draw
time, where we really can't avoid any cycles and don't want to risk
stalling the GPU.
The downside is that we have to guess the non-orthagonal state the
program will have set when it draws with the shader. Previously, we
guessed wrong for nearly every shader, so it wasn't useful. With the
recent SamplerUnits rework and this series, we've either eliminated
state or made smarter guesses, and usually get it right now.
In the L4D2 time demo, I now have 39 fragment shader recompiles and no
vertex shader recompiles. Before this series and the SamplerUnits
rework, I had 206 fragment shader recompiles and 192 vertex shader
recompiles.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This fixes a regression since 76d1301e8e:
I began setting SWIZZLE_XYZW for unused sampler units in the actual
program keys, since this matched the FS precompile behavior. However,
the VS precompile was expecting zero, so that commit made essentially
every vertex shader (even those not using texturing) mismatch and need
to be recompiled.
Setting them in the VS precompile key solves the issue. It also is an
improvement over our old behavior: previously we guessed that vertex
shaders didn't use any textures at all. Now we actually look to see if
the VS had any sampler uniforms and guess based on that.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Eric added support for WM key debugging. This adds it for the VS.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Our previous assumption, SWIZZLE_XYZW, was completely bogus for depth
textures. There are no Y, Z, or W components.
DEPTH_TEXTURE_MODE has three options:
- GL_LUMINANCE: <X, X, X, 1>
- GL_INTENSITY: <X, X, X, X>
- GL_ALPHA: <0, 0, 0, X>
The default value is GL_LUMINANCE, and most applications don't seem to
alter DEPTH_TEXTURE_MODE. Make that our precompile guess.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Now that most things are based on the linker-assigned index, it makes
sense to convert the arrays in the VS/WM program key as well. It seems
silly to leave them indexed by texture unit.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
brw_wm_prog_key's proj_attrib_mask field is designed to enable an
optimization for fixed-function programs, letting us avoid projecting
attributes where the divisor is 1.0.
However, for shaders, this is not useful, and is pretty much impossible
to guess when building the FS precompile key. Turning it off for
shaders should allow the precompile to work and not lose much.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Suggested-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
We probably want to do something more sophisticated here, but this at
least makes it through L4D2 without dumping the program cache.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Do all pre-draw hiz resolves *after* the renderbuffers are resized by
intel_prepare_render. Otherwise, we may resolve buffers that are
immediately discarded afterwards.
Fixes the assertion failure below when resizing windows in KDE and under
some unknown circumstance in Chrome OS:
intel_resolve_map.c:46: intel_resolve_map_set: Assertion
`(*tail)->need == need' failed.
Also, remove the comment that "resolves must occur [...] before setting up
any hardware state". That was true when resolves were implemented with
meta-ops, but no longer with blorp.
v2:
- Keep brw_predraw_resolve_buffers in its current position, which is
before any brw_context bits are modified. Instead, move the call to
intel_prepare_render.
Note: This is a candiate for the 8.0 branch.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=52252
Reported-by: Lu Hua <huax.lu@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
intel_renderbuffer_resolve_hiz checks if rb->mt is null, so there is no
need for the caller to do so.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
This adds the FMASK and CMASK buffers. They share the same resource
with color data.
COMPRESSION and FAST_CLEAR are always enabled if both FMASK and CMASK are
allocated. We initialize the CMASK to a "compressed" state (not "fast cleared"),
so that we can keep FAST_CLEAR enabled all the time.
Both FMASK and CMASK must be present at the moment. If either one is missing,
the other one is not used.
v2: add cayman regs in the list
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
The original samples positions took samples outside of the pixel boundary,
leading to dark pixels on the edge of the colorbuffer, among other things.
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Drivers need to be able to communicate their actual number of bits populated
in the field in order for applications to be able to properly handle rollover.
There's a small behavior change here: Instead of reporting the
GL_SAMPLES_PASSED bits for GL_ANY_SAMPLES_PASSED (which would also be valid),
just return 1, because more bits don't make any sense.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
When faced with this sequence:
MOV R1, c[1];
MAD R0, R2, R1.x, R1.y;
we were concluding that the MOV of R1 set up our accumulator and so we could
just use the previous result. Only, it's got R1.xyzw in it instead of the
r1.y we're looking for.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=46784
NOTE: This is a candidate for the 8.0 branch.
Support version 3 as well as 2, since that is only the new format query,
which Jesse added support for to st/dri when he added it to dri_inteface.h.
Tested-by: Scott Moreau <oreaus@gmail.com>
Signed-off-by: Jakob Bornecrantz <jakob@vmware.com>
Since its not used by anything anymore and no release has gone out
where it was being used.
Tested-by: Scott Moreau <oreaus@gmail.com>
Signed-off-by: Jakob Bornecrantz <jakob@vmware.com>
Uses libkms instead of dri image cursor. Since this is the only user of the
DRI cursor and write interface we can remove cursor surfaces entirely from
the DRI interface and as a consequence also from the Gallium interface as
well. Tho to make everybody happy with this it would probably should add a
kms_bo_write function, but that is probably wise in anyways.
The only downside is that it adds a dependancy on libkms, this could how ever
be replaced with the dumb_bo drm ioctl interface.
Tested-by: Scott Moreau <oreaus@gmail.com>
Signed-off-by: Jakob Bornecrantz <jakob@vmware.com>
We already changed the actual program key builder to only set these bits
on gen < 6; this patch just brings the precompile state back in line so
it doesn't mismatch every time.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
When dumping differences in program keys, it printed messages of the
format:
[Name of thing that changed] [new]->[old]
This was terribly confusing: the right arrow implies "the value changed
from this to that", when in fact the message conveyed the opposite.
Except that some of the time, it didn't, since we accidentally swapped
the arguments to brw_debug_recompile_sampler_key. With two swaps, it
would often come out in the expected format.
This patch fixes it to properly print:
[Name of thing that changed] [old]->[new]
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Gallium drivers and i965 don't require special notification when
sampler uniforms change. They simply see the _NEW_TEXTURE and adjust
their indirection tables. These drivers don't want ProgramStringNotify:
it simply causes pointless recompiles.
Unfortunately, i915 still requires shader recompiles and needs
ProgramStringNotify. Rather than trying to fix that, simply change the
hook to a new, more specific one: ShaderUniformChange. On i915, this
translates to ProgramStringNotify; others simply ignore it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
When assigning uniform locations, the linker assigns each sampler
uniform a sequential numerical ID. gl_shader_program::SamplerUnits maps
these sampler variable IDs to the actual texture units they reference
(specified via glUniform1i).
Previously, we encoded this mapping in the SEND instruction encoding:
the "sampler" was the texture unit number, and the binding table index
was SURF_INDEX_TEXTURE(the texture unit number). This unfortunately
meant that whenever the application changed the value of a sampler
uniform, we had to recompile the shader to change the SEND instructions.
This was horrible for the game Cogs, which repeatedly switches between
using texture unit 0 and 1. It also made fragment shader precompiles
useless: we'd do the precompile at glLinkShader() time, before the
application called glUniform1i to set the sampler values. As soon as
it did that, we'd have to recompile, wasting time and space in the
program cache.
This patch encodes the SamplerUnits indirection in the binding table,
sampler state, and sampler default color tables. Instead of baking the
texture unit number into the shader, we bake in the sampler variable ID
assigned by the linker. Since those never change, we don't need to
recompile programs on uniform changes.
This does mean that the tables now depend on the linked shader program
being used for rendering, rather than simply representing all available
texture units. This could cause an increase in state emission.
Another plus is that the sampler state and sampler default color tables
are now compact: we only emit as many entries as there are sampler
uniforms, with no holes in the table since the new sampler IDs are
sequential. Previously we had to emit a full 16 entries every time,
since the tables tracked the state of all active texture units.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This represents the index into the sampler state table or sampler
default color table (the two are identical).
Right now, this is still the texture unit, but that will change shortly.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Currently, we mirror the VS and WM binding tables' texture entries.
That may not continue to be true, so in preparation, pass in the binding
table and surface index as arguments.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
The number we're passing around is actually the ID of the texture unit,
as opposed to the numerical value our of sampler uniforms. Calling it
"texunit" clarifies this slightly.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
The number we're passing around is actually the ID of the texture unit,
as opposed to the numerical value our of sampler uniforms. Calling it
"texunit" clarifies this slightly.
Don't bother renaming fs_instruction::sampler. Although it's currently
the texture unit, this series will change that. No need for the churn.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, we left the swizzle key field as zero for unused texture
units. The precompile sets all of them to SWIZZLE_NOOP, which meant
that we mismatched almost every time.
Since either works equally well, change it to SWIZZLE_NOOP to match
the precompiles.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
I can't actually understand what these mean, and they seem to
essentially say "we should simplify things", which is a nice goal but
not very specific.
Presumably things got cleaned up at some point.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Fixes brw_shader.cpp:101:9: warning: converting to non-pointer type
'GLboolean {aka unsigned char}' from NULL [-Wconversion-null]
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-with-great-enthusiasm-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by Eric Anholt <eric@anholt.net>
v2: Add proper core-profile and GLES3 filtering.
v3: *Really* add proper core-profile and GLES3 filtering based on review
feedback from Eric Anholt. It looks like previously there was some
rebase / merge fail.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
v2: Add proper core-profile and GLES3 filtering based on review feedback
from Eric Anholt. It looks like previously there was some rebase /
merge fail.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
v2: Fix handling of GL_INT and GL_UNSIGNED_INT types pre-ES3.0, and fix
handling of GL_INT_2_10_10_10_REV and GL_UNSIGNED_INT_2_10_10_10_REV in
ES3.0. Based on review comments by Ken Graunke.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This consolidates the tests and makes the emitted error message
consistent.
v2: Rename _mesa_valid_element_type to valid_elements_type. Log the
enum string instead of the hex value in error messages. Based on review
comments from Brian Paul and Ken Graunke.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
_mesa_generic_compressed_format_to_uncompressed_format() probably wins the
prize for longest function name in Mesa.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
See comments in the code for details.
Note: we only need to special-case the generic compressed formats since
specific texture formats are error-checked earlier to see if the compression
format is compatible with the texture type.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This will let us choose the actual hardware format depending on the
type of texture.
v2: fixup radeon, nouveau, intel and swrast drivers too
Reviewed-by: Eric Anholt <eric@anholt.net>
'target' was used both as a parameter of type st_texture_type and then
re-used for GL_TEXTURE_x targets. Rename the function parameter and
add a new local 'GLenum target'.
And remove an extraneous break statement.
Patches changes mesa to use 'HAVE_DLOPEN' defined by configure and Android.mk
instead of _GNU_SOURCE for detecting dlopen capability. This makes dlopen to
work also on Android where _GNU_SOURCE is not defined.
[mattst88] v2: HAVE_DLOPEN is sufficient for including dlfcn.h, remove
mingw/blrts checks around dlfcn.h inclusion.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Previously, when performing a fast depth clear, we would also clear
the miptree's resolve map. This destroyed important information,
since the resolve map contains information about needed resolves for
all levels and layers of the miptree, whereas a depth clear only
applies to a single level/layer combination at a time. As a result,
resolves would sometimes fail to occur, leading to incorrect
rendering.
Fixes rendering artifacts with shadow maps in Unigine Heaven and
Unigine Sanctuary.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=50270
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
There are three possible resolve map states for each (level, layer) of
a depth miptree: "needs HiZ resolve", "needs depth resolve", and
"needs neither". When HiZ was first implemented on i965, any attempt
to directly transition between "needs HiZ resolve" and "needs depth
resolve" without passing through the "needs neither" state would have
been a bug indicating that a necessary resolve hadn't been performed.
Accordingly, intel_resolve_map_set() contained an assertion to verify
that no such direct transition happened.
However, now that we support fast depth clears, there is a valid
transition from the "needs HiZ resolve" to the "needs depth resolve"
state. When doing a fast depth clear, the old state of the buffer is
irrelevant, since we are completely replacing it with the clear value,
so it is not necessary to do any resolves before clearing--we can
transition, if necessary, directly from the "needs HiZ resolve" state
to the "needs depth resolve" state.
To avoid spurious assertions in this valid case, this patch just
removes the assertion.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Just use the functionality provided by the surface manager instead.
This fixes just another bunch of piglit tests.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Previously you could always glGetProgramiv one of the transform feedback
or geometry shader enums even if the extension wasn't supported.
In addtion, this reverts part of bda6ad27. I think the hunks involving
GL_PROGRAM_BINARY_LENGTH_OES were spurious. Mesa has no support for any
other part of GL_OES_get_program_binary.
v2: Remove redundant return in get_programiv based on review feedback
from Matt Turner.
v3: Correctly handle UBO related enums.
v4: Emit the bad enum in the _mesa_error call based on review feedback
from Brian Paul.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fix API functions for memory objects to accept CL_MEM_READ_WRITE flag.
Signed-off-by: Blaž Tomažič <blaz.tomazic@gmail.com>
[ Francisco Jerez: Drop incorrect change in clCreateSubBuffer. ]
Fix-up the texel fetch functions so that they handle 3D coords (as used for
array textures) and remove the "f_2d" part from their names.
Helps fix swrast crashes in piglit's copyteximage test. More to come.
There was a lot of similar or duplicated code before.
To minimize this patch's size, use a forward declaration for
compressed_texture_error_check(). Move the function in the next patch.
If a proxy texture call generates a regular GL error, we should not
clear the proxy image's width/height/depth/format fields. Use a new
PROXY_ERROR token to distinguish proxy errors from regular GL errors.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
When calling glTexImage() with a proxy target most error conditions should
generate a GL error. We were erroneously doing the proxy-error behaviour
(where we zeroed-out the image's width/height/depth/format fields) in too
many places.
There's another issue with proxy textures, but that'll be fixed in the
next patch.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
draw->samplers(_views) now has PIPE_SHADER_TYPES elements, instead of
PIPE_MAX_SAMPLERS as before.
Also, shader_stage must be less than PIPE_SHADER_TYPES to prevent buffer
overflow.
Trivial.
Render Target Write message should include source zero alpha value when
sample-alpha-to-coverage is enabled for an FBO with multiple render targets.
Source zero alpha value is used as fragment coverage for all the render
targets.
This patch makes piglit tests draw-buffers-alpha-to-coverage and
alpha-to-coverage-no-draw-buffer-zero to pass on Sandybridge. No
regressions are observed with piglit all.tests.
V2: Revert all the changes made in emit_color_write() function to
include src0 alpha for targets > 0. Now handling this case in a if
block.
V3: Correctly calculate the instruction length for buffer zero.
Properly handle the case of dual_src_blend when alpha-to-coverage
is enabled.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
When too may uniforms are used, the error will be caught in
check_resources (src/glsl/linker.cpp).
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Stéphane Marchesin <marcheu@chromium.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Benoit Jacob <bjacob@mozilla.com>
Also validate glCopyTexImage border. This fixes a bug in the APIspec.
Previously glTexImage3DOES could be passed a non-zero border without error.
NOTE: This is a candidate for stable release branches.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This also adds a missing extension (and API) check around
GL_TEXTURE_CROP_RECT_OES.
v2: Add proper core-profile and GLES3 filtering. GL_TEXTURE_MAX_LEVEL
is (incorrectly) accepted in ES contexts. A future patch will add
GL_APPLE_texture_max_level, and meta really needs this.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This also adds a missing extension (and API) check around
GL_TEXTURE_CROP_RECT_OES.
v2: Add proper core-profile, GLES1, and GLES3 filtering. GL_TEXTURE_MAX_LEVEL
is (incorrectly) accepted in ES contexts. A future patch will add
GL_APPLE_texture_max_level, and meta really needs this.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Fixed the piglit test arb_texture_buffer_object-negative-unsupported.
NOTE: This is a candidate for stable release branches.
v2: Add proper core-profile and GLES3 filtering.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This should take care of all the TexImage, TexSubImage, CopyTexImage,
CompressedTexImage3DOES, and CopyTexSubImage type paths.
v2: Add proper core-profile and GLES3 filtering.
v3: Squash the CompressedTexImage3DOES patch per review comment from
Ken.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This is a bit of a hack. _mesa_meta_GenerateMipmap shouldn't even be
used in contexts where GL_GENERATE_MIPMAP doesn't exist (i.e., core
profile and ES2) because it uses fixed-function, and fixed-function
doesn't exist there either!
A GLSL-based _mesa_meta_GenerateMipmap should be available soon. When
that is available, this patch will be irrelevant and should be reverted.
v2: Change (ctx->API != API_OPENGLES2 && ctx->API != API_OPENGL_CORE) to
(ctx->API == API_OPENGL || ctx->API == API_OPENGLES) based on review
comment from Brian Paul.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Commit 77a3efc6b9 broke android build that
sets its own value for GLSL_SRCDIR before including Makefile.sources.
Patch moves overriding the value after include, this works as GLSL_SRCDIR
variable gets expanded only later.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
The name is taken from the driver_descriptor, so it will be the same as
expected by driconf utility.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
The segmentation fault occurs when DRI2 is not loaded up and
dri2_setup_screen() function deferences dri2_dpy->dri2 (since it's NULL
at this point).
This patch fixes the segmentation fault by checking if dri2 pointer is
not NULL before deferencing it.
Signed-off-by: Paulo Alcantara <pcacjr@profusion.mobi>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
This new operand replaces the MachineOperand flags in LLVM, which
will be deprecated soon. Eventually all instructions should have a flag
operand, but for now this operand has only been added to instructions
that need it.
SRC_DIRS was overwritten (visible in the second hunk).
Also don't require mapi/shared-glapi to be built for GLES.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We need to enable at least one interpolation mode,
otherwise the GPU will hang.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Disable blending when dual_src_blend is enabled and number of color exports
in the current fragment shader is less than 2.
Fixes lockups with ext_framebuffer_multisample-
alpha-to-coverage-dual-src-blend piglit test.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
The generic texture formats should be accepted by the <internalformat>
parameter of TexImage1D, TexImage2D, TexImage3D, CopyTexImage1D, and
CopyTexImage2D functions. When the application specifies a generic
format, the driver is free to pick an uncompressed format.
This patch reverts the changes due to following commit:
commit a36581ccc0
mesa: do more teximage error checking for generic compressed formats
This patch fixes compressed texture format failures in intel oglconform
pxconv-gettex test case:
https://bugs.freedesktop.org/show_bug.cgi?id=47220
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Don't dereference NULL pointers, and if all views are NULL, don't generate an
invalid PM4 packet which locks up the GPU.
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Mesa doesn't check the parameter passed to glMultiTexCoord*. It does,
however, mask the texture value to prevent out-of-bounds writes. This
patch will promote this non-conformant behavior to OpenGL ES 1. I don't
think anyone will care, and the gets some silly code out of a hot path.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
This is required to make some of llvm's api calls
thread save. In particular the PassRegistry, which is
implicitly accessed while compiling shader programs.
The PassRegistry uses a mutex that is only active if
the llvm_is_multithreaded() returns true.
Calling llvm_start_multithreading() makes this happen
and by calling this function we try to make sure that
we can savely compile shaders in paralell.
Since there is also a call llvm_stop_multithreading()
in the llvm api, we cannot guarantee that this does
not get switched off while we are relying on this being
set, but for the easier use cases this fixes a race with
the radeon llvm compiler we have as of today.
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
Signed-off-by: Tom Stellard <thomas.stellard@amd.com>
In the past, when we called pipe::set_sampler_views(n) the drivers set
samplers [n..MAX] to NULL. We no longer do that. The state tracker
code was already trying to set unused sampler views to NULL to cover
that case, but the logic was broken and unnoticed until now. This patch
fixes it.
Strictly speaking, this patch shouldn't be necessary. Drivers should simply
ignore unused samplers and sampler views. But some drivers like llvmpipe (and
others?) count those things and they figure into state validation. That could
be fixed in the future.
Fixes http://bugs.freedesktop.org/show_bug.cgi?id=53617
Reviewed-by: Marek Olšák <maraeo@gmail.com>
GL_INVALID_OPERATION is to be raised when querying a non-compressed
image/buffer. Since a buffer object can't have a compressed format this
query always generates an error.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These are gradually going to get whittled away and eventually folded into the
source files with the native type functions.
v2: Add (speculative) SConscript changes. These may be broken.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
In the old backend, we looked at any FS attribute's proj_attrib_mask bits, not
just texcoords. Now that we have _mesa_vert_result_to_frag_attrib(), we can
fill in the other FS inputs with correct proj_attrib_mask info.
NOTE: This is a candidate for stable branches.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=46644
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The OpenGL 3.1 specification explicitly allows this. Oddly, the
ARB_texture_buffer_object spec's issues section claims this isn't
allowed, but proceeds to explain that the extension simply doesn't edit
the underlying spec to allow it, and thus it didn't appear in the list
of legal texture targets.
Thus, this patch legalizes it only in 3.1+ contexts, but still returns
INVALID_ENUM in earlier contexts that expose ARB_texture_buffer_object.
Unfortunately, the behavior of the call is horrendously undefined.
Fixes oglconform's tbo/negative.textureParams test.
v2: Require desktop OpenGL.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Move the _mesa_GetTexLevelParameter[iv] functions below the helper
function so the prototype is available.
This will be useful in the next commit.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
For cube maps, _mesa_generate_mipmap() calls this with
GL_TEXTURE_CUBE_MAP (the gl_texture_object's Target) rather than one
of the faces. This caused _mesa_max_texture_levels() to return 0, which
resulted in maxLevels == -1 and the next line's assertion to fail.
This function is called from seven places:
- fbobject.c: framebuffer_texture()
- mipmap.c: _mesa_generate_mipmap()
- texgetimage.c:
- getteximage_error_check()
- getcompressedteximage_error_check()
- texparam.c: _mesa_GetTexLevelParameteriv()
- texstorage.c: tex_storage_error_check()
All of these (or their callers) now explicitly check for invalid targets
already, so this shouldn't cause invalid targets to slip through.
(Technically _mesa_generate_mipmap() doesn't check for invalid targets,
but the API-facing _mesa_GenerateMipmapEXT() function does.)
+2 oglconforms (float-texture/mipmap.automatic and mipmap.manual)
In addition to fixing the mipmap bug, it should also cause glTexStorage
to accept GL_TEXTURE_CUBE_MAP, which is explicitly allowed by the spec.
v2: Drop alterations to callers; this is now in a patch series that adds
explicit checking to API functions.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously, it relied on _mesa_max_texture_levels() for texture target
error checking. This was somewhat dodgy, as _mesa_max_texture_levels()
is called in seven diferent places, not all of which necessarily accept
the same list of targets.
I copied the list of legal targets from _mesa_max_texture_levels(), so
this patch should not introduce any change in behavior. Future patches
will cause the two to diverge.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously, they relied on _mesa_max_texture_levels() for texture target
error checking. This was somewhat dodgy, as _mesa_max_texture_levels()
is called in seven diferent places, not all of which necessarily accept
the same list of targets.
I copied the list of legal targets from _mesa_max_texture_levels() but
removed the proxy targets, as both functions explicitly rejected those
targets. This changes the order in which we check errors, which could
change whether we return INVALID_VALUE or INVALID_ENUM. However, it
shouldn't change the list of accepted targets.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
It's possible for us to have an unused sampler bound when the fragment
shader itself doesn't use any samplers. So the assertion isn't valid.
Fixes http://bugs.freedesktop.org/show_bug.cgi?id=53616
We aligned the dimensions to the blocksize, then divided by it
(in r600_blit.c), then minified, which was wrong.
The minification must be done first, not last.
This fixes piglit/fbo-generatemipmap-formats with S3TC and maybe
a bunch of other tests too. Tested on RV730.
This seems to be expected by the WebGL texture-mips test. The error makes
sense, but I haven't found (yet) any OpenGL documentation specifying this
error condition.
See http://bugs.freedesktop.org/show_bug.cgi?id=44912
Note: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
As with other recent changes, put the vertex and fragment sampler state
into arrays indexed by the shader type. This will let us easily add
support for other types of shaders in the future.
PIPE_MAX_SAMPLERS, PIPE_MAX_VERTEX_SAMPLERS and PIPE_MAX_GEOMETRY_SAMPLERS
were all defined to the same value (16).
In various places we're creating arrays such as
sampler_views[PIPE_SHADER_TYPES][PIPE_MAX_SAMPLERS] so we were assuming
the same number of max samplers for all shader stages anyway.
Of course, drivers are still free to advertise different numbers of max
samplers for different shaders.
The previous test for result != NULL was kind of bogus since we dereferenced
the pointer earlier in the code. Now, check for result != NULL first, then
get the result->key info.
Also, remove the useless "offset +=" code at the end.
We'd end up re-using the old one and throwing away the new one anyway, but only
after a roundtrip to the kernel.
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
If a hole exactly matches the allocated size plus alignment, we would fail to
preserve the alignment as a hole. This would result in never being able to use
the alignment area for an allocation again.
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Otherwise we'll likely end up with an ever increasing amount of ever smaller
holes.
Requires keeping the list ordered wrt offsets.
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Otherwise we'd wrap around after 32 bits. The kernel currently limits GPU
virtual address space to 4GB anyway, but that will probably change sooner or
later, and this would result in confusing error messages when running out of
virtual address space even now.
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
This adds support for having libGL pick a different driver for prime support.
DRI_PRIME env var is set to the value retrieved from the server randr
provider calls, by the calling process. (generally DRI_PRIME=1 will be
the right answer).
Signed-off-by: Dave Airlie <airlied@redhat.com>
With this we can embed data for the shaders (like resource
descriptors) into the PM4 stream.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
I was seeing some GPU hangs that seemed to be cause by ALU instructions
writing to the same register used as the source for VTX_READ. Adding
this constraint to the VTX_READ instructions avoids this situation.
The only allowed instructions are TXQ_LZ and TXF.
TXQ_LZ is like TXQ, but without the LOD parameter (which is always zero
with MSAA textures)
The 3rd or the 4th texcoord component in TXF should contain the sample index
for a 2D_MSAA or 2D_ARRAY_MSAA texture, respectively.
The problem was that the string matching succeeded e.g. for "2D" when there
was actually "2D_MSAA" and then failed parsing "_MSAA".
To prevent similar failures in the future, let's fix this kind of error
everywhere.
Rename _mesa_pack_rgba_span_int to _mesa_pack_rgba_span_from_uints.
Add _mesa_pack_rgba_span_from_ints.
These separate routines allow the integer clamping to be handled
properly for signed versus unsigned integers.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
We need to downsample before flushing BUFFER_FAKE_FRONT_LEFT to
BUFFER_FRONT_LEFT in intel_flush_front.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Stop repeating ourselves. Replace the 4 instances of
`driContext->driDrawablePriv` with `driDrawable`.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Move it from intel_screen.c to intel_context.c. Redeclare as non-static.
A future commit will use it in multiple files.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Unlike 1.x to 2.0, OpenGL ES 3.0 is backwards compatible with 2.0. Use the
same API flag for both. Applications that specifically want 3.0 will specify
this using the major / minor version attributes.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Just like in GLX, EGL_KHR_create_context requires DRI2 version >= 3, and
EGL_EXT_create_context_robustness requires both DRI2 version >= 3 and the
__DRI2_ROBUSTNESS extension.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The extra block in dri2_create_context is to prevent extra white space noise
in the next patch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Add GL_ARB_invalidate_subdata to release notes at Brian's
suggestion.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
These are part of GL_ARB_invalidate_subdata (but not OpenGL ES 3.0).
v2: Add comment explaining why minimum dimensions are set to 1 for some
texture targets. Add default case to switch statement to silence
compiler warnings and detect new texture targets. Both changes
suggested by Brian. Also use _mesa_is_desktop_gl as suggested by Ken.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These are part of GL_ARB_invalidate_subdata (but not OpenGL ES 3.0).
v2: Use _mesa_bufferobj_mapped instead of testing
gl_buffer_object::Pointer as suggested by Brian. Also use
_mesa_is_desktop_gl as suggested by Ken.
v3: Add a comment by the map subrange / discard range overlap test and
fix an off-by-one error noticed by Ken.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
With this change _mesa_init_bufferobj_dispatch won't set function
pointers that don't exist in OpenGL ES.
v2: Use _mesa_is_desktop_gl and _mesa_is_gles3 as suggested by Ken.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These are part of GL_ARB_invalidate_subdata and OpenGL ES 3.0.
v2: Reject aux buffers in core context, and use _mesa_is_desktop_gl and
_mesa_is_gles3. Both suggested by Ken.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is basically cut-and-paste from the swrast implementation, and it
could probably be (slightly) more optimal.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
No driver supports this extension, and it seems unlikely than any driver
ever will. I think r300c may have supported it at one time, but that
driver has already been removed.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
The final step of _mesa_unpack_depth_span is to take the temporary
GLfloat depth values and convert them to the desired format. When
converting to GL_UNSIGNED_INTEGER with depthMax > 0xffffff, we use
double-precision math to avoid overflow and precision problems.
Or at least that's the idea. Unfortunately
GLdouble z = depthValues[i] * (GLfloat) depthMax;
actually causes single-precision multiplication, since both operands are
GLfloats. Casting depthMax to GLdouble causes the scaling to be done
with double-precision math.
Fixes a regression in oglconform's depth-stencil basic.read.ds test
since c60ac7b179, where the expected and
actual values differed slightly. For example, 0xcfa7a6 vs. 0xcfa7a4.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=49772
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Use base-10 for versions like gl_context::Version. Suggested by Ken.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Use base-10 for versions like gl_context::Version. Suggested by Ken.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This forces the drivers to do at least some validation of context API
and version before creating the context. In r100 and r200 drivers, this
means that they don't do any post-hoc validation.
v2: Actually reject compatibility profile 3.2+ contexts. Thanks Ken.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It may be possible to trim the list of extensions futher. These are
just the obvious extensions that add functionality that the core context
explicitly forbids. Apple's core-context extension list is *just* the
extensions on top of the core GL version. I'm not sure we want to go
that far, but removing some things that have been in core since 2.1 may
be okay.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Add both top_srcdir and top_builddir to mesa asm include dirs.
These require both in-tree and build-time-generated files.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Christopher James Halse Rogers <christopher.halse.rogers@canonical.com>
Like in src/mesa, use GLSL_BUILDDIR/GLSL_SRCDIR to unambiguously
distinguish between in-tree and generated files.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Christopher James Halse Rogers <christopher.halse.rogers@canonical.com>
Also fix include paths for the generated headers.
v2: Switch to using self-explanatory BUILDDIR/SRCDIR defined from
top_builddir/top_srcdir rather than the ambiguous TOP.
v3: Add both top_builddir and top_srcdir to include flags for mesa asm.
These rely on both in-tree and build-time-generated includes.
v4: Rebased on top of 948c8f502a.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Christopher James Halse Rogers <christopher.halse.rogers@canonical.com>
Signed-off-by: Matt Turner <mattst88@gmail.com>
After realizing that brw_finish_batch emitted some final PIPE_CONTROLs
to record occlusion queries, Chris noted that we probably hadn't
reserved enough space to actually emit them.
Reserving a full 60 bytes seems a bit harsh, since we only need that
much if occlusion queries are actually active. Plus, 28 bytes would be
sufficient for Gen7, and 24 for Gen4-5.
We could optimize this in the future, but it doesn't seem too critical.
NOTE: This is a candidate for stable release branches.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=53311
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Eric Anholt <eric@anholt.net>
On Gen4+, brw_finish_batch() calls brw_emit_query_end(), which emits
some extra PIPE_CONTROLs to capture the current occlusion query data.
Unfortunately, it was being called *after* _intel_batchbuffer_flush
added the MI_BATCH_BUFFER_END, meaning those PIPE_CONTROLs didn't get
inside the batch.
Not only does this likely cause bogus occlusion query values, it can
also cause crashes: with the recent change to use 64-bit depth count
writes on Gen6+, we started emitting an odd-length PIPE_CONTROL, which
happened after the MI_NOOP padding. This resulted in an odd-length
batch buffer, which resulted in execbuf2 returning -EINVAL and the
application dying with an intel_do_flush_locked failure.
On older generations, finish_batch() doesn't emit any state, so this
change shouldn't have any effect.
Huge thanks to Chris Wilson for helping me figure this out.
NOTE: This is a candidate for stable release branches.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=53311
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Eric Anholt <eric@anholt.net>
I want to introduce some more debug output for performance surprises that
includes fallbacks, but aren't necessarily software rasterization. Leave
INTEL_DEBUG=fall in place for those that have used that flag before.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Avoid INVALID_OPERATION error if decompressing rectangle texture.
Setting mipmap level limits for those textures is error that must not be
hit by meta code to mislead user.
[v3/Kayden]: Resolve conflicts due to Eric picking a subset of Pauli's
original changes.
Signed-off-by: Pauli Nieminen <pauli.nieminen@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Sampler objects are perfect for meta operations.Sampler object
is separate state object that shadows the sampling state in texture
object. With sampler object mipmap can maintain same sampling state for
all subsequent generation requests.
Signed-off-by: Pauli Nieminen <pauli.nieminen@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Sampler queries are so far made only for enabled texture unit. But if
any code would query sampler before checking texture unit state that
would result to NULL deference.
Making the inline helper easier to use with NULL check makes a lot sense
because compiler is likely to combine the checks for the current texture.
Signed-off-by: Pauli Nieminen <pauli.nieminen@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In tune with previous patches. Again there is duplication of information
in function parameters that is good to remove.
Signed-off-by: Pauli Nieminen <pauli.nieminen@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Size and format information is always stored in gl_texture_image
structure. That makes it preferable to remove duplicate information from
parameters to make interface easier to understand.
Signed-off-by: Pauli Nieminen <pauli.nieminen@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
gl_texture_image structure always holds size and internal format before
TexImage driver hook is called. Those passing same information in
function parameters only duplicates information making the interface
harder to understand.
Signed-off-by: Pauli Nieminen <pauli.nieminen@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Commit 6882381a2e added a dependency on a
newer version of xcb, but the version check wasn't added in all the
necessary places.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This reverts commit 9f5a5d541d.
Fixes the following build error on GCC 4.2.3:
cc1plus: error: unrecognized command line option "-Wno-narrowing"
The GCC Manual incorrectly stated that commit 9f5a5d54 woulde be safe for
old versions of GCC.
Reported-by: Andy Furniss <andyqos@ukfsn.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
The var!=softpipe->fs_variant assertion was failing because we weren't
nulling the softpipe->fs_variant pointer when binding a new shader.
Since softpipe->fs_variant depends on the current fs, it's of no use
when a new FS is bound.
Fixes http://bugs.freedesktop.org/show_bug.cgi?id=53318
Note: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
After we attach a new renderbuffer in this function we need to make
sure Mesa's update_framebuffer() gets called.
Fixes crash in WebGL conformance/textures/texture-attachment-formats.html,
but the test still fails for other reasons.
Fixes http://bugs.freedesktop.org/show_bug.cgi?id=53316
Note: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Add -Wno-narrowing to CXXFLAGS for gcc.
It is safe to add this flag even for versions of gcc that don't recognize
it. From the GCC Manual [1]: "[GCC] allows the use of new -Wno- options
with old compilers".
This removes warnings of the form
warning: narrowing conversion of X from 'int' to 'float' inside { } is
ill-formed in C++11 [-Wnarrowing]
in ff_fragment_shader.cpp and gen6_blorp.cpp of the form. When building
i965, I observed no other difference in the build output.
[1] http://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Fixes WebGL conformance/uniforms/uniform-default-values.html crash.
We need to check for the null view pointer before accessing view->texture.
Fixes http://bugs.freedesktop.org/show_bug.cgi?id=53317
Note: This is a candidate for the 8.0 branch.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Always downsample before mapping, even if the map mode contains
GL_MAP_INVALIDATE_RANGE_BIT. If we neglect to downsample when only
a subrect is mapped then the upsample in intel_miptree_unmap_multisample
may write garbage to the region outside the subrect.
(Eric gave my patch e88cfbb a conditional reviewed-by with the condition
that it always downsample before mapping. I forgot to make that change
before pushing the patch.)
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Fixes the glsl skinning demo regression since changing to the new GLSL
compiler, and is part of fixing piglit gl-2.0-edgeflag.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=50079
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If there was an edge flag or a two-side-color pair present, we'd end up
mismatched and read values from earlier in the VUE for later FS inputs.
v2: Fix regression in gles2conform shaders generating point size. (change by
anholt)
Signed-off-by: Olivier Galibert <galibert@pobox.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
NOTE: This is a candidate for the 8.0 branch.
If the application has requested reset notification, then
dri2_convert_glx_attribs will initialize this to the correct value.
Otherwise, it's supposed to initialize this to NO_NOTIFICATION, but
doesn't when num_attribs == 0. (The consensus seems to be that we
should make it do so, but that's more invasive, so I'm pushing this for
now.)
Fixes a regression since a8724d85f8
where trying to run OilRush_x86 or apitrace heaven_x64 would result in:
dri_util.c:221: dri2CreateContextAttribs: Assertion `!"Should not get
here."' failed.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=53076
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Patch changes i915 and i965 drivers to use fixed function version of
meta clear when running on ES 1.1. This fixes rendering errors seen with
Google Maps, Angry Birds and Gallery3D on Android platform.
Change 88128516d4 exposes all extensions
internally to be available independent of GL flavour, therefore check
against ARB_fragment_shader does not work.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=50333
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This removes the CS stall on Ivybridge.
On Sandybridge, the depth stall needs to be preceded by a non-zero
post-sync op, which requires a CS stall, which needs a stall at
scoreboard. Emit the full workaround.
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
I don't know if it was possible to trigger this bug -- we don't merge
saturates into the math instruction because we're bad at coalescing currently,
and there's nothing generating these with predicates. Still, let's avoid
future bugs when we do smarter codegen.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This was ridiculous. We were ignoring the inst->header.saturate flag in the
case of math and only math. On gen4, we would leave inst->header.saturate in
place if it happened to be set, which would end up being applied to the
implicit mov and thus trash the first argument. On gen6, we would overwrite
inst->header.saturate with the saturate flag from the argument, which was not
set appropriately in brw_vec4_emit.cpp, and was only not a bug due to our
incompetence at coalescing saturate moves.
By ripping the argument out and making saturate work just like all the other
brw_eu_emit.c code generation, we can avoid both these classes of bugs.
Fixes piglit fog-modes, and the new specific fs-saturate-exp2 case.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=48628
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
There was a chance for brw_wm_emit.c to screw up and pass (1 << 4) instead of
1, which would get converted to 0 when stored. Instead, use stdbool which
converts nonzero to true/1 like we want.
Otherwise, conditional rendering always takes the fallthrough "render it
anyway" case unless the application had itself done a check or wait on the
query.
Fixes intel oglconform's conditional_render advanced.nofbo.readpixels.
Reviewed-by: Brian Paul <brianp@vmware.com>
NOTE: This is a candidate for the 8.0 branch.
I happened to notice this while looking at a blit pass in l4d2, which had an
optional push/pop around framebuffer srgb setting. It didn't matter in the
end, but the fix is sitting in my tree now.
Reviewed-by: Brian Paul <brianp@vmware.com>
NOTE: This is a candidate for the 8.0 branch.
You can't practically have desktop OpenGL and OpenGL ES on the same system
without this. The benefits of not having it (e.g., a more compact dispatch
table) are irrelevant.
v2: Don't mark shared-glapi as experimental. Review suggestion by Chad.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
These are largely based on the src/mapi/glapi/tests. However,
shared-glapi provides less external visibility into the dispatch table,
so there is less to test. Also, shared-glapi does not implement
_glapi_get_proc_name, so that test was removed.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
When --enable-shared-glapi is used, all non-ABI entries in the table are
lies. Avoiding the use of glapitable.h avoids the lies. The only
entries used in this code are entries that are ABI. For these, the ABI
offset can be used directly.
Since this code is in src/glx, it can't use src/mesa/main/dispatch.h to
get the pretty names for these offsets.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
When --enable-shared-glapi is used, all non-ABI entries in the table are
lies. There are two completely separate code generation paths used to
assign dispatch offset. Neither has any clue about the other.
Unsurprisingly, the can't agree on what offsets to assign.
This adds a bunch of overhead to __glXNewIndirectAPI, but this function
is called at most once.
The test ExtensionNopDispatch was removed. There was just no way to
make this test work with the information provided in shared-glapi.
Since indirect_glx.c uses _glapi_get_proc_offset now, it was also
impossible to make the tests work without shared-glapi. So much pain.
This fixes indirect rendering with shared-glapi.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
This fixes 'make check' on with --enable-shared-glapi. This test cannot work
in that environment.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
The hardware seems to use the length of the PIPE_CONTROL command to
indicate whether the write is 64-bits or 32-bits. Which makes sense
for immediate writes.
Daniel discovered this by writing a pattern into the query object bo
and noticing that the high 32-bits were left intact, even on those
pipe control writes that seemingly worked.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Eric Anholt <eric@anholt.net>
The hardware seems to use the length of the PIPE_CONTROL command to
indicate whether the write is 64-bits or 32-bits. Which makes sense
for immediate writes.
Daniel discovered this by writing a pattern into the query object bo
and noticing that the high 32-bits were left intact, even on those
pipe control writes that seemingly worked.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Eric Anholt <eric@anholt.net>
This consolidates the complexity in one place, which is important
because it's about to get even more complicated.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Eric Anholt <eric@anholt.net>
PIPE_CONTROL has variable length, depending upon generation and whether
we want to do 32-bit or 64-bit data writes. Make it explicit, rather
than hiding a length of 4 in the #define for _3DSTATE_PIPE_CONTROL.
Generated by s/3DSTATE_PIPE_CONTROL/3DSTATE_PIPE_CONTROL | (4 - 2)/g.
This is equivalent since the #define used to have | 2 in it. A grep
through the sources shows that all instances have been converted, so
it's safe to remove the | 2 from the #define.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Eric Anholt <eric@anholt.net>
Unlike the FS side in the previous commit, this does variable indexing just
fine, using the same code as we used for other variable-indexed pull
constants.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Variable array indexing isn't finished, because the lowering pass
turns it all into conditional moves of constant index accesses so I
can't test it.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I wanted to add the surface index as a variable value for UBO support,
and a reg seemed like the obvious way to go. This exposes more of the
information to CSE, which we'll probably want to apply to pull
constant loads for UBOs eventually (you might access 4 floats in a
row, each of which would produce an oword block read of the same
block).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes piglit GL_ARB_uniform_buffer_object/dlist.
v2: Use the .ui fields instead of .i for type consistency (review by Brian
Paul)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The ARB spec lets you get away with the default block counting against the
blocks for combined size limits. The core spec says you need to be able to
support the maximum size of default block *and* the maximum size of each
uniform block. I see no reason that any driver would have a problem with
that.
Fixes gl 3.1/minmax (with an associated fix to the test)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We were only propagating it to the API when the variable was a matrix type,
but we were still tripping over it in lower_ubo_reference when it was set on a
vector.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We were getting the base offset of a vec2, not of a vec2[2] like the quoted
spec text says we should.
v2: Fix swapped then/else cases.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, we were returning the index into the UniformBlocks of one of the
linked shaders, when it's supposed to be the program global index.
Fixes piglit getactiveuniformsiv-uniform_block_index.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In between glGenBuffers() and glBindBuffer(), the buffer object points to this
dummy buffer with a name of 0, and a glBindBufferBase() would point to that.
It seems pretty clear, given that glBindBufferBase() only cares about the
current size of the buffer at render time, that it should bind up the buffer
that you passed in instead of pointing it at this useless dummy buffer.
However, what should glBindBufferRange() do? As of this patch, it will
promote the genned buffer to a proper buffer like it had been
glBindBuffer()ed, and then detect that the size is greater than the buffer's
current size of 0 and throw INVALID_VALUE. It seems like the most reasonable
answer here.
Note that this also changes the behavior of these two on non-glGenBuffers() bo
names. We haven't yet set up the error throwing for glBindBuffers() on gl
3.1+, and my assumption is that these two functions should inherit their
behavior on un-genned names from glBindBuffers().
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Reduce the impenetrable code in emit_ubo_loads() by 23 lines by keeping
the ir_variable as the variable part of the offset from handle_rvalue(),
and track the constant offsets from that with a plain old integer value,
avoiding a bunch of temporary variables in the array and struct handling.
Also, fix file description doxygen.
v3: Fix a row vs col typo, and fix spelling in a comment.
Reviewed-by: Eric Anholt <eric@anholt.net>
For the UBO lowering pass, I want to see the whole dereference chain for
replacing, not the innermost ir_dereference_variable.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Drivers will probably want to be able to take UBO references in a
shader like:
uniform ubo1 {
float a;
float b;
float c;
float d;
}
void main() {
gl_FragColor = vec4(a, b, c, d);
}
and generate a single aligned vec4 load out of the UBO. For intel,
this involves recognizing the shared offset of the aligned loads and
CSEing them out. Obviously that involves breaking things down to
loads from an offset from a particular UBO first. Thus, the driver
doesn't want to see
variable_ref(ir_variable("a")),
and even more so does it not want to see
array_ref(record_ref(variable_ref(ir_variable("a")),
"field1"), variable_ref(ir_variable("i"))).
where a.field1[i] is a row_major matrix.
Instead, we're going to make a lowering pass to break UBO references
down to expressions that are obvious to codegen, and amenable to
merging through CSE.
v2: Fix some partial thoughts in the ir_binop comment (review by Kenneth)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When converting var->location from pointing at the program's UniformBlocks to
pointing at the linked shader's UniformBlocks, I missed this change. It
usually worked out in the end because the two lists happen to be the same in
many testcases.
Fixes a valgrind complaint on
oglconform ubo-compile.cpp advanced.std140.2stage
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
As we get into supporting GL 3.x core, we come across more and more features
of the API that depend on the version number as opposed to just the extension
list. This will let us more sanely do version checks than "(VersionMajor == 3
&& VersionMinor >= 2) || VersionMajor >= 4".
v2: Fix a bad <= 30 check.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This turns on window system MSAA.
This patch changes the id of many GLX visuals and configs, but that
couldn't be prevented. I attempted to preserve the id's of extant configs
by appending the multisample configs to the end of the extant ones. But
somewhere, perhaps in the X server, the configs are reordered with
multisample configs interspersed among the singlesample ones.
Test results:
Tested with xonotic and `glxgears -samples 1` on Ivybridge.
No piglit regressions on Ivybridge.
On Sandybridge, passes 68/70 of oglconform's
winsys multisample tests. The two failing tests are:
multisample(advanced.pixelmap.depth)
multisample(advanced.pixelmap.depthCopyPixels)
These tests hang the gpu (on kernel 3.4.6) due to
a glDrawPixels/glReadPixels pair on an MSAA depth buffer. I don't expect
realworld apps to do that, so I'm not too concerned about the hang.
On Ivybridge, passes 69/70. The failing case is
multisample(advanced.line.changeWidth).
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
This function felt sloppy, so this patch cleans it up a little bit.
- Rename `color` to `i`. It is not a color value, only an iterator int.
- Move `depth_bits[0] = 0` into the non-accum loop because that is where
it used. The accum loop later overwrites depth_bits[0].
- Rename `depth_factor` to `num_depth_stencil_bits`.
- Redefine `msaa_samples_array` as static const because it is never
modified. Rename to `singlesample_samples`.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
If either argument to driConcatConfigs(a, b) is null or the empty list,
then simply return the other argument as the resultant list.
All callers were accomplishing that same behavior anyway. And each caller
accopmplished it with the same pattern. So this patch moves that external
pattern into the function.
Reviewed-by: <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
DRI2 configs were constructed in intelInitScreen2. That function already
does too much, so move verbatim the code for creating configs to a new
function, intel_screen_make_configs.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Add two new functions: intel_miptree_{map,unmap}_multisample, to which
intel_miptree_{map,unmap} dispatch. Only mapping flat, renderbuffer-like
miptrees are supported.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Move the opencoded construction and destruction of intel_miptree_map into
new functions, intel_miptree_attach_map and intel_miptree_release_map.
This patch prevents code duplication in a future commit that adds support
for mapping multisample miptrees.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Move the body of intel_miptree_map into a new function,
intel_miptree_map_singlesample. Now intel_miptree_map dispatches to the
new function. A future commit adds a multisample variant.
Ditto for intel_miptree_unmap.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Add function intel_renderbuffer_set_needs_downsample. It is a no-op
except on multisample winsys buffers shared with DRI2.
Mark the needed downsamples with the new function at two locations:
- Immediately after drawing is complete.
- After blitting.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Define a function, brw_blorp_blit_miptrees, that simply wraps
brw_blorp_blit_params + brw_blorp_exec with C calling conventions. This
enables intel_miptree.c, in a following commit, to perform blits with
blorp for the purpose of downsampling multisample miptrees.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Immediately after obtaining, with DRI2GetBuffersWithFormat, the DRM buffer
handle for a DRI2 buffer, we wrap that DRM buffer handle with a region and
a miptree. This patch additionally allocates an accompanying multisample
miptree if the DRI2 buffer is multisampled.
Since we do not yet advertise multisample GL configs, the code for
allocating the multisample miptree is currently inactive.
This patch adds the following fields to intel_mipmap_tree:
singlesample_mt
needs_downsample
and the following function stubs:
intel_miptree_downsample
intel_miptree_upsample
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Move the logic for creating the ancillary hiz and mcs miptress for winsys
and non-texture renderbuffers from intel_alloc_renderbuffer_storage to
intel_miptree_create_for_renderbuffer. Let's try to isolate complex
miptree logic to intel_mipmap_tree.c.
Without this refactor, code duplication would be required along the
intel_process_dri2_buffer codepath in order to create the mcs miptree.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Add a new param, num_samples, to intel_create_renderbuffer and
intel_create_private_renderbuffer.
No multisample GL config is yet advertised, so the value of num_samples is
currently 0. For server-owned winsys buffers, gl_renderbuffer::NumSamples
is not yet used.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com> (v1)
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Rename quantize_num_samples to intel_quantize_num_samples and change the
first param from struct intel_context* to struct intel_screen*. The
function will later be used by intelCreateBuffer, which is not bound to
any context but is bound to a screen.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com> (v1)
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
The comment referred to intel_tex_image_map/unmap, but should more
accurately refer to intel_miptree_map/unmap.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Fixes uninitialized scalar field defect reported by Coverity.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
v2: Note that GLSL 4.3 has not been started, and that
ARB_compute_shader has been started in Gallium drivers.
Signed-off-by: Jason Wood <sandain@hotmail.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
KHR extension name is reserved for Khronos ratified extensions, and there is
no such thing as EGL_KHR_surfaceless_{gles1,gles2,opengl}. Replace these
three extensions with EGL_KHR_surfaceless_context since that extension
actually exists.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Since support for swrast version 2 was added (f55d027a), it has also been
required. In swrast_driver_extensions, version 2 is set for __DRI_SWRAST
extension. Remove the spurious version checks sprinked through the code.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Previously an error would be generated if any attributes were specified when
creating a non-desktop OpenGL context. This was a mistake, and it will
prevent old drivers from working with new EGL libraries that add support for
the createContextAttribs interface. Instead, match the behavior of
EGL_KHR_create_context: allow versions that make sense, reject non-zero flags.
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Commit f0cecd43d6 moved the VUE map computation to be only once, at
VS compile time. However, it did so in slightly the wrong place: it
made the one call to brw_vue_compute_map happen right before the
allocation of dummy slots for replaced point sprite coordinates, causing
a different VUE map to be generated (at least on Ironlake).
Fixes a regression in Piglit's point-sprite test on Ironlake.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=46489
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Consider a texture call such as:
textureLod(s, coordinate, log2(...))
First, we begin setting up the sampler message by loading the texture
coordinates into MRFs, starting with m2. Then, we realize we need the
LOD, and go to compute it with:
ir->lod_info.lod->accept(this);
On Gen4-5, this will generate a SEND instruction to compute log2(),
loading the operand into m2, and clobbering our texcoord.
Similar issues exist on Gen6+. For example, nested texture calls:
textureLod(s1, c1, texture(s2, c2).x)
Any texturing call where evaluating the subexpression trees for LOD or
shadow comparitor would generate SEND instructions could potentially
break. In some cases (like register spilling), we get lucky and avoid
the issue by using non-overlapping MRF regions. But we shouldn't count
on that.
Fixes four Piglit test regressions on Gen4-5:
- glsl-fs-shadow2DGradARB-{01,04,07,cumulative}
NOTE: This is a candidate for stable release branches.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=52129
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
With the textureRect support and GL_CLAMP workarounds, it's grown
sufficiently that it deserves its own function. Separating it out
makes the original function much more readable.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Setting the texture offset bits in the message header involves very
specific hardware register descriptions. As such, I feel it's better
suited for the lower level "generate" layer that has direct access to
the weird register layouts, rather than at the fs_inst abstraction layer.
This also parallels the approach I took in the VS backend.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Use atom for sampler state. Does not provide new functionality
or fix any bug. Just a step toward full atom base r600g.
v2: Split seamless on r6xx/r7xx into it's own atom. Make sure it's
emited after sampler and with a pipeline flush before otherwise
it does not take effect.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
...to look like update_fragment_samplers() code, as with the previous
commit. The next step would be to merge the two functions.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Less code. And as with softpipe, if/when we consolidate the pipe_context
functions for binding sampler state, this will make the llvmpipe changes
trivial.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
The functions for setting samplers and sampler views for vertex,
fragment and geometry shaders were nearly identical. Now they
use shared code.
In the future, if the pipe_context functions for setting samplers
and sampler views for vert/frag/geom/compute are combined, this
will make updating the softpipe driver a snap.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Combine separate arrays for vertex/fragment/geometry samplers, etc into
one array indexed by PIPE_SHADER_x.
This allows us to collapse separate code for vertex/fragment/geometry
state into loops over the shader stage. More to come.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Fixes dereference before null check defect reported by Coverity.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Fixes uninitialized pointer read defect reported by Coverity.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Fixes dereference before null check defect reported by Coverity.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Merge the vertex/fragment versions of the cso_set/save/restore_samplers()
functions. Now we pass the shader stage (PIPE_SHADER_x) to the function
to indicate vertex/fragment/geometry samplers. For example:
cso_single_sampler(cso, PIPE_SHADER_FRAGMENT, unit, sampler);
This results in quite a bit of code reduction, fewer CSO functions and
support for geometry shaders.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Fixes uninitialized scalar variable defect reported by Coverity.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
The GL_OES_mapbuffer extension is supported by OpenGL ES 1 and ES 2 so return
GL_MAP_WRITE_BIT for both ES versions, not just ES 1.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Before, the GLSL parser was getting rebuilt every time that scons was
run. The problem was scons was expecting a glsl_parser.hpp file but
we were generating a glsl_parser.h file.
Signed-off-by: Brian Paul <brianp@vmware.com>
Windowed speed is of course way to slow, but fullscreen
works like a charm now.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Using the writemask in the sampler results in packet
VGPRS. For now just sample all components and let
llvm chose the right one.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
The backend is multiplying the offset by the numbers of
elements anyway, so doing it twice just makes everything
crash.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
The patch makes the SCons build with Intel Compiler successful.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Framebuffer blit needs to setup texture sampling with no reference to the
user's texturing state, and a sampler object lets us avoid a bunch of changes
to the user's state setup.
We don't bother caching the sampler object since we're changing parameters in
it based on the filtering option to glBlitFramebuffer().
Fixes piglit GL_ARB_sampler_objects/framebufferblit and rendering in l4d2 (our
setting of srgb decode wasn't being respected due to the user's sampler object
being active).
Signed-off-by: Pauli Nieminen <pauli.nieminen@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Sampler objects can be used to shadow texture object state without
modifying original application state. Decompression path feels a bit
like path where caching shouldn't happen. But as everything else is
cached already I decided to cache sampler state too.
Signed-off-by: Pauli Nieminen <pauli.nieminen@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
To allow meta module to use sample objects mesa GL functions need to be
visible and linkable for meta module.
Signed-off-by: Pauli Nieminen <pauli.nieminen@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
swrast needs to pass sampler object into all texture fetching functions
to use correct sampling state when sampler object is bound to the unit.
The changes were made using half manual regular expression replace.
v2: Fix NULL deref in _swrast_choose_triangle(), because the _Current
values aren't set yet, so we need to look at our texObj2D. (anholt)
Signed-off-by: Pauli Nieminen <pauli.nieminen@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
To allow meta acceleration operations to use sampler objects the
ARB_sampler_objects extension needs to be mandatory for all drivers.
Because the extension doesn't have any hardware dependencies it is
trivial to implement.
Signed-off-by: Pauli Nieminen <pauli.nieminen@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
CompareFailValue is part of Sampler state that needs to be read from
bound sampler object if present.
Signed-off-by: Pauli Nieminen <pauli.nieminen@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Fixed function fragment shader generator was incorrectly read texture
sampling state directly from texture object. To make sure that
ARB_sampler_object works correctly shader generator has to use the
bound sampler if one exist.
Signed-off-by: Pauli Nieminen <pauli.nieminen@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Preparation for the mandatory support of ARB_sampler_objects. I have tested
this patch with rv280 only.
Signed-off-by: Pauli Nieminen <pauli.nieminen@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
When I build tested radeon changes I noticed two warnings about format
size missmatch in 64bit. I decided to clean them to make relevant
compiler warnings easier to spot.
Signed-off-by: Pauli Nieminen <pauli.nieminen@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
ARB_sampler_objects is very simple software only extension to support. I want
to make it a mandatory extension for Mesa drivers to allow the meta module to
use it.
This patch add support for the extension to nouveau. It is completely untested
search and replace patch, except for flagging the texture state as needing to
be recomputed when a sampler object is present.
Signed-off-by: Pauli Nieminen <pauli.nieminen@linux.intel.com>
sRGBDecode state is part of sampler object state but mesa was missing
handlers to access the state. This patch adds the support for required
state changes and queries.
GL_EXT_texture_sRGB_decode issue 4:
"4) Should we add forward-looking support for ARB_sampler_objects?
RESOLVED: YES
If ARB_sampler_objects exists in the implementation, the sampler
objects should also include this parameter per sampler."
Fixes piglit GL_ARB_sampler_objects/GL_EXT_texture_sRGB_decode.
Signed-off-by: Pauli Nieminen <pauli.nieminen@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
GL_DEPTH_TEXTURE_MODE isn't meant to be part of sampler state based on
compatibility profile specifications.
OpenGL specification 4.1 compatibility 20100725 3.9.2:
"... The values accepted in the pname parameter
are TEXTURE_WRAP_S, TEXTURE_WRAP_T, TEXTURE_WRAP_R, TEXTURE_MIN_-
FILTER, TEXTURE_MAG_FILTER, TEXTURE_BORDER_COLOR, TEXTURE_MIN_-
LOD, TEXTURE_MAX_LOD, TEXTURE_LOD_BIAS, TEXTURE_COMPARE_MODE, and
TEXTURE_COMPARE_FUNC. Texture state listed in table 6.25 but not listed here and
in the sampler state in table 6.26 is not part of the sampler state, and remains in the
texture object."
The list of states is in Table 6.24 "Textures (state per texture
object)" instead of 6.25 mentioned in the specification text.
Same can be found from 3.3 compatibility specification.
Signed-off-by: Pauli Nieminen <pauli.nieminen@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This patch allows GL_SAMPLES to be set to either 0 or 1 on i965
platforms that don't support MSAA (those prior to Gen6). Setting
GL_SAMPLES=1 has the same effect as setting it to 0 on these platforms
(because MSAA is unsupported), but is distinguishable via the GL API.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=50165
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
EXT_framebuffer_multisample is a required subpart of
ARB_framebuffer_object, which means that we must support it even on
platforms that don't support MSAA. Fortunately
EXT_framebuffer_multisample allows for this by allowing GL_MAX_SAMPLES
to be set to 1.
This leads to a tricky quirk in the GL spec: since
GlRenderbufferStorageMultisamples() accepts any value for its
"samples" parameter up to and including GL_MAX_SAMPLES, that means
that on platforms that don't support MSAA, GL_SAMPLES is allowed to be
set to either 0 or 1. On platforms that do support MSAA, GL_SAMPLES=1
is not used; 0 means no MSAA, and 2 or higher means MSAA.
In other words, GL_SAMPLES needs to be interpreted as follows:
=0 no MSAA (possible on all platforms)
=1 no MSAA (only possible on platforms where MSAA unsupported)
>1 MSAA (only possible on platforms where MSAA supported)
This patch modifies all MSAA-related code to choose between
multisampling and single-sampling based on the condition (GL_SAMPLES >
1) instead of (GL_SAMPLES > 0) so that GL_SAMPLES=1 will be treated as
"no MSAA".
Note that since GL_SAMPLES=1 implies GL_SAMPLE_BUFFERS=1, we can no
longer use GL_SAMPLE_BUFFERS to distinguish between MSAA and non-MSAA
rendering.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Nearly the whole function body was contained in the 'else' branch. The
'if' branch did one thing: return early with an error. Clean things up by
moving all the code out of the 'else' branch. Decreases max nesting level
from 4 to 3.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
After commit "intel: Convert to using private depth/stencil buffers", we
request from DRI2GetBuffersWithFormat only the front left and back left
buffers. We no longer request depth and stencil buffers.
Assert that in intelAllocateBuffer and remove the related dead code.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
These assignments caused CFLAGS specified on the configure line to
appear twice in the final CFLAGS. Removing them makes the behavior
reasonable -- USER_CFLAGS are appended at the end of CFLAGS, allowing
the builder to override flags added by configure.ac like
-fno-strict-aliasing.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Even on s390{,x} where there's no video card, you still want this so GLX
protocol works.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Adam Jackson <ajax@redhat.com>
This reverts commit 5d5af7d359.
It turns out the issue this was supposed to fix merely counter-acted
a bug in the hardware driver that I wasn't aware of.
The resource_resolve is not supposed to do sRGB conversion, period.
(This would violate the requirement that source and destination must
be of the same format).
no point in emitting aux scissor values if we
a) never enable them
b) never set the actual values
plus it is enough to have that aux scissor enable reg (which we never set to
enable) in one place not two.
There were several problems with these functions (which are a remnant
of dri1 hyperz mostly - should bring it back somehow someday).
First, it would always do a swrast clear if the buffer to clear was a fbo.
Second, for buffers we wouldn't handle the clear (I guess aux/accum?) we
would actually still have tried to clear that later even when we already
cleared it with swrast.
This addresses one issue raised in bug #51658 discovered by Eugene St Leger.
The assert is bogus since there's no problem with texture width/height being
2048 (the width/height programmed is width/height minus one).
OTOH though the programmed size for scissor rect should be width/height
minus one too otherwise bad things may happen (as it is inclusive, and there's
not enough bits for more than a value of 2047).
SI does not support 64-bit immediates natively, but llvm will generate
i64 immediates when indexing loads and stores (since SI has 64-bit
pointers). The i64 indices will always be small enough to fit into
32-bits (i.e. the high 32 bits will always be all zeros), so we can
treat these index values as 32-bits.
In tablegen, if two patterns match, the one that comes first in the file
is given preference. We want the SMRD IMM pattern to be given
preference, because it encodes the pointer offset in its immediate
field, which saves us an add instruction.
I ended up having to add rallocing of the ast_type_qualifier in order
to avoid pulling in ast.h for glsl_parser_extras.h, because I wanted
to track an ast_type_qualifier in the state.
Fixes piglit ARB_uniform_buffer_object/row-major.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Yes, you get to say things like "layout(row_major, column_major)" and
get column major.
Part of fixing piglit ARB_uniform_buffer_object/row_major.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This is like a stripped-down version of glGetActiveUniform that just
returns the name, since the other return values (type and size) of
that function are now meant to be handled with
glGetActiveUniformsiv().
Fixes piglit ARB_uniform_buffer_object/getactiveuniformname
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The previous implementation required a flag in _mesa_glsl_parse_state
and line of code to initialize it for every version of the shading
language we intend to support. As we look to add 150, 330, 400, 410,
420, and beyond, this gets rather unwieldy.
This patch retains the switch statement (to reject, say, #version 111),
but removes all the bits. Code to check for ctx->API == API_OPENGL_CORE
could easily be added to the 110 and 120 cases to reject those.
v2: Use _mesa_is_desktop_gl to preserve the existing behavior in the
presence of the new API_OPENGL_CORE enumeration.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net> [v1]
Fixes some failures in getteximage-formats.
v2: Remove stray include, and drop extra test for encoding == GL_SRGB --
_mesa_get_srgb_format_linear() returns the same format if it wasn't SRGB.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=48120
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
NOTE: This is a candidate for the 8.0 branch.
It was using state->Const.GLSL_100ES, which is set if the driver
supports ARB_ES2_compatibility or we're in ES2 mode. Instead, it should
use state->language_version, as that represents the actual GLSL version
of the shader being compiled.
Since the correct logic is < 120 && !100, just make it == 110.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This will need to get refactored when we add support for core profiles
or forward-compatible contexts, but we may as well have it in the
meantime. This allows us to override the GLSL version and experiment.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Move installing osmesa.pc to drivers/osmesa, where it belongs better
This also restores the installation of gl.pc if we are building osmesa at the
same time as libGL, which was broken in commit 39785488 when the .pc
installation was converted to automake
v2:
Remove HAVE_OSMESA_DRIVER automake conditional, it's now pointless as we
will only be building in the drivers/osmesa directory if the condition it
checked was true.
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Eric Anholt <eric@anholt.net>
This patch fixes this build failure with Intel Compiler.
src/gallium/auxiliary/util/u_format_tests.c(903): error: floating-point operation result is out of range
{PIPE_FORMAT_R16_FLOAT, PACKED_1x16(0xffff), PACKED_1x16(0x7c01), UNPACKED_1x1( NAN, 0.0, 0.0, 1.0)},
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Now that ir_quadop_vector exists, ir_last_binop and ir_last_opcode are
no longer the same. Only one place currently uses this enumeration, and
already handles ir_quadop_vector correctly.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Olivier Galibert <galibert@pobox.com>
It's more convenient to use shortcuts like glsl_type::bvec2_type than
the longwinded glsl_type::get_instance(GLSL_TYPE_BOOL, 2, 1).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Olivier Galibert <galibert@pobox.com>
The hardware supports this format with no known quirks, so we may as
well enable it.
Alpha blending is not supported until Sandybridge, but as far as I can
tell, OpenGL doesn't require alpha blending on SNORM formats. Plus, we
already expose R8G8B8A8_SNORM which has a similar restriction.
Fixes 6 piglit texwrap-2D-*SNORM* cases,
gl-3.1/required-sized-texture-formats, and 10 oglconform snorm-textures
subcases
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
v2: fix tiling for small pitches, that finally makes
glxgears and readPixSanity work
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
The format member of pipe_surface may differ from that of the
pipe_resource, which is used to communicate, for instance, whether
sRGB encode should be enabled in the resolve operation or not.
Fixes resolve to sRGB surfaces in mesa/st when GL_FRAMEBUFFER_SRGB
is disabled.
Reviewed-by: Brian Paul <brianp@vmware.com>
sRGBEnabled should affect both textures and renderbuffers, so we need
to check/update the pipe_surface format for both.
Fixes, for instance, rendering appearing too bright in wine applications
using sRGB multisample renderbuffers.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
Remove the check for pixel transfer ops. If any RGB/depth scale/bias
is in effect, it'll be applied in the glTexImage step.
If drawing stencil pixels we need to disable pixel transfer so that
alpha scale/bias are not applied to the stencil data.
These issues were spotted by Roland.
Fixes Blender performance issues reported in
http://bugs.freedesktop.org/show_bug.cgi?id=47375
NOTE: This is a candidate for the 8.0 branch.
Tested-by: Barto <mister.freeman@laposte.net>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
No functional change. This patch modifies intel_miptree_alloc_mcs to
allocate the 4x MCS buffer using MESA_FORMAT_R8 instead of
MESA_FORMAT_A8. In principle it doesn't matter, since we only access
the buffer using MCS-specific hardware mechanisms, so all that's
important is to use a format with the correct size. However,
MESA_FORMAT_A8 has enough unusual behaviours that it seems prudent to
avoid it.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
It seems reset is not required for setting the max_wm_threads to 80
on gen6 GT2.
Increases performance in the Counter-Strike: Source video stress test
by 7.18% (n=5).
Signed-off-by: Zou Nan hai <nanhai.zou@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Matt Turner <mattst88@gmail.com>
Acked-by: Eric Anholt <eric@anholt.net>
The VCC register is tricky because the SALU views it as 64-bit, but the
VALU views it as 1-bit. In order to deal with this we've added some
special bitcast and binary operations to help convert from the 64-bit
SALU view to the 1-bit VALU view and vice versa.
If you want to change your compiler arguments, just set CFLAGS/CXXFLAGS.
Having Mesa have this separate variable is a great way to have your arguments
not thoroughly propagated to all compiler invocations.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In all current uses, it was appended to CFLAGS, which already had -m32. If
you want to do some other flag supplied to compiler invocations, there's
CFLAGS/CXXFLAGS.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
No functional change. This patch modifies brw_blorp_blit.cpp to use
the ROUND_DOWN_TO macro instead of open-coded bit manipulations, for
clarity.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The emit->key.fkey info is only valid if we're generating a fragment shader.
We should not look at it if we're generating a vertex shader.
When generating a vertex shader, the value of emit->key.fkey.num_textures was
garbage and the loop over num_textures would read invalid data. At best
this would cause us to emit an unused constant. At worse, we could segfault.
Just by dumb luck, fkey.num_textures was usually a smallish integer.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Recently more files were removed from control to be auto-generated
in the dricore library. Android build was not able to locate the
new files if they were not created beforehand.
LOCAL_SRC_FILES includes some of those files and Android.gen.mk
re-defines this variable by filtering out the auto-generated files.
Unfortunately for this variable it is not the same to have the SRCDIR
variable defined as the current directory.
By re-defining SRCDIR for the autotools build the Android build system
is happy again and the new files were actually removed from the sources
to use the auto generated versions.
Also patch d5c1801a01 was partially reverted as the files
can not be compiled to the LOCAL_PATH, instead they should live on the
intermediates folder so that a clean can wipe them out.
v3: [chad] Fix the definition of SRCDIR in libdricore/Makefile.am.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Signed-off-by: Daniel Charles <daniel.charles@intel.com>
XGetImage() will generate a BadMatch error if the source window isn't
visible. When that happens, create a new XImage. Fixes piglit 'select'
test failures with swrast/xlib driver.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Always allocate space for the inverse matrix in _math_matrix_ctr()
since we were always calling _math_matrix_alloc_inv() anyway.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When computing a matrix inverse, if the determinant is too small we could hit
a divide by zero. There's a check to prevent this (we basically give up on
computing the inverse and return the identity matrix.) This patch loosens
this test to fix a lighting bug reported by Lars Henning Wendt.
v2: use abs(det) to handle negative values
NOTE: This is a candidate for the 8.0 branch.
Tested-by: Lars Henning Wendt <lars.henning.wendt@gris.tu-darmstadt.de>
The sendc instruction causes the fragment shader thread to wait for
any dependent threads (i.e. threads rendering to overlapping pixels)
to complete before sending the message. We need to use sendc on the
first render target write in order to guarantee that fragment shader
outputs are written to the render target in the correct order.
Previously, we only used the "sendc" instruction when writing to
binding table index 0. This did the right thing for fragment shaders,
because our fragment shader back-ends always issue their first render
target write to binding table index 0. However, it did the wrong
thing for blorp, which performs its render target writes to binding
table index 1.
A more robust solution is to use sendc for all render target writes.
This should not produce any performance penalty, since after the first
sendc, all of the dependent threads will have completed.
For more information about sendc, see the Ivy Bridge PRM, Vol4 Part3
p218 (sendc - Conditional Send Message), and p54 (TDR Registers).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
A lot of code was still differentiating between between winsys and
user fbos by testing the fbo's name against zero. This converts
everything in the i915 and 965 drivers over to use _mesa_is_user_fbo()
and _mesa_is_winsys_fbo().
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
A lot of code was still differentiating between between winsys and
user fbos by testing the fbo's name against zero. This converts
everything in core mesa, the state tracker, and src/mesa/program over
to use _mesa_is_user_fbo() and _mesa_is_winsys_fbo().
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The OpenGL(R) ES Shading Language
Version 1.00 Revision 17 (12 May, 2009)
> 4.6.1 The Invariant Qualifier
> ... To force all output variables to be invariant, use the pragma
> #pragma STDGL invariant(all)
Signed-off-by: Oliver McFadden <oliver.mcfadden@linux.intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
We already provided these files on 'make install', but only created a
'libglapi.so' in the top-level lib/ convenience folder. We used to
create all three, but at some point in the build system churn, it broke.
Various applications (like the ES2 conformance suite) seem to link
against libglapi.so.0, so without these links, setting LD_LIBRARY_PATH
and LIBGL_DRIVERS_PATH can lead to using /usr/lib/libglapi.so.0 with
/home/whatever/libGL.so, which leads to API calls getting routed
incorrectly (i.e. glCompileShader -> _mesa_LinkProgramARB), which leads
to rage problems.
Preserve developer sanity...install links.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Ever since ctx->NativeIntegers was set, the conversion flag has been
PARAM_NO_CONVERT.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Since osmesa now has been converted to Makefile.am, an appropriate install: rule
is generated to install the shared libary, so we no longer need to do that in
src/mesa/Makefile.old
This leaves nothing in src/mesa/Makefile.old but the tags: rule, so move that to
Makefile.am and remove Makefile.old
Also, nothing now uses OSMESA_LIB_GLOB anymore, so remove it
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Commit 6c6803f28d removed xm_image.[ch], and removed
xm_image.c, but not xm_image.h from the Makefile, this was subsequently carried over
into Makefile.am
Remove xm_image.h from Makfile.am. This allows 'make dist' to succeed, even if it
doesn't do anything useful
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
"Use -no-undefined to assure libtool that the library has no
unresolved symbols at link time, so that libtool will build a shared
library on platforms require that all symbols are resolved when the
library is linked."
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
"Use -no-undefined to assure libtool that the library has no
unresolved symbols at link time, so that libtool will build a shared
library on platforms require that all symbols are resolved when the
library is linked."
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
MCS buffers use 32 bits per pixel in 8x MSAA, and 8 bits per pixel in
4x MSAA. This patch adjusts the format we use to allocate the buffer
so that enough memory is set aside for 8x MSAA.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The code to emit 3DSTATE_SAMPLE_MASK was already correct for 8x
MSAA--this patch just removes an assertion that would have prevented
it from being used for 8x MSAA.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch updates the blorp functions encode_msaa() and decode_msaa()
to properly handle the encoding of IMS MSAA buffers when
num_samples=8.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
When operating in persample dispatch mode, the blorp engine would
previously assume that subspan N always represented sample N (this is
correct assuming 4x MSAA and a 16-wide dispatch). In order to support
8x MSAA, we must compute which sample is associated with each subspan,
using the "Starting Sample Pair Index" field in the thread payload.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When rendering to an IMS MSAA surface on Gen7, blorp sets up the
rendering pipeline as though it were rendering to a single-sampled
surface; accordingly it must adjust the size of the primitive it sends
down the pipeline to account for the interleaving of samples in an IMS
surface.
This patch modifies the size adjustment code to properly handle 8x
MSAA, which makes room for the extra samples by using an interleaving
pattern that is twice as wide as 4x MSAA.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch adds a num_samples argument to the blorp function
manual_blend(), allowing it to be told how many samples need to be
blended together. Previously it assumed 4x MSAA, since that was all
we supported.
We also bump up LOG2_MAX_BLEND_SAMPLES from 2 to 3, so that
manual_blend() will be able to handle 8x MSAA.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When the client program uses glDrawBuffer() or glDrawBuffers() to
select more than one color buffer for drawing into, and then performs
a blit, we need to blit into every single enabled draw buffer.
+2 oglconforms.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=50407
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
This patch rearranges the order of steps performed by a blorp blit
from this:
- Sync up state of window system buffers.
- Find buffers.
- Find miptrees.
- Make sure buffer formats match.
- Handle mirroring.
- Make sure width and height match.
- Handle clipping/scissoring.
- Account for window system origin conventions.
- Do depth resolves, if applicable.
- Do the blit.
- Record the need for a future HiZ resolve, if applicable.
To this:
- Sync up state of window system buffers.
- Handle mirroring.
- Make sure width and height match.
- Handle clipping/scissoring.
- Account for window system origin conventions.
- Find buffers.
- Make sure buffer formats match.
- Find miptrees.
- Do depth resolves, if applicable.
- Do the blit.
- Record the need for a future HiZ resolve, if applicable.
The steps are the same, but they are now performed in an order that
will make it possible to implement correct DrawBuffers support. Note
that the last four steps are now in a separate function
(do_blorp_blit), since they will need to be executed repeatedly when
DrawBuffers support is added.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Previously, the blorp engine would fall back to swrast if the source
or destination of a blit had no associated miptree. This was
unnecessary, since _mesa_BlitFramebufferEXT() already takes care of
making the blit silently succeed if there are no buffers bound, so the
fallback paths could never actually happen in practice.
Removing these fallback paths will simplify the implementation of
correct DrawBuffers support in blorp.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
This patch modifies the order of operations in the blorp engine so
that clipping and scissoring are performed before adjusting the
coordinates to account for the difference in origin convention between
window system buffers and framebuffer objects. Previously, we would
do clipping and scissoring after adjusting for origin conventions, so
we would get scissoring wrong in window system buffers.
Fixes Piglit test "fbo-scissor-blit window".
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
When checking that the source and destination dimensions match, we
don't need to store the width and height in variables; doing so just
risks confusion since right after the check, we do clipping and
scissoring, which may alter the width and height.
No functional change.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
On Gen6, multisampled null render targets don't seem to work
properly--they cause the GPU to hang. So, as a workaround, we render
into a dummy color buffer.
Fortunately this situation (multisampled rendering without a color
buffer) is rare, and we don't have to waste too much memory, because
we can give the workaround buffer a very small pitch.
Fixes piglit test "EXT_framebuffer_multisample/no-color {2,4}
depth-computed *" on Gen6.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
The HW docs say that the width and height of null render targets need
to match the width and height of the corresponding depth and/or
stencil buffers, and that they need to be marked as Y-tiled. Although
leaving these values at 0 doesn't seem to cause any ill effects, it
seems wise to follow the documented requirements.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Previously, we used the number of samples in draw buffer 0 to
determine whether to set up the 3D pipeline for multisampling. Using
the visual is cleaner, and has the benefit of working properly when
there is no color buffer.
Fixes all piglit tests "EXT_framebuffer_multisample/no-color" on Gen7.
On Gen6, the "depth-computed" variants of these tests still fail; this
will be addresed in a later patch.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
This patch ensures that Visual.samples and Visual.sampleBuffers are
set correctly even in the case where there is no color buffer.
Previously, these values would retain their default value of 0 in this
circumstance, even if the depth or stencil buffer was multisampled.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Mesa misses a few checks when compiling on a uclibc system
which cause it to fall back on glibc-ism. This patch
addresses those issues.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Anthony G. Basile <blueness@gentoo.org>
The kernel streamout support was supposed to get into 3.3 along
the tiling change and thus use the same kernel version bump of
2.13 to report userspace that streamout register were supported.
This is not what happen. So as streamout kernel support did not
bump the kernel driver version, rely on kernel 2.14 version bump
to know if streamout is enabled or not. Which means you need at
least 3.4 kernel.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
The error was being set on the non-error path, rather
than the error path.
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
For 'non-legacy' contexts we will want to generate an error
if an uninstalled function is called.
The effect of this change will be that we can avoid installing
legacy functions, and they will then generate an error as
needed for deprecated functions in GL >= 3.1.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Commit 2d4b77c7 (automake: Convert src/mesa/drivers/x11/Makefile to
automake, 2012-06-12) dropped the old Makefile, which used GL_LIB, and
replaced it with a Makefile.am hard-coding the name "GL". This broke
handling of --enable-mangling and --with-gl-lib-name options which
depend on GL_LIB to specify the GL library name.
Use "@GL_LIB@" in src/mesa/drivers/x11/Makefile.am to configure the
library name. Also use this approach to simplify src/glx/Makefile.am
and drop the HAVE_MANGLED_GL conditional. While at it, fix the
compatibility link we create in "lib" for the software-only driver to
use version GL_MAJOR instead of hard-coding "1".
Reviewed-by: Dan Nicholson <dbn.lists@gmail.com>
This fixes the piglit EXT_framebuffer_multisample/bitmap tests.
Note that we must not rely on ctx->DrawBuffer when flushing the cache, because
that's already updated with a new framebuffer. We want to draw into the old
framebuffer where glBitmap was called.
Reviewed-by: Brian Paul <brianp@vmware.com>
Testing shows that the standard JIT engine retrofited with AVX support is quite
stable and as capable to handle AVX instructions as MC-JIT is.
And the old JIT is much more memory efficient, as we don't need to
allocate one engine instance per shader, as we do for MC-JIT due to its
incompleteness.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
When X is running it is neccesary for pipe_loader to authenticate with
DRM, in order to be able to use the device.
This makes it possible to run OpenCL programs while X is running.
v2:
- Fix C++ style comments
- Drop Xlib-xcb dependency
- Close the X connection when done
- Split auth code into separate function
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Calling glDeleteShader() should mark shaders as pending for deletion,
but shouldn't decrement the refcount every time. Otherwise, repeated
glDeleteShader() is not safe.
This is particularly bad since glDeleteProgram() frees shaders: if you
first call glDeleteShader() on the shaders attached to the program (thus
decrementing the refcount), then called glDeleteProgram(), it would try
to free them again (decrementing the refcount another time), causing
a refcount > 0 assertion to fail.
Similar to commit d950a778.
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
If the pack type is not supported, use _mesa_problem
rather than asserting.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
_mesa_is_integer_format is moved to formats.c and renamed
as _mesa_is_enum_format_integer.
_mesa_is_format_unsigned, _mesa_is_type_integer,
_mesa_is_type_unsigned, and _mesa_is_enum_format_or_type_integer
are added.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
llvm-3.2svn r160587 moved createBoundsCheckingPass from
lib/Transforms/Scalar to lib/Transforms/Instrumentation.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Except for a couple of explicit uses, _mesa_inv_sqrtf was disabled since
its addition in 2003 (see f9b1e524).
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Temporarily disabled since 2003 (see 386578c5b).
This saves us from calling sqrt() 128 times to generate the sqrttab in
one_time_init().
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Found by compiler warning:
i830_texstate.c:131:28: warning: argument to 'sizeof' in 'memset' call
is the same expression as the destination; did you mean to
dereference it? [-Wsizeof-pointer-memaccess]
memset(state, 0, sizeof(state));
~~~~~ ^~~~~
On 64-bit systems, memset here would write an extra 4 bytes.
Note: This is a candidate for the stable branches.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
This can potentially cut shader program size by a factor of 4 for 4-wide
execution respectively 2 for 8-wide execution and while this ratios aren't
quite reached for more complex shaders it can be close.
Could not really measure a performance difference so far except for trivial
shaders (glxgears).
There seems to be a fair amount of unnecessary move's generated especially
at the beginning it might be possible to optimize those away somehow.
Things aren't quite as clean, some additional stuff needs to be done for
keeping both paths working (though llvm might be able to optimize this away).
glxgears seems to lose about 5-10% of performance, looking at the generated
shaders this is actually less than I'd think it would be - both 4 and 8-wide
shaders, despite containing a loop actually have about 10% more instructions
in total, and will have roughly 50% more executed instructions (though mostly
cheap ones). Need to figure out how to reduce overhead...
v2: keep complex interpolation for 4-wide mode, adapt to interface changes.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
This thread count is only supposed to be enabled when "WIZ Hashing Disable in
GT_MODE register enabled." I've always been confused whether that means the
bit in the register should be 1 or 0. For my IVB GT2's register 0x7008 value
of 0x0, this appears to work fine.
Improves l4d2 performance at 640x480 by 0.88 +/- 0.11% (n=88). Improves
performance with rasterization at 1280x1024 by 1.45% +/- 0.36% (n=6).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now that we finally have a list of uniform blocks in the linked shader
program, we can tell what their indices are.
Fixes piglit GL_ARB_uniform_buffer_object/getuniformblockindex.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
At this point in the linking, we've totally lost track of the struct
gl_uniform_buffer that this pointed to in the original unlinked
shader, so we do a nasty n^2 walk to find it the new one based on the
variable name.
Note that these point into the shader's list of gl_uniform_buffers,
not the linked program's.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We'll need to propagate the UBO fields to the uniform storage records
before we can handle the other pnames.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This is a single entrypoint that maps from a series of names to the
indices of those names within the active uniforms list. Each index is
like glGetUniformLocation()'s return value, except that it doesn't
encode an array offset.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
With the upcoming GL_ARB_uniform_buffer_object changes, the only
other caller that will want the cooked value is state_tracker.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We're going to need this structure to cross-validate the uniform
blocks between shader stages, since unused ir_variables might get
dropped. It's also the place we store the RowMajor qualifier, which
is not part of the GLSL type (since that would cause a bunch of type
equality checks to fail).
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Someone tried to be clever and "optimized" add_vertex_data2() to just use
two points for the texture coordinates and then reuse individual
components. Sadly this is not how matrix multiplication works.
Fixes rendercheck -t tmcoords
Signed-off-by: Lucas Stach <dev@lynxeye.de>
Previously, on Gen7, when texturing from a depth or stencil surface,
the blorp engine would configure the 3D pipeline as though the input
surface was non-multisampled, and perform the necessary coordinate
transformations in the fragment shader to account for the IMS layout.
This meant outputting a lot of extra fragment shader code, and it
raised some uncertainty about how to deal with very large surfaces.
This patch modifies blorp to configure the 3D pipeline properly for
IMS layout when reading from depth and stencil surfaces.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Previously, on Gen7, compute_msaa_layout_for_pipeline() would verify
that IMS layout is not used. However, now that we configure
SURFACE_STATE correctly for IMS surfaces, IMS layout is available.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This patch modifies gen7_set_surface_num_multisamples() to set up the
SURFACE_STATE appropriately for texturing from IMS format MSAA
surfaces (which are only used on Gen7 for depth and stencil buffers).
Since the function now sets more than just the number of multisamples,
it's been renamed to gen7_set_surface_msaa().
This will make it possible to remove some kludginess from the blorp
engine.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
When downsampling a compressed multisampled surface, we can take a
shortcut to downsample any pixels that were completely covered by a
single primitive. In this case, the first color value we fetch is the
correct final color for the downsampled pixel, so we can skip the rest
of the blending operation.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
When downsampling an integer-format buffer on Gen7, we need to use the
"avg" instruction rather than the "add" instruction, to ensure that we
don't overflow the range of 32-bit integers. Also, we need to use the
proper register type (BRW_REGISTER_TYPE_D or BRW_REGISTER_TYPE_UD) for
intermediate color data and for writing to the render target.
Note: this patch causes blorp to use the proper register type for all
operations (downsampling, upsampling, and ordinary blits). Strictly
speaking, this is only necessary for downsampling, because the other
operations exclusively use MOV instructions on the color data. But
it's simpler to use the proper register type in all cases.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
When downsampling from an MSAA image to a single-sampled image, it is
inevitable that some loss of numerical precision will occur, since we
have to use 32-bit floating point registers to hold the intermediate
results while blending. However, it seems reasonable to expect that
when all samples corresponding to a given pixel have the exact same
color value, there will be no loss of precision.
Previously, we averaged samples as follows:
blend = (((sample[0] + sample[1]) + sample[2]) + sample[3]) / 4
This had the potential to lose numerical precision when all samples
have the same color value, since ((sample[0] + sample[1]) + sample[2])
may not be precisely representable as a 32-bit float, even if the
individual samples are.
This patch changes the formula to:
blend = ((sample[0] + sample[1]) + (sample[2] + sample[3])) / 4
This avoids any loss of precision in the event that all samples are
the same, by ensuring that each addition operation adds two equal
values.
As a side benefit, this puts the formula in the form we will need in
order to implement correct blending of integer formats.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
From the Ivy Bridge PRM, Vol4 Part3 p152:
"The avg instruction performs component-wise integer average of
src0 and src1 and stores the results in dst. An integer average
uses integer upward rounding. It is equivalent to increment one to
the addition of src0 and src1 and then apply an arithmetic right
shift to this intermediate value."
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
The kill_emitted variable was duplicating the functionality of
gl_fragment_program::UsesKill. There's no need for both.
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, the code for setting this flag for GLSL programs was
duplicated in three places: brw_link_shader(), glsl_to_tgsi_visitor,
and ir_to_mesa_visitor. In addition to the unnecessary duplication,
there was a performance problem on i965: brw_link_shader() set the
flag before doing its final round of optimizations, which meant that
if the optimizations managed to eliminate all the discard operations,
the flag would still be set, resulting (at least in theory) in slower
performance.
This patch consolidates all of the code that sets UsesKill for GLSL
programs into do_set_program_inouts(), which already is doing a
similar job for UsesDFdy, and which occurs after i965's final round of
optimizations.
Non-GLSL programs (ARB programs and the state tracker's glBitmap
program) are unaffected.
Reviewed-by: Eric Anholt <eric@anholt.net>
Move it to native_wayland_drm_bufmgr_helper.c which only gets compiled when
wayland is enabled and which already includes the right headers.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
The cube sampler generates two-dimensional texture coordinates and
hence passes NULL for the array for the third one. The actual 2D
sampler, lower in the pipe, knew not to used that array since it
didn't need it. But the samplers have become single-texel and the
coordinate array dereference has been moved up one step, to a level
where the code does not know only two coordinates are used. Hence the
segfault.
The simplest fix by far is to add a third dummy coordinate array in
the call to the next pipe step, which will be dereferenced to an
harmless 0 which then will be happily ignored by the sampler.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=52250
Signed-off-by: Olivier Galibert <galibert@pobox.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
We also reuse EGL_TEXTURE_RGBA and EGL_TEXTURE_RGB, adding only the new
planar YUV texture formats: EGL_TEXTURE_Y_U_V_WL, EGL_TEXTURE_Y_UV_WL and
EGL_TEXTURE_Y_XUXV_WL.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
The i965 back-end needs to compile dFdy() differently for FBOs and
window system framebuffers, because Y coordinates are flipped between
the two (see commit 82d2596: i965: Compute dFdy() correctly for FBOs).
This patch avoids unnecessarily recompiling shaders that don't use
dFdy(), by only setting render_to_fbo in the wm program key if the
shader actually uses dFdy().
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This patch updates the ir_set_program_inouts_visitor so that it also
sets gl_fragment_program::UsesDFdy.
This is a bit of a hack (since dFdy() isn't an input or an output),
but there's no other obvious visitor to squeeze this functionality
into, and it would be silly to create a brand new visitor just for
this purpose.
v2: use local 'fprog' var to avoid repeated casting.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The i965 back-end needs to compile dFdy() differently for FBOs and
window system framebuffers, because Y coordinates are flipped between
the two (see commit 82d2596: i965: Compute dFdy() correctly for FBOs).
This boolean will allow it to avoid unnecessarily recompiling shaders
that don't use dFdy().
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Unigine Heaven (at least) has a bug where it incorrectly uses the
GL_ARB_blend_func_extended extension.
Dual source blending allows two color outputs per render target;
individual shader outputs can be assigned to be either the first or
second blending input by setting the 'index' via one of two methods:
- An API call: glBindFragDataLocationIndexed()
- The GLSL 'layout' qualifier provided by GL_ARB_explicit_attrib_location
Both of these only work on user defined fragment shader outputs; it's an
error to use either on built-in outputs like gl_FragData.
Unigine uses gl_FragData and gl_FragColor exclusively, and doesn't even
attempt to use either method to set index == 1. However, it does set
the blending function to SRC1 enums, which requires a fragment shader
output with index == 1 or else rendering is undefined.
In other words, enabling ARB_blend_func_extended causes Unigine to
render incorrectly, resulting in an apparent regression, even though our
driver code (as far as I can tell) is perfectly fine.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=50291
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, if we were spilling the result of a texture call, we would store
all 4 regs, then for each use of one of those regs as the source of an
instruction, we would unspill all 4 regs even though only one was needed.
In both lightsmark and l4d2 with my current graphics config, the shaders that
produce spilling do so on split GRFs, so this doesn't help them out. However,
in a capture of the l4d2 shaders with a different snapshot and playing the
game instead of using a demo, it reduced one shader from 2817 instructions to
2179, due to choosing a now-cheaper texture result to spill instead of piles
of texcoords.
v2: Fix comment noted by Ken, and fix the if condition associated with it for
the current state of what constitutes a partial write of the destination.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
There's one instance of a potential behavior change: propagate_constants may
now propagate into a part of a vgrf after a different part of it was
overwritten by a send that returns multiple registers. I don't think we ever
generate IR that meets that condition, but it's something to note if we bisect
behavior change to this.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In these places, we care about any sort of send that hits more than one reg,
not just textures. We don't yet have anything else returning more than one
reg, so there's no change.
v2: Use mlen instead of is_tex() for the is-it-a-send check.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
"count" is a more useful name, since most of the time we're using it for
looping over the variables.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
OpenGL specification 3.3 (page 196), section 4.1.3 says:
If drawbuffer zero is not NONE and the buffer it references has an
integer format, the SAMPLE_ALPHA_TO_COVERAGE and SAMPLE_ALPHA_TO_ONE
operations are skipped."
This should work properly even if there are other draw buffers that
are not in integer format.
This patch makes following piglit tests pass on mesa:
int-draw-buffers-alpha-to-coverage
int-draw-buffers-alpha-to-one
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
This patch churns a lot because it needs to change 4-wide filters into
single pixel filters, since each fragment may use a different filter.
The only case not entirely supported is the anisotropic filtering.
Not sure what we want to do there, since a full quad is required by
that filter.
Signed-off-by: Olivier Galibert <galibert@pobox.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
From the GL 3.0 spec, section 4.3.3, in the documentation for
CopyPixels():
"An INVALID_OPERATION error will be generated if the object bound
to READ_FRAMEBUFFER_BINDING is framebuffer complete and the value
of SAMPLE_BUFFERS is greater than zero."
The same applies to CopyTexImage...() and CopyTexSubImage...()
functions, since they are defined in terms of CopyPixels().
Previously we were generating an INVALID_FRAMEBUFFER_OPERATION error
in these cases.
Fixes piglit tests
"EXT_framebuffer_multisample/negative-{copypixels,copyteximage}".
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Issues fixed:
- set_vs_sampler_views for evergreen is now properly implemented.
- Added the missing inval_texture_cache call for evergreen.
- have_depth_texture was sometimes incorrectly set to false on evergreen even
if there were depth textures in other shader stages. To fix this, set it
to true once and never set it to false again. It's stupid, but it matches
the r600 code. The proper fix is left to another patch.
- Optimizaton: The sampler views which aren't changed aren't updated.
This is a leftover from:
commit fe1fd67556
Author: Marek Olšák <maraeo@gmail.com>
Date: Sun Jul 8 03:10:37 2012 +0200
r600g: don't flush depth textures set as colorbuffers
If only some buffers are changed, the other ones don't have to re-emitted.
This uses bitmasks of enabled and dirty buffers just like
emit_constant_buffers does.
* Also add mcjit in the non-OpenCL case.
* Replace hardcoded llvm-config with $LLVM_CONFIG everywhere.
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Tom Stellad <thomas.stellard@amd.com>
Helps spotting and removing the obsolete generated files, which otherwise break
the build.
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This is neccessary for linking the llvmpipe tests. It appears this
dependency was introduced by the "wider native register" changes.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
It's been broken (using NULL getBuffersWithFormat() instead of
getBuffers()) due to a copy and paste error for a year now.
GetBuffersWithFormat has been around since 2009, so I don't feel any
guilt in not supporting it.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This means that GLX buffer sharing of these no longer works. On the
other hand, just *look* at this code reduction.
v2:
- [chad] Fix intelCreateBuffer for gen < 6. When the branch for
!screen->hw_has_separate_stencil was taken,
intel_create_private_renderbuffer was incorrectly not used.
- [chad] Remove all code in intel_process_dri2_buffer for processing
depth, stencil, and hiz buffers. That code is now dead.
CC: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
commit '7250cd506baa0bd4649b30d87509cdd0cbc06a57'
changes struct gbm_bo, renaming it's 'pitch' to 'stride'.
This applies to Gallium.
Signed-off-by: Elvis Lee <kwangwoong.lee@lge.com>
Previously, if you ran make followed by make check it would work, but
if you just ran make check the test program would fail to compile.
Reviewed-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Squashed commit of the following:
commit 7acb7b4f60dc505af3dd00dcff744f80315d5b0e
Author: José Fonseca <jfonseca@vmware.com>
Date: Mon Jul 9 17:46:31 2012 +0100
draw: Don't use dynamically sized arrays.
Not supported by MSVC.
commit 5810c28c83647612cb372d1e763fd9d7780df3cb
Author: José Fonseca <jfonseca@vmware.com>
Date: Mon Jul 9 17:44:16 2012 +0100
gallivm,llvmpipe: Don't use expressions with PIPE_ALIGN_VAR().
MSVC doesn't accept exceptions in _declspec(align(...)). Use a
define instead.
commit 8aafd1457ba572a02b289b3f3411e99a3c056072
Author: José Fonseca <jfonseca@vmware.com>
Date: Mon Jul 9 17:41:56 2012 +0100
gallium/util: Make u_cpu_detect.h header C++ safe.
commit 5795248350771f899cfbfc1a3a58f1835eb2671d
Author: José Fonseca <jfonseca@vmware.com>
Date: Mon Jul 2 12:08:01 2012 +0100
gallium/util: Add ULL suffix to large constants.
As suggested by Andy Furniss: it looks like some old gcc versions
require it.
commit 4c66c22727eff92226544c7d43c4eb94de359e10
Author: José Fonseca <jfonseca@vmware.com>
Date: Fri Jun 29 13:39:07 2012 +0100
gallium/util: Truly disable INF/NAN tests on MSVC.
Thanks to Brian for spotting this.
commit 8bce274c7fad578d7eb656d9a1413f5c0844c94e
Author: José Fonseca <jfonseca@vmware.com>
Date: Fri Jun 29 13:39:07 2012 +0100
gallium/util: Disable INF/NAN tests on MSVC.
Somehow they are not recognized as constants.
commit 6868649cff8d7fd2e2579c28d0b74ef6dd4f9716
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu Jul 5 15:05:24 2012 +0200
gallivm: Cleanup the 2 x 8 float -> 16 ub special path in lp_build_conv.
No behaviour change intended, like 7b98455fb40c2df84cfd3cdb1eb7650f67c8a751.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
commit 5147a0949c4407e8bce9e41d9859314b4a9ccf77
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu Jul 5 14:28:19 2012 +0200
gallivm: (trivial) fix issues with multiple-of-4 texture fetch
Some formats can't handle non-multiple of 4 fetches I believe, but
everything must support length 1 and multiples of 4.
So avoid going to scalar fetch (which is very costly) just because length
isn't 4.
Also extend the hack to not use shift with variable count for yuv formats to
arbitrary length (larger than 1) - doesn't matter how many elements we
have we always want to avoid it unless we have variable shift count
instruction (which we should get with avx2).
Reviewed-by: José Fonseca <jfonseca@vmware.com>
commit 87ebcb1bd71fa4c739451ec8ca89a7f29b168c08
Author: Roland Scheidegger <sroland@vmware.com>
Date: Wed Jul 4 02:09:55 2012 +0200
gallivm: (trivial) fix typo for wrap repeat mode in linear filtering aos code
This would lead to bogus coordinates at the edges.
(undetected by piglit because this path is only taken for block-based
formats).
Signed-off-by: José Fonseca <jfonseca@vmware.com>
commit 3a42717101b1619874c8932a580c0b9e6896b557
Author: José Fonseca <jfonseca@vmware.com>
Date: Tue Jul 3 19:42:49 2012 +0100
gallivm: Fix TGSI integer translation with AVX.
commit d71ff104085c196b16426081098fb0bde128ce4f
Author: José Fonseca <jfonseca@vmware.com>
Date: Fri Jun 29 15:17:41 2012 +0100
llvmpipe: Fix LLVM JIT linear path.
It was not working properly because it was looking at the JIT function
before it was actually compiled.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
commit a94df0386213e1f5f9a6ed470c535f9688ec0a1b
Author: José Fonseca <jfonseca@vmware.com>
Date: Thu Jun 28 18:07:10 2012 +0100
gallivm: Refactor lp_build_broadcast(_scalar) to share code.
Doesn't really change the generated assembly, but produces more compact IR,
and of course, makes code more consistent.
Reviewed-by: Brian Paul <brianp@vmware.com>
commit 66712ba2731fc029fa246d4fc477d61ab785edb5
Author: José Fonseca <jfonseca@vmware.com>
Date: Wed Jun 27 17:30:13 2012 +0100
gallivm: Make LLVMContextRef a singleton.
There are any places inside LLVM that depend on it. Too many to attempt
to fix.
Reviewed-by: Brian Paul <brianp@vmware.com>
commit ff5fb7897495ac263f0b069370fab701b70dccef
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu Jun 28 18:15:27 2012 +0200
gallivm: don't use 8-wide texture fetch in aos path
This appears to be a slight loss usually.
There are probably several reasons for that:
- fetching itself is scalar
- filtering is pure int code hence needs splitting anyway, same
for the final texel offset calculations
- texture wrap related code, which can be done 8-wide, is slightly more
complex with floats (with clamp_to_edge) and float operations generally
more costly hence probably not much faster overall
- the code needed to split when encountering different mip levels for the
quads, adding complexity
So, just split always for aos path (but leave it 8-wide for soa, since we
do 8-wide filtering there when possible).
This should certainly be revisited if we'd have avx2 support.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
commit ce8032b43dcd8e8d816cbab6428f54b0798f945d
Author: Roland Scheidegger <sroland@vmware.com>
Date: Wed Jun 27 18:41:19 2012 +0200
gallivm: (trivial) don't extract fparts variable if not needed
Did not have any consequences but unnecessary.
commit aaa9aaed8f80dc282492f62aa583a7ee23a4c6d5
Author: Roland Scheidegger <sroland@vmware.com>
Date: Wed Jun 27 18:09:06 2012 +0200
gallivm: fix precision issue in aos linear int wrap code
now not just passes at a quick glance but also with piglit...
If we do the wrapping with floats, we also need to set the
weights accordingly. We can potentially end up with different
(integer) coordinates than what the integer calculations would
have chosen, which means the integer weights calculated previously
in this case are completely wrong. Well at least that's what I think
happens, at least recalculating the weights helps.
(Some day really should refactor all the wrapping, so we do whatever is
fastest independent of 16bit int aos or 32bit float soa filtering.)
Reviewed-by: José Fonseca <jfonseca@vmware.com>
commit fd6f18588ced7ac8e081892f3bab2916623ad7a2
Author: José Fonseca <jfonseca@vmware.com>
Date: Wed Jun 27 11:15:53 2012 +0100
gallium/util: Fix parsing of options with underscore.
For example
GALLIVM_DEBUG=no_brilinear
which was being parsed as two options, "no" and "brilinear".
commit 09a8f809088178a03e49e409fa18f1ac89561837
Author: James Benton <jbenton@vmware.com>
Date: Tue Jun 26 15:00:14 2012 +0100
gallivm: Added a generic lp_build_print_value which prints a LLVMValueRef.
Updated lp_build_printf to share common code.
Removed specific lp_build_print_vecX.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
commit e59bdcc2c075931bfba2a84967a5ecd1dedd6eb0
Author: José Fonseca <jfonseca@vmware.com>
Date: Wed May 16 15:00:23 2012 +0100
draw,llvmpipe: Avoid named struct types on LLVM 3.0 and later.
Starting with LLVM 3.0, named structures are meant not for debugging, but
for recursive data types, previously also known as opaque types.
The recursive nature of these types leads to several memory management
difficulties. Given that we don't actually need recursive types, avoid
them altogether.
This is an attempt to address fdo bugs 41791 and 44466. The issue is
somewhat random so there's no easy way to check how effective this is.
Cherry-picked from 9af1ba565d
commit df6070f618a203c7a876d984c847cde4cbc26bdb
Author: Roland Scheidegger <sroland@vmware.com>
Date: Wed Jun 27 14:42:53 2012 +0200
gallivm: (trivial) fix typo in faster aos linear int wrap code
no longer crashes, now REALLY tested.
commit d8f98dce452c867214e6782e86dc08562643c862
Author: Roland Scheidegger <sroland@vmware.com>
Date: Tue Jun 26 18:20:58 2012 +0200
llvmpipe: (trivial) remove bogus optimization for float aos repeat wrap
This optimization for nearest filtering on the linear path generated
likely bogus results, and the int path didn't have any optimizations
there since the only shader using force_nearest apparently uses
clamp_to_edge not repeat wrap anyway.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
commit c4e271a0631087c795e756a5bb6b046043b5099d
Author: Roland Scheidegger <sroland@vmware.com>
Date: Tue Jun 26 23:01:52 2012 +0200
gallivm: faster repeat wrap for linear aos path too
Even if we already have scaled integer coords, it's way faster to use
the original float coord (plus some conversions) rather than use URem.
The choice of what to do for texture wrapping is not really tied to int
aos or float soa filtering though for some modes there can be some gains
(because of easier weight calculations).
Reviewed-by: José Fonseca <jfonseca@vmware.com>
commit 1174a75b1806e92aee4264ffe0ffe7e70abbbfa3
Author: Roland Scheidegger <sroland@vmware.com>
Date: Tue Jun 26 14:39:22 2012 +0200
gallivm: improve npot tex wrap repeat in linear soa path
URem gets translated into series of scalar divisions so
just about anything else is faster.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
commit f849ffaa499ed96fa0efd3594fce255c7f22891b
Author: Roland Scheidegger <sroland@vmware.com>
Date: Tue Jun 26 00:40:35 2012 +0100
gallivm: (trivial) fix near-invisible shift-space typo
I blame the keyboard.
commit 5298a0b19fe672aebeb70964c0797d5921b51cf0
Author: Roland Scheidegger <sroland@vmware.com>
Date: Mon Jun 25 16:24:28 2012 +0200
gallivm: add new intrinsic helper to deal with arbitrary vector length
This helper will split vectors which are too large for the hw, or expand
them if they are too small, so a caller of a function using intrinsics which
uses such sizes need not split (or expand) the vectors manually and the
function will still use the intrinsic instead of dropping back to generic
llvm code. It can also accept scalars for use with pseudo-vector intrinsics
(only useful for float arguments, all x86 scalar simd float intrinsics use
4vf32).
Only used for lp_build_min/max() for now (also added the scalar float case
for these while there). (Other basic binary functions could use it easily,
whereas functions with a different interface would need different helpers.)
Expanding vectors isn't widely used, because we always try to use
build contexts with native hw vector sizes. But it might (or not) be nicer
if this wouldn't need to be done, the generated code should in theory stay
the same (it does get hit by lp_build_rho though already since we
didn't have a intrinsic for the scalar lp_build_max case before).
v2: incorporated Brian's feedback, and also made the scalar min/max case work
instead of crash (all scalar simd float intrinsics take 4vf32 as argument,
probably the reason why it wasn't used before).
Moved to lp_bld_intr based on José's request, and passing intrinsic size
instead of length.
Ideally we'd derive the source type info from the passed in llvm value refs
and process some llvmtype return type so we could handle intrinsics where
the source and destination type isn't the same (like float/int conversions,
packing instructions) but that's a bit too complicated for now.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
commit 01aa760b99ec0b2dc8ce57a43650e83f8c1becdf
Author: Roland Scheidegger <sroland@vmware.com>
Date: Mon Jun 25 16:19:18 2012 +0200
gallivm: (trivial) increase max code size for shader disassembly
64kB was just short of what I needed (which caused a crash) hence
increase to 96kB (should probably be smarter about that).
commit 74aa739138d981311ce13076388382b5e89c6562
Author: Roland Scheidegger <sroland@vmware.com>
Date: Mon Jun 25 11:53:29 2012 +0100
gallivm: simplify aos float tex wrap repeat nearest
just handle pot and npot the same. The previous pot handling
ended up with exactly the same instructions plus 2 more (leave it
in the soa path though since it is probably still cheaper there).
While here also fix a issue which would cause a crash after an assert.
commit 0e1e755645e9e49cfaa2025191e3245ccd723564
Author: Roland Scheidegger <sroland@vmware.com>
Date: Mon Jun 25 11:29:24 2012 +0100
gallivm: (trivial) skip floor rounding in ifloor when not signed
This was only done for the non-sse41 case before, but even with
sse41 this is obviously unnecessary (some callers already call
itrunc in this case anyway but some might not).
commit 7f01a62f27dcb1d52597b24825931e88bae76f33
Author: Roland Scheidegger <sroland@vmware.com>
Date: Mon Jun 25 11:23:12 2012 +0100
gallivm: (trivial) fix bogus comments
commit 5c85be25fd82e28490274c468ce7f3e6e8c1d416
Author: José Fonseca <jfonseca@vmware.com>
Date: Wed Jun 20 11:51:57 2012 +0100
translate: Free elt8_func/elt16_func too.
These were leaking.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
commit 0ad498f36fb6f7458c7cffa73b6598adceee0a6c
Author: Roland Scheidegger <sroland@vmware.com>
Date: Tue Jun 19 15:55:34 2012 +0200
gallivm: fix bug for tex wrap repeat with linear sampling in aos float path
The comparison needs to be against length not length_minus_one, otherwise
the max texel is never chosen (for the second coordinate).
Fixes piglit texwrap-1D-npot-proj (and 2D/3D versions).
Reviewed-by: José Fonseca <jfonseca@vmware.com>
commit d1ad65937c5b76407dc2499b7b774ab59341209e
Author: Roland Scheidegger <sroland@vmware.com>
Date: Tue Jun 19 16:13:43 2012 +0200
gallivm: simplify soa tex wrap repeat with npot textures and no mip filtering
Similar to what is already done in aos sampling for the float path (but not
the int path since we don't get normalized float coordinates there).
URem is expensive and the calculation is done trivially with
normalized floats instead (at least with sse41-capable cpus).
(Some day should probably do the same for the mip filter path but it's much
more complicated there hence the gain is smaller.)
Reviewed-by: José Fonseca <jfonseca@vmware.com>
commit e1e23f57ba9b910295c306d148f15643acc3fc83
Author: Roland Scheidegger <sroland@vmware.com>
Date: Mon Jun 18 20:38:56 2012 +0200
llvmpipe: (trivial) remove duplicated function declaration
Reviewed-by: José Fonseca <jfonseca@vmware.com>
commit 07ca57eb09e04c48a157733255427ef5de620861
Author: Roland Scheidegger <sroland@vmware.com>
Date: Mon Jun 18 20:37:34 2012 +0200
llvmpipe: destroy setup variants on context destruction
lp_delete_setup_variants() used to be called in garbage collection,
but this no longer exists hence the setup shaders never got freed.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
commit ed0003c633859a45f9963a479f4c15ae0ef1dca3
Author: Roland Scheidegger <sroland@vmware.com>
Date: Mon Jun 18 16:25:29 2012 +0100
gallivm: handle different ilod parts for multiple quad sampling
This fixes filtering when the integer part of the lod is not the same
for all quads. I'm not fully convinced of that solution yet as it just
splits the vector if the levels to be sampled from are different.
But otherwise we'd need to do things like some minify steps, and getting
mip level base address separately anyway hence it wouldn't really look
like much of a win (and making the code even more complex).
This should now give identical results to single quad sampling.
commit 8580ac4cfc43a64df55e84ac71ce1a774d33c0d2
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu Jun 14 18:14:47 2012 +0200
gallivm: de-duplicate sample code common to soa and aos sampling
There doesn't seem to be any reason why this code dealing with cube face
selection, lod and mip level calculation is separate in aos and
soa sampling, and I am sick of having it to change in both places.
commit fb541e5f957408ce305b272100196f1e12e5b1e8
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu Jun 14 18:15:41 2012 +0200
gallivm: do mip filtering with per quad lod_fpart
This gives better results for mip filtering, though the generated code might
not be optimal. For now it also creates some artifacts if the lod_ipart isn't
the same for all quads, since instead of using the same mip weight for all
quads as previously (which just caused non-smooth gradients) this now will
use the right weights but with the wrong mip level in this case (can easily
be seen with things like texfilt, mipmap_tunnel).
v2: use logic helper suggested by José, and fix issue with negative lod_fpart
values
commit f1cc84eef7d826a20fab6cd8ccef9a275ff78967
Author: Roland Scheidegger <sroland@vmware.com>
Date: Wed Jun 13 18:35:25 2012 +0200
gallivm: (trivial) fix bogus assert in lp_build_unpack_broadcast_aos_scalars
commit 7c17dbae8ae290df9ce0f50781a09e8ed640c044
Author: James Benton <jbenton@vmware.com>
Date: Tue Jun 12 12:11:14 2012 +0100
util: Reimplement half <-> float conversions.
Removed u_half.py used to generate the table for previous method.
Previous implementation of float to half conversion was faulty for
denormalised and NaNs and would require extra logic to fix,
thus making the speedup of using tables irrelevant.
commit 7762f59274070e1dd4b546f5cb431c2eb71ae5c3
Author: James Benton <jbenton@vmware.com>
Date: Tue Jun 12 12:12:16 2012 +0100
tests: Updated tests to properly handle NaN for half floats.
commit fa94c135aea5911fd93d5dfb6e6f157fb40dce5e
Author: Roland Scheidegger <sroland@vmware.com>
Date: Mon Jun 11 18:33:10 2012 +0200
gallivm: do mip level calculations per quad
This is the final piece which shouldn't change the rendering output yet.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
commit 23cbeaddfe03c09ca18c45d28955515317ffcf4c
Author: Roland Scheidegger <sroland@vmware.com>
Date: Sat Jun 9 00:54:21 2012 +0200
gallivm: do per-quad cube face selection
Doesn't quite fix the piglit cubemap test (not sure why actually)
but doing per-quad face selection is doing the right thing and
definitely an improvement.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
commit abfb372b3702ac97ac8b5aa80ad1b94a2cc39d33
Author: Roland Scheidegger <sroland@vmware.com>
Date: Mon Jun 11 18:22:59 2012 +0200
gallivm: do all lod calculations per quad
Still no functional change but lod is now converted to scalar after
lod calculations.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
commit 519368632747ae03feb5bca9c655eccbc5b751b4
Author: James Benton <jbenton@vmware.com>
Date: Tue May 22 16:46:10 2012 +0100
gallivm: Added support for half-float to float conversion in lp_build_conv.
Updated various utility functions to support this change.
commit 135b4d683a4c95f7577ba27b9bffa4a6fbd2c2e7
Author: James Benton <jbenton@vmware.com>
Date: Tue May 22 16:02:46 2012 +0100
gallivm: Added function for half-float to float conversion.
Updated lp_build_format_aos_array to support half-float source.
commit 37d648827406a20c5007abeb177698723ed86673
Author: James Benton <jbenton@vmware.com>
Date: Tue May 22 14:55:18 2012 +0100
util: Updated u_format_tests to rigidly test half-float boundary values.
commit 2ad18165d96e578aa9046df7c93cb1c3284d8c6b
Author: James Benton <jbenton@vmware.com>
Date: Tue May 22 14:54:16 2012 +0100
llvmpipe: Updated lp_test_format to properly handle Inf/NaN results.
commit 78740acf25aeba8a7d146493dd5c966e22c27b73
Author: James Benton <jbenton@vmware.com>
Date: Tue May 22 14:53:30 2012 +0100
util: Added functions for checking NaN / Inf for double and half-floats.
commit 35e9f640ae01241f9e0d67fe893bbbf564c05809
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu May 24 21:05:13 2012 +0200
gallivm: Fix calculating rho for 3d textures for the single-quad case
Discovered by accident, this looks like a very old typo bug.
commit fc1220c636326536fd0541913154e62afa7cd1d8
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu May 24 21:04:59 2012 +0200
gallivm: do calcs per-quad in lp_build_rho
Still convert to scalar at the end of the function.
commit 50a887ffc550bf310a6988fa2cea5c24d38c1a41
Author: Roland Scheidegger <sroland@vmware.com>
Date: Mon May 21 23:21:50 2012 +0200
gallivm: (trivial) return scalar in lp_build_extract_range for length 1 vectors
Our type system on top of llvm's one doesn't generally support vectors of
length 1, instead using scalars. So we should return a scalar from this
function instead of having to bitcast the vector with length 1 later elsewhere.
commit 80c71c621f9391f0f9230460198d861643324876
Author: James Benton <jbenton@vmware.com>
Date: Tue May 22 17:49:15 2012 +0100
draw: Fixed bad merge error
commit c47401cfad0c9167de20ff560654f533579f452c
Author: James Benton <jbenton@vmware.com>
Date: Tue May 22 15:29:30 2012 +0100
draw: Updated store_clip to store whole vectors instead of individual elements.
commit 2d9c1ad74b0b0b41861fffcecde39f09cc27f1cf
Author: James Benton <jbenton@vmware.com>
Date: Tue May 22 15:28:32 2012 +0100
gallivm: Added lp_build_fetch_rgba_aos_array.
A version of lp_build_fetch_rgba_aos which is targeted at simple array formats.
Reads the whole vector from memory in one, instead of reading each element
individually.
Tested with mesa tests and demos.
commit ff7805dc2b6ef6d8b11ec4e54aab1633aef29ac8
Author: James Benton <jbenton@vmware.com>
Date: Tue May 22 15:27:40 2012 +0100
gallivm: Added lp_build_pad_vector.
This function pads a vector with undef to a desired length.
commit 701f50acef24a2791dabf4730e5b5687d6eb875d
Author: James Benton <jbenton@vmware.com>
Date: Fri May 18 17:27:19 2012 +0100
util: Added util_format_is_array.
This function checks whether a format description is in a simple array format.
commit 5e0a7fa543dcd009de26f34a7926674190fa6246
Author: James Benton <jbenton@vmware.com>
Date: Fri May 18 19:13:47 2012 +0100
draw: Removed draw_llvm_translate_from and draw/draw_llvm_translate.c.
This is "replaced" by adding an optimised path in lp_build_fetch_rgba_aos
in an upcoming patch.
commit 8c886d6a7dd3fb464ecf031de6f747cb33e5361d
Author: James Benton <jbenton@vmware.com>
Date: Wed May 16 15:02:31 2012 +0100
draw: Modified store_aos to write the vector as one, not individual elements.
commit 37337f3d657e21dfd662c7b26d61cb0f8cfa6f17
Author: James Benton <jbenton@vmware.com>
Date: Wed May 16 14:16:23 2012 +0100
draw: Changed aos_to_soa to use lp_build_transpose_aos.
commit bd2b69ce5d5c94b067944d1dcd5df9f8e84548f1
Author: James Benton <jbenton@vmware.com>
Date: Fri May 18 19:14:27 2012 +0100
draw: Changed soa_to_aos to use lp_build_transpose_aos.
commit 0b98a950d29a116e82ce31dfe7b82cdadb632f2b
Author: James Benton <jbenton@vmware.com>
Date: Fri May 18 18:57:45 2012 +0100
gallivm: Added lp_build_transpose_aos which converts between aos and soa.
commit 69ea84531ad46fd145eb619ed1cedbe97dde7cb5
Author: James Benton <jbenton@vmware.com>
Date: Fri May 18 18:57:01 2012 +0100
gallivm: Added lp_build_interleave2_half aimed at AVX unpack instructions.
commit 7a4cb1349dd35c18144ad5934525cfb9436792f9
Author: José Fonseca <jfonseca@vmware.com>
Date: Tue May 22 11:54:14 2012 +0100
gallivm: Fix build on Windows.
MC-JIT not yet supported there.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
commit afd105fc16bb75d874e418046b80d9cc578818a1
Author: James Benton <jbenton@vmware.com>
Date: Fri May 18 16:17:26 2012 +0100
llvmpipe: Added a error counter to lp_test_conv.
Useful for keeping track of progress when fixing errors!
Signed-off-by: José Fonseca <jfonseca@vmware.com>
commit b644907d08c10a805657841330fc23db3963d59c
Author: James Benton <jbenton@vmware.com>
Date: Fri May 18 16:16:46 2012 +0100
llvmpipe: Changed known failures in lp_test_conv.
To comply with the recent fixes to lp_bld_conv.
Signed-off-by: José Fonseca <jfonseca@vmware.com>
commit d7061507bd94f6468581e218e61261b79c760d4f
Author: James Benton <jbenton@vmware.com>
Date: Fri May 18 16:14:38 2012 +0100
llvmpipe: Added fixed point types tests to lp_test_conv.
Signed-off-by: José Fonseca <jfonseca@vmware.com>
commit 146b3ea39b4726dbe125ac666bd8902ea3d6ca8c
Author: James Benton <jbenton@vmware.com>
Date: Fri May 18 16:26:35 2012 +0100
llvmpipe: Changed lp_test_conv src/dst alignment to be correct.
Now based on the define rather than a fixed number.
Signed-off-by: José Fonseca <jfonseca@vmware.com>
commit f3b57441f834833a4b142a951eb98df0aa874536
Author: James Benton <jbenton@vmware.com>
Date: Fri May 18 16:06:44 2012 +0100
gallivm: Fixed erroneous optimisation in lp_build_min/max.
Previously assumed normalised was 0 to 1, but it can be -1 to 1
if type is signed.
Tested with lp_test_conv and lp_test_format, reduced errors.
Signed-off-by: José Fonseca <jfonseca@vmware.com>
commit a0613382e5a215cd146bb277646a6b394d376ae4
Author: James Benton <jbenton@vmware.com>
Date: Fri May 18 16:04:49 2012 +0100
gallivm: Compensate for lp_const_offset in lp_build_conv.
Fixing a /*FIXME*/ to remove errors in integer conversion in lp_build_conv.
Tested using lp_test_conv and lp_test_format, reduced errors.
Signed-off-by: José Fonseca <jfonseca@vmware.com>
commit a3d2bf15ea345bc8a0664f8f441276fd566566f3
Author: James Benton <jbenton@vmware.com>
Date: Fri May 18 16:01:25 2012 +0100
gallivm: Fixed overflow in lp_build_clamped_float_to_unsigned_norm.
Tested with lp_test_conv and lp_test_format, reduced errors.
Signed-off-by: José Fonseca <jfonseca@vmware.com>
commit e7b1e76fe237613731fa6003b5e1601a2e506207
Author: José Fonseca <jfonseca@vmware.com>
Date: Mon May 21 20:07:51 2012 +0100
gallivm: Fix build with LLVM 2.6
Trivial, and useful.
commit d3c6bbe5c7f5ba1976710831281ab1b6a631082d
Author: José Fonseca <jfonseca@vmware.com>
Date: Tue May 15 17:15:59 2012 +0100
gallivm: Enable MCJIT/AVX with vanilla LLVM 3.1.
Add the necessary C++ glue, so that we don't need any modifications
to the soon to be released LLVM 3.1.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
commit 724a019a14d40fdbed21759a204a2bec8a315636
Author: José Fonseca <jfonseca@vmware.com>
Date: Mon May 14 22:04:06 2012 +0100
gallivm: Use HAVE_LLVM 0x0301 consistently.
commit af6991e2a3868e40ad599b46278551b794839748
Author: José Fonseca <jfonseca@vmware.com>
Date: Mon May 14 21:49:06 2012 +0100
gallivm: Add MCRegisterInfo.h to silence benign warnings about missing implementation.
Trivial.
commit 6f8a1d75458daae2503a86c6b030ecc4bb494e23
Author: Vinson Lee <vlee@freedesktop.org>
Date: Mon Apr 2 22:14:15 2012 -0700
gallivm: Pass in a MCInstrInfo to createMCInstPrinter on llvm-3.1.
llvm-3.1svn r153860 makes MCInstrInfo available to the MCInstPrinter.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
commit 62555b6ed8760545794f83064e27cddcb3ce5284
Author: Vinson Lee <vlee@freedesktop.org>
Date: Tue Mar 27 21:51:17 2012 -0700
gallivm: Fix method overriding in raw_debug_ostream.
Use matching type qualifers to avoid method hiding.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
commit 6a9bd784f4ac68ad0a731dcd39e5a3c39989f2be
Author: Vinson Lee <vlee@freedesktop.org>
Date: Tue Mar 13 22:40:52 2012 -0700
gallivm: Fix createOProfileJITEventListener namespace with llvm-3.1.
llvm-3.1svn r152620 refactored the OProfile profiling code.
createOProfileJITEventListener was moved from the llvm namespace to the
llvm::JITEventListener namespace.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
commit b674955d39adae272a779be85aa1bd665de24e3e
Author: Vinson Lee <vlee@freedesktop.org>
Date: Mon Mar 5 22:00:40 2012 -0800
gallivm: Pass in a MCRegisterInfo to MCInstPrinter on llvm-3.1.
llvm-3.1svn r152043 changes createMCInstPrinter to take an additional
MCRegisterInfo argument.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
commit 11ab69971a8a31c62f6de74905dbf8c02884599f
Author: Vinson Lee <vlee@freedesktop.org>
Date: Wed Feb 29 21:20:53 2012 -0800
Revert "gallivm: Change getExtent and readByte to non-const with llvm-3.1."
This reverts commit d5a6c17254.
llvm-3.1svn r151687 makes MemoryObject accessor members const again.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
commit 339960c82d2a9f5c928ee9035ed31dadb7f45537
Author: Roland Scheidegger <sroland@vmware.com>
Date: Mon May 14 16:19:56 2012 +0200
gallivm: (trivial) fix assertion failure for mipmapped 1d textures
In lp_build_rho, we may end up with a 1-element vector (for mipmapped 1d
textures), but in this case we require the type to be a non-vector type,
so need a cast.
commit 9d73edb727bd6d196030dc3026b7bf0c574b3e19
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu May 10 18:12:07 2012 +0200
gallivm: prepare for per-quad lod calculations for large vectors
to be able to handle multiple quads at once in texture sampling and still
do lod calculations per quad, it is necessary to get the per-quad derivatives
into the lp_build_rho function.
Until now these derivative values were just scalars, which isn't going to work.
So we now use vectors, and since the interface needs to change we also do some
different (slightly more efficient) packing of the values.
For 8-wide vectors the packed derivative values for 3 coords would look like
this, this scales to a arbitrary (multiple of 4) vector size:
ds1dx ds1dy dt1dx dt1dy ds2dx ds2dy dt2dx dt2dy
dr1dx dr1dy _____ _____ dr2dx dr2dy _____ _____
The second vector will be unused for 1d and 2d textures.
To facilitate future changes the derivative values are put into a struct, since
quite some functions just pass these values through.
The generated code seems to be very slightly better for 2d textures (with
4-wide vectors) than before with sse2 (if you have a cpu with physical 128bit
simd units - otherwise it's probably not a win).
v2: suggestions from José, rename variables, add comments, use swizzle helper
commit 0aa21de0d31466dac77b05c97005722e902517b8
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu May 10 18:10:31 2012 +0200
gallivm: add undefined swizzle handling to lp_build_swizzle_aos
This is useful for vectors with "holes", it lets llvm choose the most
efficient shuffle instructions if some elements aren't needed without having to
worry what elements to manually pick otherwise.
commit 00faf3f370e7ce92f5ef51002b0ea42ef856e181
Author: José Fonseca <jfonseca@vmware.com>
Date: Fri May 4 17:25:16 2012 +0100
gallivm: Get the LLVM IR optimization passes before JIT compilation.
MC-JIT engine compiles the module immediately on creation, so the optimization
passes were being run too late.
So now we create a target data layout from a string, that matches the
ABI parameters reported by the compiler.
The backend optimization passes were always been run, so the performance
improvement is modest (3% on multiarb mesa demo).
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
commit 40a43f4e2ce3074b5ce9027179d657ebba68800a
Author: Roland Scheidegger <sroland@vmware.com>
Date: Wed May 2 16:03:54 2012 +0200
gallivm: (trivial) fix wrong define used in lp_build_pack2
should fix stack-smashing crashes.
commit e6371d0f4dffad4eb3b7a9d906c23f1c88a2ab9e
Author: Roland Scheidegger <sroland@vmware.com>
Date: Mon Apr 30 21:25:29 2012 +0200
gallivm: add perf warnings when not using intrinsics with 256bit vectors
Helper functions using integer sse2 intrinsics could split the vectors with AVX
instead of using generic fallback (which should be faster).
We don't actually expect to hit these paths (hence don't fix them up to actually
do the vector splitting) so just emit warnings (for those functions where it's
obvious doing split/intrinsic is faster than using generic path).
Only emit warnings for 256bit vectors since we _really_ don't expect to hit
arbitrary large vectors which would affect a lot more functions.
The warnings do not actually depend on avx since the same logic applies to
plain sse2 too (but of course again there's _really_ no reason we should hit
these functions with 256bit vectors without avx).
commit 8a9ea701ea7295181e846c6383bf66a5f5e47637
Author: Roland Scheidegger <sroland@vmware.com>
Date: Tue May 1 20:37:07 2012 +0200
gallivm: split vectors manually for avx in lp_build_pack2 (v2)
There's 2 reasons for this:
First, there's a llvm bug (fixed in 3.1) which generates tons of byte
inserts/extracts otherwise, and second, more importantly, we want to use
pack intrinsics instead of shuffles.
We do this in lp_build_pack2 and not the calling code (aos sample path)
because potentially other callers might find that useful too, even if
for larger sequences of code using non-native vector sizes it might be
better to manually split vectors.
This should boost texture performance in the aos path considerably.
v2: fix issues with intrinsics types with old llvm
commit 27ac5b48fa1f2ea3efeb5248e2ce32264aba466e
Author: Roland Scheidegger <sroland@vmware.com>
Date: Tue May 1 20:26:22 2012 +0200
llvmpipe: refactor lp_build_pack2 (v2)
prettify, and it's unnecessary to assert when there's no intrinsic due to
unsupported bit width - the shuffle path will work regardless.
In contrast lp_build_packs2, should only rely on lp_build_pack2 doing the
clamping for element sizes for which there is a sse2 intrinsic.
v2: fix bug spotted by Jose regarding the intrinsic type for packusdw
on old llvm versions.
commit ddf279031f0111de4b18eaf783bdc0a1e47813c8
Author: Roland Scheidegger <sroland@vmware.com>
Date: Tue May 1 20:13:59 2012 +0200
gallivm: add src width check in lp_build_packs2()
not doing so would skip clamping even if no sse2 pack instruction is
available, which is incorrect (in theory only, such widths would also always
hit a (unnecessary) assertion in lp_build_pack2().
commit e7f0ad7fe079975eae7712a6e0c54be4fae0114b
Author: Roland Scheidegger <sroland@vmware.com>
Date: Fri Apr 27 15:57:00 2012 +0200
gallivm: (trivial) fix crash-causing typo for npot textures with avx
commit 28a9d7f6f655b6ec508c8a3aa6ffefc1e79793a0
Author: Roland Scheidegger <sroland@vmware.com>
Date: Wed Apr 25 19:38:45 2012 +0200
gallivm: (trivial) remove code mistakenly added twice.
commit d5926537316f8ff67ad0a52e7242f7c5478d919b
Author: Roland Scheidegger <sroland@vmware.com>
Date: Tue Apr 24 21:16:15 2012 +0200
gallivm: add a new avx aos sample path (v2)
Try to avoid mixing float and int address calculations. This does texture wrap
modes with floats, and then the offset calculations still with ints (because
of lack of precision with floats, though we could do some effort to make it work
with not too large (16MB) textures).
This also handles wrap repeat mode with npot-sized textures differently than
either the old soa or aos int path (likely way faster but untested).
Otherwise the actual address wrap code is largely similar to the soa path (not
quite the same as this one also has some int code), it should get used by avx
soa sampling later as well but doesn't handle more complex address modes yet
(this will also have the benefit that we can use aos sampling path for all
texture address modes).
Generated code for that looks reasonable, but still does not split vectors
explicitly for fetch/filter which means still get hit by llvm (fixed upstream)
which generates hundreds of pinsrb/pextrb instead of two shuffles.
It is not obvious though if it's much of a win over just doing address calcs
4-wide but with ints, even if it is definitely much less instructions on avx.
piglit's texwrap seems to look exactly the same but doesn't test
neither the non-normalized nor the npot cases.
v2: fix comments, prettify based on Brian's and Jose's feedback.
commit bffecd22dea66fb416ecff8cffd10dd4bdb73fce
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu Apr 19 01:58:29 2012 +0200
gallivm: refactor aos lp_build_sample_image_nearest/linear
split them up to separate address calculations and fetching/filtering.
Need this for being able to do 8-wide float address calcs and 4-wide
fetch/filter later (for avx). Plus the functions were very big scary monsters
anyway (in particular lp_build_sample_image_linear).
commit a80b325c57529adddcfa367f96f03557725c4773
Author: Roland Scheidegger <sroland@vmware.com>
Date: Mon Apr 16 17:17:18 2012 +0200
gallivm: fix lp_build_resize when truncating width but expanding vector size
Missed this case which I thought was impossible - the assertion for it was
right after the division by zero...
(AoS) texture sampling may ask us to do this, for things like 8 4x32int
vectors to 1 32x8int vector conversion (eventually, we probably don't want
this to happen).
commit f9c8337caa3eb185830d18bce8b95676a065b1d7
Author: Roland Scheidegger <sroland@vmware.com>
Date: Sat Apr 14 18:00:59 2012 +0200
gallivm: fix cube maps with larger vectors
This makes the branchless cube face selection code work with larger vectors.
Because the complexity is quite high (cannot really be improved it seems,
per-face selection would reduce complexity a lot but this leads to errors
unless the derivatives are calculated all from the same face which almost
doubles the work to be done) it is still slower than the branching version,
hence only enable this with large vectors.
It doesn't actually do per-quad face selection yet (only makes sense with
matching lod selection, in fact it will select the same face for all pixels
based on the average of the first four pixels for now) but only different
shuffles are required to make it work (the branching version actually should
work with larger vectors too now thanks to the improved horizontal add but of
course it cannot be extended to really select the face per-quad unless doing
branching per quad).
commit 7780c58869fc9a00af4f23209902db7e058e8a66
Author: Roland Scheidegger <sroland@vmware.com>
Date: Fri Mar 30 21:11:12 2012 +0100
llvmpipe: (trivial) fix compiler warning
and also clarify comment regarding availability of popcnt instruction.
commit a266dccf477df6d29a611154e988e8895892277e
Author: Roland Scheidegger <sroland@vmware.com>
Date: Fri Mar 30 14:21:07 2012 +0100
gallivm: remove unneeded members in lp_build_sample_context
Minor cleanup, the texture width, height, depth aren't accessed in their
scalar form anywhere. Makes it more obvious those values should probably be
fetched already vectorized (but this requires more invasive changes)...
commit b678c57fb474e14f05e25658c829fc04d2792fff
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu Mar 29 15:53:55 2012 +0100
gallivm: add a helper for concatenating vectors
Similar to the extract_range helper intended to get around slow code generated
by llvm for 128bit insertelements.
Concatenating two 128bit vectors this way will result in a single vinsertf128
operation rather than two 64bit stores plus one 128bit load, though it might be
mildly useful for other purposes as well.
commit 415ff228bcd0cf5e44a4c15350a661f0f5520029
Author: Roland Scheidegger <sroland@vmware.com>
Date: Wed Mar 28 19:41:15 2012 +0100
gallivm: add a custom 2x8f->1x16ub avx conversion path
Similar to the existing 4x4f->1x16ub sse2 path, shaves off a couple
instructions (min/max mostly) because it relies on pack intrinsics clamping.
commit 78c08fc89f8fbcc6dba09779981b1e873e2a0299
Author: Roland Scheidegger <sroland@vmware.com>
Date: Wed Mar 28 18:44:07 2012 +0100
gallivm: add avx arithmetic intrinsics
Add all avx intrinsics for arithmetic functions (with the exception
of the horizontal add function which needs another look).
Seems to pass basic tests.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
commit a586caa2800aa5ce54c173f7c0d4fc48153dbc4e
Author: Roland Scheidegger <sroland@vmware.com>
Date: Wed Mar 28 15:31:35 2012 +0100
gallivm: add avx logic intrinsics
Add the blend intrinsics for 8-wide float and 4-wide double vectors.
Since we lack 256bit int instructions these are used for int vectors as well,
though obviously not for byte or word element values.
The comparison intrinsics aren't extended for avx since these are only used
for pre-2.7 llvm versions.
commit 70275e4c13c89315fc2560a4c488c0e6935d5caf
Author: Roland Scheidegger <sroland@vmware.com>
Date: Wed Mar 28 00:40:53 2012 +0100
gallivm: new helper function for extract shuffles.
Based on José's idea as we can need that in a couple places.
Note that such shuffles should not be used lightly, since data layout
of <4 x i8> is different to <16 x i8> for instance, hence might cause
data rearrangement.
commit 4d586dbae1b0c55915dda1759d2faea631c0a1c2
Author: Roland Scheidegger <sroland@vmware.com>
Date: Tue Mar 27 18:27:25 2012 +0100
gallivm: (trivial) don't overallocate shuffle variable
using wrong define meant huge array...
commit 06b0ec1f6d665d98c135f9573ddf4ba04b2121ad
Author: Roland Scheidegger <sroland@vmware.com>
Date: Tue Mar 27 17:54:20 2012 +0100
gallivm: don't do per-element extract/insert for vector element resize
Instead of doing per-element extract/insert if the src vectors
and dst vector differ in total size (which generates atrocious code)
first change the src vectors size by using shuffles to destination
vector size.
We can still do better than that on AVX for packing to color buffer
(by exploiting pack intrinsics characteristics hence eleminating the
need for some clamps) but this already generates much better code.
v2: incorporate feedback from José, Keith and use shuffle instead of
bitcasts/extracts. Due to llvm deficiencies the latter cause all data
to get moved to GPRs and back in pieces (even though the data in the
regs actually stays the same...).
commit c9970d70e05f95d3f52fe7d2cd794176a52693aa
Author: Roland Scheidegger <sroland@vmware.com>
Date: Fri Mar 23 19:33:19 2012 +0000
gallivm: fix bug in simple position interpolation
Accidental use of position attribute instead of just pixel coordinates.
Caused failures in piglit glsl-fs-ceil and glsl-fs-floor.
commit d0b6fcdb008d04d7f73d3d725615321544da5a7e
Author: Roland Scheidegger <sroland@vmware.com>
Date: Fri Mar 23 15:31:14 2012 +0000
gallivm: fix emission of ceil opcode
lp_build_ceil seems more appropriate than lp_build_trunc.
This seems to be never hit though someone performs some ceil
to floor magic.
commit d97fafed7e62ffa6bf76560a92ea246a1a26d256
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu Mar 22 11:46:52 2012 +0000
gallivm: new vectorized path for cubemap calculations
should be faster when adapted to multiple quads as only selection masks need to be different.
The code is more or less a per-pixel version adapted to only do it per quad.
A per pixel version would be much simpler (could drop 2 selects, 6 broadcasts and the messy
horizontal add of 3 vectors at the expense of only 2 more absolute value instructions -
would also just work for arbitary large vectors).
This version doesn't yet work with larger vectors because the horizontal add isn't adjusted
to be able to work with 2x4 vectors (and also because face selection wouldn't be done per
quad just per block though that would be only a correctness issue just as with lod selection).
The downside is this code is quite a bit slower. On a Core2 it can be sped up by disabling the
hw blend instructions for selection and using logicop fallbacks instead, but it is still slower
than the old code, hence leave that in for now. Probably will chose one or the other version
based on vector length in the end.
commit b375fbb18a3fd46859b7fdd42f3e9908ea4ff9a3
Author: Roland Scheidegger <sroland@vmware.com>
Date: Wed Mar 21 14:42:29 2012 +0000
gallivm: fix optimized occlusion query intrinsic name
commit a9ba0a3b611e48efbb0e79eb09caa85033dbe9a2
Author: José Fonseca <jfonseca@vmware.com>
Date: Wed Mar 21 16:19:43 2012 +0000
draw,gallivm,llvmpipe: Call gallivm_verify_function everywhere.
commit f94c2238d2bc7383e088b8845b7410439a602071
Author: Roland Scheidegger <sroland@vmware.com>
Date: Tue Mar 20 18:54:10 2012 +0000
gallivm: optimize calculations for cube maps a bit
this does some more vectorized calculations and uses horizontal adds if possible.
A definite win with sse3 otherwise it doesn't seem to make much of a difference.
In any case this is arithmetically identical, cannot handle larger vectors.
Should be useful as a reference point against larger vector version later...
commit 21a2c1cf3c8e1ac648ff49e59fdc0e3be77e2ebb
Author: Roland Scheidegger <sroland@vmware.com>
Date: Tue Mar 20 15:16:27 2012 +0000
llvmpipe: slight optimization of occlusion queries
using movmskps when available.
While this is slightly better for cpus without popcnt we should
really sum the vectors ourselves (it is also possible to cast to i4 before
doing the popcnt but that doesn't help that much neither since llvm
is using some optimized popcnt version for i32)
commit 5ab5a35f216619bcdf55eed52b0db275c4a06c1b
Author: Roland Scheidegger <sroland@vmware.com>
Date: Tue Mar 20 13:32:11 2012 +0000
llvmpipe: fix occlusion queries with larger vectors
need to adjust casts etc.
commit ff95e6fdf5f16d4ef999ffcf05ea6e8c7160b0d5
Author: José Fonseca <jfonseca@vmware.com>
Date: Mon Mar 19 20:15:25 2012 +0000
gallivm: Restore optimization passes.
commit 57b05b4b36451e351659e98946dae27be0959832
Author: Roland Scheidegger <sroland@vmware.com>
Date: Mon Mar 19 19:34:22 2012 +0000
llvmpipe: use existing min2 macro
commit bc9a20e19b4f600a439f45679451f2e87cd4b299
Author: Roland Scheidegger <sroland@vmware.com>
Date: Mon Mar 19 19:07:27 2012 +0000
llvmpipe: add some safeguards against really large vectors
As per José's suggestion, prevent things from blowing up if some cpu
would have 1024bit or larger vectors.
commit 0e2b525e5ca1c5bbaa63158bde52ad1c1564a3a9
Author: Roland Scheidegger <sroland@vmware.com>
Date: Mon Mar 19 18:31:08 2012 +0000
llvmpipe: fix mask generation for uberwide vectors
this was the only piece preventing 16-wide vectors from working
(apart from the LP_MAX_VECTOR_WIDTH define that is), which is the maximum
as we don't get more pixels in the fragment shader at once.
Hence adjust that so things could be tested properly with that size
even though there seems to be no practical value.
commit 3c8334162211c97f3a11c7f64e9e5a2a91ad9656
Author: Roland Scheidegger <sroland@vmware.com>
Date: Mon Mar 19 18:19:41 2012 +0000
llvmpipe: fix the simple interpolation method with larger vectors
so both methods actually _really_ work now. Makes textures look
nice with larger vectors...
commit 1cb0464ef8871be1778d43b0c56adf9c06843e2d
Author: Roland Scheidegger <sroland@vmware.com>
Date: Mon Mar 19 17:26:35 2012 +0000
llvmpipe: fix mask generation and position interpolation with 8-wide vectors
trivial bugs, with these things start to look somewhat reasonable.
Textures though have some swizzling issues it seems.
commit 168277a63ef5b72542cf063c337f2d701053ff4b
Author: Roland Scheidegger <sroland@vmware.com>
Date: Mon Mar 19 16:04:03 2012 +0000
llvmpipe: don't overallocate variables
we never have more than 16 (stamp size) / 4 (minimum possible vector size).
(With larger vectors those variables are still overallocated a bit.)
commit 409b54b30f81ed0aa9ed0b01affe15c72de9abd2
Author: Roland Scheidegger <sroland@vmware.com>
Date: Mon Mar 19 15:56:48 2012 +0000
llvmpipe: add some 32f8 formats to lp_test_conv
Also add the ability to handle different sized vectors.
commit 55dcd3af8366ebdac0af3cdb22c2588f24aa18ce
Author: Roland Scheidegger <sroland@vmware.com>
Date: Mon Mar 19 15:47:27 2012 +0000
gallivm: handle different sized vectors in conversion / pack
only fully generic path for now (extract/insert per element).
commit 9c040f78c54575fcd94a8808216cf415fe8868f6
Author: Roland Scheidegger <sroland@vmware.com>
Date: Sun Mar 18 00:58:28 2012 +0100
llvmpipe: fix harmless use of unitialized values
commit 551e9d5468b92fc7d5aa2265db9a52bb1e368a36
Author: Roland Scheidegger <sroland@vmware.com>
Date: Fri Mar 16 23:31:21 2012 +0100
gallivm: drop special path in extract_broadcast with different sized vectors
Not needed, llvm can handle shuffles with different sized result vector just
fine. Should hopefully generate the same code in the end, but simpler IR.
commit 44da531119ffa07a421eaa041f63607cec88f6f8
Author: Roland Scheidegger <sroland@vmware.com>
Date: Fri Mar 16 23:28:49 2012 +0100
llvmpipe: adapt interpolation for handling multiple quads at once
this is still WIP there are actually two methods possible not quite
sure what makes the most sense, so there's code for both for now:
1) the iterative method as used before (compute attrib values at upper left
corner of stamp and upper left corner of each quad initially).
It is improved to handle more than one quad at once, and also do some more vectorized
calculations initially for slightly better code - newer cpus have full throughput with
4 wide float vectors, hence don't try to code up a path which might be faster if there's
just one channel active per attribute.
2) just do straight interpolation for each pixel.
Method 2) is more work per quad, but less initially - if all quads are executed
significantly more overall though. But this might change with larger vector lengths.
This method would also be needed if we'd do some kind of active quad merging when
operating on multiple quads at once.
This path contains some hack to force llvm to generate better code, it is still far
from ideal though, still generates far too many unnecessary register spills/reloads.
Both methods should work with different sized vectors.
Not very well tested yet, still seems to work with four-wide vectors, need changes
elsewhere to be able to test with wider vectors.
commit be5d3e82e2fe14ad0a46529ab79f65bf2276cd28
Author: José Fonseca <jfonseca@vmware.com>
Date: Fri Mar 16 20:59:37 2012 +0000
draw: Cleanup.
commit f85bc12c7fbacb3de2a94e88c6cd2d5ee0ec0e8d
Author: José Fonseca <jfonseca@vmware.com>
Date: Fri Mar 16 20:43:30 2012 +0000
gallivm: More module compilation refactoring.
commit d76f093198f2a06a93b2204857e6fea5fd0b3ece
Author: José Fonseca <jfonseca@vmware.com>
Date: Thu Mar 15 21:29:11 2012 +0000
llvmpipe: Use gallivm_compile/free_function() in linear code.
Should had been done before.
commit 122e1adb613ce083ad739b153ced1cde61dfc8c0
Author: Roland Scheidegger <sroland@vmware.com>
Date: Tue Mar 13 14:47:10 2012 +0100
llvmpipe: generate partial pixel mask for multiple quads
still works with one quad, cannot be tested yet with more
At least for now always fixed order with multiple quads.
commit 4c4f15081d75ed585a01392cd2dcce0ad10e0ea8
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu Mar 8 22:09:24 2012 +0100
llvmpipe: refactor state setup a bit
Refactor to make it easier to emit (and potentially later fetch in fs)
coefficients for multiple attributes at once.
Need to think more about how to make this actually happen however, the
problem is different attributes can have different interpolation modes,
requiring different handling in both setup and fs (though linear and
perspective handling is close).
commit 9363e49722ff47094d688a4be6f015a03fba9c79
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu Mar 8 19:23:23 2012 +0100
llvmpipe: vectorize tri offset calc
cuts number of instructions in quad-offset-factor from 107 to 75.
This code actually duplicated the (scalar) code calculating the determinant
except it used different vertex order (leading to different sign but it doesn't
matter) hence llvm could not have figured out it's the same (of course with
determinant vectorized in the other place that wouldn't have worked any longer
neither).
Note this particular piece doesn't actually vectorize well, not many arithmetic
instructions left but tons of shuffle instructions...
Probably would need to work on n tris at a time for better vectorization.
commit 63169dcb9dd445c94605625bf86d85306e2b4297
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu Mar 8 03:11:37 2012 +0100
llvmpipe: vectorize some scalar code in setup
reduces number of arithmetic instructions, and avoids loading
vector x,y values twice (once as scalars once as vectors).
Results in a reduction of instructions from 76 to 64 in fs setup for glxgears
(16%) on a cpu with sse41.
Since this code uses vec2 disguised as vec4, on old cpus which had physical
64bit sse units (pre-Core2) it probably is less of a win in practice (and if
you have no vectors you can only hope llvm eliminates the arithmetic for
unneeded elements).
commit 732ecb877f951ab89bf503ac5e35ab8d838b58a1
Author: Roland Scheidegger <sroland@vmware.com>
Date: Wed Mar 7 00:32:24 2012 +0100
draw: fix clipping
bug introduced by 4822fea3f0440b5205e957cd303838c3b128419c broke
clipping pretty badly (verified with lineclip test)
commit ef5d90b86d624c152d200c7c4056f47c3c6d2688
Author: Roland Scheidegger <sroland@vmware.com>
Date: Tue Mar 6 23:38:59 2012 +0100
draw: don't store vertex header per attribute
storing the vertex header once per attribute is totally unnecessary.
Some quick look at the generated assembly says llvm in fact cannot optimize
away the additional stores (maybe due to potentially aliasing pointers
somewhere).
Plus, this makes the code cleaner and also allows using a vector "or"
instead of scalar ones.
commit 6b3a5a57b0b9850854cfbd7b586e4e50102dda71
Author: Roland Scheidegger <sroland@vmware.com>
Date: Tue Mar 6 19:11:01 2012 +0100
draw: do the per-vertex "boolean" clipmask "or" with vectors
no point extracting the values and doing it per component.
Doesn't help that much since we still extract the values elsewhere anyway.
commit 36519caf1af40e4480251cc79a2d527350b7c61f
Author: Roland Scheidegger <sroland@vmware.com>
Date: Fri Mar 2 22:27:01 2012 +0100
gallivm: fix lp_build_extract_broadcast with different sized vectors
Fix the obviously wrong argument, so it doesn't blow up.
commit 76d0ac3ad85066d6058486638013afd02b069c58
Author: José Fonseca <jfonseca@vmware.com>
Date: Fri Mar 2 12:16:23 2012 +0000
draw: Compile per module and not per function (WIP).
Enough to get gears w/ LLVM draw + softpipe to work on AVX doing:
GALLIUM_DRIVER=softpipe SOFTPIPE_USE_LLVM=yes glxgears
But still hackish -- will need to rethink and refactor this.
commit 78e32b247d2a7a771be9a1a07eb000d1e54ea8bd
Author: José Fonseca <jfonseca@vmware.com>
Date: Wed Feb 29 12:01:05 2012 +0000
llvmpipe: Remove lp_state_setup_fallback.
Never used.
commit 6895d5e40d19b4972c361e8b83fdb7eecda3c225
Author: José Fonseca <jfonseca@vmware.com>
Date: Mon Feb 27 19:14:27 2012 +0000
llvmpipe: Don't emit EMMS on x86
We already take precautions to ensure that LLVM never emits MMX code.
commit 4822fea3f0440b5205e957cd303838c3b128419c
Author: Roland Scheidegger <sroland@vmware.com>
Date: Wed Feb 29 15:58:19 2012 +0100
draw: modifications for larger vector sizes
We want to be able to use larger vectors especially for running the vertex
shader. With this patch we build soa vectors which might have a different
length than 4.
Note that aos structures really remain the same, only when aos structures
are converted to soa potentially different sized vectors are used.
Samplers probably don't work yet, didn't look at them.
Testing done:
glxgears works with both 128bit and 256bit vectors.
commit f4950fc1ea784680ab767d3dd0dce589f4e70603
Author: José Fonseca <jfonseca@vmware.com>
Date: Wed Feb 29 15:51:57 2012 +0100
gallivm: override native vector width with LP_NATIVE_VECTOR_WIDTH env var for debug
commit 6ad6dbf0c92f3bf68ae54e5f2aca035d19b76e53
Author: José Fonseca <jfonseca@vmware.com>
Date: Wed Feb 29 15:51:24 2012 +0100
draw: allocate storage with alignment according to native vector width
commit 7bf0e3e7c9bd2469ae7279cabf4c5229ae9880c1
Author: José Fonseca <jfonseca@vmware.com>
Date: Fri Feb 24 19:06:08 2012 +0000
gallivm: Fix comment grammar.
Was missing several words. Spotted by Roland.
commit b20f1b28eb890b2fa2de44a0399b9b6a0d453c52
Author: José Fonseca <jfonseca@vmware.com>
Date: Thu Feb 23 19:22:09 2012 +0000
gallivm: Use MC-JIT on LLVM 3.1 + (i.e, SVN)
MC-JIT
Note: MC-JIT is still WIP. For this to work correctly it requires
LLVM changes which are not yet upstream.
commit b1af4dfcadfc241fd4023f4c3f823a1286d452c0
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu Feb 23 20:03:15 2012 +0100
llvmpipe: use new lp_type_width() helper in lp_test_blend
commit 04e0a37e888237d4db2298f31973af459ef9c95f
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu Feb 23 19:50:34 2012 +0100
llvmpipe: clean up lp_test_blend a little
Using variables just sized and aligned right makes it a bit more obvious
what's going on.
The test still only tests vector length 4.
For AoS anything else probably isn't going to work.
For SoA other lengths should work (at least with floats).
commit e61c393d3ec392ddee0a3da170e985fda885a823
Author: José Fonseca <jfonseca@vmware.com>
Date: Thu Feb 23 17:48:30 2012 +0000
gallivm: Ensure vector width consistency.
Instead of assuming that everything is the max native size.
commit 330081ac7bc41c5754a92825e51456d231bf84dd
Author: José Fonseca <jfonseca@vmware.com>
Date: Thu Feb 23 17:44:14 2012 +0000
draw: More simd vector width consistency fixes.
commit d90ca002753596269e37297e2e6c139b19f29f03
Author: José Fonseca <jfonseca@vmware.com>
Date: Thu Feb 23 17:43:00 2012 +0000
gallivm: Remove unused lp_build_int32_vec4_type() helper.
commit cae23417824d75869c202aaf897808d73a2c1db0
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu Feb 23 17:32:16 2012 +0100
gallivm: use global variable for native vector width instead of define
We do not know the simd extensions (and hence the simd width we should use)
available at compile time.
At least for now keep a define for maximum vector width, since a global
variable obviously can't be used to adjust alignment of automatic stack
variables.
Leave the runtime-determined value at 128 for now in all cases.
commit 51270ace6349acc2c294fc6f34c025c707be538a
Author: José Fonseca <jfonseca@vmware.com>
Date: Thu Feb 23 15:41:02 2012 +0000
gallivm: Add a hunk inadvertedly lost when rebasing.
commit bf256df9cfdd0236637a455cbaece949b1253e98
Author: José Fonseca <jfonseca@vmware.com>
Date: Thu Feb 23 14:24:23 2012 +0000
llvmpipe: Use consistent vector width in depth/stencil test.
commit 5543b0901677146662c44be2cfba655fd55da94b
Author: José Fonseca <jfonseca@vmware.com>
Date: Thu Feb 23 14:19:59 2012 +0000
draw: Use a consistent the vector register width.
Instead of 4x32 sometimes, LP_NATIVE_VECTOR_WIDTH other times.
commit eada8bbd22a3a61f549f32fe2a7e408222e5c824
Author: José Fonseca <jfonseca@vmware.com>
Date: Thu Feb 23 12:08:04 2012 +0000
gallivm: Remove garbagge collection.
MC-JIT will require one compilation per module (as opposed to one
compilation per function), therefore no state will be shared,
eliminating the need to do garbagge collection.
commit 556697ea0ed72e0641851e4fbbbb862c470fd7eb
Author: José Fonseca <jfonseca@vmware.com>
Date: Thu Feb 23 10:33:41 2012 +0000
gallivm: Move all native target initialization to lp_set_target_options().
commit c518e8f3f2649d5dc265403511fab4bcbe2cc5c8
Author: José Fonseca <jfonseca@vmware.com>
Date: Thu Feb 23 09:52:32 2012 +0000
llvmpipe: Create one gallivm instance for each test.
commit 90f10af8920ec6be6f2b1e7365cfc477a0cb111d
Author: José Fonseca <jfonseca@vmware.com>
Date: Thu Feb 23 09:48:08 2012 +0000
gallivm: Avoid LLVMAddGlobalMapping() in lp_bld_assert().
Brittle, complex, and unecesary. Just use function pointer constant.
commit 98fde550b33401e3fe006af59db4db628bcbf476
Author: José Fonseca <jfonseca@vmware.com>
Date: Thu Feb 23 09:21:26 2012 +0000
gallivm: Add a lp_build_const_func_pointer() helper.
To be reused in all places where we want to call C code.
commit 6cfedadb62c2ce5af8d75969bc95a607f3ece118
Author: José Fonseca <jfonseca@vmware.com>
Date: Thu Feb 23 09:44:41 2012 +0000
gallivm: Cleanup/simplify lp_build_const_string_variable.
- Move to lp_bld_const where it belongs
- Rename to lp_build_const_string
- take the length from the argument (and don't count the zero terminator twice)
- bitcast the constant to generic i8 *
commit db1d4018c0f1fa682a9da93c032977659adfb68c
Author: José Fonseca <jfonseca@vmware.com>
Date: Thu Feb 23 11:52:17 2012 +0000
gallivm: Set NoFramePointerElimNonLeaf to true where supported.
commit 088614164aa915baaa5044fede728aa898483183
Author: Roland Scheidegger <sroland@vmware.com>
Date: Wed Feb 22 19:38:47 2012 +0100
llvmpipe: pass in/out pointers rather scalar floats in lp_bld_arit
we don't want llvm to potentially optimize away the vectors (though it doesn't
seem to currently), plus we want to be able to handle in/out vectors of arbitrary
length.
commit 3f5c4e04af8a7592fdffa54938a277c34ae76b51
Author: Roland Scheidegger <sroland@vmware.com>
Date: Tue Feb 21 23:22:55 2012 +0100
gallivm: fix lp_build_sqrt() for vector length 1
since we optimize away vectors with length 1 need to emit intrinsic
without vector type.
commit 79d94e5f93ed8ba6757b97e2026722ea31d32c06
Author: José Fonseca <jfonseca@vmware.com>
Date: Wed Feb 22 17:00:46 2012 +0000
llvmpipe: Remove lp_test_round.
commit 81f41b5aeb3f4126e06453cfc78990086b85b78d
Author: Roland Scheidegger <sroland@vmware.com>
Date: Tue Feb 21 23:56:24 2012 +0100
llvmpipe: subsume lp_test_round into lp_test_arit
Much simpler, and since the arguments aren't passed as 128bit values can run
on any arch.
This also uses the float instead of the double versions of the c functions
(which probably was the intention anyway).
In contrast to lp_test_round the output is much less verbose however.
Tested vector width of 32 to 512 bits - all pass except 32 (length 1) which
crashes in lp_build_sqrt() due to wrong type.
Signed-off-by: José Fonseca <jfonseca@vmware.com>
commit 945b338b421defbd274481d8c4f7e0910fd0e7eb
Author: José Fonseca <jfonseca@vmware.com>
Date: Wed Feb 22 09:55:03 2012 +0000
gallivm: Centralize the function compilation logic.
This simplifies a lot of code.
Also doing this in a central place will make it easier to carry out the
changes necessary to use MC-JIT in the future.
gallivm: Fix typo in explicit derivative shuffle.
Trivial.
draw: make DEBUG_STORE work again
adapt to lp_build_printf() interface changes
Reviewed-by: José Fonseca <jfonseca@vmware.com>
draw: get rid of vecnf_from_scalar()
just use lp_build_broadcast directly (cannot assign a name but don't really
need it, vecnf_from_scalar() was producing much uglier IR due to using
repeated insertelement instead of insertelement+shuffle).
Reviewed-by: José Fonseca <jfonseca@vmware.com>
llvmpipe: fix typo in complex interpolation code
Fixes position interpolation when using complex mode
(piglit fp-fragment-position and similar)
Reviewed-by: José Fonseca <jfonseca@vmware.com>
draw: fix clipvertex/position storing again
This appears to be the result of a bad merge.
Fixes piglit tests relying on clipping, like a lot of the interpolation tests.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
gallivm: Fix explicit derivative manipulation.
Same counter variable was being used in two nested loops. Use more
meanigful variable names for the counter to fix and avoid this.
gallivm: Prevent buffer overflow in repeat wrap mode for NPOT.
Based on Roland's patch, discussion, and review .
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
gallivm: Fix dims for TGSI_TEXTURE_1D in emit_tex.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
gallivm: Fix explicit volume texture derivatives.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
gallivm: fix 1d shadow texture sampling
Always r coordinate is used, hence need 3 coords not two
(the second one is unused).
Reviewed-by: José Fonseca <jfonseca@vmware.com>
gallivm: Enable AVX support without MCJIT, where available.
For now, this just enables AVX on Windows for testing. If the code is
stable then we might consider prefering the old JIT wherever possible.
No change elsewhere.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The vertex element state isn't in registers any more, so
remove that old code. That fixes a memory corruption with
the blend state and gets eglgears partially working.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Some application calls eglCreateWindowSurface with
EGLNativeWindowType parameter having zero value. It causes SEGV
and disturbs error handling like EGL_NO_SURFACE.
Signed-off-by: Elvis Lee <kwangwoong.lee@lge.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
a112ca5d rather crassly smashed all the compiler flags together into AM_CFLAGS.
Separate them out the way they were before, putting pre-processor flags into
AM_CPPFLAGS, so assembly source gets preprocessed with the correct pre-processor
flags as well.
Also, remove unneeded CFLAGS from AM_CFLAGS, and CXXFLAGS from AM_CXXFLAGS
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Tested-by: Brian Paul <brianp@vmware.com>
I suck at resolving merge conflicts and broke the build in a5a34b1.
This patch adds the missing field intel_mipmap_tree::wraps_etc1.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Enable it for all hardware.
No current hardware supports ETC1, so this patch implements it by
translating the ETC1 data to RGBX data during the call to
glCompressedTexImage2D(). For details, see the doxygen for
intel_mipmap_tree::wraps_etc1.
Passes the Piglit test spec/OES_compressed_ETC1_RGB8_texture/miptree and
the ETC1 test in the GLES2 conformance suite.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Add function _mesa_etc1_unpack_rgba8888. It is intended to be used by
glCompressedTexSubImage2D to decode ETC1 textures into RGBA.
CC: Chia-I <olv@lunarg.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Move the body of util_etc1_rgb8_unpack_rgba_unorm8 into a new function
that can be shared between gallium and dri drivers,
texcompress_etc_tmp.h:etc1_unpack_rgba8888.
CC: Chia-I <olv@lunarg.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
lp_delete_setup_variants() used to be called in garbage collection,
but this no longer exists hence the setup shaders never got freed.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
When we don't intend to texture from or render to a __DRIimage we
use __DRI_IMAGE_FORMAT_NONE. In that case, we just create the __DRIimage
to reference the underlying buffer, and will create usable __DRIimages
from it using createSubImage later.
If we try to use _mesa_get_format_bytes() on MESA_FORMAT_NONE in
a debug build, we hit an assertion, so let's not do that.
Commit 68e04cc6 was tested using automake-1.11. Unfortunately, automake-1.12
made a "slightly backward-incompatible change" in the use of yacc with C++, and
for a .yy file, the generated header file is now named .hh, not .h
To work with both, write our own rule for running yacc, which generates a
header file named .h, rather than using automake's rule.
Also, remove things from BUILD_SOURCES which don't need to be there
Also, update EXCLUDE rules in doxygen/glsl.doxy, for change of generated files
from .cpp -> .cc, and glsl_lexer.h has never existed.
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Commit defadf2b1 erroneously tries to make gallium drivers link with libdricore
as a static library, not a shared library
Also, change uses of DRI_LIB_DEPS in gallium driver Makefiles to
GALLIUM_DRI_LIB_DEPS, so the libraries added are used in the linking the gallium
driver
Also, fix the path to the libdricore.so symlink, it's made in LIB_DIR, not in
the libdricore directory
Also repair quoting of dricore settings of DRI_LIB_DEPS and GALLIUM_DRI_LIB_DEPS
variables so VERSION is interpolated in configure but TOP and LIB_DIR are
interpolated later (where they are known, but VERSION isn't)
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Signed-off-by: Tom Stellard <thomas.stellard@amd.com>
- Use LLVM limits when LLVM is being used, instead of TGSI limits
- Provide draw_get_shader_param_no_llvm for when llvm is never used (softpipe)
- Eliminate several of the hacks around draw shader caps in several drivers
Unfortunately the hack for PIPE_MAX_VERTEX_SAMPLERS is still necessary.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
The libmesa convenience library is linked with the libglsl convenience
library. libOsmesa is linked with libmesa, and also directly with libglsl.
When using libtool, this gives rise to duplicate symbol errors.
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Matt Turner <mattst88@gmail.com>
* "configure substitutions are not allowed in _SOURCES variables" in automake,
so remove the AC_SUBST'ed GLAPI_ASM_SOURCES and instead use some AM_CONDITIONALS
to choose which asm sources are used
* Change GLAPI_LIB to point to the .la file in other Makefile.am files, and make a link
to the .a file for the convenience of other Makefiles which have not yet been converted
to automake
v2:
- Use AM_CPPFLAGS for cleaner build output
- EXTRA_SOURCES is not needed
- Remove libglapi.a compatibility link on clean
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Matt Turner <mattst88@gmail.com>
Now mesa/drivers/dri is converted to automake, we want to update DRI_LIB_DEPS
so that we link with the libmesa or libdricore libtool library, as appropriate.
However, this is complicated by the fact that gallium/targets is not (yet)
converted, so we can't share the DRI_LIB_DEPS autoconf variable with that anymore.
Add an additional autoconf variable GALLIUM_DRI_LIB_DEPS, which is now used in
gallium/targets/Makefile.dri, to link with the libdircore or libmesa native library.
v2: libdricore$VERSION.a needs to be libdricore$(VERSION).a
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Matt Turner <mattst88@gmail.com>
* "configure substitutions are not allowed in _SOURCES variables" in automake, so instead of
MESA_ASM_FILES, use some AM_CONDITIONALS to choose which architecture's asm sources are used
in libmesa_la_SOURCES. (Can't remove MESA_ASM_FILES autoconf variable as it's still used in
sources.mak)
* Update to link with the .la file in other Makefile.am files, and make a link to the
.a file for the convenience of other Makefiles which have not yet been converted to automake
v2: Remove stray -static from LDFLAGS
v3: Remove .a compatibility link on clean
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Matt Turner <mattst88@gmail.com>
Automake can't handle having both clip.S and clip.c, even though they have different paths
"src/mesa/Makefile.am: object `clip.lo' created by `$(SRCDIR)/sparc/clip.S' and `$(SRCDIR)/main/clip.c'"
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Matt Turner <mattst88@gmail.com>
v2: Use AM_V_GEN to silence generated code rules. Add BUILT_SOURCES to CLEANFILES
v3:
- Fix an accidental // in a path
- Use automake make rules for lex/yacc rather than writing our own
- Update .gitignore appropriately
- Build a libglcpp convenience library rather than awkwardly including
the files in libglsl and delegating the generation
- Remove libglsl.a compatibility link on clean
v4:
- Automake's rules for lex/yacc make .cc if source is .ll or .yy, and apparently we
must use those extensions "because of scons", so update everywhere glsl_parser.cpp
-> glsl_parser.cc and glsl_lexer.cpp -> glsl_lexer.cc. This fixes 'make tarballs'
and building with dricore enabled.
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Matt Turner <mattst88@gmail.com>
This also currently fix the installation of libOSmesa.
v2: Remove old Makefile, libOSmesa is now versioned, fix typos
v3: Keep config substitution alphabetized
v4: Update .gitignore
v5: Libraries will be in the builddir, not the srcdir.
Reviewed-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Matt Turner <mattst88@gmail.com>
This was not implemented, because the spec was changed just recently.
Everything has been in place already.
Gallium has PIPE_FORMAT_B5G6R5_UNORM, while Mesa has MESA_FORMAT_RGB565.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The whole reason I avoided this was because it might operate on a
brw_vertex_program or a brw_fragment_program. However, that isn't a
problem: all we need is the gl_program base type.
This avoids awkwardly passing the loop counter 'i' as a parameter,
simplifies both callers, and also plumbs prog in place for future use.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
If alpha-testing is enabled, we need to send alpha down the pipeline
even if nr_color_buffers == 0. However, tracking whether alpha-testing
is enabled in the WM program key is expensive: it causes us to compile
multiple specializations of the same shader, using program cache space.
This patch removes the check for alpha-testing, and simply emits alpha
whenever nr_color_buffers == 0. We believe this will also be necessary
for alpha-to-coverage, and it should add minimal overhead to an uncommon
case. Saving the recompiles should more than make up the difference.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Previously we only did this pre-Gen6, and used pwrite on Gen6+.
In one workload, this cuts significant amount of overhead.
v2: Simplify the function based on Eric's suggestions.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
We rely on proper IEEE 754 behavior in too many places for this.
See also commit 2fdbbeca43 with equivalent
change for autoconf.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Without that, people with buggy apps that looked at just the server
string for GLX_ARB_create_context would call this function that just
threw an error when you tried to make a context. Google shows plenty
of complaints about this.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This function assumes that lp_build_context::type is a vector type,
which is not true for r600 or radeonsi.
This fixes an assertion failure using glamor 2D accel.
It had many problems:
- The shadow comparison was done post-filtering.
- It required state-dependent recompiles whenever the comparison
function changed.
- It didn't even work: many cases hit assertion failures.
- I never implemented it for the VS.
The new lowering pass which converts textureGrad to textureLod by
computing the LOD value works much better.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Intel hardware doesn't natively support textureGrad with shadow
comparisons. So we need to generate code to handle it somehow.
Based on the equations of page 205 of the OpenGL 3.0 specification,
it's possible to compute the LOD value that would be selected given the
gradient values. Then, we can simply convert the TXD to a TXL.
Currently, this passes 34/46 of oglconform's shadow-grad subtests;
four cubemap tests are regressed. We should investigate this in the
future.
v2: Apply abs() to the scalar case (thanks to Eric).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This swizzles away unwanted components, while preserving the order of
the ones that remain.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
I needed to compute logs and square roots in a patch I was working on,
and wanted to use the convenient interface. We already have a similar
constructor for binops; adding one for unops seems reasonable.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
I ran into this while trying to create a TXS query, which doesn't have a
coordinate. Since it didn't get initialized to NULL, a bunch of
visitors tried to access it and crashed.
Most of the time, this won't be a problem, but it's just a good idea.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The only case a depth buffer can be set as a color buffer is when flushing.
That wasn't always the case, but now this code isn't required anymore.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
- maintain a mask of which mipmap levels are dirty (instead of one big flag)
- only flush what was requested at a given point and not the whole resource
(most often only one level and one layer has to be flushed)
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
we can just update the state when decompressing, there's no need to add
additional info into the DSA state
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
to remove some overhead from draw_vbo. This is a derived state.
BTW, I've got no idea how compute interacts with 3D here, but it should
use cb_misc_state, so that 3D and compute don't conflict.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Because u_blit couldn't sample a 1D, 3D, CUBE and ARRAY texture, we created
a 2D texture holding a copy of one slice of the source texture (even for 1D).
Let's just do it right.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
This patch updates the blorp engine to properly handle the case where
the surface being textured from uses Gen7's CMS MSAA layout. The
following changes were necessary:
- Before reading color values from the surface, we need to read from
the MCS buffer using the ld_mcs sampler message. This is done by
the mcs_fetch() function, and the result is stored in the mcs_data
register. This only needs to be done once per pixel, since the MCS
value is shared between all samples belonging to a pixel.
- When reading color values from the surface, we need to use the
ld2dms sampler message instead of the ld2dss message, and we need to
provide the value read from the MCS buffer as an argument.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
When a buffer using Gen7's CMS MSAA layout is bound to a texture or a
render target, the SURFACE_STATE structure needs to point to the MCS
buffer and to indicate its pitch. This patch updates the functions
that emit SURFACE_STATE to handle CMS layout properly.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Previously the DWORD used to control the CMS MSAA layout was just a
pad value, because we didn't use it.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
To implement Gen7's CMS MSAA layout, we need an extra buffer, the MCS
(Multisample Control Surface) buffer. This patch introduces code for
allocating and deallocating the buffer, and storing a pointer to it in
the intel_mipmap_tree struct.
No functional change, since the CMS layout is not enabled yet.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
From the Ivy Bridge PRM, Vol 1 Part 1, p112:
There are three types of multisampled surface layouts designated
as follows:
- IMS Interleaved Multisampled Surface
- CMS Compressed Mulitsampled Surface
- UMS Uncompressed Multisampled Surface
Previously, the i965 driver only used IMS and UMS formats, and
distinguished beetween them using the boolean
intel_mipmap_tree::msaa_is_interleaved. To facilitate adding support
for the CMS format, this patch replaces that boolean (and other
booleans derived from it) with an enum
INTEL_MSAA_LAYOUT_{IMS,CMS,UMS}. It also updates the terminology used
in comments throughout the driver to match the IMS/CMS/UMS terminology
used in the PRM. CMS layout is not yet used.
The enum has a fourth possible value, INTEL_MSAA_LAYOUT_NONE, which is
used for non-multisampled surfaces.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
On Gen6, MSAA buffers always use an interleaved layout and non-MSAA
buffers always use a non-interleaved layout, so it is not strictly
necessary to keep track of the layout of the texture and render target
surfaces in the blorp program key. However, it is cleaner to do so,
since (a) it makes the blorp compiler less dependent on implicit
knowledge about how the GPU pipeline is configured, and (b) it paves
the way for implementing compressed multisampled surfaces in Gen7.
This patch won't cause any redundant compiles, because the layout of
the texture and render target surfaces depends on other parameters
that are already in the blorp program key.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
We don't generate public entrypoints for GLES extensions, so move the
GL_NV_draw_buffers definition from ARB_draw_buffers.xml to es_EXT.xml.
When the extension is defined in ARB_draw_buffers.xml, we end up with a
public entry point for it, but no prototype, which gives an error when
compiled with --disable-asm and --disable-shared-glapi.
Instead, just move the GLES extension to es_EXT.xml so this doesn't happen.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
This lets us specify an offset into the bo where the miptree starts,
which will let us set up a texture for a single plane in a planar buffer.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
The additions in version 5 enables creating EGLImages for different planes
of a YUV buffer. createImageFromName is still used to create the containing
__DRIimage, and createSubImage can then be used no that __DRIimage to create
__DRIimages that correspond to the y, u, and v planes (__DRI_IMAGE_FORMAT_R8)
or the uv planes (__DRI_IMAGE_FORMAT_RG88) for formats such as NV12 where
the u and v components are interleaved. Packed formats such as YUYV etc
doesn't require any special treatment, we just sample those as a regular
ARGB texture.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
The code for growing the memory pool (which is used for storing all of
the global buffers) wasn't working. There seem to be two separate issues
with the memory pool code. The first was the way it was growing the pool.
When the memory pool needed more space, it would:
1. Copy the data from the memory pool's backing texture to system memory.
2. Delete the memory pool's texture
3. Create a bigger backing texture for the memory pool.
4. Copy the data from system memory into the bigger texture.
The copy operations didn't seem to be working, and I suspect that since
they were using fragment shaders to do the copy, that there might have
been a problem with the mixing of compute and 3D state.
The other issue is that the size of 1D textures is limited, and I was
having trouble getting 2D textures to work.
I think these problems will be easier to solve once more code is shared
between 3D and compute, which is why I decided to disable it for now
rather than continue searching for a fix.
The original strategy for handling floating point loads, which was to
lower (f32 load) to (f32 bitcast (i32 load)) wasn't really working. The
main problem was that the DAG legalizer couldn't handle replacing a node
with two results (load) with a node with only one result (bitcast).
It didn't change performance on Lightsmark or Nexuiz, which both used
DYNAMIC_DRAW buffers, but it was killing performance (40% CPU wasted pwriting
buffers) on a closed-source app we're looking at.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Add the infrastructure required for this extension. There is no
xserver support and no driver support yet. Drivers can enable this be
advertising DRI2 version 4 and accepting the
__DRI_CTX_FLAG_ROBUST_BUFFER_ACCESS flag and the
__DRI_CTX_ATTRIB_RESET_STRATEGY attribute in create context.
Some additional Mesa infrastructure is needed before drivers can do
this. The GL_ARB_robustness spec, which all Mesa drivers already
advertise, requires:
"If the behavior is LOSE_CONTEXT_ON_RESET_ARB, a graphics reset
will result in the loss of all context state, requiring the
recreation of all associated objects."
It is necessary to land this infrastructure now so that the related
infrastructure can land in the xserver. The xserver has very long
release schedules, and the remaining Mesa parts should land long, long
before the next xserver merge window opens.
v2: Expose robustness as a DRI2 extension rather than bumping
__DRI_DRI2_VERSION.
v3: Add a comment explaining why dri2->base.version >= 3 is also
required for GLX_ARB_create_context_robustness.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This allows revising the dri_interface.h separately from adding driver
support.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We neglected to list the deprecation model/forward compatible context
support.
inverse() has been done for a while.
None of us know what "highp change" means; GLSL 1.30 already added the
ability to recognize precision keywords, and it doesn't look like 1.40
has any new requirements there (precision keywords still have no meaning).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Use r600_resource_texture::flished_depth_texture for GPU access, and
allocate it in the VRAM. For transfers we'll allocate texture in the GTT
and store it in the r600_transfer::staging.
Improves performance when flushed depth texture is frequently used by the
GPU, e.g. in Lightsmark (~30%)
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
With fixes and updates from Ben Widawsky and comments from Paul Berry.
v2: Use drm_intel_gem_context_destroy to destroy hardware context;
remove useless initialization of hw_ctx, both suggested by Eric.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Acked-by: Paul Berry <stereotype441@gmail.com>
This doesn't do anything with the uniform block declarations yet, so
usage of those uniforms finds them to be undeclared.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
I've been trying to derive from this for UBO support, and the slightly
obfuscated types were putting me over the edge.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The got_one variable was set iff one of the bits in flags.i was set.
v2: Fix incorrect dropping of the ARB_conservative_depth warning.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This function is used when dispatching compute shader in order to avoid
mixing compute and 3D registers in the context's dirty list. This
allows the compute code to resuse 3D functions like evergreen_cb, which
return a struct r600_pipe_state and still have control over when and how
the register writes are emitted.
The start_compute_cs atom initializes some config and context registers
to the values needed for running compute shaders. When a compute shader
is dispatched, this atom is emitted after the start_cs_cmd atom, which
initializes registers that are common to both 3D and compute.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Some packets require the shader type bit (bit 1) to be set when
used for compute shaders. The pkt_flag will be initialized to
RADEON_CP_PACKET3_COMPUTE_MODE for any struct r600_command_buffer used
for dispatching compute shaders and it will be or'd against the result of
the PKT3 macro when adding a new packet to a struct r600_command buffer.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
For copy propgation, we've dropped the use of a GRF in favor of a
(probably later) use of a different GRF. This definitely requires
invalidating intervals.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Since live intervals are based on ip, removing an instruction trashes
the intervals unless we were to go do some surgery. These happen to
usually remove a use of a grf, so it's time to recalculate, anyway.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
NOTE: This is a candidate for the 8.0 release branch.
This has less impact than for the FS (4k savings), because it was partially
done already, but makes things more consistent.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We factor out all the EGL book-keeping into dri2_create_image() and
simplify the wayland case by using dupImage.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
We have the same switch and allocation code in two places.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This reverts commit cbffaf20e9.
Use the PRIx64 macro in the fprintf() call instead, as suggested
by Dylan Noblesmith.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
ROUND and TRUNC are implemented with one function to reduce code duplication.
Note: ROUND isn't actually used yet, but probably will be soon.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Converting CMP to SLT+LRP didn't work when src2 or src3 was Inf/NaN.
That's the case for GLSL sqrt(0). sqrt(0) actually happens in many
piglit auto-generated tests that use the distance() function.
v2: remove debug/devel code, per Jose
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Was previously implemented with FLOOR.
Fixes quite a few piglit tests of float->int conversion, integer
division, etc.
v2: clean up left over debug/devel code, per Jose
Reviewed-by: José Fonseca <jfonseca@vmware.com>
If the 'dst' register is the same as the 'pass' register we'll generate
invalid code. Use a temporary register in that case.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Redo this commit, and remove the inclusion of gl2ext.h
from src/mapi/glapi/glapi_priv.h. The include was added in
8f3be33985 to fix a missing prototype for
glDrawBuffersNV and others, but it's not possible to include both
glext.h and gl2ext.h from the same file.
I don't see the missing prototype here (with or without shared glapi)
so I'm just removing the offending #include.
Also, since we're redoing this, update to the most recent gl2ext.2.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
That old bug was hidden but the clipper always interpolating in 3d space
no matter what it should have been doing. Now that the interpolation
has been fixed, the bug shows up.
Fixes fdo 51364.
Signed-off-by: Olivier Galibert <galibert@pobox.com>
Signed-off-by: José Fonseca <jfonseca@vmware.com>
Calling glGenerateMipmap could overwrite vertex buffer state, leading
to incorrect rendering or crashes depending on the Gallium driver.
This was happening on WebGL Conformance test texture-size.
Before 784dd51198 this was covered up
by redundant vertex buffer validation.
Reviewed-by: Stéphane Marchesin <marcheu@chromium.org>
Signed-off-by: Marek Olšák <maraeo@gmail.com>
This reverts commit 8818b88748.
I get a lot of errors like this one:
In file included from ../../../src/mapi/glapi/glapi_priv.h:49:0,
from glapi_dispatch.c:40:
../../../include/GLES2/gl2ext.h:1074:28: error: redefinition of typedef ‘PFNGLRENDERBUFFERSTORAGEMULTISAMPLEEXTPROC’
../../../include/GL/glext.h:10237:25: note: previous declaration of ‘PFNGLRENDERBUFFERSTORAGEMULTISAMPLEEXTPROC’ was here
This with a clean build (with git clean -fdX).
I don't get the errors on my other machine. I didn't investigate why,
a wild guess is that this depends on the version of gcc.
This is a big win for savage2, hon and yofrankie. 62 new programs for
savage2/hon get 16-wide mode, along with one for humus demos and two
for tropics. Even a few shaders from tropics see reductions of 15% or
more.
total instructions in shared programs: 216536 -> 207353 (-4.24%)
instructions in affected programs: 123941 -> 114758 (-7.41%)
In benchmarking Tropics, only a .040% +/- 034% performance improvement
was observed (n=90). Rather disappointing, but I was primarily
motivated to do this patch by a regression in the number of 16-wide
shaders compiled after a GRF texturing on IVB patch I'm working on.
Hopefully this helps avoid that regression.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This shaves a few instructions off of a ton of programs. For 12
shaders from tropics and sanctuary, it's enough reduction in register
pressure to get 16-wide mode. 7 shaders from heroes of newerth and
savage2 are hurt by about 1.1%, where copy propagation of negates ends
up preventing coalescing, but we could regain that by doing dataflow
analysis in our copy propagation.
No significant performance difference in tropics (n=11)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The meta-ops _mesa_meta_Clear() and _mesa_meta_glsl_Clear() need to
ignore the state of GL_SAMPLE_ALPHA_TO_COVERAGE,
GL_SAMPLE_ALPHA_TO_ONE, GL_SAMPLE_COVERAGE, GL_SAMPLE_COVERAGE_VALUE,
and GL_SAMPLE_COVERAGE_INVERT when clearing multisampled buffers. The
easiest way to accomplish this is to disable GL_MULTISAMPLE during the
clear meta-ops.
Note: this patch also causes GL_MULTISAMPLE to be disabled during
_mesa_meta_GenerateMipmap() and _mesa_meta_GetTexImage() (since those
two meta-ops use MESA_META_ALL). Arguably this isn't strictly
necessary, since those meta-ops use their own non-MSAA fbo's, but it
shouldn't do any harm.
Fixes Piglit tests "EXT_framebuffer_multisample/clear {2,4}
{color,stencil}" on i965.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
From the Ivy Bridge PRM, Vol 2 Part 1 p280-281 (3DSTATE_WM:
Barycentric Interpolation Mode):
"Errata: When Centroid Barycentric mode is required, HW may
produce incorrect interpolation results when a 2X2 pixels have
unlit pixels."
To work around this problem, after doing centroid interpolation, we
replace the centroid-interpolated values for unlit pixels with
non-centroid-interpolated values (which are interpolated at pixel
centers). This produces correct rendering at the expense of a slight
increase in shader execution time.
I've conditioned the workaround with a runtime flag
(brw->needs_unlit_centroid_workaround) in the hopes that we won't need
it in future chip generations.
Fixes piglit tests "EXT_framebuffer_multisample/interpolation {2,4}
{centroid-deriv,centroid-deriv-disabled}". All MSAA interpolation
tests pass now.
Reviewed-by: Eric Anholt <eric@anholt.net>
In order to compute centroid varyings correctly, the fragment shader
needs to be able to load the current pixel/sample mask into a flag
register. This patch adds an opcode to the fragment shader back-end
to do this; the opcode gets translated into the instruction
mov(1) f0<1>UW g1.14<0,1,0>UW { align1 WE_all }
Since this instruction clobbers f0, instruction scheduling has to
treat it the same as instructions that have a conditional modifier.
Reviewed-by: Eric Anholt <eric@anholt.net>
When querying GL_PRIMITIVES_GENERATED, if primitive restart
is also used, then take the software primitive restart
path so GL_PRIMITIVES_GENERATED is returned correctly.
GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN is also updated
since it will also affected by the same issue.
As noted in brw_primitive_restart.c, with further work we
should be able to move this situation back to a hardware
handled path.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Commit d73f6375f5 fixed the cause of the Piglit failure with
ARB_color_buffer_float fragment clamp modes. Now that it's fixed,
there's no reason to leave snorm format rendering disabled.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Commit 0c005bd7 intended to make ir_loop_jump::mode public, but also
accidentally added a new pointer to the enclosing loop. Furthermore, it
tried to initialize the new field by adding "this->loop = loop;" to the
constructor, but since there is no loop parameter, this only initialized
the field to itself---so it will likely be a garbage pointer.
A lot of code, such as lower_jumps, allocates new loop jumps without
setting this field appropriately, so any uses would probably just crash.
Thankfully, there were none, so we can just delete the field.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=51574
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
DrawPixels uses the MESA_META_CLAMP_FRAGMENT_COLOR flag to save/restore
the fragment color clamp mode. This is unnecessary since it never
alters it. It's also harmful: when the clamp mode is GL_FIXED_ONLY,
setting this flag causes _mesa_meta_begin to force it to GL_FALSE,
breaking clamping on SNORM formats.
DrawPixels should use the user-specified clamp mode and not change it.
Fixes Piglit's spec/ARB_color_buffer_float/GL_RGBA8_SNORM-drawpixels
test on i965/Sandybridge (with SNORM render targets re-enabled).
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Add "-f $(srcdir)/gl_API.xml" to the arguments of all
the scripts that by default look for gl_API.xml in the
working directory when run with no arguments, and prepend
$(srcdir) to those scripts that are already using an
explicit -f argument.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
tgsi_ureg was recently enhanced to support local temporaries, and as result
temps are declared individually.
This change avoids many TEMP register declarations on common shaders.
(And fixes performance regression due to mismatches against performance
sensitive shaders.)
Reviewed-by: Brian Paul <brianp@vmware.com>
The templated copy constructor doesn't prevent the compiler from
emitting a default copy constructor, which leads to inconsistent
memory handling and was reported to cause segfaults when doing event
manipulation.
Reported-by: Tom Stellard <thomas.stellard@amd.com>
The function internalizer pass marks non-kernel functions as internal,
which enables optimizations like function inlining and global dead-code
elimination.
v2:
- Pass vector arguments by const reference
Removed u_half.py used to generate the table for previous method.
Previous implementation of float to half conversion was faulty for
denormalised and NaNs and would require extra logic to fix,
thus making the speedup of using tables irrelevant.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Some parameters need to be checked only once.
check_valid_to_render needs to be called only once.
The validate function is based on the one for DrawElements.
Reviewed-by: Brian Paul <brianp@vmware.com>
This is a cleanup for ARB_transform_feedback3, where
GL_MAX_TRANSFORM_FEEDBACK_BUFFERS is introduced for interleaved attribs and
has the same meaning as GL_MAX_.._SEPARATE_ATTRIBS for separate attribs.
Also, the maximum number of TFB buffers is reduced from 32 to 4, which makes
this patch useful even without the extension.
I don't know of any hardware which can do more than 4.
Reviewed-by: Brian Paul <brianp@vmware.com>
Doesn't really change the generated assembly, but produces more compact IR,
and of course, makes code more consistent.
Reviewed-by: Brian Paul <brianp@vmware.com>
For some reason regular gcc on Linux didn't catch these but the mingw
compiler did (generated errors, not warnings).
v2: include the changes in src/mapi/ too
Fixes the es2 build with gcc.
Note: in glext.h the prototypes for glShaderSource() and glShaderSourceARB()
disagree: only the former has the extra const qualifier.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Set the step_rate value when drawing to implement
ARB_instanced_arrays for gen >= 4.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, we were counting gl_FrontFacing, gl_FragCoord and gl_PointCoord
against the limit of varying variables. This prevented some valid shaders
from linking.
The other potential solution to this is to have the driver advertise
more varying vars or set the GLSLSkipStrictMaxVaryingLimitCheck flag.
But the above-mentioned variables aren't conventional varying attributes
so it doesn't seem right to count them.
Reviewed-by: Eric Anholt <eric@anholt.net>
Updated lp_build_printf to share common code.
Removed specific lp_build_print_vecX.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Since we don't have them in hw we emulate them in the shader. Although not
recommended by the spec it is legit.
As a side effect we also get GL 2.1. I think this is as far as we can take
the i915.
The most recent commit adds support for comments and macro expansion
on #line directives. Add testing to verify the new features.
Signed-off-by: Carl Worth <cworth@cworth.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The GLSL specification requires that #line directives be interpreted
after macro expansion. Our existing implementation of #line macros in
the lexer prevents conformance on this point.
Moving the handling of #line from the lexer to the parser gives us the
macro expansion we need. An additional benefit is that the
preprocessor also now supports comments on the same line as #line
directives.
Finally, the preprocessor now emits the (fully-macro-expanded) #line
directives into the output. This allows the full GLSL compiler to also
see and interpret these directives so it can also generate correct
line numbers in error messages.
Signed-off-by: Carl Worth <cworth@cworth.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This function is currently used only in the expansion of #if lines,
but we will soon be using it more generally (for the expansion of
(_glcpp_parser_expand_and_lex_from) and some more documentation.
Signed-off-by: Carl Worth <cworth@cworth.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Commit b823b99ec0 switched from using
functions such as ralloc_asprintf and ralloc_strcat to
ralloc_asprintf_rewrite_tail. This change maintains the string's
length as a aparamter that is updated by the ralloc functions (rather
than recomputing it with strlen over and over).
However, the change failed to updated two locations (glcpp_error and
glcpp_warning), with the result that the string's length wasn't
updated by these calls. Then, subsequent calls to other
ralloc_asprintf_rewrite_tail would overwrite the text appended by
glcpp_error.
This commit fixes the two missing updates, and restores line numbers
to the output of glcpp error messages, (as noticed by a glcpp unit
test case that has been failing since the above-mentioned commit).
Signed-off-by: Carl Worth <cworth@cworth.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
A strict reading of the GLSL specification would have this be an
error, but we've received reports from users who expect the
preprocessor to interepret undefined macros as 0. This is the standard
behavior of the rpeprocessor for C, and according to these user
reports is also the behavior of other OpenGL implementations.
So here's one of those cases where we can make our users happier by
ignoring the specification. And it's hard to imagine users who really,
really want to see an error for this case.
The two affected tests cases are updated to reflect the new behavior.
Signed-off-by: Carl Worth <cworth@cworth.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
DUAL_EXPORT can be enabled on r6xx/r7xx when all CBs use 16-bit export
and there is no depth/stencil export.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
It seems DUAL_EXPORT on evergreen may be enabled when all CBs use 16-bit export
mode (EXPORT_4C_16BPC), also there should be at least one CB, and the PS
shouldn't export depth/stencil.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
In some cases TGSI shader has more color outputs than the number of CBs,
so it seems we need to limit the number of color exports. This requires
different shader variants depending on the nr_cbufs, but on the other hand
we are doing less exports, which are very costly.
v2: fix various piglit regressions
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Shader variants are stored in the list, the key for lookup is based on the
states that require different hw shaders - currently it's rctx->two_side (all
gpus) and rctx->nr_cbufs (evergreen/cayman, when writes_all property is set).
v2:
- use simple list instead of keymap as suggested by Marek on irc
- call r600_adjust_gprs from r600_bind_vs_shader for r6xx/r7xx
(r600_shader_select isn't used for vertex shaders currently)
v3:
- fix call to r600_adjust_gprs - do it after updating current shader
Improves performance for some apps, e.g. FlightGear -
see https://bugs.freedesktop.org/show_bug.cgi?id=50360
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
As with the previous commit for softpipe.
v2: remove 'default' case to get compile-time warning
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
These all return zero. Add a debug_printf() to catch the default case so
we don't accidently mishandle something important in the future.
v2: remove 'default' case to get compile-time warning
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
This is actually required for GL_ARB_framebuffer_object, but the state
tracker doesn't currently check it.
Direct3D 9 allows mixed format color buffers with some restrictions.
Setting this allows Unigine Heaven 2.5 and 3.0 to run. Tested both on
GL and D3D hosts.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
The type is the destination type (i.e. float vector) and not the
source type. Fixes piglit fs-{in,de}crement-uint.
Signed-off-by: Olivier Galibert <galibert@pobox.com>
Signed-off-by: José Fonseca <jfonseca@vmware.com>
i965 hardware needs to be informed of situations in which it's
possible for pixels (or samples) to be discarded for reasons other
than depth/stencil testing (e.g. due to an explicit "discard" in the
fragment shader). One of these situations is when
GL_ALPHA_TO_COVERAGE is enabled, since that can cause samples to be
discarded by the color calculator when the pixel's alpha value is less
than 1.0.
Without this patch, GL_ALPHA_TO_COVERAGE does not take effect on depth
buffers.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This patch enables the multisampling parameters
GL_SAMPLE_ALPHA_TO_COVERAGE and GL_SAMPLE_ALPHA_TO_ONE, which allow
the fragment shader's alpha output to be converted into a sample
coverage mask and ignored for blending. i965 supports these
parameters through the BLEND_STATE structure.
The GL spec allows, but does not require, the implementation to dither
the conversion from alpha to a sample coverage mask, so that alpha
values that aren't a multiple of 1/num_samples result in the correct
proportion of samples being lit. A bit exists in the BLEND_STATE
structure to enable this functionality, but according to the hardware
docs it must be disabled on Sandy Bridge (see the Sandy Bridge PRM,
Vol2, Part1, p379: AlphaToCoverage Dither Enable). So it is enabled
for Gen7 only.
Fixes piglit tests
"EXT_framebuffer_multisample/sample-alpha-to-{coverage,one} {2,4}".
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This patch enables glSampleCoverage() functionality, which allows the
client program to specify that only a portion of the samples be lit up
when performing multisampled rendering. i965 supports
glSampleCoverage() through the 3DSTATE_SAMPLE_MASK command packet,
which allows the driver to specify a bitfield indicating which samples
to light up.
Fixes piglit tests "EXT_framebuffer_multisample/sample-coverage {2,4}
{inverted,non-inverted}".
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Fixes gles2conform GL.equal.equal_bvec2_frag.
This fixes brw_fs_visitor's translation of ir_unop_f2b. It used CMP to
convert the float to one of 0 or ~0. However, the convention in the
compiler is that true is represented by 1, not ~0. This patch adds an AND
to convert ~0 to 1.
By inspection, a similar problem existed with ir_unop_i2b, with a similar
fix.
[v2 kayden]: eliminate extra temporary register.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=49621
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
This patch causes the fragment shader to be configured correctly (and
the correct code to be generated) for centroid interpolation. This
required two changes: brw_compute_barycentric_interp_modes() needs to
determine when centroid barycentric coordinates need to be included in
the pixel shader thread payload, and
fs_visitor::emit_general_interpolation() needs to interpolate using
the correct set of barycentric coordinates.
Fixes piglit tests "EXT_framebuffer_multisample/interpolation {2,4}
centroid-edges" on i965.
Reviewed-by: Eric Anholt <eric@anholt.net>
To save time, we only instruct the clip stage of the pipeline to
compute noperspective barycentric coordinates if those coordinates are
needed by the fragment shader. Previously, we would determine whether
the coordinates were needed by seeing whether the fragment shader used
the BRW_WM_NONPERSPECTIVE_PIXEL_BARYCENTRIC interpolation mode.
However, with MSAA, it's possible that the fragment shader might use
BRW_WM_NONPERSPECTIVE_CENTROID_BARYCENTRIC instead. In the future,
when we support ARB_sample_shading, it might use
BRW_WM_NONPERSPECTIVE_SAMPLE_BARYCENTRIC.
This patch modifies the upload_clip_state() functions to check for all
three possible noperspective interpolation modes.
Reviewed-by: Eric Anholt <eric@anholt.net>
This bitfield tells the back-ends which of a fragment shader's inputs
require centroid interpolation. It is only set for GLSL fragment
shaders, since assembly fragment shaders don't support centroid
interpolation.
Reviewed-by: Eric Anholt <eric@anholt.net>
It was only no-oping the clear() function, not actual triangle
rasterization. Move the no_rast field from lp_context down into
lp_rasterizer so it's accessible where it's needed.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Fixes this build failure on Solaris.
Compiling build/sunos-debug/glsl/glcpp/glcpp-lex.c ...
"src/glsl/glcpp/glcpp-lex.l", line 30: cannot find include file: "glcpp-parse.h"
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
$CLANG_RESOURCE_DIR is the directory that contains all resources
needed by clang to compile programs. When clover uses clang to
compile kernels it needs to specify a resource dir, so that clang
can find its internal headers (e.g. stddef.h).
clang defines $CLANG_RESOURCE_DIR as $CLANG_LIBDIR/clang/$CLANG_VERSION
This patch adds the --with-clang-libdir option in order to accommodate
clang intalls to non-standard locations, and it also adds a check
to the configure script to verify that $CLANG_RESOURCE_DIR/include
contains the necessary header files.
On i965, dFdx() and dFdy() are computed by taking advantage of the
fact that each consecutive set of 4 pixels dispatched to the fragment
shader always constitutes a contiguous 2x2 block of pixels in a fixed
arrangement known as a "sub-span". So we calculate dFdx() by taking
the difference between the values computed for the left and right
halves of the sub-span, and we calculate dFdy() by taking the
difference between the values computed for the top and bottom halves
of the sub-span.
However, there's a subtlety when FBOs are in use: since FBOs use a
coordinate system where the origin is at the upper left, and window
system framebuffers use a coordinate system where the origin is at the
lower left, the computation of dFdy() needs to be negated for FBOs.
This patch modifies the fragment shader back-ends to negate the value
of dFdy() when an FBO is in use. It also modifies the code that
populates the program key (brw_wm_populate_key() and
brw_fs_precompile()) so that they always record in the program key
whether we are rendering to an FBO or to a window system framebuffer;
this ensures that the fragment shader will get recompiled when
switching between FBO and non-FBO use.
This will result in unnecessary recompiles of fragment shaders that
don't use dFdy(). To fix that, we will need to adapt the GLSL and
NV_fragment_program front-ends to record whether or not a given shader
uses dFdy(). I plan to implement this in a future patch series; I've
left FIXME comments in the code as a reminder.
Fixes Piglit test "fbo-deriv".
NOTE: This is a candidate for stable release branches.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It's not optimal, but it's better than the register pressure scheduler
that was previously being used. The VLIW scheduler currently ignores
all the complicated instruction groups restrictions and just tries to
fill the instruction groups with as many instructions as possible.
Though, it does know enough not to put two trans only instructions in
the same group.
We are able to ignore the instruction group restrictions in the LLVM
backend, because the finalizer in r600_asm.c will fix any illegal
instruction groups the backend generates.
Enabling the VLIW scheduler improved the run time for a sha1 compute
shader by about 50%. I'm not sure what the impact will be for graphics
shaders. I tested Lightsmark with the VLIW scheduler enabled and the
framerate was about the same, but it might help apps that use really
big shaders.
The rest of the TFB implementation remains in transformfeedback.c, and
this will be shared with UBOs.
v2: Move the size/offset checks shared with UBOs to common code as
well. (Kenneth's review)
Reviewed-by: Brian Paul <brianp@vmware.com> (v1)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Fix a typo spotted by Eric Anholt.
v3: Fix missing "GL" on types, fix style, fix Studly_Caps extension name,
drop commented code duplicated with GL3x.xml [anholt]
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Our intention is still that it's not abi stable, so make the package
version number get included in the library name. Now you can parallel
install dricore-using drivers from multiple mesa versions. We can put
it into lib now that we're following library versioning rules
(assuming that ABIs don't change within a single Mesa point release).
LD_LIBRARY_PATH still doesn't work with a non-/, non-/usr prefix
because libtool uses rpath instead of runpath for nonstandard
prefixes.
The weird versioning of the libGL where the package version was sort
of expressed as a big integer is dropped. libtool didn't like the 0
prefix, and it didn't really make sense anyway -- if you interpret it
as an integer version number, old Mesa 071200 was bigger than current
Mesa 08100. Instead, just bump the minor version and drop the
patchlevel.
Except for the deleted linux-cell target, these were just the target
cc/cflags. The only usage was for gen_matypes, which wants the
target's structure packing, not the host, anyway.
Every place that uses ASM_FLAGS already uses DEFINES. Not including
it in DEFINES is just a way to screw up potential users, as I've done
several times while working on the build system.
Even pre-automake, we rely on gmake features for pattern
substitutions, and replacing those with reams more make code is not
interesting. This will let us turn the old Makefiles using pattern
substitutions into automake without spewing warnings.
Reviewed-by: Dan Nicholson <dbn.lists@gmail.com>
1) We need to insert a barrier between consecutive transform feedback calls.
2) VBO cache needs to be flushed when TFB output is used as VBO draw input.
Fixes Piglit test EXT_transform_feedback/immediate-reuse.
Thanks to Christoph Bumiller for pointing out bugs in previous versions
of this patch.
gl_ClipDistance needs special treatment in form of lowering pass
which transforms gl_ClipDistance representation from float[] to
vec4[]. There are 2 implementations - at glsl linker level (enabled
by LowerClipDistance option) and at glsl_to_tgsi level (enabled
unconditionally for gallium drivers). Second implementation is
incomplete - it does not take into account transform feedback (see
commit 642e5b413e "mesa: Fix transform
feedback of unsubscripted gl_ClipDistance array" for details).
There are 2 possible fixes:
- adding transform feedback support into glsl_to_tgsi version
- ripping gl_ClipDistance support from glsl_to_tgsi and enabling
gl_ClipDistance lowering on glsl linker side
This patch implements 2nd option. All it does is:
- reverts most of the commit 59be691638
"st/mesa: add support for gl_ClipDistance"
- changes LowerClipDistance to true
Fixes Piglit tests "EXT_transform_feedback/builtin-varyings
gl_ClipDistance[{2,3,4,5,6,7,8}]-no-subscript" at least on nv50
and evergreen cards.
From the GL 3.0 spec (p.116):
"Multisample rasterization is enabled or disabled by calling
Enable or Disable with the symbolic constant MULTISAMPLE."
Elsewhere in the spec, where multisample rasterization is described
(sections 3.4.3, 3.5.4, and 3.6.6), the following text is consistently
used:
"If MULTISAMPLE is enabled, and the value of SAMPLE_BUFFERS is
one, then..."
So, in other words, disabling GL_MULTISAMPLE should prevent
multisample rasterization from occurring, even if the draw framebuffer
is multisampled. This patch implements that behaviour by setting the
WM and SF stage's "multisample rasterization mode" to
MSRAST_ON_PATTERN only when the draw framebuffer is multisampled *and*
GL_MULTISAMPLE is enabled.
Fixes piglit test spec/EXT_framebuffer_multisample/enable-flag.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Due to hardware limitations, MSAA is unsupported on Gen6 for formats
containing >64 bits of data per pixel. From the Sandy Bridge PRM,
vol4 part1, p72 ("Surface Format"):
If Number of Multisamples is set to a value other than
MULTISAMPLECOUNT_1, this field cannot be set to the following
formats:
- any format with greater than 64 bits per element
- any compressed texture format (BC*)
- any YCRCB* format
Gen7 has a similar, but less stringent limitation: formats with >64
bits of data per pixel only support 4x MSAA.
This patch causes the unsupported formats to report
GL_FRAMEBUFFER_UNSUPPORTED.
Fixes piglit "multisample-formats" tests on Gen6.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Sandy Bridge and later don't use this field, so there's no point in
setting it. It can only cause harmful state-based recompiles.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The system array values concept doesn't really because it expects the
system values to be fixed per call, which is wrong for gl_VertexID and
iffy for gl_SampleID. So this patch does two things:
- kill the array, have emit_fetch_system_value directly pick the
values it needs (only gl_InstanceID for now, as the previous code)
- correctly handle the expected type in emit_fetch_system_value
Signed-off-by: Olivier Galibert <galibert@pobox.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This includes:
- picking up correctly which attributes are flatshaded and which are
noperspective
- copying the flatshaded attributes when needed, including the
non-built-in ones
- correctly interpolating the noperspective attributes in screen-space
instead than in a 3d-correct fashion.
Signed-off-by: Olivier Galibert <galibert@pobox.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
z or stencil texture should not be created with the z/stencil
flags for surface creation as they are intended to be bound
as texture.
v2: remove broken code
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Solaris Studio C compiler does not support anonymous structs and
anonymous unions.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
The idea here is to rewrite comparisons like 2 >= x with x <= 2; we want
to simply exchange arguments, not negate the condition. If equality was
part of the original comparison, it should remain part of the swapped
version.
This is the true cause of bug #50298. It didn't manifest itself on
Sandybridge because we embed the conditional modifier in the IF
instruction rather than emitting a CMP. All other platforms use CMP.
It also didn't manifest itself on the master branch because commit
be5f27a84d ("glsl: Refine the loop instruction counting.") papered over
the problem.
NOTE: This is a candidate for stable release branches.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=50298
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Fixes build error on Cygwin and Solaris. _R, _G, and _B are used in
ctype.h on those platforms.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
This fixes a bug where a sampler view was using stale texture/resource
data when the texture was modified through a surface (render to texture).
Bumping the texture and layer ages triggers sampler view revalidation.
Fixes piglit fbo-blit failure.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
This lets us select the front buffer for reading under GLES2.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This extra condition checks the API not the version of the API, so rename
to reflect that.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is failing sometimes, probably because TargetData keeps a structure layout
cache, which can becomes bogus, ever since the InvalidateStructLayoutInfo API
was removed in LLVM r135245.
This change merely makes the problem easier to diagnose (an assertion
failure instead of a random crash).
instead of failing to allocate a renderbuffer.
This also fixes piglit/get-renderbuffer-internalformat with non-renderable
formats.
Reviewed-by: Brian Paul <brianp@vmware.com>
This allows drivers not to do any allocation in AllocStorage if the storage
cannot be allocated because of an unsupported internalformat + samples combo.
The little ugliness is that AllocStorage is expected to return TRUE in this
case.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This requires the latest streamout kernel patches.
Streamout is disabled by default on r7xx, so this patch is safe for regular
users.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Note: for the moment TGSI_OPCODE_F2U is implemented using
lp_build_itrunc() (the same function used to implement
TGSI_OPCODE_F2I). In the long run, we should create an
lp_build_utrunc() function to do the proper conversion. But this
should allow us to limp along with mostly correct behaviour for now.
Previously, we performed conversions from float->uint by a two step
process: float->int->uint. However, on platforms that use saturating
conversions (e.g. i965), this didn't work, because if the source value
was larger than the maximum representable int (0x7fffffff), then
converting it to an int would clamp it to 0x7fffffff.
This patch just adds the new opcode; further patches will adapt
optimization passes and back-ends to use it, and then finally the
ast_to_hir logic will be modified to emit the new opcode.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch modifies blorp blits (which are used for MSAA) to properly
account for clipping of source coordinates. Previously, if we
detected the possibility of source clipping, we would fall back to the
blit meta-op, which doesn't support MSAA and is very slow for depth
and stencil buffers.
Fixes piglit tests
"EXT_framebuffer_multisample/clip-and-scissor-blit" on i965/Gen6+.
Also substantially speeds up the Humble Bundle V game "Psychonauts" on
Gen6+ (without this patch, the game's depth buffer blits use the slow
blit meta-op).
Reviewed-by: Carl Worth <cworth@cworth.org>
This allows to submit things to the compute only
rings on cayman+
v2: rebased on current master and actually make use
of the new flag in evergreen_compute.c
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
When drawing a depth image the fragment shader also needs to emit the
current raster color.
The new piglit drawpix-z test exercises this.
NOTE: This is a candiate for the 8.0 branch.
This patch updates .gitignore files to account for the new build
artifacts introduced by the following commits:
ae376f0 glx/tests: Rename test as glx-test
8fecdcc mesa/tests: Add tests for _mesa_lookup_enum_by_{name,nr} functions
a29ad2b mesa/tests: Add tests for the generated dispatch table
Haiku targets the Pentium or higher processor.
To ensure compatibility we can do march 586 and
mtune 686. Mesa will still use sse however if
the cpu supports it (and the stack is properly
aligned). These flags only effect the internal
compiler optimizations.
Previously, rbug_*.c would fail to compile with incomplete prototype
errors when make was run from the command line on my machine. My IDE
always built fine, and still does after this patch (Netbeans 7.1.2).
Most of the includes from files in gallium/auxiliary/rbug/* were
assuming an rbug/ subdirectory, while the headers are actually in the
same directory as the .c files.
The build error was also previously a problem for me on Ubuntu 11.10
and Mint 12.
Fixes build for the following configuration: ./autogen.sh
--enable-debug --enable-texture-float --with-gallium-drivers=r600
--with-dri-drivers=radeon --enable-r600-llvm-compiler
Signed-off-by: Brian Paul <brianp@vmware.com>
In single precision, 1.5707963 becomes 1.5707962513 which is too
small. However, 1.5707964 becomes 1.5707963705 which is just right.
The value 1.5707964 is already used in asin.ir.
NOTE: This is a candidate for stable release branches.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Olivier Galibert <galibert@pobox.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
There is no GLX protocol for these functions. Open-source Linux
driver have not supported this extension for many years, and it seems
unlikely at this point that this support will return. There's no
reason to have slots for these functions in the dispatch table.
The unit tests (GetProcAddress::TableDidntShrink and others) are also updated.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
There is no GLX protocol for these functions. No open-source Linux
driver has ever supported this extension, and it seems unlikely at
this point that one ever will. There's no reason to have slots for
these functions in the dispatch table.
The unit tests (GetProcAddress::TableDidntShrink and others) are also updated.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
There is no GLX protocol for these functions. No open-source Linux
driver has ever supported this extension, and it seems unlikely at
this point that one ever will. There's no reason to have slots for
these functions in the dispatch table.
The unit tests (GetProcAddress::TableDidntShrink and others) are also updated.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
There is no GLX protocol for these functions, and no Linux driver has
ever supported this extension. There's no reason to have slots for
these functions in the dispatch table.
The unit tests (GetProcAddress::TableDidntShrink and others) are also updated.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
There is no GLX protocol for this function. Open-source Linux driver
have not supported this extension for many years, and it seems
unlikely at this point that this support will return. There's no
reason to have slots for this function in the dispatch table.
The unit tests (GetProcAddress::TableDidntShrink and others) are also updated.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
There is no GLX protocol for these functions, and no Linux driver has
ever supported this extension. There's no reason to have slots for
these functions in the dispatch table.
The unit tests (GetProcAddress::TableDidntShrink and others) are also updated.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
These are from OpenGL 3.1 and ARB_uniform_buffer_object. I only added
them to 3.1 because that required the least work.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
These are from OpenGL 3.3, ARB_texture_swizzle, and
EXT_texture_swizzle (with different names). I only added them to 3.3
because that required the least work.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Determines whether it's a basis vector, i.e., a vector with one element
equal to 1 and all other elements equal to 0.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When a value was replaced, the new key was strdup'd and leaked.
To fix this, we modify the hash table implementation to return
whether the value was replaced and free() the (now useless)
duplicate string.
When we have multiple shared contexts, and one of them is
long-running, this will lead to never freeing those resources
since they are shared. Instead, free them right away on context
destruction since we know the other context isn't using them.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
NOTE: This is a candidate for the 8.0 branch.
From the GL_NV_primitive_restart spec:
"PrimitiveRestartIndexNV is not compiled into display lists, but is
executed immediately."
Prior to this patch, calls to glPrimitiveRestartIndex would hit the noop
dispatch stub.
+2 oglconforms.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
From the GL_ARB_copy_buffer spec:
"An INVALID_VALUE error is generated if any of readoffset, writeoffset,
or size are negative [...]"
Fixes oglconform's copybuffer/negative.CNNegativeValues test.
NOTE: This is a candidate for stable release branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The warnings appear to occur with newer automake (probably 1.12).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
These allow one to mangle the library names, without also mangling the
symbol names, to make them distinct from other GL libraries on the
system.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Because these classes are used entirely from their own source files
and not from separate DSOs, the linker gets to produce massively less
code. This cuts about 13k of text in the libdricore case. In the
non-libdricore case, the additional linkage information allows the
compiler to inline some code, so libglsl.a size actually increases by
about 300 bytes.
For a dricore build, improves shader_runner runtime on
glsl-fs-copy-propagation-texcoords-1 by 0.21% +/- 0.03% (n=353574,
outliers removed). No statistically significant difference with n=322
on glslparsertest on a yofrankie shader intended to test compiler
performance.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now we have just one library of "all of Mesa core" instead of both
libdricore and libglsl that drivers link against.
I did this change in a sort of nonrecursive make fashion: the
generated files are still produced in the non-automake build, like the
rest of dricore, but the GLSL files are stuffed into libdricore
without building a convenience library in src/glsl (even though we
could now). This would make a bit more sense if glsl was just another
dir under src/mesa, because right now I had to contort the prefix
variable name to look another ../ level up.
This is part of a series to fix our build issues in the automake case
by hooking up the automatic Makefile regeneration support. The
extract_git_sha1 is moved into src/mesa/Makefile so that we get
correct dependency generation.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I tried to update all the old Makefiles that included the default
config to be sure they had a default target if they didn't previously
have one, since this new all target will always point at it. Almost
everything had one.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Some more of the files are now autogenerated, this caused build breakage,
patch adds generation of these missing files. Patch also changes existing
make so that the files are created to be part of the local source
(not intermediate directory, this causes several problems).
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
This patch fixes a copy/paste error and masking of depth/stencil (stencil
is in the top 8 bits), and makes glean/readPixSanity happy.
Both the stencil and the depth buffer piglit test also pass if
glClear(DEPTH | STENCIL) is executed instead of
glClear(DEPTH)/glClear(STENCIL).
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Tested-by: Christopher Egert <cme3000@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
remove archaic .cvsignore
*.pyo is already in toplevel .gitignore
*.pyc is already in toplevel .gitignore
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, blits using the "blorp" mechanism only worked for 8-bit
RGBA color buffers, 24-bit depth buffers, and 8 bit stencil buffers.
This was not enough, because the blorp mechanism must be used for
blitting whenever MSAA is in use. This patch allows all formats to be
used, provided the source and destination formats match.
So far I have confirmed that the following formats work properly with
MSAA:
- GL_RGB
- GL_RGBA
- GL_ALPHA
- GL_ALPHA4
- GL_ALPHA8
- GL_R3_G3_B2
- GL_RGB4
- GL_RGB5
- GL_RGB8
- GL_RGB10
- GL_RGB12
- GL_RGB16
- GL_RGBA2
- GL_RGBA4
- GL_RGB5_A1
- GL_RGBA8
- GL_RGB10_A2
- GL_RGBA12
- GL_RGBA16
Fixes piglit tests "EXT_framebuffer_multisample/formats {2,4}" on
Sandy Bridge and Ivy Bridge.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously the blorp engine only supported RGBA8 color buffers and
24-bit depth buffers. This patch adds support for any color buffer
format that is supported as a render target, and for 16-bit and 32-bit
depth buffers.
This required threading the brw_context struct through into
brw_blorp_surface_info::set() so that it can consult the
brw->render_target_format array.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Even though brw_blorp_surface_info is derived from brw_blorp_mip_info,
this function doesn't need to be virtual, because it is never accessed
through a base class pointer. Making the function non-virtual will
allow it to take additional parameters in the brw_blorp_surface_info
case.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch moves the responsibility for deciding on the format of the
source and destination surfaces from the
gen{6,7}_blorp_emit_surface_state() functions to
brw_blorp_surface_info::set(), which is shared between Gen6 and Gen7.
This will make it possible to add support for more surface formats
without code duplication.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
TGSI doesn't need an opcode, since registers are untyped (but beware
once doubles come into the scene). Mesa IR doesn't handle native
integers, so trying to handle them there is worthless, the case
entries are only added for warning reasons.
It was only tested with softpipe, since llvmpipe doesn't support glsl
1.3 yet.
Signed-off-by: Olivier Galibert <galibert@pobox.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
That adds support for activating the extension. It doesn't actually
*do* anything yet, of course.
Signed-off-by: Olivier Galibert <galibert@pobox.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
From the issues section of the GL_ARB_texture_compression_rgtc extension:
15) What should glGetTexLevelParameter return for
GL_TEXTURE_GREEN_SIZE and GL_TEXTURE_BLUE_SIZE for the RGTC1
formats? What should glGetTexLevelParameter return for
GL_TEXTURE_BLUE_SIZE for the RGTC2 formats?
RESOLVED: Zero bits.
These formats always return 0.0 for these respective components
and have no bits devoted to these components.
Returning 8 bits for red size of RGTC1 and the red and green
sizes of RGTC2 makes sense because that's the maximum potential
precision for the uncompressed texels.
Thus, we need to return 8 bits for GL_TEXTURE_RED_SIZE on all RGTC formats
and 8 bits for GL_TEXTURE_GREEN_SIZE on RGTC2 formats. BLUE should be 0.
Fixes oglconform/rgtc/advanced.texture_fetch.tex_param.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
While ~loop_state() is already freeing the loop_variable_state objects
via ralloc_free(this->mem_ctx), the ~loop_variable_state() destructor
was never getting called, so the hash table inside loop_variable_state
was never getting destroyed.
Fixes a memory leak in any shader with loops.
NOTE: This is a candidate for stable release branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
The functions for handling 1D, 2D and 3D texture images were nearly
identical. This folds them all together.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We can't remove this pass yet, because we need it to convert AMDIL
registers in BRANCH* instructions, but we don't need it for
instruction conversion any more.
OpenGL allows you to declare user-defined fragment shader outputs with
less than four components:
out ivec2 color;
This makes sense if you're rendering to an RG format render target.
Previously, we assumed that all color outputs had four components (like
the built-in gl_FragColor/gl_FragData variables). This caused us to
call emit_color_write for invalid indices, incrementing the output
virtual GRF's reg_offset beyond the size of the register.
This caused cascading failures: split_virtual_grfs would allocate new
size-1 registers based on the virtual GRF size, but then proceed to
rewrite the out-of-bounds accesses assuming that it had allocated enough
new (contiguously numbered) registers. This resulted in instructions
that accessed size-1 GRFs which register numbers beyond
virtual_grf_next (i.e. registers that were never allocated).
Finally, this manifested as live variable analysis and instruction
scheduling accessing their temporary array with an out of bounds index
(as they're all sized based on virtual_grf_next), and the program would
segfault.
It looks like the hardware's Render Target Write message requires you to
send four components, even for RT formats such as RG or RGB. This patch
continues to use all four MRFs, but doesn't bother to fill any data for
the last few, which should be unused.
+2 oglconforms.
NOTE: This is a candidate for stable release branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Commit 4650aea7a5 fixed texelFetchOffset()
on Ivybridge, but didn't update the Ironlake/Sandybridge code.
+18 piglits on Sandybridge.
NOTE: This and 4650aea7a5 are both candidates for stable branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Commit f41ecade7b fixed texelFetchOffset()
on Ivybridge, but didn't update the Ironlake/Sandybridge code.
+15 piglits on Sandybridge.
NOTE: This and f41ecade7b are both candidates for stable branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
This isn't saved/restored by _mesa_meta_begin, so we need to do it
manually (like we do for the read/draw framebuffers). Additionally,
we neglected to re-bind before the glRenderbufferStorage call.
+13 oglconforms.
NOTE: This is a candidate for stable release branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
DeleteBuffer needs to unbind from these binding points as well, based on
the same rationale as the previous patch.
+51 oglconforms (together with the last patch).
NOTE: This is a candidate for stable release branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
_mesa_lookup_bufferobj returns NULL for 0, which caused us to say
"there's no such buffer object" and raise an error, rather than
correctly binding the shared NullBufferObj.
Now you can unbind your buffers.
NOTE: This is a candidate for stable release branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
According to the GL 3.1 spec, section 2.9 ("Buffer Objects"):
"If a buffer object is deleted while it is bound, all bindings to that
object in the current context (i.e. in the thread that called
DeleteBuffers) are reset to zero."
The code already checked for a number of cases, but neglected these
newer binding points.
+21 oglconforms.
NOTE: This is a candidate for stable release branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
We were incorrectly assuming that the coordinate's dimensionality is
equal to the gradient's dimensionality. For array types, the coordinate
has one more component.
Fixes 12 subcases of oglconform's glsl-bif-tex-grad test.
NOTE: This is a candidate for stable release branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Currently, if you pass --with-egl-platforms=x11 but xcb-dri2 isn't available
we just silently fail and disables building the EGL DRI2 driver.
This commit cleans up the EGL platfrom checking and fails if a selected
platform can't find its required dependencies.
Reviewed-by: Eric Anholt <eric@anholt.net>
Commit a07cf3397e added support for TBOs
on Gen7, but missed Gen6.
Passes piglit -t texture_buffer and oglconform's buffermapping
basic.read.texture tests.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
According to Table 6.17 in the GL 2.1 specification, DEPTH_TEXTURE_MODE,
TEXTURE_COMPARE_MODE, and TEXTURE_COMPARE_FUNC need to be restored on
glPopAttrib(GL_TEXTURE_BIT).
Makes a number of oglconform tests happier.
v2: Make restoration conditional on the ARB_shadow and ARB_depth_texture
extensions, as suggested by Brian. I'm not sure that any
implementations still remain that don't support those, but why not?
NOTE: This is a candidate for stable release branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
this fixes libdricore directory build with --enable-32-bit on a x86_64 system
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
The VTX_READ instructions were using the ADDRParam ComplexPattern which
allows a load instruction's offset to be a register, but VTX_READ
instructions can only handle an immediate offset.
Also, the load_param pattern fragment had an erroneous return true;
statement that was causing it to match the wrong load instructions.
Tungsten Graphics has not existed for several years, and the majority of
ongoing development and support is done by Intel. I chose to include
"Open Source Technology Center" to distinguish it from, say, the closed
source Windows OpenGL driver.
The one downside to this patch is that applications that pattern match
against "Intel" may start applying workarounds meant for the Windows
driver. However, it does seem like the right thing to do.
This does change oglconform behavior.
Acked-by: Eric Anholt <eric@anholt.net>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Acked-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
These look like debug messages from the switch-statement development.
NOTE: This is a candidate for the 8.0 release branch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Tom Stellard:
- Updated for gallium interface changes
- Fixed a few bugs:
+ Set the loop counter
+ Calculate the correct number of pipes
- Added hooks into the LLVM compiler
v2:
-Separate IR type and LLVM triple
-Do the OpenCL C->LLVM IR and linking steps for all PIPE_SHADER_IR
types.
v3:
- Coding style fixes
- Removed compatibility code for LLVM < 3.1
- Split build_module_llvm() into three functions:
compile(), link(), and build_module_llvm()
v4:
- Use struct pipe_compute_program
v5:
- Don't malloc memory for struct pipe_llvm_program
v6:
- Fix serialization of llvm bytecode
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
This structure is used as a header that precedes LLVM bytecode programs
that are passed to the drivers.
v2:
- s/pipe_compute_program/pipe_llvm_program/
v3:
- Rename to struct pipe_llvm_program_header
- Drop the char * prog member
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
This is for the llvm code that can't use extended initializers.
v2:
- Use const references for vector arguments
- Move constructor defs before data members
- Initialize all values in the default constructors
v3:
- Fix typo
A device now has two function for getting information about the IR
it needs to return.
ir_format() => returns the preferred IR
ir_target() => returns the triple for the target that is understood by
clang/llvm.
v2:
- renamed ir_target() to ir_format()
- renamed llvm_triple() to ir_target()
v3:
- Remove unnecessary include
- Do proper conversion from std::vector<char> to std::string
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
v2: Tom Stellard
- Update CAP description
v3: Tom Stellard
- TGSI targets should pass an empty string for this CAP.
v4: Tom Stellard
- TGSI targets can ignore this CAP.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
TEX instructions can't do saturation. Do the TEX into a temp reg w/out
saturation, then do a MOV_SAT.
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
Some distributions (like Arch Linux) make /usr/bin/python Python 3,
rather than Python 2. Since compare_ir uses /usr/bin/env python,
such systems will fail to run optimization-test, causing 'make check' to
always fail.
Automake's TESTS_ENVIRONMENT variable provides a mechanism to run
programs or set environment variables in the test environment.
Ideally, I think we would want to use AM_TESTS_ENVIRONMENT, since
TESTS_ENVIRONMENT is supposed to be user-overridable. However, it isn't
supported using the default/serial test runner.
Fixes 'make check' on Arch Linux and Gentoo.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Matt Turner <mattst88@gmail.com>
I started writing unit tests for a new piece of code, and discovered
they all failed due to a bug in ralloc. Clearly it needs a test suite.
v2: Rename to 'ralloc-test' and fix copyright date. (idr review)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
If an object is allocated out of the NULL context, info->parent will be
NULL. Using the PTR_FROM_HEADER macro would be incorrect: it would say
that ralloc_parent(ralloc_context(NULL)) == sizeof(ralloc_header).
Fixes the new "null_parent" unit test.
NOTE: This is a candidate for the 7.9, 7.10, 7.11, and 8.0 branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Discovered while running the Khronos conformance test suite and
receiving "implementation error: meta program compile failed."
This bug was recently introduced by the i965 clear patch set and would
only be detected while using the ES2 API and only on gen6+ hardware.
Signed-off-by: Oliver McFadden <oliver.mcfadden@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is performed in a subdirectory to avoid needing to convert all of
src/mesa/Makefile in one go.
I can now cherry-pick a commit containing glapi XML changes, do "(cd
src/mapi/glapi/gen && make) && make", and get a working driver.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In order to do the minimal change for libdricore conversion to
automake, I need to put its Makefile.am in a subdirectory. Automake
gets whiny/broken if you use GNU make features like "addprefix" or
"$(FILES:%=../%)" to munge your *_SOURCES. So, use a plain old
variable to be able to substitute in that "../"
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*_SOURCES is reserved for files lists for particular automake targets.
Also, "-" in the variable names is not allowed.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This variable won't be set when called from non-automake makefiles,
but it cleans up shared-glapi's output.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Mesa already always depends on python to build. The checked in
changes are not reviewed (because any trivial change rewrites the
world). We also have been pushing commits between xml change and
regen where at-build-time xml-generated code disagrees with committed
xml-generated code. And worst of all, sometimes we ("I") check in
*stale* xml-generated code.
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
commit 87f12bb2d9 tried to fix rb->mt
being NULL, but change this case wrong.
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Kurt Roeckx <kurt@roeckx.be>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We now model loading uses sgpr values with LLVM IR load instructions that
use the USER_SGPR address space.
The definition of the sgpr parameter to the use_sgpr() helper function
in radeonsi_shader.c has changed so that you can pass raw sgpr values
rather than having to divide the sgpr value you want to use by the dword
width of the type you want to load.
This function was causing compile errors in the tablegen'd code for
some intrinsic definitions. I don't think we really need this function,
so I'm removing the function body just as a temporary solution. I'll
look into removing the entire AMDILIntrinsicInfo class later.
v2: use a define for the maximum sample count
v3: also test odd sample counts (r300 supports MS3)
While multisample renderbuffers are supported by mesa, MS visuals
are not, so we need a way to tell dri/st not to advertise them even
if the gallium driver does support multisampled surfaces.
Otherwise applications selecting these non-functional visuals would
run into trouble ...
Reviewed-by: Brian Paul <brianp@vmware.com>
The code which scans the index buffer for restart indexes wasn't adding
the index buffer offset so we were always starting at offset=0. The
offset is usually zero so it wasn't noticed before.
Fixes a failure in the piglit primitive-restart test when testing
vertex data + index data in a single VBO.
NOTE: This is a candidate for the 8.0 branch.
Basic 4x MSAA support now works on Gen7. This patch enables it.
As with Gen6, MSAA support is still fairly preliminary. In
particular, the following are not yet supported:
- 8x oversampling (Gen7 has hardware support for this, but we do not
yet expose it).
- Fully general blits between MSAA and non-MSAA buffers.
- Formats other than RGBA8, DEPTH24, and STENCIL8.
- Centrold interpolation.
- Coverage parameters (glSampleCoverage, GL_SAMPLE_ALPHA_TO_COVERAGE,
GL_SAMPLE_ALPHA_TO_ONE, GL_SAMPLE_COVERAGE, GL_SAMPLE_COVERAGE_VALUE,
GL_SAMPLE_COVERAGE_INVERT).
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
On Gen6, the blending necessary to blit an MSAA surface to a non-MSAA
surface could be accomplished with a single texturing operation. On
Gen7, the WM program must fetch each sample and blend them together
manually. From the Bspec (Shared Functions/Messages/Initiating
Message/Message Types/sample):
[DevIVB+]:Number of Multisamples on the associated surface must be
MULTISAMPLECOUNT_1.
This patch implements the manual blend operation.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Since blorp uses color textures and render targets to do all its work
(even when blitting stencil and depth data), it always has to
configure the Gen7 GPU to use the new "sliced" MSAA layout. However,
when blitting stencil or depth data, the actual MSAA layout is
interleaved (as in Gen6). Therefore, blorp has to do extra coordinate
transformation work to account for the interleaving manually.
This patch causes blorp to perform the necessary extra coordinate
transformations.
It also modifies the blorp SURFACE_STATE setup code for Gen7, so that
it does not try to correct the surface width and height to account for
MSAA, since "sliced" MSAA layout doesn't affect the surface width or
height.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
When a Gen7 SURFACE_STATE is configured for MSAA, a number of
additional constaints come in to play. This patch adds a function
gen7_check_surface_setup() which verifies that all of those
constraints are met.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Starting in Gen7, there are two possible layouts for MSAA surfaces:
- Interleaved, in which additional samples are accommodated by scaling
up the width and height of the surface. This is the only layout
available in Gen6. On Gen7 it is used for depth and stencil
surfaces only.
- Sliced, in which the surface is stored as a 2D array, with array
slice n containing all pixel data for sample n. On Gen7 this layout
is used for color surfaces.
The "Sliced" layout has an additional requirement: it must be used in
ARYSPC_LOD0 mode, which means that the surface doesn't leave any extra
room between array slices for miplevels other than 0.
This patch modifies the surface allocation functions to use the
correct layout when allocating MSAA surfaces in Gen7, and to set the
array offsets properly when using ARYSPC_LOD0 mode. It also modifies
the code that populates SURFACE_STATE structures to ensure that
ARYSPC_LOD0 mode is selected in the appropriate circumstances.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Gen7 support for blorp (blits using the render bath) now works for
non-MSAA purposes. This patch enables it.
Since blorp operations re-use the logic for HiZ ops, this required
adding a case to the switch statement in gen7_blorp_emit_wm_config(),
to allow for the case where no HiZ op is being performed.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
On Gen6, texel fetch is always accomplished using the SAMPLE_LD
message, which accepts arguments (u, v, r, lod, si). On Gen7, there
are two* texel fetch messages: SAMPLE_LD for non-MSAA surfaces, taking
arguments (u, lod, v), and SAMPLE_LD2DSS for MSAA surfaces, taking
arguments (si, u, v).
*Technically, there are other texel fetch messages, but they are used
for "compressed" MSAA surfaces, which we don't yet support.
This patch adds the proper message types and argument orderings for
Gen7.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Gen7 hardware requires us to enable at least one WM dispatch mode,
even if there is no program being dispatched to. When this code was
only used for HiZ operations (which don't use a WM program), we used
32-pixel dispatch, because it didn't matter. But blit programs are
compiled for 16-pixel dispatch. So just enable 16-wide dispatch
unconditionally.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Enable 16-wide dispatch unconditionally rather than add the
unnecessary complication of using 32-wide dispatch when there is no WM
program.
On Gen7, push constants for shader programs are stored in the URB, so
blorp code needs to set aside space for them. This was previously
unnecessary because blorp code was based on HiZ operations, which
don't require any shaders.
This patch adds a call from gen7_blorp_exec() to
gen7_allocate_push_constants(), to ensure that push constants are
assigned the correct location in the URB. It also extracts a new
function gen7_emit_urb_state() from gen7_upload_urb(), which is
re-used by gen7_blorp_emit_urb_config() to ensure that the URB regions
used by all the pipeline stages leave room for the push constants.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We know from previous bug fixes (commits
c25e5300cb and
b2ace06cbb) that texture border color
doesn't work if the dynamic state upper bound is set to 0. Although
the blorp engine doesn't make use of texture borders, it seems like we
ought to err on the safe side and set this value properly.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch separates out the portions of gen6_blorp_emit_batch_head()
that emit 3DSTATE_MULTISAMPLE, 3DSTATE_SAMPLE_MASK, and
STATE_BASE_ADDRESS. This paves the way for making the blorp code work
on Gen7, where additional command packets
(3DSTATE_PUSH_CONSTANT_ALLOC_VS and 3DSTATE_PUSH_CONSTANT_ALLOC_PS)
need to be emitted before 3DSTATE_MULTISAMPLE.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch modifies the "blorp" WM program so that it can be run in
MSDISPMODE_PERSAMPLE (which means that every single sample of a
multisampled render target is dispatched to the WM program, not just
every pixel).
Previously we were using the ugly hack of configuring multisampled
destination surfaces as single-sampled, and generating sample indices
other than zero by swizzling the pixel coordinates in the WM program.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This patch modifies the function brw_blorp_blit_program::texel_fetch()
to emit the SI (sample index) argument to the SAMPLE_LD message when
reading from a sample index other than zero.
Previously we were using the ugly hack of configuring multisampled
source surfaces as single-sampled, and accessing sample indices other
than zero by swizzling the texture coordinates in the WM program.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch generalizes the function
brw_blorp_blit_program::texture_lookup() so that it prepares the
arguments to the sampler message based on a caller-provided array
rather than assuming the argument order is always (u, v).
This paves the way for the messages we will need to use in Gen7, which
use argument orders (u, lod, v) and (si, u, v) (si=sample index).
It will also will allow us to read from arbitrary sample indices on
Gen6, by supplying the arguments (u, v, r, lod, si) to the SAMPLE_LD
message instead of just (u, v).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Gen6 MSAA buffers (and Gen7 MSAA depth/stencil buffers) interleave
MSAA samples in a complex pattern that repeats every 2x2 pixel block.
Therefore, when allocating an MSAA buffer, we need to make sure to
allocate an integer number of 2x2 blocks; if we don't, then some of
the samples in the last row and column will be cut off.
Fixes piglit tests "EXT_framebuffer_multisample/unaligned-blit {2,4}
color msaa" on i965/Gen6.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Without passing the -ldflags parameter before $(LDFLAGS) in some cases
flags will be passed to MKLIB which it does not understand.
This might be -m64, -m32 or similar.
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Thomas Gstädtner <thomas@gstaedtner.net>
Signed-off-by: Brian Paul <brianp@vmware.com>
This patch gets the FreeBSD SCons build working again. The build still
fails though.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
We need to return immediately after inserting instructions that require
S_WAITCNT so that the parent class' custom inserter won't try to insert
them again.
Fix uninitialized scalar variable defects report by Coverity.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
We should just set the bits of functionality that we support; the
GL/ES1/ES2 flags in extensions.c will take care of advertising the
appropriate extensions for the current API.
This enables the GL_EXT_texture_compression_dxt1 extension on ES1/ES2
when libtxc_dxtn is installed or the force_s3tc driconf option is set.
The main extension code set this up properly, but the ES-specific code
failed to do so.
Otherwise, the extension strings reported by es1_info, es2_info, and
glxinfo all remain the same.
This patch manually disables the ARB_framebuffer_object bit on ES
to preserve the behavior of 1c0f5d8324.
v2: Rebase, fix the i915 Makefile, and unconditionally set the
OES_draw_texture bit as core Mesa will only apply it to ES1 now.
Tested-by: Daniel Charles <daniel.charles@intel.com> [v1]
Reviewed-by: Chad Versace <chad.versace@linux.intel.com> [v1]
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
If the primitive restart index and the primitive type can
be handled by the cut index feature, then use the hardware
to handle the primitive restart feature.
The VBO module's software handling of primitive restart is
used as a fall back.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
For newer hardware we disable the VBO module's software handling
of primitive restart. We now handle primitive restarts in
brw_handle_primitive_restart.
The initial version of brw_handle_primitive_restart simply calls
vbo_sw_primitive_restart, and therefore still uses the VBO
module software primitive restart support.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
When considering which components of a variable were killed by an
assignment, constant propagation would previously just use the write
mask of the assignment. This worked if the LHS of the assignment was
simple, e.g.:
v.xy = ...; // (assign (xy) (var_ref v) ...)
But it did the wrong thing if the LHS of the assignment involved an
array indexing operator, since in this case the write mask is always
(x):
v[i] = ...; // (assign (x) (deref_array (var_ref v) (var_ref i)) ...)
In general, we can't predict which vector component will be selected
by array indexing, so the only safe thing to do in this case is to
kill the entire variable.
Fixes piglit tests {fs,vs}-vector-indexing-kills-all-channels.shader_test.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Now that the linker handles initializers of samplers just like any
other uniform, a bunch of this annoying code is unnecessary.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The linker may have set initial values for uniforms. Propagate these
values to the driver's backing storage when it is first associated.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Fix handling of arrays-of-structure. Thanks to Eric Anholt for
pointing this out.
v3: Minor comment change based on feedback from Ken.
Fixes piglit glsl-1.20/execution/uniform-initializer/fs-structure-array
and glsl-1.20/execution/uniform-initializer/vs-structure-array.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Add support for gen6, and don't turn it on if blending is
disabled. (fixes GPU hang), and note it in docs/GL3.txt
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The i965 driver needed this as well for hardware setup, so instead of
duplicating the logic, just save it off.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
While it doesn't have the same warning in the simulator as in gen7,
let's emit it out of paranoia. We wouldn't want our resolves of some
previous clear to get clamped to some current clamping value.
Suggested-by: pretty much everyone
When doing fast clears, a fulsim warning said that the batch was being
emitted without the viewport set up. While the fast clear pass I was
looking at doesn't use the clear value, the later resolves which also
didn't set up the vieport would trigger the same. It's not obvious
from the error message whether it meant "fast clear value gets clamped
to something you haven't defined" or "fast clear value doesn't get
clamped, and I saw it was out of the current (uninitialized) range,
and you probably wanted it clamped to that (uninitialized) range". Be
paranoid and assume the first case.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Having this enum separate caused us to need a bunch of helper
functions to translate to the op to be executed.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
The GLSL clear path doesn't need any buffer presence checks, since
those are already handled in the normal drawing path code.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Our understanding is that the 3D engine is supposed to be faster
anyway. We used to have more overhead in our tri clear path than we
do today, which would have led to this choice. But given that we
almost always see a depth clear along with a color clear, the path was
hardly exercised anyway.
Also, the color mask logic was broken in the presence of
GL_EXT_draw_buffers2's per-buffer colormask.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Previously, when the environment variable INTEL_DEBUG=aub was set,
mesa would simply instruct DRM to start dumping data to an .aub file,
but we would not provide DRM with any information about the format of
the data in various buffers. As a result, a lot of the data in the
generate .aub file would be unannotated, making further data analysis
difficult.
This patch causes the entire contents of each batch buffer to be
annotated using the data in brw->state_batch_list (which was
previously used only to annotate the output of INTEL_DEBUG=bat). This
includes data that was allocated by brw_state_batch, such as binding
tables, surface and sampler states, depth/stencil state, and so on.
The new annotation mechanism requires DRM version 2.4.34.
Reviewed-by: Eric Anholt <eric@anholt.net>
When we are generating an AUB dump, we make a final call to
aub_dump_bmp() as the context is being destroyed, to ensure that any
rendering performed before the application exits can be seen during a
simulation run. However, we were doing this before flushing the batch
buffer; as a result simulation runs would not always see the effect of
all rendering commands.
This patch flushes the batch buffer just before making the final call
to aub_dump_bmp(), to ensure that all rendering is properly captured
in the final bitmap.
This is a long standing problem, that recently surfaced with the change
to enable perspective correct color interpolation.
A fix for all possible formats is left to the future.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Previously assumed normalised was 0 to 1, but it can be -1 to 1
if type is signed.
Tested with lp_test_conv and lp_test_format, reduced errors.
Signed-off-by: José Fonseca <jfonseca@vmware.com>
Fixing a /*FIXME*/ to remove errors in integer conversion in lp_build_conv.
Tested using lp_test_conv and lp_test_format, reduced errors.
Signed-off-by: José Fonseca <jfonseca@vmware.com>
This patch removes two Clang warnings in GLU:
The first one seems to be an actual bug in mapdesc.cc: Clang complains
that sizeof(dest) will return the size of REAL*[MAXCOORDS], instead of
the intended REAL[MAXCOORDS][MAXCOORDS]. The second one is just
cosmetic because Clang doesn't like extra parentheses.
NOTE: This is a candidate for the 8.0 branch
Reviewed-by: Brian Paul <brianp@vmware.com>
Fixes another case of sampler views being created by one context,
shared by another, then deleted by the first, leaving a dangling
pipe context pointer.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Use it where performance matters more and the exact method of float->int
conversion/rounding isn't terribly important. There should no net change
here since F_TO_I() is the new name of the old IROUND() function.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
The different implementations of IROUND() behaved differently and in
the case of fistp, depended on the current x86 FPU rounding mode.
This caused some tests like piglit roundmode-pixelstore and
roundmode-getintegerv to fail on 32-bit x86 but pass on 64-bit x86.
Now IROUND() always rounds to the nearest integer (away from zero).
The new F_TO_I function converts a float to an int by whatever means
is fastest. We'll use this where we're more concerned with performance
and not too worried to how the conversion is done.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
The IROUND converted all arguments to 0 or 1. That's not what we wanted.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
For zero-stride vertex arrays, the svga driver copies the value into
the constant value and uses that value in the shader. The recent
gallium-userbuf changes caused a regression in this. An example
symptom was per-primitive glColor3f() calls getting ignored.
Where we copied the vertex value from the vertex buffer to the
constant buffer we neglected to take into account the
pipe_vertex_buffer::buffer_offset field. Adding that value to the
source offset fixes the problem. Actually, it looks like we should
have been doing this all along, but it never was an issue before for
some reason.
If the MESA_GLSL env var contains "errors", GLSL compilation and
link errors will be reported to stderr.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Fix uninitialized scalar variable defect reported by Coverity.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Piglits test for fragment shaders pass, vertex shaders fail. The
actual failure seems to be in the interpolators, and not the
textureSize query.
Signed-off-by: Olivier Galibert <galibert@pobox.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: José Fonseca <jose.r.fonseca@gmail.com>
Fixes a bunch of piglit tests related to flat interpolation of floats.
Signed-off-by: Olivier Galibert <galibert@pobox.com>
Signed-off-by: José Fonseca <jose.r.fonseca@gmail.com>
The VBO module now can handle primitive restart in software
if required. Therefore this support is no londer required.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
If the PIPE_CAP_PRIMITIVE_RESTART screen param is not set, then enable
PrimitiveRestartInSoftware to enable software primitive restart
support in the VBO module.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
When PrimitiveRestartInSoftware is set, the VBO module will handle
primitive restart scenarios before calling the vbo->draw_prims
drawing function.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If set, then the VBO module will handle all primitive
restart scenarios before calling the driver draw_prims.
Software primitive restart support is disabled by default.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
vbo_sw_primitive_restart implements primitive restart in software
by splitting primitive draws apart.
This is based on similar support in mesa/state_tracker/st_draw.c.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It's an implied argument, and I don't think being explicit about it
helps.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The comment quotes spec saying that only scalar integers are allowed,
but we only checked for integer.
Fixes piglit switch-expression-const-ivec2.vert
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Total instructions: 261582 -> 261316
135/2147 programs affected (6.3%)
36752 -> 36486 instructions in affected programs (0.7% reduction)
This excludes a tropics shader that now gets 16-wide mode and throws
off the numbers. 5 shaders are hurt: two extra MOVs in 4 tropics
shaders it looks like because we don't split register names according
to independent webs, and one gstreamer shader where it looks like
try_rewrite_rhs_to_dst() is falling on its face.
This should also help avoid a regression in VSes from idr's ARB
programs to GLSL work.
By using the live variables code for determining interference, we can
handle coalescing in the presence of control flow, which the other
register coalescing path couldn't.
Total instructions: 207184 -> 206990
74/1246 programs affected (5.9%)
33993 -> 33799 instructions in affected programs (0.6% reduction)
There is a newerth shader that loses out, because of some extra MOVs
that now get their dead-code nature obscured by coalescing. This
should be fixed by doing better at dead code elimination.
Starting with LLVM 3.0, named structures are meant not for debugging, but
for recursive data types, previously also known as opaque types.
The recursive nature of these types leads to several memory management
difficulties. Given that we don't actually need recursive types, avoid
them altogether.
This is an attempt to address fdo bugs 41791 and 44466. The issue is
somewhat random so there's no easy way to check how effective this is.
No functional change. This patch replaces the
brw_blorp_params::exec() method with a global function
brw_blorp_exec() that performs the operation described by the params
data structure.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch enables MSAA for Gen6, by modifying intel_mipmap_tree to
understand multisampled buffers, adapting the rendering pipeline setup
to enable multisampled rendering, and adding multisample resolve
operations to brw_blorp_blit.cpp. Some preparation work is also
included for Gen7, but it is not yet enabled.
MSAA support is still fairly preliminary. In particular, the
following are not yet supported:
- Fully general blits between MSAA and non-MSAA buffers.
- Formats other than RGBA8, DEPTH24, and STENCIL8.
- Centroid interpolation.
- Coverage parameters (glSampleCoverage, GL_SAMPLE_ALPHA_TO_COVERAGE,
GL_SAMPLE_ALPHA_TO_ONE, GL_SAMPLE_COVERAGE, GL_SAMPLE_COVERAGE_VALUE,
GL_SAMPLE_COVERAGE_INVERT).
Fixes piglit tests "EXT_framebuffer_multisample/accuracy" on
i965/Gen6.
v2:
- In intel_alloc_renderbuffer_storage(), quantize the requested number
of samples to the next higher sample count supported by the
hardware. This ensures that a query of GL_SAMPLES will return the
correct value. It also ensures that MSAA is fully disabled on Gen7
for now (since Gen7 MSAA support doesn't work yet).
- When reading from a non-MSAA surface, ensure that s_is_zero is true
so that we won't try to read from a nonexistent sample.
This patch expands the "blorp" component to be able to perform blits
as well as HiZ resolves. The new blitting code is located in
brw_blorp_blit.cpp. This includes the necessary fragment shader code
to look up pixels in the source buffer (which is configured as a
texture) and output them to the destination buffer (which is
configured as the render target).
Most of the time the fragment shader code is simple and
straightforward, since it merely has to apply a coordinate offset,
read from the texture, and write to the render target. However, in
the case of blitting stencil buffers, things are more complicated,
since the GPU stores stencil data using W tiling, and W tiling is not
supported for textures or render targets. So, we set up the stencil
buffers as Y tiled, and emit fragment shader code that adjusts the
coordinates to account for the difference between W and Y tiling.
Furthermore, since a rectangular region in W tiling does not
necessarily correspond to a rectangular region in Y tiling, we widen
the rectangle primitive to the nearest tile boundary and have the
fragment shader "kill" any pixels that don't fall inside the actual
desired destination rectangle.
All of this is a necessary prerequisite for implementing MSAA, since
we'll need to be able to blit between multisample color, depth, and
stencil buffers and their non-multisampled counterparts, and none of
the existing blitting mechanisms support multisampling.
In addition, the new blitting code should speed up operations where we
previously fell back to software rasterization, such as blitting of
stencil buffers. The current fallback sequence is: first we try to do
a blit using the hardware blitting engine. If that fails we try to do
a blit using the render path. If that also fails then we do the blit
using a meta-op (which may or may not fall back to software
rasterization).
Note that blitting using the render path has some limitations at the
moment: it only supports a few formats, and it doesn't support
clipping or scissoring. These limitations will be addressed in future
patch series.
v2:
- Add the code that configures the WM program to
gen{6,7}_emit_wm_config() and gen7_emit_ps_config() rather than
creating separate ...enable() functions.
- Call intel_prepare_render before determining which miptrees we are
blitting from/to, because it may cause miptrees to be reallocated.
- Allow the blit to mirror X and/or Y coordinates.
- Disable blorp blits on Gen7 for now, since they aren't working yet.
This patch exposes the functions brw_get_surface_tiling_bits and
gen7_set_surface_tiling, so that they can be re-used when setting up
surface states in gen6_blorp.cpp and gen7_blorp.cpp.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
This patch splits up the gen6_blorp_exec and gen7_blorp_exec
functions, which were very long, into simple component functions.
With a few exceptions, there is one function per state packet.
This will allow blit functionality to be added without significantly
complicating the code.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
v2: Rename the functions gen{6,7}_emit_wm_disable() to
gen{6,7}_emit_wm_config() (since the WM is not actually disabled
during HiZ ops; it simply doesn't have a program). Also, on gen7,
split out the configration of 3DSTATE_PS to a separate function
gen7_emit_ps_config().
This patch groups together the parameters used by the HiZ functions
into a new data structure, brw_hiz_resolve_params, rather than passing
each parameter individually between the HiZ functions. This data
structure is a subclass of brw_blorp_params, which represents the
parameters of a general-purpose blit or resolve operation. A future
patch will add another subclass for blits.
In addition, this patch generalizes the (width, height) parameters to
a full rect (x0, y0, x1, y1), since blitting operations will need to
be able to operate on arbitrary rectangles. Also, it renames several
of the HiZ functions to reflect the expanded role they will serve.
v2: Rename brw_hiz_resolve_params to brw_hiz_op_params. Move
gen{6,7}_blorp_exec() functions back into gen{6,7}_blorp.h.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
When drawing to a FBO, the viewport wasn't always set correctly. It
was fine in the usual case of the viewport dims matching the surface
dims but broken otherwise. In particular, this was happening because
the viewport scale is negative for FBO rendering.
The piglit fbo-viewport test exercises this.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
We can save one instruction by lowering it to:
SUB_INT tmp, 0, src
MAX_INT dst, src, tmp
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
This patch adds .gitignore files to ignore the makefiles generated by
the gallium pipe loader and the clover OpenCL state tracker.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Previously, I tried implementing this in the i965 driver, but did so
in a way that violated the intent of the spec, and broke Tropics.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This will be convenient when I want to comment out optimization code
to see the raw program being optimized, but more importantly will let
the interference check be used during optimization.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
We could do more by handling abs/negate and non-GRF sources, but this is
a good start. Improves tropics performance 0.30% +/- .17% (n=43).
shader-db results:
Total instructions: 208032 -> 207184
60/1246 programs affected (4.8%)
23286 -> 22438 instructions in affected programs (3.6% reduction)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When I had a bug causing the backend to never finish optimizing, it
also sent me deep into swap. This avoids extra memory allocation per
trip through optimization, and thus may reduce the peak memory
allocation of the driver even in the success case.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Total instructions: 18210 -> 17836
49/163 programs affected (30.1%)
12888 -> 12514 instructions in affected programs (2.9% reduction)
This reduces Lightsmark's "Scale down filter" shader from 395
instructions to 283, a whopping 28%. It also reduces register pressure
significantly: the SIMD8 program now uses 29 registers instead of 101,
giving us more than enough room for a SIMD16 program.
v2: Add && !inst->conditional_mod to the "skip some instructions" check.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This lets you omit some ampersands and is more idiomatic C++. Using
const also marks the function as not altering either register (which
was obvious, but nice to enforce).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This allows us to calculate the triangle's area using fixed point,
previously it was cacluated in floating point space. It was possible
that a triangle which had negative area in floating point space had
a positive area in fixed point space.
Fixes fdo 40920.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
As pointed out by Marek, if we have only one cb, we may as well add this
single register write here rather than adding it in the draw loop.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Vertex and index buffers are never used by hardware, only by Draw.
SWTCL chipsets usually have very little memory, so this might help
with stability and reliability.
Instead of having to hack the code to enable these debugging options,
set them through the MESA_DEBUG env var.
Reviewed-by: Eric Anholt <eric@anholt.net>
This flag has been around for a while but it wasn't actually used anywhere.
Now, setting this flag causes a glFlush() to be issued after each
drawing call (including glBegin/End, glDrawElements, glDrawArrays,
glDrawPixels, glCopyPixels and glBitmap).
This was being done in the _mesa_Flush/Finish() calls but if there
was an internal call to _mesa_flush/finish() the FLUSH_VERTICES()
wouldn't happen. Looks like only the intel and radeon drivers made
such calls in MakeCurrent().
When glColorMaterial() is used to latch glColor commands to a material
attribute, glMaterial calls to change that material should become no-ops.
This failed to work properly when the glMaterial call was inside a
display list.
This removes the Material function from the vbo_attrib_tmp.h template
file. We have separate/different implementations for the "save" and
"exec" cases now.
NOTE: This is a candidate for the 8.0 branch.
_mesa_material_bitmask() will record a GL error and return 0 if
face or mode are illegal. Return early in that case.
NOTE: This is a candidate for the 8.0 branch.
Some code relies on the existing of an invalid texture target. It seems
safer to bring it back than to deal with unintended consequences.
This partially reverts commit a4ebb04214.
Reviewed-by: Brian Paul <brianp@vmware.com>
Contains the following patches squashed in:
commit 9fff1dc0875f7c9591550fa3ebbe1ba7a18483fa
Author: Tom Stellard <thomas.stellard@amd.com>
Date: Tue Mar 20 23:20:03 2012 +0100
configure.ac: Build gallium loader when OpenCL is enabled
commit 542111cb02957418c6a285cb6ef2924e49adc66e
Author: Tom Stellard <thomas.stellard@amd.com>
Date: Tue Mar 20 23:30:29 2012 +0100
configure.ac: Add sw/null to GALLIUM_WINSYS_DIRS for gallium loader
commit 876f8de46062dde76b6075be3b6628f969b16648
Author: Tom Stellard <thomas.stellard@amd.com>
Date: Thu Feb 9 11:26:05 2012 -0500
configure.ac: Require gcc > 4.6.0 for clover
commit 99049d50fa3d9a23297ae658189c19c89dca1766
Author: Tom Stellard <thomas.stellard@amd.com>
Date: Tue Mar 20 23:32:06 2012 +0100
configure.ac: Require Gallium drm loader when gallium loader is enabled
No longer silently exclude this when building OpenCL drivers
for nouveau and r600.
Add a test program that tries to exercise some of the language
features commonly used by compute programs at the Gallium API level:
- Correctness of the values returned by the grid parameters.
- Proper functioning of resource LOADs and STOREs.
- Subroutine calls.
- Argument passing to the compute parameter through the INPUT
memory space.
- Mapping of buffer objects to the GLOBAL memory space.
- Proper functioning of the PRIVATE and LOCAL memory spaces.
- Texture sampling and constant buffers.
- Support for multiple kernels in the same program.
- Indirect resource indexing.
- Formatted resource loads and stores (i.e. with channel conversion
and scaling) using several different formats.
- Proper functioning of work-group barriers.
- Atomicity and semantics of the atomic opcodes.
As of now all of them seem to pass on my nvA8.
It simplifies things slightly, and besides, it makes possible to
execute the trivial tests on a hardware device instead of being
limited to software rendering.
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
This target generates pipe driver modules intended to be consumed by
auxiliary/pipe-loader. Most of it was taken from the "gbm" target --
the duplicated code will be replaced with references to this target in
a future commit.
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
The goal is to have a uniform interface to create winsys and
pipe_screen instances for any driver, exposing the device enumeration
capabilities that might be supported by the operating system (for now
there's a "drm" back-end using udev and a "sw" back-end that always
returns the same built-in devices).
The typical use case of this library will be:
>
> struct pipe_loader_device devs[n];
> struct pipe_screen *screen;
>
> pipe_loader_probe(&devs, n);
>[pick some device from the array...]
>
> screen = pipe_loader_create_screen(dev, library_search_path);
>[do something with screen...]
>
> screen->destroy(screen);
> pipe_loader_release(&devs, N);
>
A part of the code was taken from targets/gbm/pipe_loader.c, which
will be removed and replaced with calls into this library by a future
commit.
Add a shader cap for specifying the preferred shader representation.
Right now the only supported value is TGSI, other enum values will be
added as they are needed.
This is mainly to accommodate AMD's LLVM compiler back-end by letting
it bypass the TGSI representation for compute programs. Other drivers
will keep using the common TGSI instruction set.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
This change will be useful to implement function parameter passing on
top of TGSI. As we don't have a proper stack, a register-based
calling convention will be used instead, which isn't necessarily a bad
thing given that GPUs often have plenty of registers to spare.
Using the same register space for local temporaries and
inter-procedural communication caused some inefficiencies, because in
some cases the register allocator would lose the freedom to merge
temporary values together into the same physical register, leading to
suboptimal register (and sometimes, as a side effect, instruction)
usage.
The LOCAL declaration modifier specifies that the value isn't intended
for parameter passing and as a result the compiler doesn't have to
give any guarantees of it being preserved across function boundaries.
Ignoring the LOCAL flag doesn't change the semantics of a valid
program in any way, because local variables are just supposed to get a
more relaxed treatment. IOW, this should be a backwards-compatible
change.
Normal resource access (e.g. the LOAD TGSI opcode) is supposed to
perform a series of conversions to turn the texture data as it's found
in memory into the target data type.
In compute programs it's often the case that we only want to access
the raw bits as they're stored in some buffer object, and any kind of
channel conversion and scaling is harmful or inefficient, especially
in implementations that lack proper hardware support to take care of
it -- in those cases the conversion has to be implemented in software
and it's likely to result in a performance hit even if the pipe_buffer
and declaration data types are set up in a way that would just pass
the data through.
Add a declaration flag that marks a resource as typeless. No channel
conversion will be performed in that case, and the X coordinate of the
address vector will be interpreted in byte units instead of elements
for obvious reasons.
This is similar to D3D11's ByteAddressBuffer, and will be used to
implement OpenCL's constant arguments. The remaining four compute
memory spaces can also be understood as raw resources.
This texture type was already referred to by the documentation but it
was never defined. Define it as 0 to match the pipe_texture_target
enumeration values.
Move Interpolate, Centroid and CylindricalWrap from tgsi_declaration
to a separate token -- they only make sense for FS inputs and we need
room for other flags in the top-level declaration token.
This commit splits the current concept of resource into "sampler
views" and "shader resources":
"Sampler views" are textures or buffers that are bound to a given
shader stage and can be read from in conjunction with a sampler
object. They are analogous to OpenGL texture objects or Direct3D
SRVs.
"Shader resources" are textures or buffers that can be read and
written from a shader. There's no support for floating point
coordinates, address wrap modes or filtering, and, unlike sampler
views, shader resources are global for the whole graphics pipeline.
They are analogous to OpenGL image objects (as in
ARB_shader_image_load_store) or Direct3D UAVs.
Most hardware is likely to implement shader resources and sampler
views as separate objects, so, having the distinction at the API level
simplifies things slightly for the driver.
This patch introduces the SVIEW register file with a declaration token
and syntax analogous to the already existing RES register file. After
this change, the SAMPLE_* opcodes no longer accept a resource as
input, but rather a SVIEW object. To preserve the functionality of
reading from a sampler view with integer coordinates, the
SAMPLE_I(_MS) opcodes are introduced which are similar to LOAD(_MS)
but take a SVIEW register instead of a RES register as argument.
Define an interface that exposes the minimal functionality required to
implement some of the popular compute APIs. This commit adds entry
points to set the grid layout and other state required to keep track
of the usual address spaces employed in compute APIs, to bind a
compute program, and execute it on the device.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
This patch renames the gen6_hiz.h and gen7_hiz.h files to correspond
to the renames of the corresponding .cpp files (see previous commit).
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
This patch converts the files gen6_hiz.c and gen7_hiz.c to C++, in
preparation for expanding the HiZ code to support arbitrary blits.
The new files are called gen6_blorp.cpp and gen7_blorp.cpp to reflect
the expanded role that this code will serve--"blorp" stands for "BLit
Or Resolve Pass".
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Previous to this patch, gen6_hiz.c contained two implicit type casts
from void * to a a non-void pointer type. This is allowed in C but
not in C++. This patch makes the type casts explicit, so that
gen6_hiz.c can be converted into a C++ file.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
In C++, if a struct is defined inside another struct, or its name is
first seen inside a struct or function, the struct is nested inside
the namespace of the struct or function it appears in. In C, all
structs are visible from toplevel.
This patch explicitly moves the decalartions of intel_batchbuffer to
toplevel, so that it does not get nested inside a namespace when
header files are included from C++.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
These declarations are necessary to allow C++ code to call C code
without causing unresolved symbols (which would make the driver fail
to load).
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Fix wrong cube/3D texture layout for the tailing levels whose width or
height is smaller than the align unit.
From 965 B-spec http://intellinuxgraphics.org/VOL_1_graphics_core.pdf at
page 135:
All of the LOD=0 q-planes are stacked vertically, then below that,
the LOD=1 qplanes are stacked two-wide, then the LOD=2 qplanes are
stacked four-wide below that, and so on.
Thus we should always inrease pack_x_nr, which results to the pitch of LODn
may greater than the pitch of LOD0. So we should refactor mt->total_width
when needed.
This would fix the following webgl test case on all gen4 platforms:
conformance/textures/texture-size-cube-maps.html
NOTE: This is a candidate for stable release branches.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
This points to the object with the function body, allowing us to map
from a built-in prototype to the actual body with IR code to execute.
Signed-off-by: Olivier Galibert <galibert@pobox.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
- copy_masked_offset copies part of a constant into another,
assign-like.
- copy_offset copies a constant into (a subset of) another,
funcall-return like.
These methods are to be used to trace through assignments and function
calls when computing a constant expression.
Signed-off-by: Olivier Galibert <galibert@pobox.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net> [v1]
The method is used to get a reference to an ir_constant * within the
context of evaluating an assignment when calculating a
constant_expression_value.
Signed-off-by: Olivier Galibert <galibert@pobox.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net> [v1]
We were looping over all the vector components, but only dealing with
the first one. This was masked by the fact that constant expression
handling on built-ins went through custom code for the lessThan()
/function/ rather than the ir_binop_less expression operator.
NOTE: This is a candidate for all release branches.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Olivier Galibert <galibert@pobox.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
The vbo module recomputes its states if _NEW_ARRAY is set, so it shouldn't use
the same flag to notify the driver. Since we've run out of bits in NewState
and NewState is for core Mesa anyway, we need to find another way.
This patch is the first to start decoupling the state flags meant only
for core Mesa and those only for drivers.
The idea is to have two flag sets:
- gl_context::NewState - used by core Mesa only
- gl_context::NewDriverState - used by drivers only (the flags are defined
by the driver and opaque to core Mesa)
It makes perfect sense to use NewState|=_NEW_ARRAY to notify the vbo module
that the user changed vertex arrays, and the vbo module in turn sets
a driver-specific flag to notify the driver that it should update its vertex
array bindings.
The driver decides which bits of NewDriverState should be set and stores them
in gl_context::DriverFlags. Then, Core Mesa can do this:
ctx->NewDriverState |= ctx->DriverFlags.NewArray;
This patch implements this behavior and adapts st/mesa.
DriverFlags.NewArray is set to ST_NEW_VERTEX_ARRAYS.
Core Mesa only sets NewDriverState. It's the driver's responsibility to read
it whenever it wants and reset it to 0.
Reviewed-by: Brian Paul <brianp@vmware.com>
In the future we'd like to treat vertex arrays as a state and
not as a parameter to the draw function. This is the first step
towards that goal. Part of the goal is to avoid array re-validation
for every draw call.
This commit adds:
const struct gl_client_array **gl_context::Array::_DrawArrays.
The pointer is changed in:
* vbo_draw_method
* vbo_rebase_prims - unused by gallium
* vbo_split_prims - unused by gallium
* st_RasterPos
Reviewed-by: Brian Paul <brianp@vmware.com>
Replacing "float equal to 1.0f" with "int not equal to 0".
This should help for further optimization of boolean computations.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
We're using float as default type, so basically for every instruction that
wants other types for dst/src operands we need to perform the bitcast
to/from default float. Currently bitcast produces no-op MOV instruction,
will be eliminated later.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
In i965 Gen7, Mesa has for a long time used the "depth coordinate
offset X/Y" settings (in 3DSTATE_DEPTH_BUFFER) to cause the GPU to
render to miplevels other than 0. Unfortunately, this doesn't work,
because these offsets must be aligned to multiples of 8, and miplevels
in the depth buffer are only guaranteed to be aligned to multiples of
4. When the offsets aren't aligned to a multiple of 8, the GPU
sometimes hangs.
As a temporary measure, to avoid GPU hangs, this patch smashes the 3
LSB's of "depth coordinate offset X/Y" to 0. This results in
incorrect rendering to mipmapped depth textures, but that seems like a
reasonable stopgap while we figure out a better solution.
Avoids GPU hangs in piglit test "depthstencil-render-miplevels" at
texture sizes that are not powers of 2.
Reviewed-by: Chad Verace <chad.versace@linux.intel.com>
In i965 Gen6, Mesa has for a long time used the "depth coordinate
offset X/Y" settings (in 3DSTATE_DEPTH_BUFFER) to cause the GPU to
render to miplevels other than 0. Unfortunately, this doesn't work,
because these offsets must be aligned to multiples of 8, and miplevels
in the depth buffer are only guaranteed to be aligned to multiples of
4. When the offsets aren't aligned to a multiple of 8, the GPU
sometimes hangs.
As a temporary measure, to avoid GPU hangs, this patch smashes the 3
LSB's of "depth coordinate offset X/Y" to 0. This results in
incorrect rendering to mipmapped depth textures, but that seems like a
reasonable stopgap while we figure out a better solution.
(Note that we have only ever observed this GPU hang on Gen6 when HiZ
is enabled, so another possible stopgap would be to disable HiZ).
Avoids GPU hangs in piglit test "depthstencil-render-miplevels" at
texture sizes that are not powers of 2.
Reviewed-by: Chad Verace <chad.versace@linux.intel.com>
When the user attaches a texture to one of the depth/stencil
attachment points (GL_STENCIL_ATTACHMENT or GL_DEPTH_ATTACHMENT), we
check to see if the same texture is also attached to the other
attachment point, and if so, we re-use the existing texture
attachment. This is necessary to ensure that if the user later
queries what is attached to GL_DEPTH_STENCIL_ATTACHMENT, they will not
receive an error.
If, however, the user attaches buffers to the two different attachment
points using different parameters (e.g. a different miplevel), then we
can't re-use the existing texture attachment, because it is pointing
to the wrong part of the texture. This might occur as a transitory
condition if, for example, if the user attached miplevel zero of a
texture to GL_STENCIL_ATTACHMENT and GL_DEPTH_ATTACHMENT, rendered to
it, and then later attempted to attach miplevel one of the same
texture to GL_STENCIL_ATTACHMENT and GL_DEPTH_ATTACHMENT.
This patch causes Mesa to check that GL_STENCIL_ATTACHMENT and
GL_DEPTH_ATTACHMENT use the same attachment parameters before
attempting to share the texture attachment.
On i965 Gen6, fixes piglit tests
"texturing/depthstencil-render-miplevels 1024 depth_stencil_shared"
and "texturing/depthstencil-render-miplevels 1024
stencil_depth_shared".
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
When rendering to a miplevel other than 0 within a color, depth,
stencil, or HiZ buffer, we need to tell the GPU to render to an offset
within the buffer, so that the data is written into the correct
miplevel. We do this using a coarse offset (in pages), and a fine
adjustment (the so-called "tile_x" and "tile_y" values, which are
measured in pixels).
We have always computed the coarse offset and fine adjustment using
intel_renderbuffer_tile_offsets() function. This worked fine for
color and combined depth/stencil buffers, but failed to work properly
when HiZ and separate stencil were in use. It failed to work because
there is only one set of fine adjustment controls shared by the HiZ,
depth, and stencil buffers, so we need to choose tile_x and tile_y
values that are compatible with the tiling of all three buffers, and
then compute separate coarse offsets for each buffer.
This patch fixes the HiZ and separate stencil case by replacing the
call to intel_renderbuffer_tile_offsets() with calls to two functions:
intel_region_get_tile_masks(), which determines how much of the
adjustment can be performed using offsets and how much can be
performed using tile_x and tile_y, and
intel_region_get_aligned_offset(), which computes the coarse offset.
intel_region_get_tile_offsets() is still used for color renderbuffers,
so to avoid code duplication, I've re-worked it to use
intel_region_get_tile_masks() and intel_region_get_aligned_offset().
On i965 Gen6, fixes piglit tests
"texturing/depthstencil-render-miplevels 1024 X" where X is one of
(depth, depth_and_stencil, depth_stencil_single_binding, depth_x,
depth_x_and_stencil, stencil, stencil_and_depth, stencil_and_depth_x).
On i965 Gen7, the variants of
"texturing/depthstencil-render-miplevels" that contain a stencil
buffer still fail, due to another problem: Gen7 seems to ignore the 3
LSB's of the tile_y adjustment (and possibly also tile_x).
v2: Removed spurious comments. Added assertions to check
preconditions of intel_region_get_aligned_offset().
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
This patch removes ARB_framebuffer_object from the GLES1 and GLES2
extension lists in intel_extensions_es.c.
Fixes a crash in the Android browser on Ice Cream Sandwich.
The Android browser crashed because it did the following, which is legal
in GLES2 but not in ARB_framebuffer_object.
glGenFramebuffers(1, &fb);
glBindFramebuffer(GL_FRAMEBUFFER, fb);
// render render render...
glDeleteFramebuffers(1, &fb);
// go do other stuff...
glBindFramebuffer(GL_FRAMEBUFFER, fb);
// This bind unexpectedly failed, and the app panics.
The semantics of glBindFramebuffer specified by ARB_framebuffer_object (a
desktop GL extension) and GLES2 specs are incompatible. The ideal solution
to fix this is to create separate API entry points for glBindFramebuffer,
one for GL and the other for GLES2. But, until that work is complete,
disabling ARB_framebuffer_object in GLES2 contexts safely fixes the problem.
Likewise, the semantics of glBindFramebuffer in ARB_framebuffer_object and
of glBindFramebufferOES in OES_framebuffer_object (a GLES1 extension) are
incompatible. Even though the functions have different names, the semantic
difference still results in a bug because both API calls are implemented
by a single function, _mesa_BindFramebufferEXT, which handles the semantic
difference incorrectly. Again, disabling ARB_framebuffer_object in GLES1
contexts safely fixes this problem.
According to the ARB_framebuffer_object spec, the extension is an
amalgamation of
EXT_framebuffer_object
EXT_framebuffer_blit
EXT_packed_depth_stencil
EXT_framebuffer_multisample
By disabling this extension, however, no functionality is removed from
GLES1 and GLES2 contexts because 1) the first three extensions are
explicitly enabled in Intel's ES extension lists and 2) no functionality
of the last extension is exposed in an ES context.
Note: This is a candidate for the 8.0 branch.
See-also: http://www.mail-archive.com/mesa-dev@lists.freedesktop.org/msg21006.html
CC: Charles Johnson <charles.f.johnson@intel.com>
CC: Sean Kelley <sean.v.kelley@intel.com>
Reviewed-by: Ian Romanick <idr@freedesktop.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
When primitive restart is enabled, and glArrayElement is called
with the restart index value, then call glPrimitiveRestartNV.
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Brian Paul<brianp@vmware.com>
The signal.h include was missed in the commit
bc16c73407 which leads to broken
compilations under Linux.
Signed-off-by: José Fonseca <jose.r.fonseca@gmail.com>
When doing the var->assigned change in
f2475ca424, I overzealously indented the
second block of code into the "if (var)" test. Revert these blocks to
the way they were before, just taking advantage of "var" to avoid
re-calling variable_referenced().
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=49066
I only considered var->assigned for FragColor and FragData, but
ignored when it was false for out vars. Fixes piglit
write-gl_FragColor-and-not-user-output.frag
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=49068
It seems silly that GL lets you allocate these given that they're
framebuffer attachment incomplete, but the webgl conformance tests
actually go looking to see if the getters on 0-width/height
depth/stencil renderbuffers return good values. By failing out here,
they all got smashed to 0, which turned out to be correct for all the
getters they tested except for GL_RENDERBUFFER_INTERNAL_FORMAT. Now,
by succeeding but not making a miptree, that one also returns the
expected value.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
The index is also used for GL_ARB_blend_func_extended. Cloning in
i965 was dropping a non-ARB_explicit_attrib_location index.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
When glTexImage or glCopyTexImage is called with internalFormat being a
generic compressed format (like GL_COMPRESSED_RGB) we need to do the same
error checks as for specific compressed formats. In particular, check if
the texture target is compatible with the format. None of the texture
compression formats we support so far work with GL_TEXTURE_1D, for example.
See also https://bugs.freedesktop.org/show_bug.cgi?id=49124
NOTE: This is a candidate for the 8.0 branch.
Otherwise it fails like so:
CC egl_dri2.lo
In file included from egl_dri2.h:40:0,
from egl_dri2.c:42:
../../../../../../src/egl/wayland/wayland-drm/wayland-drm.h:8:41:
fatal error: wayland-drm-server-protocol.h: No such file or directory
compilation terminated.
This new gbm entry point allows writing data into a gbm bo. The bo has
to be created with the GBM_BO_USE_WRITE flag, and it's only required to
work for GBM_BO_USE_CURSOR_64X64 bos.
The gbm API is designed to be the glue layer between EGL and KMS, but there
was never a mechanism initialize a buffer suitable for use with KMS
hw cursors. The hw cursor bo is typically not compatible with anything EGL
can render to, and thus there's no way to get data into such a bo.
gbm_bo_write() fills that gap while staying out of the efficient
cpu->gpu pixel transfer business.
Reviewed-by: Ander Conselvan de Oliveira <conselvan2@gmail.com>
At this point, in order for OpenCL to work correctly with r600g, OpenCL
specific intrinsics need to be defined in the LLVM tree. So, we need
to check for these intrinsics in the LLVM include directory to make sure
not to re-define them.
This is a pseudo instruction that enables the LLVM backend to encode
instructions and pass it through r600_bytecode_build()
Signed-off-by: Tom Stellard <thomas.stellard@amd.com>
This moves the alpha test control to derived state and disables alpha
testing for integer fbs.
fbo-blending test in piglit gets further when we do this (not a pass
but less fail).
v2: drop the fb_sx_alpha_test_control
Signed-off-by: Dave Airlie <airlied@redhat.com>
- Move to lp_bld_const where it belongs
- Rename to lp_build_const_string
- take the length from the argument (and don't count the zero terminator twice)
- bitcast the constant to generic i8 *
Allows the creation of const aos masks which have the mask swizzled
to match the correct format.
Updated existing mask creation code to use the swizzled version where
necessary (tgsi register masks and llvmpipe aos blending).
Signed-off-by: José Fonseca <jfonseca@vmware.com>
With this feature enabled, the LLVM backend will dump the MachineIntrs
prior to emitting code. The mesa env variable R600_DUMP_SHADERS will enable
this feature in the backend.
Fix uninitialized scalar field defect reported by Coverity.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
We can't delete MASK_WRITE instructions from the program, because this
will cause instructions being masked by MASK_WRITE to be marked dead and
then deleted in the dce pass.
Enabled MESA_FORMAT_RGBX8888_REV for RGBX. Android software
requires RGBX8888 format to be supported for software rendering.
That requires EGL to be capable of generating images from this
format.
Signed-off-by: Sean V Kelley <sean.v.kelley@linux.intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Add new format __DRI_IMAGE_FORMAT_XBGR8888 to __DRI_IMAGE.
HAL_PIXEL_FORMAT_RGBX_8888 now maps to __DRI_IMAGE_FORMAT_XBGR8888.
Signed-off-by: Sean V Kelley <sean.v.kelley@linux.intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Only images created with intel_create_image() had the field properly
set. Set it also on intel_dup_image(), intel_create_image_from_name()
and intel_create_image_from_renderbuffer().
GL_ARB_texture_storage says:
The commands eglBindTexImage, wglBindTexImageARB, glXBindTexImageEXT or
EGLImageTargetTexture2DOES are not permitted on an immutable-format
texture.
They will generate the following errors:
- EGLImageTargetTexture2DOES: INVALID_OPERATION
- eglBindTexImage: EGL_BAD_MATCH
- wglBindTexImage: ERROR_INVALID_OPERATION
- glXBindTexImageEXT: BadMatch
Fixing the EGL and GLX cases requires extending the DRI interface,
since setTexBuffer2 doesn't currently return any error information.
Reviewed-by: Brian Paul <brianp@vmware.com>
Adapted drivers: i915, llvmpipe, r300, r600, radeonsi, softpipe.
User index buffers have been disabled in nv30, nv50, nvc0 and svga to keep
things working.
This reduces CPU overhead in st_draw_vbo and removes a lot of unnecessary code
in that function which was required only to comply with the gallium interface,
but wasn't any useful really.
Adapted drivers: i915, llvmpipe, r300, softpipe.
No changes required in: r600, radeonsi.
User vertex buffers have been disabled in nv30, nv50, nvc0 and svga to keep
things working.
This is required for any serious constant buffer support.
Constant buffer offsets on ATI and NVIDIA DX10 and DX11 GPUs must be
a multiple of 256.
In OpenGL, this can be queried via GL_UNIFORM_BUFFER_OFFSET_ALIGNMENT.
As noted in commit be4e46b21a,
this was missing before.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
A little analysis shows that the worst-case value for "nr" is 17:
- base_mrf = 2 ... 2
- header present (say gen == 5) ... 4
- aa_dest_stencil_reg (stencil test) ... 5
- SIMD16 mode: += 4 * reg_width ... 13
- source_depth_to_render_target ... 15
- dest_depth_reg ... 17
This resulted in us setting base_mrf to 2 and mlen to 15. In other
words, we'd try to use m2..m16. But m16 doesn't exist pre-Gen6. Also,
the instruction scheduler data structures use arrays of size 16, so this
would cause us to access them out of bounds.
While the debugger system routine may need m0 and m1, we don't use it
today, so the simplest solution is just to move base_mrf back to 1.
That way, our worst case message fits in m1..m15, which is legal.
An alternative would be to fail on SIMD16 in this case, but that seems
a bit unfortunate if there's no real need to reserve m0 and m1.
Fixes new piglit test shaders/depth-test-and-write on Ironlake.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=48218
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
To ensure that the alloca is at the top of the function body, otherwise
LLVM will not eliminate them, causing stack misalignment on 32bits.
Reviewed-by: James Benton <jbenton@vmware.com>
This is taken from the ogl-math project, with Inverse renamed to adj
(since it's not actually the inverse), transposed, and our types
plugged in. There are potential CSE opportunities in this code
(particularly for hardware with RCP but not DIV), but we should be
doing CSE anyway, so don't hand-optimize.
Fixes piglit inverse tests.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
This takes advantage of the builtin compiler to generate IR into a
string, the same way we read GLSL for function prototypes for our
profiles.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I keep getting lost in the Makefile trying to figure out what to edit
to work on builtin_compiler or glsl_compiler.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It appears that when using 'ld' with the offset bits, address bounds
checking happens before the offset is applied, so parts of the drawing
in piglit texelFetchOffset() with a negative texcoord go black.
It appears that when using 'ld' with the offset bits, address bounds
checking happens before the offset is applied, so parts of the drawing
in piglit texelFetchOffset() with a negative texcoord go black.
This couldn't be split because it would break bisecting.
Summary:
* r300g,r600g: stop using u_vbuf
* r300g,r600g: also report that the FIXED vertex type is unsupported
* u_vbuf: refactor for use in the state tracker
* cso: wire up u_vbuf with cso_context
* st/mesa: conditionally install u_vbuf
This adds the ability to initialize u_vbuf_caps before creating u_vbuf itself.
It will be useful for determining if u_vbuf should be used or not.
Also adapt r300g and r600g.
This fixes an assertion failure since:
commit 81afdd20f3
vbo: don't check twice whether it's valid to render
FLUSH_CURRENT may set _NEW_CURRENT_ATTRIB.
Reviewed-by: Brian Paul <brianp@vmware.com>
If the source region for a glCopyPixels is completely outside the
source buffer bounds, no-op the copy. Fixes a failed assertion.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The LLVM backend can now be enabled for r600g by using the
--enable-r600-llvm-compiler configure flag. If you configure with this
flag, you can still use the default compiler by setting the envrionment
variable R600_USE_LLVM=0
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Otherwise HAVE_LLVM won't be included in the $(DEFINES) variable for
Automake generated Makefiles.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
This shaves 2k off the final dri.so, and removes lots of pointless
NULL, 0 passing.
most like pointless - but it looked nicer to me.
Signed-off-by: Dave Airlie <airlied@redhat.com>
The __glapi_gentable_set_remaining_noop() routine treats the _glapi_struct
as an array of _glapi_get_dispatch_table_size() pointers, so we have to
allocate _glapi_get_dispatch_table_size()*sizeof(void*) bytes rather
than sizeof(struct _glapi_struct) bytes.
Reviewed-by: Jeremy Huddleston <jeremyhu@apple.com>
Alexandre Demers sent me some cayman results with no major problems.
I'll rip out the env var in a week or so.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Full piglit run on my rv610 with no regressions.
This only leaves cayman, however my cayman is resisting my attempt
to get through a full piglit run.
Signed-off-by: Dave Airlie <airlied@redhat.com>
I've done a piglit run on rv740 and confirmed no regressions.
We don't get GL3 on r700 due to transform feedback being busted still.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This cap is used by u_blitter to decide if it can use integers
in vertex data.
fixes some crashes with glsl130 in piglit
Signed-off-by: Dave Airlie <airlied@redhat.com>
I've done a piglit run on my SUMO machine and I see no regressions.
Lots of things to fix (skip->fail), but hey maybe we can fix them
if we can see them.
I'll try and work my way across r600,700,cayman sometime if nobody
else gets to them.
Signed-off-by: Dave Airlie <airlied@redhat.com>
The field wasn't actually used before and it's not used now either.
But this is a more logical place for it and will hopefully allow
doing smarter draw/array validation (per array object) in the future.
Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Our previous live interval analysis just said that anything in a loop
was live for the whole loop. If you had to spill a reg in a loop,
then we would consider the unspilled value live across the loop too,
so you never made progress by spilling. Eventually it would consider
everything in the loop unspillable and fail out.
With the new analysis, things completely deffed and used inside the
loop won't be marked live across the loop, so even if you
spill/unspill something that used to be live across the loop, you
reduce register pressure. But you usually don't even have to spill
any more, since our intervals are smaller than before.
This fixes assertion failure trying to compile the shader for the
"glyphy" text rasterier and piglit glsl-fs-unroll-explosion.
Improves Unigine Tropics performance 1.3% +/- 0.2% (n=5), by allowing
more shaders to be compiled in 16-wide mode.
This takes the fs_inst list generated by the visitor, and generates a
list of basic blocks with edges between them. This is a building
block for data-flow analysis.
We were checking for these at link time previously, which is not as
early as mandated, and would actually fail to detect conflicting
writes if dead code removal removed some writes.
Fixes failures in piglit
glsl-*/compiler/fragment-outputs/write-gl_Frag*
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This will be used for some compile-and-link-time error checking, where
currently we've been doing error checking only at link time.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This runs optimization-test and produces the usual automake test
output, which may be interesting to automated build systems.
This doesn't convert the tests to be individually exposed to the
automake runner, because automake doesn't like wildcards (due to being
nonportable in make, not that we care).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is the reason the declaration member existed in the reference
visitor, but I didn't copy the code from structure splitting that
avoided setting it.
This wasn't currently a problem, because we don't allow splitting of
in/out variables. But that would be nice to change some day.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This was carried over from structure splitting, without thinking about
whether the name still made sense in this context.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes encoding of VOP3 shader instructions.
The shift was wrong for source registers 2 and 3, and the resulting value was
only 32 bits, so the shift in SICodeEmitter::VOPPostEncode() didn't work as
intended.
If a non-default array object was bound at context destruction time
we'd try to unreference the array object after it was already deleted
in _mesa_free_varray_data(). Now do the unref first.
Fixes a regression from commit 86f53e6d6b.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
It's not nice when you have several variables pointing to the same array
and you wanna ask your editor "where is this used" and you only get an answer
for one of the four currval, legacy_currval, generic_currval, mat_currval,
which is quite useless, because you never see the whole picture.
Let's get rid of the additional pointers.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
It's already done in _mesa_validate_Draw* and it's not needed to do it again
unless I am missing something.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
This is a frequently-updated state and _NEW_ARRAY already causes revalidation
of the vbo module. It's kinda counter-productive to recompute arrays
in the vbo module if _NEW_ARRAY is set and then set _NEW_ARRAY again.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
This moves the RebindArrays flag into the vbo module, consolidates the code,
and adds missing vbo_draw_method calls.
Also with this change, the vertex arrays are not needlessly recalculated twice.
The issue with the old code was:
- If recalculate_input_bindings updates vp_varying_inputs, _NEW_ARRAY is set.
- _mesa_update_state is called and the vp_varying_inputs change causes
regeneration of the fixed-function shaders, which also sets _NEW_PROGRAM.
- The occurence of either _NEW_ARRAY or _NEW_PROGRAM sets
the recalculate_inputs flag to TRUE again.
- The new code sets the flag to FALSE after the second _mesa_update_state,
because there can't possibly be any change which would require recalculating
the arrays.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Vinson reported that we failed to initialize this, which would lead to
all kinds of crashes if we actually used it. Since we don't use it,
we may as well just delete the broken code.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Now that we use separate binding tables for WM, VS, and GS, and have
BRW_MAX_VS_SURFACES and BRW_MAX_GS_SURFACES macros, we really shouldn't
have an unqualified BRW_MAX_SURFACES macro. It's confusing.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
They had a number of issues:
- A paragraph states that we use a single binding table, but we don't.
- We labelled the WM binding table diagram as SOL/WM.
- The WM diagram had an "Only relevant to the WM" comment. Duh.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
This change uses the array object factory for gl_array_objects. This
prevents crashes when deriving from gl_array_object.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
We don't normally clear immediately after drawing something. But as it
was, the drawing would incorrectly appear after the clear.
Fixes piglit clear-varray-2.0 failure.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Deletes a lot of pointless duplication, as well as some run-time effort.
Conveniently, GLSL 1.40 no longer needs a .vert variant, since it
doesn't define any built-ins specific to the vertex shader stage.
ARB_texture_rectangle and OES_EGL_image_external also only need a single
profile, since the .vert and .frag variants were identical.
I didn't bother with EXT_texture_array and OES_texture_3D because
they're so tiny that the savings would be miniscule.
Cuts the generated builtin_function.cpp from 1.7MB to 1.0MB (41%).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
The built-in subsystem uses "profiles," or GLSL shaders containing
prototypes for all built-ins supported within a particular language
version (or extension) and shader stage.
Since profiles were stage-specific, we had to cut and paste almost all
the prototypes between (e.g.) 110.vert and 110.frag. Naturally, this
led to sundry cut and paste bugs, where someone fixed an issue in .frag
but neglected to update .vert, or vice-versa. Geometry shaders would
have only made this worse.
This patch introduces support for a new '.glsl' profile suffix which
contains prototypes common to all shader stages. The existing '.frag'
and '.vert' profiles need only contain the few stage-specific built-ins.
Not only does this remove duplication, it makes built-in setup slightly
faster: we don't need to re-read the common prototypes and function
bodies for both the vertex and fragment shader stage.
Internally, this was trivial. We already create a list of gl_shader
objects to search through for built-ins: one for the core language
version/stage, and additional shaders for any extensions in use. This
patch simply adds another shader to the list: core/common, core/stage,
and extensions.
The next patch will update the profiles to remove the duplication.
It's separated out purely to make review easier.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Accelerates a few glReadPixels cases for WebGL.
See https://bugs.freedesktop.org/show_bug.cgi?id=48545
v2: Per Jose, use bit twiddling for the swizzle case instead of ubyte
arrays (it's about 44% faster).
Note: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
The GLSL 1.30 -> 4.10 specs all erroneously say "vec2" for a few
overloads of textureProjGradOffset, while most overloads and all other
texturing functions use ivec types.
The GLSL 4.20 specification corrects these to "ivec2", but doesn't
mention this as being a conscious change in behavior. Nor does the
ARB_shading_language_420pack extension. So presumably it was a typo.
At any rate, our builtin functions all use ivec already, so the fact
that these prototypes use plain vecs will only lead to applications
dying in a fire when trying to use them.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This reverts commit 4ec449a6ed.
I meant to not push this one. Review found that a link error is not
mandated: it should link, but you get undefined rendering if you rely
on a missing stage.
page 42/55 section 2.11 "Vertex Shaders":
"If the program object has no vertex shader, or no program object
is currently in use, the results of vertex shader execution are
undefined."
(and similar for page 160/173 section 3.9 "Fragment Shaders" for FS,
and page 45/58 section 2.11.2 "Program Objects" for program being 0)
It turns out the commit was broken anyway, because it was missing a
"goto done", so linkstatus got smashed back to true later and the
error just showed up as a warning in the infolog.
All I know of that needs finishing in Mesa is to enable the extension
in a GL3.1 core context on i965 -- we're not going to expose it in
non-3.1 core contexts.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes the new piglit texelFetch() tests on these. Note that the rest
of the new functions are not tested (same as the non-2DRect versions
of most of them).
The non-integer versions were already reserved in 1.30, but apparently
these were forgotten.
Fixes piglit glsl-1.40/compiler/reserved/
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Prevents this error with Automake 1.9:
src/gallium/drivers/Makefile.am: C objects in subdir but
`AM_PROG_CC_C_O' not in `configure.ac'
autoreconf: automake failed with exit status: 1
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
radeonsi and r600 have duplicate symbols, so it's not possible to
statically link both. Remove the newcomer, radeonsi, until duplicate
symbols are fixed.
Most things that work on Fermi should work on Kepler too.
There are a few performance optimizations left to do, like better
placement of texture barriers and adding scheduling data to the
shader instructions (without them, a thread group will be masked
for 32 cycles after each single instruction issue).
Edit: Don't do it for the main function of (graphics) shaders,
its inputs and outputs always go through TGSI_FILE_INPUT/OUTPUT.
This prevents all TEMPs from counting as live out and reduces
register pressure.
The point is to keep an independent dictionary for each function.
The array that was being used as dictionary has been converted into a
"bimap" for two different reasons: first, because having an almost
empty instance of an array with as many entries as registers there are
in the program, once for every function, would be wasteful, and
second, because we want to be able to map Value pointers back to
locations at some point.
The reason is that several passes (regalloc, function argument
binding, inlining) are going to require the callees of a function to
be processed before the caller.
Instruction attributes like WriteALUResult and ALUResultCompare
were being discarded during the some of the local transformations.
This fixes the following piglit tests:
glsl1-inequality (vec2, pass)
loopfunc
fs-any-bvec2-using-if
fs-op-ne-bvec2-bvec2-using-if
fs-op-ne-ivec2-ivec2-using-if
fs-op-ne-mat2-mat2-using-if
fs-op-ne-vec2-vec2-using-if
fs-op-ne-mat2x3-mat2x3-using-if
fs-op-ne-mat2x4-mat2x4-using-if
https://bugs.freedesktop.org/show_bug.cgi?id=45921
NOTE: This is a candidate for the stable branches.
The loop registers weren't being cleared, so any shader that was
executed after a shader containing loops was at risk of having a loop
randomly inserted into it.
This fixes over one hundred piglit tests, although these test
only failed during full piglit runs and would pass if
run individually. The exact number of piglit tests that this patch
fixes will vary depending on the version of piglit and the order the
tests are run.
NOTE: This is a candidate for the stable branches.
This lets us significantly shorten p->instructions->push_tail(ir), and
will be used in a few more places.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now we can fold a bunch of our expression setup in ff_fragment_shader
into single-line, parseable commits.
v2: Make it actually work. I wasn't setting num_components in the
mask structure, and not setting up a mask structure is way easier.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Having to explicitly dereference is irritating and bloats the code,
when the compiler can detect and do the right thing.
v2: Use a little shim class to produce the automatic dereference
generation at compile time as opposed to runtime, while also
allowing compile-time type checking.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The C++ constructors with placement new, while functional, are
extremely verbose, leading to generation of simple GLSL IR expressions
like (a * b + c * d) expanding to many lines of code and using lots of
temporary variables. By creating a new ir_builder.h that puts simple
generators in our namespace and taking advantage of ralloc_parent(),
we can generate much more compact code, at a minor runtime cost.
v2: Replace ir_instruction usage with just ir_rvalue.
v3: Drop remaining missed as_rvalue() in v2.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The primary motivation for this rewrite was to have a maintainable driver
going forward, as nvfx was quite horrible in a lot of ways.
The driver is heavily based on the design of the nv50/nvc0 3d drivers we
already have, and uses the same common buffer/fence code. It also passes
a HEAP more piglit tests than nvfx did, supports a couple more features,
and a few more to come still probably.
The CPU footprint of this driver is far far less than nvfx, and translates
into far greater framerates in a lot of applications (unless you're using
a CPU that's way way newer than the GPUs of these generations....)
Basically, we once again have a maintained driver for these chipsets \o/
Feel free to report bugs now!
This driver hasn't been maintained properly for a very long time, and for
many very good reasons. It's horrible.
A new driver supporting these chipsets will appear with the commits that
port vieux/nv50/nvc0 to libdrm_nouveau-2.0.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
TEXTURED_TRIANGLE and MULTITEX_TRIANGLE are both a bit special in that if
you use any other graph object in the meantime they'll forget their state
and spew a lovely METHOD_CNT error at you when you try to draw.
The pre-newlib driver has a flush_notify() hook which does this state
re-emit, and a number of random workarounds like extra flushes and state
dirtying after various operations to solve this issue.
I'm taking a slightly different approach to things instead, which has the
nice side-effect of removing the divergent code-paths for ttri/mtri, the
flush/dirty workarounds and the need for flush_notify. Also gives a few
FPS boost in OA, yay.
This is just a function to tell if a certain blend mode requires dual sources.
v2: move to inlines as per Brian's suggestion
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds the blend mode mapping, it also uses the var->index in the
glsl to tgsi convertor - this is the other half of my using 4 in the GLSL
compiler.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds index support to the GLSL compiler.
I'm not 100% sure of my approach here, esp without how output ordering
happens wrt location, index pairs, in the "mark" function.
Since current hw doesn't ever have a location > 0 with an index > 0,
we don't have to work out if the output ordering the hw requires is
location, index, location, index or location, location, index, index.
But we have no hw to know, so punt on it for now.
v2: index requires layout - catch and error
setup explicit index properly.
v3: drop idx_offset stuff, assume index follow location
Signed-off-by: Dave Airlie <airlied@redhat.com>
Add implementations of the two API functions,
Add a new strings to uint mapping for index bindings
Add the blending mode validation for SRC1 + SRC_ALPHA_SATURATE
Add get for MAX_DUAL_SOURCE_DRAW_BUFFERS
v2:
Add check in valid_to_render to address case in spec ERRORS.
v3:
Add index to ir.h so this patch compiles on its own
fixup comment
v4: fixup Brian's comments
The GLSL patch will setup the indices.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This commit adds initial support for acceleration
on SI chips. egltri is starting to work.
The SI/R600 llvm backend is currently included in mesa
but that may change in the future.
The plan is to write a single gallium driver and
use gallium to support X acceleration.
This commit contains patches from:
Tom Stellard <thomas.stellard@amd.com>
Michel Dänzer <michel.daenzer@amd.com>
Alex Deucher <alexander.deucher@amd.com>
Vadim Girlin <vadimgirlin@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The following commits were squashed in:
======================================================================
radeonsi: Remove unused winsys pointer
This was removed from r600g in commit:
commit 96d882939d
Author: Marek Olšák <maraeo@gmail.com>
Date: Fri Feb 17 01:49:49 2012 +0100
gallium: remove unused winsys pointers in pipe_screen and pipe_context
A winsys is already a private object of a driver.
======================================================================
radeonsi: Copy color clamping CAPs from r600
Not sure if the values of these CAPS are correct for radeonsi, but the
same changed were made to r600g in commit:
commit bc1c836938
Author: Marek Olšák <maraeo@gmail.com>
Date: Mon Jan 23 03:11:17 2012 +0100
st/mesa: do vertex and fragment color clamping in shaders
For ARB_color_buffer_float. Most hardware can't do it and st/mesa is
the perfect place for a fallback.
The exceptions are:
- r500 (vertex clamp only)
- nv50 (both)
- nvc0 (both)
- softpipe (both)
We also have to take into account that r300 can do CLAMPED vertex colors only,
while r600 can do UNCLAMPED vertex colors only. The difference can be expressed
with the two new CAPs.
======================================================================
radeonsi: Remove PIPE_CAP_OUTPUT_READ
This CAP was dropped in commit:
commit 04e3240087
Author: Marek Olšák <maraeo@gmail.com>
Date: Thu Feb 23 23:44:36 2012 +0100
gallium: remove PIPE_SHADER_CAP_OUTPUT_READ
r600g is the only driver which has made use of it. The reason the CAP was
added was to fix some piglit tests when the GLSL pass lower_output_reads
didn't exist.
However, not removing output reads breaks the fallback for glClampColorARB,
which assumes outputs are not readable. The fix would be non-trivial
and my personal preference is to remove the CAP, considering that reading
outputs is uncommon and that we can now use lower_output_reads to fix
the issue that the CAP was supposed to workaround in the first place.
======================================================================
radeonsi: Add missing parameters to rws->buffer_get_tiling() call
This was changed in commit:
commit c0c979eebc
Author: Jerome Glisse <jglisse@redhat.com>
Date: Mon Jan 30 17:22:13 2012 -0500
r600g: add support for common surface allocator for tiling v13
Tiled surface have all kind of alignment constraint that needs to
be met. Instead of having all this code duplicated btw ddx and
mesa use common code in libdrm_radeon this also ensure that both
ddx and mesa compute those alignment in the same way.
v2 fix evergreen
v3 fix compressed texture and workaround cube texture issue by
disabling 2D array mode for cubemap (need to check if r7xx and
newer are also affected by the issue)
v4 fix texture array
v5 fix evergreen and newer, split surface values computation from
mipmap tree generation so that we can get them directly from the
ddx
v6 final fix to evergreen tile split value
v7 fix mipmap offset to avoid to use random value, use color view
depth view to address different layer as hardware is doing some
magic rotation depending on the layer
v8 fix COLOR_VIEW on r6xx for linear array mode, use COLOR_VIEW on
evergreen, align bytes per pixel to a multiple of a dword
v9 fix handling of stencil on evergreen, half fix for compressed
texture
v10 fix evergreen compressed texture proper support for stencil
tile split. Fix stencil issue when array mode was clear by
the kernel, always program stencil bo. On evergreen depth
buffer bo need to be big enough to hold depth buffer + stencil
buffer as even with stencil disabled things get written there.
v11 rebase on top of mesa, fix pitch issue with 1d surface on evergreen,
old ddx overestimate those. Fix linear case when pitch*height < 64.
Fix r300g.
v12 Fix linear case when pitch*height < 64 for old path, adapt to
libdrm API change
v13 add libdrm check
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
======================================================================
radeonsi: Remove PIPE_TRANSFER_MAP_PERMANENTLY
This was removed in commit:
commit 62f44f670b
Author: Marek Olšák <maraeo@gmail.com>
Date: Mon Mar 5 13:45:00 2012 +0100
Revert "gallium: add flag PIPE_TRANSFER_MAP_PERMANENTLY"
This reverts commit 0950086376.
It was decided to refactor the transfer API instead of adding workarounds
to address the performance issues.
======================================================================
radeonsi: Handle PIPE_VIDEO_CAP_PREFERED_FORMAT.
Reintroduced in commit 9d9afcb5ba.
======================================================================
radeonsi: nuke the fallback for vertex and fragment color clamping
Ported from r600g commit c2b800cf38.
======================================================================
radeonsi: don't expose transform_feedback2 without kernel support
Ported from r600g commit 15146fd1bc.
======================================================================
radeonsi: Handle PIPE_CAP_GLSL_FEATURE_LEVEL.
Ported from r600g part of commit 171be75522.
======================================================================
radeonsi: set minimum point size to 1.0 for non-sprite non-aa points.
Ported from r600g commit f183cc9ce3.
======================================================================
radeonsi: rework and consolidate stencilref state setting.
Ported from r600g commit a2361946e7.
======================================================================
radeonsi: cleanup setting DB_SHADER_CONTROL.
Ported from r600g commit 3d061caaed.
======================================================================
radeonsi: Get rid of register masks.
Ported from r600g commits
3d061caaed13b646ff40754f8ebe73f3d4983c5b..9344ab382a1765c1a7c2560e771485edf4954fe2.
======================================================================
radeonsi: get rid of r600_context_reg.
Ported from r600g commits
9344ab382a1765c1a7c2560e771485edf4954fe2..bed20f02a771f43e1c5092254705701c228cfa7f.
======================================================================
radeonsi: Fix regression from 'Get rid of register masks'.
======================================================================
radeonsi: optimize r600_resource_va.
Ported from r600g commit 669d8766ff.
======================================================================
radeonsi: remove u8,u16,u32,u64 types.
Ported from r600g commit 78293b99b2.
======================================================================
radeonsi: merge r600_context with r600_pipe_context.
Ported from r600g commit e4340c1908.
======================================================================
radeonsi: Miscellaneous context cleanups.
Ported from r600g commits
e4340c1908a6a3b09e1a15d5195f6da7d00494d0..621e0db71c5ddcb379171064a4f720c9cf01e888.
======================================================================
radeonsi: add a new simple API for state emission.
Ported from r600g commits
621e0db71c5ddcb379171064a4f720c9cf01e888..f661405637bba32c2cfbeecf6e2e56e414e9521e.
======================================================================
radeonsi: Also remove sbu_flags member of struct r600_reg.
Requires using sid.h instead of r600d.h for the new CP_COHER_CNTL definitions,
so some code needs to be disabled for now.
======================================================================
radeonsi: Miscellaneous simplifications.
Ported from r600g commits 38bf276348 and
b0337b679a.
======================================================================
radeonsi: Handle PIPE_CAP_QUADS_FOLLOW_PROVOKING_VERTEX_CONVENTION.
Ported from commit 8b4f7b0672.
======================================================================
radeonsi: Use a fake reloc to sleep for fences.
Ported from r600g commit 8cd03b933c.
======================================================================
radeonsi: adapt to get_query_result interface change.
Ported from r600g commit 4445e170be.
clang warns on these:
stroker.c:626:19: warning: implicit conversion from enumeration
type 'VGPathCommand' to different enumeration type 'VGPathSegment'
[-Wconversion]
No change in the underlying value.
Reviewed-by: Brian Paul <brianp@vmware.com>
Noticed by clang:
brw_wm_surface_state.c:330:30: warning: initializer overrides prior
initialization of this subobject [-Winitializer-overrides]
[MESA_FORMAT_Z24_S8] = 0,
^
brw_wm_surface_state.c:326:30: note: previous initialization is here
[MESA_FORMAT_Z24_S8] = 0,
^
No functionality change, since the array is declared static so
it was zero-initialized by default.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Silences a clang warning:
format_pack.c:2546:30: warning: implicit conversion from 'int' to
'GLubyte' (aka 'unsigned char') changes value from 65535 to 255
[-Wconstant-conversion]
d[i] = d[i] ? 0xffff : 0x0;
~ ^~~~~~
Reviewed-by: Brian Paul <brianp@vmware.com>
Noticed by clang:
egl_st.c:57:50: warning: field precision should have type 'int',
but argument has type 'size_t' (aka 'unsigned long') [-Wformat]
ret = util_snprintf(path, sizeof(path), "%.*s/%s" UTIL_DL_EXT,
~~^~
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
C still treats array arguments exactly like pointer arguments.
By sheer coincidence, this still worked fine on 64-bit
machines where 2 * sizeof(float) == sizeof(void*), but not
on 32-bit.
Noticed by clang:
text.c:76:51: warning: sizeof on array function parameter will
return size of 'const VGfloat *' (aka 'const float *') instead of
'const VGfloat [2]' [-Wsizeof-array-argument]
memcpy(glyph->glyph_origin, glyphOrigin, sizeof(glyphOrigin));
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
Noticed by clang:
eglimage.c:48:28: warning: argument to 'sizeof' in 'memset' call is
the same expression as the destination; did you mean to dereference
it? [-Wsizeof-pointer-memaccess]
memset(attrs, 0, sizeof(attrs));
~~~~~ ^~~~~
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
Most of the 256 values in the 'generic_to_slot' table were supposed to
be initialized with the default value 0xff, but were left at zero
(from CALLOC_STRUCT()) instead.
Noticed by clang:
u_linkage.h:60:31: warning: argument to 'sizeof' in 'memset' call is the same expression as the destination;
did you mean to provide an explicit length? [-Wsizeof-pointer-memaccess]
memset(table, 0xff, sizeof(table));
~~~~~ ^~~~~
Also fix a signed/unsigned comparison and a comment typo here.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
container_of() can legally return anything, even invalid addresses
that cause segfaults, when 'sample' is an uninitialized pointer.
Bug exposed by clang.
NOTE: This is a candidate for the 8.0 branch.
Fix uninitialized scalar field defect reported by Coverity.
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Commit 272bc48976 removed the damage implementation for the
wl_buffer_interface because that has been removed from git master of
Wayland. However this breaks building with the 0.85 branch of Wayland
because it would end up initialising the struct incorrectly.
For the time being it's quite convenient for some compositors to track
the 0.85 branch of Wayland because the protocol is stable but they
will also want to track the master branch of Mesa so that they can use
the gbm surface changes.
This patch adds a compile-time check for the version of Wayland so
that it can work with either Wayland master or the 0.85 branch.
krh: Edited to also account for API changes in 6802eaa68, which
removes the timestamp argument from wl_resource_destroy().
The upstream of gtest has decided that the intended usage model is for
projects to import the source and use it, which is reflected in their
recent removal of the gtest-config tool.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
By making a bool fs_reg only have a defined low bit (matching CMP
output), instead of being a full 0 or 1 value, we reduce the ANDs
generated in logic chains like:
if (v_texcoord.x < 0.0 || v_texcoord.x > texwidth ||
v_texcoord.y < 0.0 || v_texcoord.y > 1.0)
discard;
My concern originally when writing this code was that we would end up
generating unnecessary ANDs on bool uniforms, so I put the ANDs right
at the point of doing the CMPs that otherwise set only the low bit.
However, in order to use a bool, we're generating some instruction
anyway (e.g. moving it so as to produce a condition code update), and
those instructions can often be turned into an AND at that point. It
turns out in the shaders I have on hand, none of them regress in
instruction count:
Total instructions: 262649 -> 262545
39/2148 programs affected (1.8%)
14253 -> 14149 instructions in affected programs (0.7% reduction)
This change (before the previous two) produced a .23% +/- .11%
performance improvement in Unigine Tropics at 1024x768 on IVB.
Total instructions: 269270 -> 262649
614/2148 programs affected (28.6%)
179386 -> 172765 instructions in affected programs (3.7% reduction)
v2: Move some of the logic of finding the instruction that produced
the result of an expression tree to a helper.
This should fit in well with our lower_mat_op_to_vec code: now, in
addition to having expressions on each column of a matrix, we also
split the columns to separate variables so they can be tracked
individually by the copy propagation, dead code, and other passes.
This optimizes out some more code generation in unigine and gstreamer
shaders.
Total instructions: 269342 -> 269270
14/2148 programs affected (0.7%)
2226 -> 2154 instructions in affected programs (3.2% reduction)
I've had this code laying around almost done for a long time. The
idea is like opt_structure_splitting, that we've got a bunch of
transforms at the GLSL IR level that only understand scalars and
vectors, which just skip complicated dereferences. While driver
backends may manage some optimization after they split matrices up
themselves, it would be better to bring all of our optimization to
bear on the problem.
While I wasn't expecting changes quite yet, a few programs end up
winning: a gstreamer convolution shader, and the Humus dynamic
branching demo:
Total instructions: 269430 -> 269342
3/2148 programs affected (0.1%)
1498 -> 1410 instructions in affected programs (5.9% reduction)
The Android build was broken by
commit ca760181b4
Author: Kristian Høgsberg <krh@bitplanet.net>
Date: Fri Mar 16 12:55:40 2012 -0400
shared-glapi: Convert to automake
The offending change was that it redefined the filepaths in sources.mak
like this:
- FOO_FILES := bar.c
+ FOO_FILES := $(TOP)/src/mapi/mapi/bar.c
This broke the build because source filepaths in Android makefiles must be
relative to the makefile.
Ideally, this could be fixed by reverting the change in sources.mak and
making shared-glapi's Makefile.am use $(addprefix $(TOP)/src/mapi/mapi,
$(FOO_FILES)). However, automake doesn't understand builtin GNU make
functions, such as addprefix. So, it seems that automake and Android can
no longer share sources.mak.
Fix the build by duplicating the source lists from sources.mak into
Android.mk.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Keep a reference to any newly allocated aux buffers to avoid
re-allocating for every st_framebuffer_validate() (i.e. leaking).
Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
When using a separate stencil buffer, i965 requires that the pitch of
the buffer (in the 3DSTATE_STENCIL_BUFFER command) be specified as 2x
the actual pitch.
Previously this was accomplished by doubling the "cpp" and "pitch"
values stored in the intel_region data structure, and halving the
height. However, this was confusing, and it led to a subtle (but
benign) bug: since a stencil buffer is W-tiled, its true height must
be aligned to a multiple of 64; we were accidentally aligning its faux
height to a multiple of 64, causing memory to be wasted.
Note that for window system stencil buffers, the DDX also doubles the
cpp and pitch values. To facilitate fixing this DDX server bug in the
future, we fix the cpp and pitch values we receive from the X server
only if cpp has the "incorrect" value of 2.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
v2: Clarify comments about the DDX.
This is a related fix for the Wayland change:
commit 83685c506e76212ae4e5cb722205d98d3b0603b9
Author: Kristian Høgsberg <krh@bitplanet.net>
Date: Mon Mar 26 16:33:24 2012 -0400
Remove wl_buffer.damage and simplify shm implementation
Apparently, this should also fix a memory leak. When wl_buffer.damage
was removed from Wayland and Mesa was not fixed, wl_buffer.destroy ended
up in the (empty) damage function instead of calling
wl_resource_destroy().
Spotted during build as:
CC wayland-drm-protocol.lo
wayland-drm.c:80:2: warning: initialization from incompatible pointer type
wayland-drm.c:82:1: warning: excess elements in struct initializer
wayland-drm.c:82:1: warning: (near initialization for 'drm_buffer_interface')
Signed-off-by: Pekka Paalanen <ppaalanen@gmail.com>
Fixes uninitialized member defects reported by Coverity.
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
This was hacked in in one place for EGL image stuff, but the right
thing to do was just to provide the mapping from the mesa format to
the native hardware format, which includes render target support.
This turns out to be required for GL_ARB_texture_buffer_object, which
sees data in this layout.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It turns out this field *is* used, and it's the stride between samples
from the buffer. Discovered during TBO debugging.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
There was a function full of unused mappings from the GLenum to
datatype/comps, but that wasn't all the information a driver would
want, which includes the other fields that a gl_format has. Given
that all the texture buffer formats were represented in gl_format,
just use that as our description.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We have to skip some work that wants to look at texture images, since
buffer textures don't have any of that complexity.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
All that should be needed is that it exists. Fixes segfaults on first
_mesa_update_context() with a samplerBuffer-using shader active but
without a particular buffer texture enabled.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fix texelFetch(sampler2DRect) and textureSize(samplerBuffer)
generation to not reference a LOD at the same time because it's easier
than not fixing it.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The samplerBuffer type will be undefined in !glsl 1.40, and the
keyword is marked as reserved. The [iu]samplerBuffer types are not
marked as reserved pre-1.40, so they don't have separate tokens and
fall through to normal type handling.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We're supposed to just immediately call it. Fixes piglit
GL_ARB_texture_buffer_object/dlist
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is set correctly in gl.spec, but was missed in Mesa. As a
result, only one of the two was hooked up in Mesa.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We have lexer recognition of a bunch of our types based on the
handling. This code was mapping those recognized tokens to an enum
and then to a string of their name. Just drop the enums and provide
the string directly in the parser.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Nothing actually relied on them being mutable, and there was at least
one cast which discarded const qualifiers. The next patch would have
introduced many more.
Casting away const qualifiers should be avoided if at all possible.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
In "release" builds, Mesa would print this message if the MESA_DEBUG
variable was set. Make it so for debug builds as well.
I build debug builds all the time, but I'm not debugging this.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
While ir_to_mesa contains code that attempts to support functions, I
honestly doubt it's been tested and have little confidence that it
works.
The comment in visit(ir_function *ir) doesn't inspire confidence:
/* Ignore function bodies other than main() -- we shouldn't see calls to
* them since they should all be inlined before we get to ir_to_mesa.
*/
Furthermore, hardware drivers such as i915, i965, and (AFAICT) r200
don't support the BGNSUB/ENDSUB/CAL opcodes anyway. Only swrast does.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This never worked. brwProgramStringNotify also explicitly rejects
programs that use CAL and RET. So there's no need for this to exist.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
When SPRITE_POINT_ENABLE bit is set, the texture coord would be
replaced, and this is only needed when we called something like
glTexEnvi(GL_POINT_SPRITE, GL_COORD_REPLACE, GL_TRUE).
And more, we currently handle varying inputs as texture coord,
we would be careful when setting this bit and set it just when
needed, or you will find the value of varying input is not right
and changed.
Thus we do set SPRITE_POINT_ENABLE bit only when all enabled tex
coord units need do CoordReplace. Or fallback is needed to make
sure the rendering is right.
With handling the bit setup at i915_update_sprite_point_enable(),
we don't need the relative code at i915Enable then.
This patch would _really_ fix the webglc point-size.html test case and
of course, not regress piglit point-sprite and glean-pointSprite
testcase.
NOTE: This is a candidate for stable release branches.
v2: fallback just when all enabled tex coord units need do
CoordReplace (Eric)
v3: move the sprite point validate code at I915InvalidateState (Eric)
v4: sprite point enable bit update based on _NEW_PROGRAM, too
add relative _NEW-state comments to show what state is being used(Eric)
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Fix 'set but not used' warnings; gl_version, gl_versions_profiles and
glx_extensions variables are used just only HAVE_XCB_GLX_CREATE_CONTEXT
is defined. Thus those warnings are shown when that macro isn't defined.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Fixes clang error:
tgsi/tgsi_dump.c:72:12: error: no member named '__printf_chk' in 'struct dump_ctx'
ctx->printf( ctx, "%u", e );
~~~ ^
/usr/include/bits/stdio2.h:109:3: note: expanded from macro 'printf'
__printf_chk (__USE_FORTIFY_LEVEL - 1, __VA_ARGS__)
^
Idea stolen from:
http://www.mail-archive.com/pld-cvs-commit@lists.pld-linux.org/msg210998.html
Reviewed-by: Brian Paul <brianp@vmware.com>
Add the maximum base vertex offset to max_index for computing the
buffer size. Fixes a failed assertion in the u_upload_mgr.c code with
the VMware svga driver.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=48141
v2: incorporate Marek's suggestions.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Return 0 for features we don't support. Added debug_printf()
warnings when we fail to handle a new PIPE_CAP_x case. That will
alert us to interfaces changes in the future. We don't want to
just ignore new PIPE_CAPs and possibly miss something important.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Before, we weren't clamping the vertex colors produced by ARB vertex
programs. This could result in some rendering being too bright (in
ETQW, for example).
Also add cases for PIPE_CAP_VERTEX_COLOR_CLAMPED and
PIPE_CAP_FRAGMENT_COLOR_CLAMPED with comments to be complete.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
We already program all the sampler state correctly, we just didn't give
the GPU a pointer to it for the VS stage. Thus, any texturing other
than texelFetch() wouldn't work.
Fixes piglit test vs-textureLod-miplevels and 99 of oglconform's
glsl-bif-tex subtests.
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested with lp_test_arit with 100% passes and piglit tests with 100%
pass for log but some tests still fail for pow.
Signed-off-by: José Fonseca <jfonseca@vmware.com>
This reduces a little of CPU overhead.
The idea is to translate pipe vertex buffers directly into the CS
and not using any intermediate representations.
Framerate in Torcs:
before: 32.2
after: 34.6
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
st/mesa doesn't allow src_offset to be greater than stride and the maximum
stride r600 supports is 2047.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
llvm-3.1svn r153860 makes MCInstrInfo available to the MCInstPrinter.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Limits maximum loop iterations in a TGSI shader to prevent infinite
loops from occurring, any iteration in any loop counts towards this
limit
Signed-off-by: José Fonseca <jfonseca@vmware.com>
Fixes Coverity resource leak defects.
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Variables have types, expression trees have types, but statements don't.
Rather than have a nonsensical field that stays NULL in the base class,
just move it to where it makes sense.
Fix up a few places that lazily used ir_instruction even though they
actually knew the particular subclass.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously, set_callee() performed some assertions about the type of the
ir_call; protecting the bare pointer ensured these checks would be run.
However, ir_call no longer has a type, so the getter and setter methods
don't actually do anything useful. Remove them in favor of accessing
callee directly, as is done with most other fields in our IR.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Aside from ir_call, our IR is cleanly split into two classes:
- Statements (typeless; used for side effects, control flow)
- Values (deeply nestable, pure, typed expression trees)
Unfortunately, ir_call confused all this:
- For void functions, we placed ir_call directly in the instruction
stream, treating it as an untyped statement. Yet, it was a subclass
of ir_rvalue, and no other ir_rvalue could be used in this way.
- For functions with a return value, ir_call could be placed in
arbitrary expression trees. While this fit naturally with the source
language, it meant that expressions might not be pure, making it
difficult to transform and optimize them. To combat this, we always
emitted ir_call directly in the RHS of an ir_assignment, only using
a temporary variable in expression trees. Many passes relied on this
assumption; the acos and atan built-ins violated it.
This patch makes ir_call a statement (ir_instruction) rather than a
value (ir_rvalue). Non-void calls now take a ir_dereference of a
variable, and store the return value there---effectively a call and
assignment rolled into one. They cannot be embedded in expressions.
All expression trees are now pure, without exception.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Most of the time, we just want to read an ir_dereference, so there's no
need to have these in separate functions. However, the next patch will
want to read an ir_dereference_variable directly.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
When translating a call from AST to HIR, we need to decide whether it
can be evaluated to a constant before emitting any code (namely, the
temporary declaration, assignment, and call.)
Soon, ir_call will become a statement taking a dereference of where to
store the return value, rather than an rvalue to be used on the RHS of
an assignment. It will be more convenient to try evaluation before
creating a call. ir_function_signature seems like a reasonable place.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Currently, ir_call can be used as either a statement (for void
functions) or a value (for non-void functions). This is rather awkward,
as it's the only class that can be used in both forms.
A number of places use ir_call::get_error_instruction() to construct a
generic value of error_type. If ir_call is to become a statement, it
can no longer serve this purpose.
Unfortunately, none of our classes are particularly well suited for
this, and creating a new one would be rather aggrandizing. So, this
patch introduces ir_rvalue::error_value(), a static method that creates
an instance of the base class, ir_rvalue. This has the nice property
that you can't accidentally try and access uninitialized fields (as it
doesn't have any). The downside is that the base class is no longer
abstract.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
generate_call() and ast_function_expression::hir() both tried to verify
that 'out' and 'inout' parameters used l-values. Irritatingly, it
turned out that this was not redundant; both checks caught -some- cases.
This patch combines the two into a single "complete" function that does
all the parameter mode checking. It also adds a comment clarifying why
AST-level checking is necessary in the first place.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We used to have one big function, match_signature_by_name, which found
a matching signature, performed out-parameter conversions, and generated
the ir_call. As the code for matching against built-in functions became
more complicated, I split it internally, creating generate_call().
However, I left the same awkward interface. This patch splits it into
three functions:
1. match_signature_by_name()
This now takes a name, a list of parameters, the symbol table, and
returns an ir_function_signature. Simple and one purpose: matching.
2. no_matching_function_error()
Generate the "no matching function" error and list of prototypes.
This was complex enough that I felt it deserved its own function.
3. generate_call()
Do the out-parameter conversion and generate the ir_call. This
could probably use more splitting.
The caller now has a more natural workflow: find a matching signature,
then either generate an error or a call.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Function calls may have side effects that alter variables used inside
the loop. In the fragment shader, they may even terminate the shader.
This means our analysis about loop-constant or induction variables may
be completely wrong.
In general it's impossible to determine whether they actually do or not
(due to the halting problem), so we'd need to perform conservative
static analysis. For now, it's not worth the complexity: most functions
will be inlined, at which point we can unroll them successfully.
Fixes Piglit tests:
- shaders/glsl-fs-unroll-out-param
- shaders/glsl-fs-unroll-side-effect
NOTE: This is a candidate for release branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Certain applications don't call SwapBuffers before exiting. Yet, we'd
really like to see a bitmap containing the final rendered image even if
they choose never to present it.
In particular, Piglit tests (at least with -auto -fbo) fall into this
category. Many of them failed to dump any images at all.
Dumping one final image at context destruction time seems to work.
We may wish to pursue a more elegant solution later.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Fixes a Coverity resource leak defect.
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These can be used to implement EXT_texture_swizzle without baking
state-dependent swizzle instructions into the shader and forcing
recompiles.
For now, just set them to pass-through mode, so everything continues to
work as it did on Ivybridge. We can optimize this later.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
We only need one sample, since we don't support multisampling yet.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Apparently this needs to be the same as in 3DSTATE_WM.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Getting HiZ working means updating all the state packets for resolves
and clears. It's not worth doing until we get the basics working.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
For now, these all return 0, as I don't yet want to enable Haswell
support. Eventually they will be filled in with proper PCI IDs.
Also add an is_haswell field similar to is_g4x to make it easy to
distinguish Gen7 and Gen7.5.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
According to the BSpec ISA volume's "Accumulator Register" section:
"[DevIVB] SIMD16 execution on dwords is not allowed when accumulator is
explicit source or destination operand."
Fixes piglit tests:
- fs-multiply-const-ivec4
- fs-multiply-const-uvec4
- fs-multiply-ivec4-const
- fs-multiply-uvec4-const
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
This replaces the cryptic void* parameter with a union.
(based on union r600_query_result)
Users of this can still pass uint64* in it, but that cannot work for every
query type, obviously. Most importantly, the code now documents what should
be expected from get_query_result.
This also adds pipe_query_data_pipeline_statistics as per the D3D11 docs.
v2: fix indentation, add comments and use the doxygen style
Reviewed-by: Brian Paul <brianp@vmware.com>
This option allows targets to link against the LLVM shared library
instead of the static libs. With LLVM 2.9, his saves ~11 MB for each of
the r300 target libraries.
Pass a dri2_loader extension to the dri driver when gbm creates the dri
screen. The implementation jumps through pointers in the gbm device
so that an EGL on GBM implementation can provide the real implementations.
The idea here is to be able to create an egl window surface from a
gbm_surface. This avoids the need for the surfaceless extension and
lets the EGL platform handle buffer allocation, while keeping the user
in charge of somehow presenting the buffers (using kms page flipping,
for example).
gbm_surface_lock_front_buffer() locks a surface's front buffer and
returns a gbm bo representing it. This bo should later be returned
to the gbm surface using gbm_surface_release_buffer().
The function that counts the number of TGSI immediates also needs to
emit the immediates. This fixes assorted failures when using polygon
stipple with fragment shaders that have their own immediates.
NOTE: This is a candidate for the 8.0 branch.
They aren't winsys of their own,
just help dealing with them.
v2: add some more comments in vl_winsys.h
Signed-off-by: Christian König <deathsimple@vodafone.de>
"Use -no-undefined to assure libtool that the library has no unresolved
symbols at link time, so that libtool will build a shared library on
platforms that require that all symbols are resolved when the library is linked."
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
This is a regression introduced by commit cdcfd5, which forget to
increase the map_refcount for successfully-mapped region. Thus caused a
wrong non-blanced map_refcount.
This would fix the regression found in the two following webglc testcase
on Pineview platform:
texture-npot.html
gl-max-texture-dimensions.html
Cc: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
createDrawable may return NULL value, we should check it, or it will
make a segment failed.
[minor-indent-issue-fixed-by: Yuanhan Liu]
Signed-off-by: Wang YanQing <udknight@gmail.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
This extension just permits GL_UNPACK_ROW_LENGTH, GL_UNPACK_SKIP_ROWS
and GL_UNPACK_SKIP_PIXELS to be passed to glPixelStore on GLES2 so it
is trivial to implement.
Also fixes the usage of GL_IMPLEMENTATION_COLOR_READ_FORMAT_OES,
which may be set to a BGRA format e.g. for a MESA_FORMAT_ARGB8888 fb.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The extension is already exposed for GLES1, but the APIspec
doesnt allow the usage of GL_BGRA_EXT in glTex(Sub)Image2D.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Noticed this was missing when writing the "glapi: sort ARB extensions
by number" commit, which at least shows it was effective.
Reviewed-by: Brian Paul <brianp@vmware.com>
Noticed it was missing based on the lack of a descriptive enum
name from this bug's error message:
https://bugs.freedesktop.org/show_bug.cgi?id=44039
This moves two enums out of GL3x.xml. Though since this and
GL_ARB_texture_compression_rgtc are both strict subsets of GL3,
both extensions should have had all their enums in that file
to begin with, not just two of them.
Reviewed-by: Brian Paul <brianp@vmware.com>
And add comments to fill in for extensions that aren't there.
Noticed the comment about "ARB extensions sorted by extension number"
didn't extend to the <xi:include> directives when it became clear
GL_ARB_texture_rg was missing, going by the error message seen here:
https://bugs.freedesktop.org/show_bug.cgi?id=44039
This makes it easier to notice in the future if an extension is missing
when it shouldn't be.
Reviewed-by: Brian Paul <brianp@vmware.com>
A later error prints this properly, fix this case to do the same.
v2: remove attribute as per Ian's suggestion
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This adds the xml file covering ARB_blend_func_extended.
v2: fix SRC1_ALPHA
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This also seems like a bad idea. There were too many instances for me
to thoroughly scan the code as I did with the last two patches, but a
quick scan indicated that most callers newly allocate a variable,
dereference it, or NULL-check. In some cases, it wasn't clear that the
value would be non-NULL, but they didn't check for error_type either.
At any rate, not checking for this is a bug, and assertions will trigger
it earlier and more reliably than returning error_type.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The constructor currently returns a ir_dereference_variable of error
type when provided NULL, but that's about to change in the next commit.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Providing a NULL pointer to the ir_dereference_record() constructor
seems like a bad idea. Currently, if provided NULL, it returns a
partially constructed value of error type. However, none of the callers
are prepared to handle that scenario.
Code inspection shows that all callers do one of the following:
- Already NULL-check the argument prior to creating the dereference
- Already deference the argument (and thus would crash if it were NULL)
- Newly allocate the argument.
Thus, it should be safe to simply assert the value passed is not NULL.
This should also catch issues right away, rather than dying later.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Providing a NULL pointer to the ir_dereference_array() constructor seems
like a bad idea. Currently, if provided NULL, it returns a partially
constructed value of error type. However, none of the callers are
prepared to handle that scenario.
Code inspection shows that all callers do one of the following:
- Already NULL-check the argument prior to creating the dereference
- Already deference the argument (and thus would crash if it were NULL)
- Newly allocate the argument.
Thus, it should be safe to simply assert the value passed is not NULL.
This should also catch issues right away, rather than dying later.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
So if anything goes wrong we won't display a random image.
v2: flush before using the surface with the decoder.
Signed-off-by: Christian König <deathsimple@vodafone.de>
If you ran g-s in 16-bpp we'd do a bunch of memory corruption.
now it just misrenders for some other reasons.
applies to stable.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
ir_validate.cpp: In member function ‘virtual ir_visitor_status ir_validate::visit_leave(ir_swizzle*)’:
ir_validate.cpp:458:66: warning: narrowing conversion of ‘ir->ir_swizzle::mask.ir_swizzle_mask::x’ from ‘unsigned int’ to ‘int’ inside { } is ill-formed in C++11 [-Wnarrowing]
ir_validate.cpp:458:66: warning: narrowing conversion of ‘ir->ir_swizzle::mask.ir_swizzle_mask::y’ from ‘unsigned int’ to ‘int’ inside { } is ill-formed in C++11 [-Wnarrowing]
ir_validate.cpp:458:66: warning: narrowing conversion of ‘ir->ir_swizzle::mask.ir_swizzle_mask::z’ from ‘unsigned int’ to ‘int’ inside { } is ill-formed in C++11 [-Wnarrowing]
ir_validate.cpp:458:66: warning: narrowing conversion of ‘ir->ir_swizzle::mask.ir_swizzle_mask::w’ from ‘unsigned int’ to ‘int’ inside { } is ill-formed in C++11 [-Wnarrowing]
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
valgrind complained about an uninitialised value being used in
glsl_parser_extras.cpp, and this was the one it was giving out about.
Just initialise the value in the fakectx.
Signed-off-by: Dave Airlie <airlied@redhat.com>
for some reason when I configure --with-dri-drivers="" the src/mesa/drivers/dri
Makefile tries to call the am--refresh target in the toplevel Makefile,
we don't have one, and I'm not sure what it should look like.
This makes things continue on.
Signed-off-by: Dave Airlie <airlied@redhat.com>
piglit glx-tfp segfaults on llvmpipe when run vs a 16-bit radeon screen,
it now fails instead of segfaulting, much prettier.
Signed-off-by: Dave Airlie <airlied@redhat.com>
When a GL LD_PRELOAD library like apitrace was used,
glXGetProcAddress() would return the preload's symbols instead of
libGL's symbol, leading to infinite recursion when the returned
function was called. This didn't hit apitrace on most apps because
who calls glXGetProcAddress() on the global functions.
The -Bsymbolic, which was present in mklib before automake conversion,
causes the glxcmds.c:GLX_functions table to be resolved at link time,
so that LD_PRELOADs don't affect it any more.
Fixes crashes when running wine under apitrace.
Tested-by: Matt Turner <mattst88@gmail.com>
Tested-by: Marek Olšák <maraeo@gmail.com>
The default was 32 for the EmitNoLoops=0 case. This allows the oZone3D
soft shadows test to work properly with the vmware driver. Jose reported
that SM3 supports up to 255 loop iterations.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Instead of the hard-coded value of 32. Note that MaxUnrollIterations
defaults to 32 so there's no net change. But the gallium state tracker
can override this.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Improves Unigine Tropics performance at 1024x768 by 2.06236% +/-
0.50272% (n=11).
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Unigine Tropics uses INVALIDATE_BUFFER and not UNSYNCHRONIZED to reset
the buffer object when its streaming wraps. Don't penalize it by
flushing the batch at the wrap point, just allocate a new BO and get
to using it.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Use -no-undefined to assure libtool that the library has no unresolved
symbols at link time, so that libtool will build a shared library on
platforms that require that all symbols are resolved when the library
is linked.
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
The force-enable option is dropped, now that the hardware we were
concerned about has HiZ on by default. Now, instead of doing
INTEL_HIZ=0 to test disabling hiz, you can set hiz=false.
v2: Disable separate stencil on gen6 when HIZ is turned off.
(previously, this had to be done manually in addition).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
This was a debug option during gen6 transform feedback bringup (and a
similar one existed during gen4 bringup). However, it looks like
we're done with that, and we don't anticipate it being used again,
either for geometry shaders or transform feedback.
Suggested by: Kenneth Graunke <kenneth@whitecape.org>
This was added in the i915/i965 merge from the i915 driver, but I
don't recall it ever being used since then.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If you want to test the graphics driver, you want to test it under the
conditions that users will see, not some set of additional fallbacks.
If you want to test swrast, run the swrast driver (or no_rast=true)
instead.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
To avoid redundancies, this patch also removes .deps, .libs, and *.la
from .gitignore files in subdirectories.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
As Eric pointed out, we know the cube faces are square at this point
so we only need to test the texture widths for consistency.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The max size was 16Kx16K so a 4 byte/pixel, six-sided cube would require
6 GBytes of memory. If mipmapped, 8 GB. Reduce the max size to 4K to
make the total size more reasonable.
Fixes a crash with the new piglit max-texture-size test.
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Per the spec, only nearest filtering is supported for integer textures.
Otherwise, the texture is incomplete.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Instead of gl_texture_object::_Complete there are now two fields:
_BaseComplete and _MipmapComplete. The former indicates whether the base
texture level is valid. The later indicates whether the whole mipmap is
valid.
With sampler objects, a single texture can appear to be both complete and
incomplete at the same time. See the GL_ARB_sampler_objects spec for more
details. To implement this we now check if the texture is complete with
respect to a sampler state.
Another benefit of this is we no longer need to invalidate a texture's
completeness state when we change the minification/magnification filters
with glTexParameter().
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Merge the mipmap level checking code that was separate cases for 1D,
2D, 3D and CUBE before.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Move the simple MaxLevel < BaseLevel test earlier to be closer to where
we error-check BaseLevel. Also, use the local baseLevel var in more places.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
To make the no-change case faster, as we do for the other object-reference
functions.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
We want to start emitting an INVALID_OPERATION from here for transform
feedback. Note that this forced dlist.c to almost not use this
function, since it wants different behavior during dlist compile.
Just pull the non-TF, non-GS test out for compile, because:
1) TF doesn't matter in that case because there's no drawing.
2) I don't think we're going to see GSes and display lists in the same
context, if we don't do GL_ARB_compatibility.
Reviewed-by: Brian Paul <brianp@vmware.com>
This fixes a build problem where EGL links to libgbm.la, which encodes
a relative path to it's libglapi.so dependency. The relative path
breaks when the linker tries to resolve it from src/egl/main instead
of src/gbm. Typically we silently fall back to the system
libglapi.so, which is wrong and breaks when there isn't one.
Morale of the story: don't mix mklib and libtool.
Although some hardware support NPOT cubemap, but it seems we don't know
the right layout for NPOT cubemap. Thus seems we need do fallback for
other platforms as well.
See comments inline the code for more detailed info.
v2: give a more detailed info about why we need fallback for other
platfroms as well.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=46666
NOTE: This is a candidate for stable release branches.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
If we failed to allocate a memory resource for the texture we'd crash
when we tried to map it. Now we propogate the NULL back up to the
texstore code and generate GL_OUT_OF_MEMORY.
Fixes a crash with the upcoming piglit max-texture-size test.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
From the GLSL 1.30 spec:
The discard keyword is only allowed within fragment shaders. It
can be used within a fragment shader to abandon the operation on
the current fragment. This keyword causes the fragment to be
discarded and no updates to any buffers will occur. Control flow
exits the shader, and subsequent implicit or explicit derivatives
are undefined when this control flow is non-uniform (meaning
different fragments within the primitive take different control
paths).
v2: Don't emit the final HALT if no other HALTs were emitted.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
By setting lod to 0 in the builtin function implementation, we avoid
needing to update all the visitors to ignore LOD in this case, when
the hardware drivers actually want to ask for LOD 0 for rectangular
textures.
Fixes piglit spec/GLSL-1.40/textureSize-*Rect.
v2: Change style of looking for substrings.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is the one builtin function claimed to be dropped due to the
ARB_compatibility split.
Fixes piglit spec/GLSL-1.40/compiler/ftransform.vert
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This makes the process slightly more debuggable, though it would be
nice if the build just failed immediately instead.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Mostly this is a matter of removing variables that have been moved to
the compatibility profile. There's one addition: gl_InstanceID is
present in the core now.
This fixes the new piglit tests for GLSL 1.40 builtin variables.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
llvm-3.1svn r152620 refactored the OProfile profiling code.
createOProfileJITEventListener was moved from the llvm namespace to the
llvm::JITEventListener namespace.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
This avoids extra if statements in the common case of just comparing
two expressions that don't involve assignments or function calls,
along with simplifying the handling of constant expressions. Reduces
i965 instructions generated in unigine tropics and sanctuary,
yofrankie, warsow, gstreamer shaders, and the weston compositor.
shader-db results:
Total instructions: 213052 -> 212752
38/1246 programs affected (3.0%)
14309 -> 14009 instructions in affected programs (2.1% reduction)
The error was removed in:
commit 719909698c
Author: Ian Romanick <ian.d.romanick@intel.com>
Date: Tue Oct 18 16:01:49 2011 -0700
mesa: Rewrite the way uniforms are tracked and handled
The GL_ARB_robustness spec doesn't say the implementation
should truncate the output, so just return after setting
the required error like it did before the above commit.
Also fixup an old comment and add an assert.
NOTE: This is a candidate for the 8.0 branch.
Handle the special case of glFramebufferTextureLayer() for which we pass
teximage = 0 internally in framebuffer_texture(). This patch makes failing
piglit test fbo-array, fbo-depth-array to pass.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=47126
V4: Removed the duplicated code.
Note: This is a candidate for the stable branches.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This will replace the soon-to-be-removed _DD_NEW_SEPARATE_SPECULAR flag.
Note: there's a similar composite _MESA_NEW_NEED_EYE_COORDS flag set already.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Just use the corresponding _NEW_x flags intead. The _DD_NEW_x flags
will be removed in a following patch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
The computed stencil.clear and depth.clear values aren't used anywhere.
Those fields have been removed too.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Set the close on exec flag when opening dri character devices, so they
will be closed and free any resouces allocated in exec.
Signed-off-by: David Fries <David@Fries.net>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This issue might recur on other OSes. If so then it might be better
to remove the C-preprocessor magic, and use fully qualified defines
instead.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This state is needed for deciding whether or not to log
application messages with IDs that haven't been specifically
passed to glDebugMessageControlARB yet.
State for each individual ID number ever passed to
glDebugMessageControlARB (per-context) still needs to be added.
Unfortunately, Unigine Heaven 3.0 still needs this.
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
min_index/max_index are merely conservative guesses, so we can't
make buffer overflow detection based on their values.
Tested-by: Jakob Bornecrantz <jakob@vmware.com>
There are several cases in which we need to explicity "rebase" colors
(ex: set G=B=0) when getting GL_LUMINANCE textures:
1. If the luminance texture is actually stored as rgba
2. If getting a luminance texture, but returning rgba
3. If getting an rgba texture, but returning luminance
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=46679
Also fixes the new piglit getteximage-luminance test.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Based on a patch submitted by Vic Lee. The other part of his patch
which checked the fs pointer wasn't needed.
This fixes a crash when clear() is called before any VS or FS is set.
But this can only happen when the driver is used without the Mesa
state tracker.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
This gets xine working with VDPAU.
v2: some minor bugfixes.
v3: create the resource with the subsampled
format to avoid tilling problems
Signed-off-by: Christian König <deathsimple@vodafone.de>
These will be used by glReadPixels() and glGetTexImage() to fix issues
with reading GL_LUMINANCE and other formats.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Before, we were only counting top-level instructions. But if we have
an assignment of a giant expression tree (such as the ones eventually
generated by glsl-fs-unroll), we were counting the same as an
assignment of a variable deref.
glsl-fs-unroll-explosion now fails in a reasonable amount of time on
i965 because the unrolling didn't go ridiculously far.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I will use SX_MISC instead.
This reverts commit 734792e83f.
Conflicts:
src/gallium/drivers/r600/evergreen_hw_context.c
src/gallium/drivers/r600/evergreen_state.c
src/gallium/drivers/r600/r600_hw_context.c
src/gallium/drivers/r600/r600_pipe.h
draw module calls back into the driver and sets certain parts
of the state to whatever it needs, unfortunately unless you
get the ordering of calls to draw just right you'll end up
reseting your own driver state. That's what was happening to us
draw module would under certain conditions reset our own driver
state.
Reviewed-by: Brian Paul <brianp@vmware.com>
This fixes the libGLU.so.* build when a system libGL.so is not present
since it is relying on the lib/ to build against until it gets
converted to automake.
Tested-by: Stéphane Marchesin <marcheu@chromium.org>
Reviewed-by: Stéphane Marchesin <marcheu@chromium.org>
That is by making the dri extension variables static in gbm_dri.c.
The image_lookup_extension is provided by egl_dri2 when using x11 or wayland
platforms, when using the drm platform, gbm_dri has a wrapper for it.
Both use the same variables name image_lookup_extension.
Since -fvisibility=hidden was (probably by mistake) removed when converting to
automake, the "image_lookup_extension" symbol from egl_dri2.c became exported
in libEGL.so, so "image_lookup_extension" from gbm_dri.c was ignored.
This resulted in calling incorrect callbacks.
We cant make the image_lookup_extension static in egl_dri2.c right now,
since its used across multiple files.
Bugzilla: https://bugs.freedesktop.org/attachment.cgi?id=58099
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
If the texture is a 1D array, don't remove the border pixel from the
height. Similarly for 2D array textures and the depth direction.
Simplify the function by assuming the border is always one pixel.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This patch add the support of gl_PointCoord gl builtin variable for
platform gen4 and gen5(ILK).
Unlike gen6+, we don't have a hardware support of gl_PointCoord, means
hardware will not calculate the interpolation coefficient for you.
Instead, you should handle it yourself in sf shader stage.
But badly, gl_PointCoord is a FS instead of VS builtin variable, thus
it's not included in c.vue_map generated in VS stage. Thus the current
code doesn't aware of this attribute. And to handle it correctly, we
need add it to c.vue_map manually to let SF shader generate the needed
interpolation coefficient for FS shader. SF stage has it's own copy of
vue_map, thus I think it's safe to do it manually.
Since handling gl_PointCoord for gen4 and gen5 platforms is somehow a
little special, I added a lot of comments and hope I didn't overdo it ;)
v2: add a /* _NEW_BUFFERS */ comment to note the state flag dependency
and also add the _NEW_BUFFERS dirty mask (Eric).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=45975
Piglit: glsl-fs-pointcoord and fbo-gl_pointcoord
NOTE: This is a candidate for stable release branches.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
llvm-3.1svn r152043 changes createMCInstPrinter to take an additional
MCRegisterInfo argument.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
This add clipdistance support like the non-llvm draw paths,
if we have a clip distance we compare with it instead of doing
the dot4.
We also have to put the have_clipvertex bit into the emitted
vertex header.
Fixes vs-clip-distance-all-planes-enabled, vs-clip-distance-const-reject,
vs-clip-distance-enables, vs-clip-distance-implicitly-sized,
vs-clip-distance-in-param, vs-clip-distance-uint-index.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This fixes the rest of the piglit clipvertex tests.
v2: fixup comments.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
We incorrectly setup clipmask for gl_ClipVertex, this fixes the clipmask
setup.
v2: fix comment
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
fix comment
This is just a simple text file containing a list of goals for gallivm/llvmpipe
and some info on what is required to get there along with some info on who
is looking at things.
v2: add EXT_texture_array.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
_mesa_max_texture_levels() is also used to test valid texture target
in _mesa_GetTexLevelParameteriv(). GL_TEXTURE_CUBE_MAP is not allowed
as texture target in glGetTexLevelParameter(). So, this should throw
GL_INVALID_ENUM error.
Few other functions which use _mesa_max_texture_levels() like
getcompressedteximage_error_check() and getteximage_error_check()
also don't accept GL_TEXTURE_CUBE_MAP.
Above fix makes piglit fbo-cubemap test to fail. This is because of
incorrect texture target passed to _mesa_max_texture_levels() in
framebuffer_texture(). Fixing that as well
Note: This is a candidate for the stable branches
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
"Use -no-undefined to assure libtool that the library has no
unresolved symbols at link time, so that libtool will build a shared
library on platforms require that all symbols are resolved when the
library is linked."
If I had a dollar for every time I wrote this patch, I'd have about
$10 :-)
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
There's even a comment in the code containing the right swizzling
computations!
Previously this has not been noticed because we need to manually
enabled swizzling on snb/ivb (kernel 3.4 will do that) and we
don't use the separate stencil on ilk (where the bios enables
swizzling). This fixes
piglit ./bin/fbo-stencil readpixels GL_DEPTH32F_STENCIL8 -auto
on recent drm-intel-next kernels.
Also remove the comment about ivb, it's stale now.
Swizzling detection is done by allocating a temporary x-tiled
buffer object. Unfortunately kernels before v3.2 lie on snb/ivb
because they claim that swizzling is enable, but it isn't. The
kernel commit that fixes this for backport to pre-v3.2 is
commit acc83eb5a1e0ae7dbbf89ca2a1a943ade224bb84
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date: Mon Sep 12 20:49:16 2011 +0200
drm/i915: fix swizzling on gen6+
But if the kernel doesn't lie, this now works on swizzling and
not swizzling machines.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This replaces the previously used wl_display_destroy.
wl_display_destroy was povided by wayland-client.so and
wayland-server.so, to resolve that conflict its renamed client-side.
Otherwise streamout with rasterizer discard will make the kernel upset
if the state tracker doesn't set a depth-stencil-alpha state.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Unused by the current stack and APIs, therefore untestable.
It was used to facilitate the transition to integers.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
For polygons, we have been using face culling with success, but that doesn't
work for points and lines.
Setting the point size and line width to 0 fixes it.
Also improve it even more by setting SCREEN_SCISSOR to a zero area.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Implement it right using STRMOUT_CONFIG.RAST_STREAM. This fixes rasterizer
discard with points and lines.
This also adds another derived state. It's a combination of rasterizer discard
and streamout enable.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
We must use VPORT_SCISSOR, because that's the only one we can use for multiple
scissor rectangles in ARB_viewport_array.
R700 can use the VPORT_SCISSOR_ENABLE bit, but R600 doesn't have that and must
emit a 8192x8192 rectangle if scissor is disabled.
This commit also cleanups magic numbers in create_rs_state.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
VPORT_SCISSOR is the OpenGL scissor. How do I know? Because there are
16 of them just like GL4.1 has multiple scissor rectangles.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Also use XXX in the other ones, because it's the most used word for that
purpose in Mesa.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Timer queries should be able to measure the time spent in u_blitter as well.
Queries are split into two groups: the timer ones and the others (streamout,
occlusion), because we should only suspend non-timer queries for u_blitter,
and later if the non-timer queries are suspended, the context flush should
only suspend and resume the timer queries.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
And rename or inline functions where appropriate.
There is no reason to keep this stuff in r600_hw_context.c.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
The current code would ignore the point size specified by gl_PointSize
builtin variable in vertex shader on Pineview. This patch servers as
fixing that.
This patch fixes the following issues on Pineview:
webglc: https://cvs.khronos.org/svn/repos/registry/trunk/public/webgl/sdk/tests/conformance/rendering/point-size.html
piglit: glsl-vs-point-size
NOTE: This is a candidate for stable release branches.
v2: pick Eric's nice tip for fixing this issue in hardware rendering.
v3: the last arg of EMIT_ATTR specify the size in _byte_. (Eric)
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This fixes the egl_gallium.so driver build when no system libEGL.so is
present, since it's relying on the lib/ to build against until it gets
converted to automake.
We were looking at the size of batch.map for how big the batchbuffer
was, but on 865 we just use a single-page batchbuffer due to hardware
limits.
v2: Removed check for sizeof map < bo->size, since that's always false.
[change by anholt]
NOTE: This is a candidate for release branches.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=41495
In order to prevent an overflow of the batch buffer when emitting
triangles, we need to limit the initial primitive to fit within the
current batch. To do we need to measure the remaining space and thence
compute the maximum number of vertices that fit into that space.
Reported-by: Kurt Roeckx <kurt@roeckx.be>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=41495
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Eric Anholt <eric@anholt.net>
NOTE: This is a candidate for release branches.
The hardware, like i915, uses an inclusive bounds on min and max for
the drawing rectangle, but we were providing a number for exclusive.
The number of bits used by the hardware only covers this value going
up to the maximum size, so when we programmed 2048 as the maximum
inclusive X, it saw a maximum X of 0 and clipped all rendering. This
caused rendering failures in gnome-shell.
Fixes piglit fbo-maxsize.
v2: dropped changes to the blitter, which does use an exclusive x2, y2.
[change by anholt]
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=45558
Reviewed-by: Eric Anholt <eric@anholt.net>
NOTE: This is a candidate for release branches.
Michel pointed out that my assumption of a global
index namespace is incorrect and breaks r300g.
Signed-off-by: Christian König <deathsimple@vodafone.de>
This reverts commit d5a6c17254.
llvm-3.1svn r151687 makes MemoryObject accessor members const again.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
This should speed things up a bit, but also shows
some bugs with the kernel implementation.
v2: require xcb-dri2 version 1.8
Signed-off-by: Christian König <deathsimple@vodafone.de>
While the ARB_map_buffer_range extension spec says nothing about these
queries -- they were added in GL 3.0 --, it seems like this could be an
error in the extension spec. This is one of the extensions, like
ARB_framebuffer_object, that "back ports" OpenGL 3.0 functionality to
previous versions. These extensions are supposed to provide identical
functionality to OpenGL 3.0. The other cases of mismatches have been
determined to be bugs in the extension specs.
And tools like apitrace rely on such queries to function properly.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: José Fonseca <jfonseca@vmware.com>
Acked-by: Brian Paul <brianp@vmware.com>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
We currently don't support gl_PrimitiveID, and I believe asking the
hardware to generate it results in vertex cache invalidations.
This could result in slowdowns for applications that use gl_InstanceID,
which would be counter-productive. Just turn it off for now.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
visit(ir_variable *) sets dst_reg::writemask to the appropriate channel
for system values. Unfortunately, visit(ir_dereference_variable *) then
calls swizzle_for_size, which for a float, sets the swizzle to .x.
This works for gl_VertexID, since we store it in the .x component (see
brw_draw_upload.c:732 - VID), but fails for gl_InstanceID (IID) since we
store it in the .y channel.
To fix this, avoid calling swizzle_for_size on ir_var_system_values.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Originally ARB_draw_instanced only specified that ARB decorated name.
Since no vendor actually implemented that behavior and some apps use
the undecorated name, the extension now specifies that both names are
available.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
When you called them in a display list compile before, you would just
end up calling through NULL.
Fixes piglit GL_ARB_draw_instanced/dlist.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Kernels prior to 271d81b84171d84723357ae6d172ec16b0d8139c (March 2011)
don't support relocations outside of the target buffer object. Rather
than guarding this with a I915_PARAM_HAS_RELAXED_DELTA check, just
smash the bound to 0xfffff001 like we do on Ironlake.
This effectively gives us no upper bound check, just like we did prior
to commit 271d81b84171d84723357ae6d172ec16b0d8139c.
Daniel Vetter would also like to mention that this relies on the guard
page at the end of the GTT.
NOTE: This is a candidate for release branches.
Fixes a regression since 271d81b84171d84723357ae6d172ec16b0d8139c.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=46766
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The drivers/ walk-through-subdirs makefile is converted as well so I
didn't need to keep EGL_DRIVERS_DIRS along with the per-driver
HAVE_EGL_DRIVER_WHATEVER.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The default case code was set up in a separate way, while this makes
it more normal. I wanted to add code to the explicit x11 platform and
default x11 platform cases in the next commit.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
All users of the shine table outside of the tnl module
are gone. Move the implementation into the tnl module and
prefix the public functions with _tnl.
Reviewed-by: Alex Deucher <alexdeucher@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
Since the shine tables are now only used in the tnl lighting stage, where
they are validated through the tnl driver function NotifyMaterialChange
called in tnl/t_vb_light.c, we can not omit calling
_mesa_validate_all_lighting_tables (which only validates the shine tables)
in main/light.c.
Reviewed-by: Alex Deucher <alexdeucher@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
Use direct computation of pow for computing the shininess
in _tnl_RasterPos. Since the _tnl_RasterPos function is still
used by plenty drivers that do only need the shine table for
_tnl_RasterPos but do not make use of swtnl computations, this
enables pushing down the shine table computation and validation
into the tnl module, which will happen in a followup change.
Reviewed-by: Alex Deucher <alexdeucher@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
Since the shine tables are implicitly invalidated by having
a different shininess value than the current one, we can
omit the explicit invalidation of the shine table.
Reviewed-by: Alex Deucher <alexdeucher@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
This lets us use the resource_copy_region() path when blitting from
R8G8B8A8 to R8G8B8x8, for example.
v2: be smarter when src_format==dst_format
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Assertions of the form assert(a && b) should be written as separate assertions
so that you can actually tell which part is false when there's a failure.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Move structs, enums, etc so they're in more logical order. In particular,
the shader and transform feedback-related structs/enums were pretty
scattered around.
After biasing we need to clamp to be sure we don't exceed the number of
levels in the mipmap. This fixes an assertion at svga_sampler_view.c:70
v2: simplify the biasing, clamping code per Jose's suggestion.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
We need to allocate new space every time to avoid blocking on the last
HiZ op completing. There are two easy ways to do this:
brw_state_batch() and intel_upload_data(). brw_state_batch() is
simpler and avoids another buffer allocation.
Improves Unigine Tropics performance 0.376416% +/- 0.148722% (n=7).
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The ralloc string appending functions were originally intended for
simple, non-hot-path uses like printing to an info log.
Cuts Unigine Tropics load time by around 20% (6 seconds).
v2: Avoid strlen() on every newline, too.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> [v1]
Acked-by: José Fonseca <jfonseca@vmware.com> [v1]
Both callers of rewrite_tail immediately compute the new total string
length by adding the (known) length of the existing string plus the
length of the newly appended text. Unfortunately, callers generally
won't know the length of the new text, as it's printf-formatted.
Since ralloc already computes this length, it makes sense to add it in
and save the caller the effort. This simplifies both existing callers,
but more importantly, will allow for cheap-appending in the next commit.
v2: The link_uniforms code needs both the old and new length.
Apply the obvious fix (which sadly makes it less of a cleanup).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> [v1]
Acked-by: José Fonseca <jfonseca@vmware.com> [v1]
This adds support for all the opcodes needed for native integer
support with GLSL 1.20 enabled, and some of the ones for GLSL1.30
support.
I've split them between non-cpu and cpu along the same lines
Tom's code did for the other ones I think, but I'm open to review
on which ones should go where.
With instance ids fixed I get no regressions on my box here
with LLVM 2.8, will test with later LLVMs as well.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Backends usually advertise a SVGA3D_DEVCAP_MAX_POINT_SIZE between 63 and
256, so an hardcoded max point size of 80 is often incorrect.
This limitation does not apply for anti-aliased points (as they are done
via draw module) but we still advertise the same limit for both, because
all others pipe drivers do.
Reviewed-by: Brian Paul <brianp@vmware.com>
Mesa has a fast path for the generic fallback when using glReadPixels
for RGBA data which uses memcpy. However it was really difficult to
hit this case because it would not be used if any transferOps are
enabled. Any type apart from floating point or non-normalized integer
types (so any of the common types) would force enabling clamping so
the fast path could not be used. This patch makes it ignore clamping
when determining whether to use the fast path if the data type of the
buffer is an unsigned normalized type because in that case clamping
will not have any effect anyway.
https://bugs.freedesktop.org/show_bug.cgi?id=46631
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Brian Paul <brianp@vmware.com>
postpone unreferences until end of function, as the ones in use will
get naturally dereferenced.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Can't see any reason this wouldn't be better off as an inline.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Some backends may advertise more temps than SVGA3D_TEMPREG_MAX, but the
driver is hardwired to only support up to the value defined by
SVGA3D_TEMPREG_MAX, so clamp to it.
Reviewed-by: Brian Paul <brianp@vmware.com>
r600g is the only driver which has made use of it. The reason the CAP was
added was to fix some piglit tests when the GLSL pass lower_output_reads
didn't exist.
However, not removing output reads breaks the fallback for glClampColorARB,
which assumes outputs are not readable. The fix would be non-trivial
and my personal preference is to remove the CAP, considering that reading
outputs is uncommon and that we can now use lower_output_reads to fix
the issue that the CAP was supposed to workaround in the first place.
KILP instruction inside IF blocks were being lowered to an unconditional
KIL. Since r300 doesn't support branching, when the IF's were lowered
to conditional moves, the KIL would always be executed. This is not a
problem with the mesa state tracker, because the GLSL compiler handles
lowering IF's, but this bug was appearing in the VDPAU state tracker,
which does not use the GLSL compiler.
Note: This is a candidate for the stable branches.
The xvmc state tracker is completely seperate and
doesn't shares code or anything else with the
xorg state tracker.
Signed-off-by: Christian König <deathsimple@vodafone.de>
This patch allows the Mac OS X SCons build to complete. The assembly
sources contain psuedo-ops that not are supported on Mac OS X.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
We were inverting the meaning of the stencil op flags: in svga/d3d
the normal incr/decr wraps and the SAT ops clamp.
This fixes piglit failures (at least stencil-twoside and stencil-wrap).
We should backport this everywhere we can.
Reviewed-by: Brian Paul <brianp@vmware.com>
Two of the switch cases used PIPE_FORMAT_ tokens instead of SVGA3D_ tokens.
As it happens, the token values are equal for these formats so there's no
net change.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
We always mapped the query buffer in begin_query, causing stalls
if the buffer was busy.
This commit reworks it such that the query buffer is only mapped
in get_query_result as it's supposed to be.
The query buffer is no longer treated as a ring buffer. Instead, the results
are just appended and when the buffer is full, we create a new one. One query
can have more than one query buffer, though that's a very rare case.
Begin_query releases all query buffers.
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
From http://www.opengl.org/registry/specs/ARB/seamless_cube_map.txt:
Accepted by the <cap> parameter of Enable, Disable and IsEnabled,
and by the <pname> parameter of GetBooleanv, GetIntegerv, GetFloatv
and GetDoublev:
TEXTURE_CUBE_MAP_SEAMLESS 0x884F
This caused a change in enums.c, which is manually built from the .xml
files.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The linked list of memory allocations was not protected by a mutex.
This lead to sporadic failures with multi-threaded apps.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This fixes another case of faulting when freeing a pipe_sampler_view
that belongs to a previously destroyed context.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Basically, instead of immediately freeing deleted surfaces, hang onto
them in a cache to do quick re-allocation. This helps when surfaces
are frequently destroyed and then reallocated a bit later.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
There was a SVGA_HOST_SURFACE_CACHE_BYTES symbol, but it was never
used.
Now when we go to add a newly deleted surface to the cache we check
if the cache size would be exceeded. If so, try to free the least
recently "unused" surfaces until the cache is smaller. If we can't
do that, simply don't cache the newly deleted surface. The alternative
involves flushing and waiting and we don't want to do that.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Before, if shader translation failed for any reason we'd keep trying
to translate the shader over and over again during state validation.
The dummy fragment shader emits solid red so that might be visual
clue that translation is failing.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The assertion recently added in dst_register() was invalid because that
function is also (suprisingly) used to declare constant registers.
Move the assertion to the callers where we're really creating temp
registers and add some code to prevent emitting invalid temp register
indexes for release builds.
Also, update the comment for get_temp(). It didn't return -1 if it
ran out of registers and none of the callers checked for that.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
And assert on the register index in dst_register(). The dest can
only be an output or temp reg and there's more of the later.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Commit 980f6f1 (mesa: move gl_texture_image::Width/Height/DepthScale
fields to swrast) moved the initialization of the Width, Height, and
DepthScale fields to _swrast_alloc_texture_image_buffer(). However,
i915 doesn't call this function because it performs its own buffer
allocation. As a result, the Width, Height, and DepthScale fields
weren't getting initialized properly, and some operations requiring
swrast would fail.
This patch ensures that Width, Height, and DepthScale are properly
initialized by separating the code that sets them into a new function,
_swrast_init_texture_image(), which is called by
intel_alloc_texture_image_buffer() as well as
_swrast_alloc_texture_image_buffer(). It also moves the
initialization of _IsPowerOfTwo into this function.
Fixes piglit test fbo/fbo-cubemap on i915.
Partially fixes https://bugs.freedesktop.org/show_bug.cgi?id=41216
This is a candidate for the 8.0 branch.
Reviewed-and-tested-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
GBM needs the buffer format in order to communicate with DRM and clients
for things like scanout.
So track the DRI format requested in the various back ends and use it to
return the DRI format back to GBM when requested. GBM will then map
this into the GBM surface type (which is in turn based on the DRM fb
format list).
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
These were rotting in an internal branch, but contain nothing confidential,
and would be much more useful if kept up-to-date with latest gallium
interface changes.
Several authors including Keith Whitwell, Zack Rusin, and Brian Paul.
In the gen6 GS case, we were under-counting and so other state would
get smashed. In the VS case, we were over-counting, so everything was
fine.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Kenneth Graunke <kenneth@whitecape.org>
This was copy and paste from the VS where I had similar code. We're
only looking at things derived from BRW_NEW_VERTEX_PROGRAM in this
block.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Kenneth Graunke <kenneth@whitecape.org>
This is a state which is derived from other states and is actually the first
state which doesn't correspond to any gallium state.
There are two state flags:
bool occlusion_query_enabled
bool flush_depthstencil_enabled
Additional flags can be added later if needed, e.g. bool hiz_enabled.
The emit function will have to figure out the register values by itself.
It basically just emits the registers when the state changes.
This commit also adds a few helper functions for writing registers directly
into a command stream.
This is the first pure command buffer. It contains CS initialization
packets and emits invariant state (i.e. the registers which never or rarely
change).
The affected registers are removed from *_hw_context.c, so that both ways
of emitting commands can co-exist.
v2: emit context_control in cayman's start_cs too
Suggested by José.
We don't provide shader caching in CSO. Most of the time the api provides
object semantics for shaders anyway, and the cases where it doesn't
(eg mesa's internall-generated texenv programs), it will be up to
the state tracker to implement their own specialized caching.
Improves VS state change microbenchmark performance by 7.08729% +/-
1.22289% (n=10) on gen7, because we don't upload the 64 dwords of
unused binding table any more.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is a step toward making the samplers/binding tables reflect
sampler uniform mappings instead of embedding those in the programs.
No significant performance difference on the microbenchmark (n=10).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We always say no. Improves VS state change microbenchmark performance
7.68747% +/- 1.40826% (n=10).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
With this and the previous patch, 640x480 nexuiz is running 0.169118%
+/- 0.0863696% faster (n=121). On a VS state change microbenchmark,
performance is increased 8.28645% +/- 0.460478% (n=52).
v2: Fix CACHE_NEW_VS comment.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This reduces recomputation of state based on non-clipping-related
transform changes, and is a step toward removing VUE map
recomputation.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
For a 1D texture array, the border only applies to the width. For a 2D
texture array the border applies to the width and height but not the depth.
Sucha cases were not handled correctly in _mesa_init_teximage_fields().
Note: This is a candidate for stable branches
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Tracing function entry/exits is a bit pointless
when VDPAU_TRACE=1 does the same thing.
v2: use WARN instead of ERR for application problems
Signed-off-by: Christian König <deathsimple@vodafone.de>
Like TGSI_OPCODE_ARL, destination should be an integer.
This fixes invalid LLVM IR on an internal state tracker (currently Mesa
never emits this opcode).
In the future consider making ADDR register also a integer-as-float array,
like all other register kinds, or simply replace ADDR & ARR/ARL with
integer temp and instructions.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Fixes this GCC warning.
native_drm.c:153:1: warning: ‘drm_display_authenticate’ defined but not
used [-Wunused-function]
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Avoid setting dirty state flags when enabling or disabling a vertex
attribute arrays when there's no change.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
If you're resorting to the dummy shader, you've probably already turned
off SIMD16 mode. But if you didn't, it would die in a fire.
We could either fail to compile in SIMD16 mode...or just fix it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The dummy FB write failed to specify EOT and a message length, causing
the GPU to hang. Now we can enjoy "everyone's favorite color" again.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Fixes this GCC warning.
mask.c: In function ‘mask_layer_fill’:
mask.c:387:12: warning: variable ‘alpha_color’ set but not used
[-Wunused-but-set-variable]
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Fixes these GCC warnings.
glx_api.c: In function ‘choose_visual’:
glx_api.c:678:8: warning: variable ‘trans_value’ set but not used
[-Wunused-but-set-variable]
glx_api.c:677:8: warning: variable ‘trans_type’ set but not used
[-Wunused-but-set-variable]
glx_api.c:663:8: warning: variable ‘min_ci’ set but not used
[-Wunused-but-set-variable]
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Now that we have a index_range_invalid flag, we can just use that rather
than calling vbo_validated_drawrangeelements directly and returning.
NOTE: This is a candidate for release branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This failed to take basevertex into account:
If basevertex < 0:
(end + basevertex) might actually be in-bounds while 'end' is not.
We would have clamped in this case when we probably shouldn't.
This could break application drawing.
If basevertex > 0:
'end' might be in-bounds while (end + basevertex) might not.
We would have failed to clamp in this place. There's a comment
indicating the TNL module depends on max_index being in-bounds;
if so, it would likely break horribly.
Rather than trying to clamp correctly in the face of basevertex, simply
delete the clamping code and indicate that we don't have a valid range.
This causes _tnl_vbo_draw_prims to use vbo_get_minmax_indices() to
compute the actual bounds, which is much safer.
NOTE: This is a candidate for release branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The application supplied [start, end] range is merely a conservative
hint of the ranges of index values inside the index buffer. There is no
requirement that all vertices in the range [start, end] be referenced.
Passing an 'end' value larger than the maximum legal index is perfectly
acceptible; applications can legally pass 0xffffffff when they don't
have a tighter bound readily available.
Thus, the warning doesn't indicate a correctness issue; it could only
indicate a performance issue. However, it does not even do that.
glDrawRangeElements is designed to optimize non-VBO vertex data uploads
by providing an upper bound on the size of buffers a driver would need
to allocate. With VBOs, the data is already in an uploaded buffer, so
the range doesn't help.
The clincher is: we only know _MaxElement for VBOs. For user-space
arrays, we just set it to 2,000,000,000 (see mesa/main/varray.h:63.)
So we can only check this in the case where it is not useful.
Many applications, including the Unigine demos, currently trigger this
warning, which suggests the applications are buggy when they're actually
fine. Eliminating the warning should confuse users less while not
actually losing any benefit to application developers.
NOTE: This is a candidate for release branches.
Suggested-by: Jose Fonseca <jfonseca@vmware.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
There's a serious trap for drivers: RenderTexture() does not indicate
that the texture is currently bound to the draw buffer, despite
FinishRenderTexture() signaling that the texture is just now being
unbound from the draw buffer.
We were acting as if RenderTexture() *was* the start of rendering and
that we could make texturing incoherent with the current contents of
the renderbuffer. This caused intel oglconform sRGB
Mipmap.1D_textures to fail, because we got a call to TexImage() and
thus RenderTexture() on a texture bound to a framebuffer that wasn't
the draw buffer, so we skipped validating the new image into the
texture object used for rendering.
We can't (easily) make RenderTexture() indicate the start of drawing,
because both our driver and gallium are using it as the moment to set
up the renderbuffer wrapper used for things like MapRenderbuffer().
Instead, postpone the setup of the workaround render target miptree
until update_renderbuffer time, so that we no longer need to skip
validation of miptrees used as render targets. As a bonus, this
should make GL_NV_texture_barrier possible.
(This also fixes a regression in the gen4 small-mipmap rendering since
3b38b33c16, which switched
set_draw_offset from image->mt to irb->mt but didn't move the irb->mt
replacement up before set_draw_offset).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44961
NOTE: This is a candidate for the 8.0 branch.
Infer from the operand the type of value to store.
MOV is untyped but we use the float store path.
v2: make MOV use float store path.
I've had to squash merge the ARL fix to be stored
as an integer in here to avoid regressions in a number
of piglit tests.
From now on ARL stores to an integer just like HW does.
Signed-off-by: Dave Airlie <airlied@redhat.com>
The infers the type of data required using the opcode,
and casts the input to the appropriate type.
So far this only handles non-indirect constant and temporaries.
v2: as per Jose suggestion, fetch immediates via floats
Signed-off-by: Dave Airlie <airlied@redhat.com>
These are used inside the action handlers for the integer opcodes.
v2: use uint_bld/int_bld, drop higher level uint_bld.
Signed-off-by: Dave Airlie <airlied@redhat.com>
For now just pass the current context, but when we want to
store int or unsigned we need to pass those later.
Signed-off-by: Dave Airlie <airlied@redhat.com>
These two functions produce the src/dst types for an opcode.
MOV is special since it can be used to mov float->float and int->int,
so just return VOID.
v2: use a new enum for the opcode type as per Jose's suggestion.
Signed-off-by: Dave Airlie <airlied@redhat.com>
If the texture format is integer, the incoming user data must also be
integer (and similarly for non-integer textures).
NOTE: This is a candidate for the stable branches.
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
I recently discovered this text in the BSpec. It seems wise to comply,
though I haven't observed it to fix anything yet.
Fixes a regression in glean/fbo since 28cfa1fa21.
NOTE: This is a candidate for stable release branches.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=45221
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Fixes (with the previous commit) piglit GL_ARB_multisample/pushpop.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
In the table of of push/pop attributes, this one doesn't fall under
the enable group.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This fixes build errors like
In file included from glapi_dispatch.c:91:
../../../src/mapi/glapi/glapitemp.h:4641: error: no previous prototype for
'glDrawBuffersNV'
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Lucas Stach <dev@lynxeye.de>
Similar to the previous commit. Also fix incorrect setting of the
sampler view's state after it's created. We need to specify the
first/last_level fields in the template instead.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Rather than the one in st_texture_object. This sampler view really has
no connection to the one used for rendering.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
And remove needless & 0xff in _mesa_pack_uint_24_8_depth_stencil_row().
As suggested by José.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously, this function only handled 2D textures.
The fallback texture is used when we try to sample from an incomplete
texture object. GLSL says sampling an incomplete texture should return
(0,0,0,1).
v2: use a 1-texel texture image, per José.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Added in _mesa_pack_uint_24_8_depth_stencil_row(). This could be hit
by something like glDrawPixels(GL_DEPTH_STENCIL, GL_UNSIGNED_INT_24_8)
into a MESA_FORMAT_Z32_FLOAT_X24S8 buffer.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The st_renderbuffer_alloc_storage() function is used to allocate both
window-system buffers and user-created renderbuffers. The later kind
are never directly displayed so don't set PIPE_BIND_DISPLAY_TARGET for
those surfaces.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Commit dc7f449d1a introduced a new method
for avoiding MOVs: try to rewrite the destination of the instruction
that produced the RHS so it writes into the LHS.
Unfortunately, this is not safe for swizzled texturing operations, as
they return a set of four contiguous registers. Consider the following:
(assign (x)
(var_ref vec_ctor_x)
(swiz x (tex vec4 (var_ref m_sampY) (var_ref m_cordY) 0 1 ())))
In this case, the source and destination registers are equal, since
reg_offset is 0 for both. Yet, this is only a partial move: the texture
operation generates four registers, and the LHS only covers one.
Fixes color distortion in XBMC when using GLSL shaders.
NOTE: This is a candidate for the 8.0 branch (with the previous commit).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44333
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Certain instructions write more than one register. Texturing, for
example, returns 4 registers. (We set rlen to 4 even for TXS and float
shadow sampling.) Some math functions return 2. Most return 1.
The next commit introduces a use of this function.
NOTE: This is a candidate for the 8.0 branch (dependency of a fix).
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reallocate/resize decompress FBO only if texture image width/height is
greater than existing decompress FBO width/height.
This is a candidate for stable branches.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
A filter strength of zero or one doesn't make any
sense. Thanks to Andy Furniss for pointing this out.
Signed-off-by: Christian König <deathsimple@vodafone.de>
The virtual address but follow the alignment requirement of the
tiled surface. The bo from handle case is not properly fix. Need
bigger change for a proper fix. Work around that by enforcing 1M
alignment for those bo.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Commit 2e5a1a2 (intel: Convert from GLboolean to 'bool' from
stdbool.h.) converted the "specoffset" local variable (in
intel_tris.c) from a GLboolean to a bool. However, GLboolean was the
wrong type for specoffset--it should have been a GLuint (to match the
declaration of specoffset in struct intel_context).
This patch changes specoffset to the proper type.
Fixes piglit test general/two-sided-lighting-separate-specular.
This is a candidate for stable branches.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=45917
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It turns out the same messages work on gen7, we were just being paranoid.
Fixes the penumbra shadows mode of Lightsmark since the register
allocation fix.
NOTE: This is a candidate for release branches.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We just abort later, but at least this should result in more
informative bug reports.
NOTE: This is a candidate for release branches.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
r300g is able to sleep until a fence completes rather than busywait because
it creates a special buffer object and relocation that stays busy until the
CS containing the fence is finished.
Copy the idea into r600g, and use it to sleep if the user asked for an
infinite wait, falling back to busywaiting if the user provided a timeout.
Note: this is a candidate for the stable branches.
Signed-off-by: Simon Farnsworth <simon.farnsworth@onelan.co.uk>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This patch adds the pixel store operations in decompress_texture_image().
decompress_texture_image() is used in glGetTexImage() for compressed
textures with unsigned, normalized values.
It also fixes the failures in intel oglconform pxstore-gettex due to
following sub test cases:
- Test all mipmaps with byte swapping enabled
- Test all small mipmaps with all allowable alignment values
- Test subimage packing for all mipmap levels
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=40864
Note: This is a candidate for stable branches
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
X86Target is a variable, and therefore isn't defined at compile time. So
LLVM_NATIVE_ARCH == X86Target
is translated into
0 == 0
and since X86 is first, we always pick it.
Therefore we replace the logic with PIPE_ARCH_*.
https://bugs.freedesktop.org/show_bug.cgi?id=45420
Fixes a regression from commit 660ed923de.
The basic idea is to look at the format of the dest renderbuffer and
choose either GLubyte or GLfloat for colors. The previous code used
_mesa_format_to_type_and_comps() which could return a bunch types other
than ubyte/float.
Determine the datatype at renderbuffer mapping time to avoid frequent
calls to the format query functions.
NOTE: This is a candidate for the 8.0 branch.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=45578
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=45577
Fix build with llvm-3.1svn.
llvm-3.1svn r149918 changed BufferMemoryObject::getExtent and
BufferMemoryObject::readByte from const member functions to non-const
member functions in include/llvm/Support/MemoryObject.h.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Ironlake appears to check our pointer against the General State Base
Address upper bound, rather than ignoring the zero bound as it ought.
Unfortunately, since we leave GSBA set to zero, there is no logical
upper bound. Set it to the maximum possible value, which should work
since our virtual addresses only go up to 2GB.
+94 piglits.
NOTE: This is a candidate for stable release branches.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=28924
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Improves nexuiz performance 0.65% +/- .10% (n=5) on my gen6, and .39%
+/- .11% (n=10) on gen7. No statistically significant performance
difference on warsow (n=5, but only one shader has MADs).
v2: Add support for MADs in 16-wide by using compression control.
v3: Don't generate MADs when it will force an immediate to be moved to a temp.
(it's not clear whether this is a win or not, but it should result in less
questionable change to codegen compared to v2).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v2)
Our only instruction with a 3rd source so far was linterp, and that
value was never register-allocated.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
WGL_ARB_pixel_format establishes the existence of pixel formats which
are invisible to GDI.
However we still need to pass a valid pixelformat to GDI, so that
context creation/binding works.
The actual WGL_TYPE_RGBA_FLOAT_ARB implementation is from Brian Paul.
The mapping from TEXTURE_x_INDEX to GL_TEXTURE_x was broken in
alloc_proxy_textures() because the elements in the targets[] array
were in the wrong order.
This didn't actually cause any failures since we never really use the
proxy texture's Target field. But let's get it right.
NOTE: This is a candidate for the 8.0 branch.
Use the float tables instead. Pixel maps are seldom used so this
shouldn't be a big deal. Next, we can get rid of the gl_pixelmap::Map8
array.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
There's a mismatch in row strides for compressed textures between
what Driver.MapTextureImage() returns and what the software fetch-texel
functions use. Move it down a layer. The next step would be to fix
this in the fetch-texel functions.
Just use pow() instead. Spot lights aren't too common and fixed-function
lighting isn't as important as it used to me.
This saves 32KB per context. Each table was 4KB and there's 8 lights.
This is a shader based median filter, generally
used for noise reduction, it could still need some
improvements, but should usually work out of the box.
Signed-off-by: Christian König <deathsimple@vodafone.de>
The wm max threads is in the same dword as the dispatch enable. The
hardware gets super angry if you set max threads to 0, even if you
aren't dispatching threads.
Avoid unrollong loops that are either nested loops or
where the loop body times the unroll count is huge.
The change is far from being perfect but it extends the
loop unrolling decision heuristic by some additional
safeguard. In particular this cuts down compilation of
a shader precomputing atmospheric scattering integral
tables containing two nesting levels in a loop from
something way beyond some minutes (I never waited for
it to finish) to some fractions of a second.
This fixes piglit tests glsl-fs-unroll-explosion and
glsl-vs-unroll-explosion on r600g.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
width, height parameter in glTexImage2D() includes: texture image
width + 2 * border (if any). So when doing the texture size check
in _mesa_test_proxy_teximage() width and height should not exceed
maximum supported size for target texture type + 2 * border.
i.e. 1 << (ctx->Const.MaxTextureLevels - 1) + 2 * border
Texture border is anyway stripped out before it is given to intel
or gallium drivers.
This patch fixes Intel oglconform test case:
max_values negative.textureSize.textureCube
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44970
Note: This is a candidate for mesa 8.0 branch.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <idr@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
If we have no more enabled samplers and we've reset all the previously
used ones, no need to keep going around this loop.
(just moved some stuff around to clean it up a bit).
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
From what I can see we were taking the debug path all the time,
when we probably only want it for enable debug path.
Signed-off-by: Dave Airlie <airlied@redhat.com>
We don't want our VBOs mapped when we're drawing. This change checks
if the vertex store VBO is mapped before we execute a list, unmaps it,
then remaps it after drawing. This situation pops up when building a
nested display list in GL_COMPILE_AND_EXECUTE mode.
Reviewed-by: Eric Anholt <eric@anholt.net>
Something has gone wrong if swrast is requested but cannot be
loaded. The user really should be made aware of this, (and instructed
to set LIBGL_DEBUG for more details).
The wording of this error message is updated from "reverting to
indirect rendering" to the more objectively descriptive "failed to
load driver: swrast". The former wording makes assumptions about what
the calling code will decide to do next, rather than simply describing
what went wrong within the current function. The new wording is
consistent with the critical errors recently added for hardware
drivers that fail to load.
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Something has gone wrong if we were asked to load a driver of a
specific name, but it failed to load for some reason. The user really
should be made aware of this, (and instructed to set LIBGL_DEBUG for
more details).
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Sometimes an error is so sever that we want to print it even when the
user hasn't specifically requested debugging by setting LIBGL_DEBUG.
Add a CriticalErrorMessageF macro to be used for this case. (The error
message can still be slienced with the existing LIBGL_DEBUG=quiet).
For critical error messages we also direct the user to set the
LIBGL_DEBUG environment variable for more details.
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
The description of ErrorMessageF was misleading in the case of
LIBGL_DEBUG being unset, (the previous comment could be understood to
mean the error should be printed, but the code does not print in this
case).
InfoMessageF previously had no comment at all.
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
The build was broken by the line below, added in commit 4f82fed4.
s_expression.cpp:26: #include <limits>
Mesa's half of the fix is to add 'external/astl/include' to the include
path. The other half of the fix requires implementing
numeric_limits<float>::infinity() in astl, for which I have patches
submitted upstream for review.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Outputs should be treated in the same way as
inputs and temporaries here.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Alex Deucher <alexdeucher@gmail.com>
If we had no vertex textures or samplers previously and we have none now,
don't bother doing the enables dance.
I was profiling nexuiz on noop and noticed these two functions in the
profile, this drops their usage from 0.86% to 0.03% and 0.23% to 0.03%
for texture and samplers.
Signed-off-by: Dave Airlie <airlied@redhat.com>
We were doing saturate-based clamping on the [0,width] or [0,height]
coordinate, which meant only the first pixel was addressable.
Fixes piglit ARB_texture_rectangle/texwrap-RECT-bordercolor
NOTE: This is a candidate for the 8.0 release branch.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We should be able to merge self-move instruction into the MRF move
anyway, and this simplifies things for the next commit.
NOTE: This is a candidate for the 8.0 release branch.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The HiZ op was implemented as a meta-op. This patch reimplements it by
emitting a special HiZ batch. This fixes several known bugs, and likely
a lot of undiscovered ones too.
==== Why the HiZ meta-op needed to die ====
The HiZ op was implemented as a meta-op, which caused lots of trouble. All
other meta-ops occur as a result of some GL call (for example, glClear and
glGenerateMipmap), but the HiZ meta-op was special. It was called in
places that Mesa (in particular, the vbo and swrast modules) did not
expect---and were not prepared for---state changes to occur (for example:
glDraw; glCallList; within glBegin/End blocks; and within
swrast_prepare_render as a result of intel_miptree_map).
In an attempt to work around these unexpected state changes, I added two
hooks in i965:
- A hook for glDraw, located in brw_predraw_resolve_buffers (which is
called in the glDraw path). This hook detected if a predraw resolve
meta-op had occurred, and would hackishly repropagate some GL state
if necessary. This ensured that the meta-op state changes would not
intefere with the vbo module's subsequent execution of glDraw.
- A hook for glBegin, implemented by brwPrepareExecBegin. This hook
resolved all buffers before entering
a glBegin/End block, thus preventing an infinitely recurring call to
vbo_exec_FlushVertices. The vbo module calls vbo_exec_FlushVertices to
flush its vertex queue in response to GL state changes.
Unfortunately, these hooks were not sufficient. The meta-op state changes
still interacted badly with glPopAttrib (as discovered in bug 44927) and
with swrast rendering (as discovered by debugging gen6's swrast fallback
for glBitmap). I expect there are more undiscovered bugs. Rather than play
whack-a-mole in a minefield, the sane approach is to replace the HiZ
meta-op with something safer.
==== How it was killed ====
This patch consists of several logical components:
1. Rewrite the HiZ op by replacing function gen6_resolve_slice with
gen6_hiz_exec and gen7_hiz_exec. The new functions do not call
a meta-op, but instead manually construct and emit a batch to "draw"
the HiZ op's rectangle primitive. The new functions alter no GL
state.
2. Add fields to brw_context::hiz for the new HiZ op.
3. Emit a workaround flush when toggling 3DSTATE_VS.VsFunctionEnable.
4. Kill all dead HiZ code:
- the function gen6_resolve_slice
- the dirty flag BRW_NEW_HIZ
- the dead fields in brw_context::hiz
- the state packet manipulation triggered by the now removed
brw_context::hiz::op
- the meta-op workaround in brw_predraw_resolve_buffers (discussed
above)
- the meta-op workaround brwPrepareExecBegin (discussed above)
Note: This is a candidate for the 8.0 branch.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Paul Berry <stereotype441@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=43327
Reported-by: xunx.fang@intel.com
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44927
Reported-by: chao.a.chen@intel.com
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
If size is small (such as 1),
pitch = ROUND_DOWN_TO(MIN2(size, (1 << 15) - 1), 4);
makes pitch = 0. Then
height = size / pitch;
causes a division-by-zero exception. If pitch is zero, set height to
1 and avoid the division.
This fixes piglit's bin/getteximage-formats test and glean's
bufferObject test.
NOTE: This is a candidate for the 8.0 release branch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44971
There are cases where a buffer can be mapped while another buffer is
flushed. This can happen in the CopyPixels meta-op path for piglit's
fbo-mipmap-copypix. After some discussion with Eric, it seems this
assertion is no longer necessary, and it has always been too strict.
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=43328
Cc: Eric Anholt <eric@anholt.net>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
This was only used by glReadPixels and glDrawPixels. Now those
functions do the corresponding error checks.
Signed-off-by: Brian Paul <brianp@vmware.com>
Basically the same story as the previous commit. But we were
already calling _mesa_source_buffer_exists() in ReadPixels().
Yeah, we were calling it twice.
Signed-off-by: Brian Paul <brianp@vmware.com>
The _mesa_error_check_format_type() function does two things: check
that format/type is legal and check that the destination (or source
buffer for glReadPixels) actually exists. Just move the relevant
parts of that into _mesa_DrawPixels().
We'll do a similar change in glReadPixels then get rid of the function
altogether.
Signed-off-by: Brian Paul <brianp@vmware.com>
This replaces the _mesa_is_legal_format_and_type() function.
According to the spec, some invalid format/type combinations to
glDrawPixels, ReadPixels and glTexImage should generate
GL_INVALID_ENUM but others should generate GL_INVALID_OPERATION.
With the old function we didn't make that distinction and generated
GL_INVALID_ENUM errors instead of GL_INVALID_OPERATION. The new
function returns one of those errors or GL_NO_ERROR.
This will also let us remove some redundant format/type checks in
follow-on commit.
v2: add more checks for ARB_texture_rgb10_a2ui at the top of
_mesa_error_check_format_and_type() per Ian.
Signed-off-by: Brian Paul <brianp@vmware.com>
Tiled surface have all kind of alignment constraint that needs to
be met. Instead of having all this code duplicated btw ddx and
mesa use common code in libdrm_radeon this also ensure that both
ddx and mesa compute those alignment in the same way.
v2 fix evergreen
v3 fix compressed texture and workaround cube texture issue by
disabling 2D array mode for cubemap (need to check if r7xx and
newer are also affected by the issue)
v4 fix texture array
v5 fix evergreen and newer, split surface values computation from
mipmap tree generation so that we can get them directly from the
ddx
v6 final fix to evergreen tile split value
v7 fix mipmap offset to avoid to use random value, use color view
depth view to address different layer as hardware is doing some
magic rotation depending on the layer
v8 fix COLOR_VIEW on r6xx for linear array mode, use COLOR_VIEW on
evergreen, align bytes per pixel to a multiple of a dword
v9 fix handling of stencil on evergreen, half fix for compressed
texture
v10 fix evergreen compressed texture proper support for stencil
tile split. Fix stencil issue when array mode was clear by
the kernel, always program stencil bo. On evergreen depth
buffer bo need to be big enough to hold depth buffer + stencil
buffer as even with stencil disabled things get written there.
v11 rebase on top of mesa, fix pitch issue with 1d surface on evergreen,
old ddx overestimate those. Fix linear case when pitch*height < 64.
Fix r300g.
v12 Fix linear case when pitch*height < 64 for old path, adapt to
libdrm API change
v13 add libdrm check
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Refine 80aa78142d "dri: make sure to build libdricommon.la"
so we don't build libdricommon if we aren't building a dri driver which needs it (i.e.
if we are just building swrast)
In particular, this restores the ability to build the swrast dri driver without having to
have a xf86drm.h
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
in check_index_bounds the comparison needs to be "greater equal" since
contrary to the name _MaxElement is the count of the array (this matches
similar code in vbo_exec_DrawRangeElementsBaseVertex).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes the build of builtin_compiler on my 32-bit build where xcb-dri2
is in a custom prefix but the custom prefix flags weren't available.
It shouldn't have been in LIBS anyway.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This checks for advertised LLC support by the GPU instead of relying on
the GPU generation for detection.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Rely on libdrm HAS_LLC parameter to verify if hardware supports it. In
case the libdrm version does not supports this check, fallback to older
way of detecting it which assumed that GPUs newer than GEN6 have it.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Was previously being done in a state-tracker, but in a way which was
difficult for some drivers to optimize. Push down to this level and
make it the individual drivers responsibility.
FBOs differ from textures in a significant way. With textures, we can
strip the border and get correct rendering except when the application
fetches texels outside [0,1].
With an FBO, the pixel at (0,0) is in the border. The
ARB_framebuffer_object spec says:
"If the attached image is a texture image, then the window
coordinates (x[w], y[w]) correspond to the texel (i, j, k), from
figure 3.10 as follows:
i = (x[w] - b)
j = (y[w] - b)
k = (layer - b)
where <b> is the texture image's border width..."
Since the border doesn't exist, we can never render any pixels in the
correct location. Just mark these FBOs FRAMEBUFFER_UNSUPPORTED.
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=42336
Ever since xserver commit 531869448d07e00ae241120b59f3aaaa5709d59c,
the server no longer sends invalidate events to clients, unless they
have performed a GetBuffers request since the drawable was last
invalidated.
If the drawable gets invalidated immediately after the GetBuffers
request was processed by the X server, it's possible that Xlib
will process the invalidate event while waiting for the GetBuffers
reply. So the server, thinking the client knows that the buffers
are invalid, is waiting for another GetBuffers request before
sending any more invalidate events. The client, on the other hand,
believes the buffers to be valid, and thus is expecting to receive
another invalidate event before it has to send another GetBuffers
request. The end result is that the client never again sends
a GetBuffers request.
To avoid this problem, take a snapshot of the lastStamp before
doing GetBuffers, and retry if the snapshot and the current
lastStamp no longer match after the GetBuffers reply has been
processed.
Signed-off-by: Ville Syrjälä <syrjala@sci.fi>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The error message I chose matches gcc's error. Fixes piglit
switch-case-duplicated.vert.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Otherwise, the upcoming error messages said the location was 0:0(0).
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
It's not quite spelled out in the spec text, but the grammar indicates
that only constant values are allowed as switch() case labels (and
only constant values make sense, anyway).
Fixes piglit glsl-1.30/compiler/switch-statement/switch-case-uniform-int.vert.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This stuffs them all in a struct for sanity. Fixes piglit
glsl-1.30/execution/switch/fs-uniform-nested.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
For all the extension entrypoints using the get_buffer() helper, they
wanted the same error handling. In some cases, the error was doing
the same error return whether target was a bad enum, or a user buffer
wasn't bound.
(Actually, GL_ARB_map_buffer_range doesn't specify the error for a zero
buffer being bound for MapBufferRange, though it does for
FlushMappedBufferRange. This appears to be an oversight).
Fixes piglit GL_ARB_copy_buffer/negative-bound-zero.
Reviewed-by: Brian Paul <brianp@vmware.com>
Even though it should be safe to use them for one frame, better be sure.
Suggested by Michael Dänzer.
NOTE: This is a candidate for the 8.0 stable branch.
Signed-off-by: Lauri Kasanen <cand@gmx.com>
This prevents a possible lapse of the depth buffer - the situation where
the app and pp have different depth buffers.
NOTE: This is a candidate for the 8.0 stable branch.
Signed-off-by: Lauri Kasanen <cand@gmx.com>
In commit 6ecee54a9a a call to
talloc_reference was replaced with a call to talloc_steal. This was in
preparation for moving to ralloc which doesn't support reference
counting.
The justification for talloc_steal within token_list_append in that
commit is that the tokens are being copied already. But the copies are
shallow, so this does not work.
Fortunately, the lifetime of these tokens is easy to understand. A
token list for "replacements" is created and stored in a hash table
when a function-like macro is defined. This list will live until the
macro is #undefed (if ever).
Meanwhile, a shallow copy of the list is created when the macro is
used and the list expanded. This copy is short-lived, so is unsuitable
as a new parent.
So we can just let the original, longer-lived owner continue to own
the underlying objects and things will work.
This fixes bug #45082:
"ralloc.c:78: get_header: Assertion `info->canary == 0x5A1106'
failed." when using a macro in GLSL
https://bugs.freedesktop.org/show_bug.cgi?id=45082
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
NOTE: This is a candidate for stable release branches.
This test cases exposes a bug as described in this bug report:
"ralloc.c:78: get_header: Assertion `info->canary == 0x5A1106'
failed." when using a macro in GLSL
https://bugs.freedesktop.org/show_bug.cgi?id=45082
Clearly, some memory is getting (incorrectly) freed on the first macro
invocation, leading to problems with the second macro invocation.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The trick here is that flex always chooses the rule that matches the most
text. So with a input text of "two:" which we want to be lexed as an
IDENTIFIER token "two" followed by an OTHER token ":" the previous OTHER
rule would match longer as a single token of "two:" which we don't want.
We prevent this by forcing the OTHER pattern to never match any
characters that appear in other constructs, (no letters, numbers, #,
_, whitespace, nor any punctuation that appear in CPP operators).
Fixes bug #44764:
GLSL preprocessor doesn't replace defines ending with ":"
https://bugs.freedesktop.org/show_bug.cgi?id=44764
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
NOTE: This is a candidate for stable release branches.
GL_RG_INTEGER only has two components, not three. I'll be surprised
if anyone ever tries to glReadPixels(..., GL_SHORT, GL_RG_INTEGER,
...). This was found by inspection.
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
With 0963990 the flag was only set when Bind created the object. In
all cases where ::ARBsemantics could be true, this path never
happened. Instead, add a _Used flag to track whether a VAO has ever
been bound. On the first Bind, set the _Used flag, and set the
ARBsemantics flag to the correct value.
NOTE: This is a candidate for release branches.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=45423
This is a hack, and it will result in incorrect rendering. However,
it does eliminate spurious warnings in several piglit CopyPixels tests
that involve floating-point depth buffers.
The real solution is to add a zf field to SWspan to store float Z
values. When a float depth buffer is involved, swrast should also
populate the zf field. I'll consider this post-8.0 work.
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Now that the draw module avoids flushing, it may flush precisely when
binding a NULL shader, so care must be taken when restoring the original
fragment shader.
Reviewed-by: Brian Paul <brianp@vmware.com>
When GLAPIENTRY is __stdcall (ie Windows), the stack is popped by the
callee making the number/type of arguments significant, therefore
using a generic no-op causes stack corruption for many entry-points.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
width, height parameter in glTexImage2D() includes: texture image
width + 2 * border (if any). So when doing the texture size check
in _mesa_test_proxy_teximage() width and height should not exceed
maximum supported size for target texture type.
i.e. 1 << (ctx->Const.MaxTextureLevels - 1)
Texture border is anyway stripped out before it is given to intel
or gallium drivers.
This patch fixes Intel oglconform test case: max_values
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44970
Note: This is a candidate for mesa 8.0 branch.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <idr@freedesktop.org>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
In certain situations API's will call pipe->clear which doesn't
require fragment shader, but then we'd try to verify the pipeline
and assume fragment shader was always set. This was leading to
crash when API would just call simple clear's before anything else.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
The node_attrsz[] array is initially copied from the node->attrsz[]
array but some values get rewritten. Thereafter, we need to use the
node_attrsz[] values.
Fixes a bug when replaying a display list that uses generic vertex
array[16] (at least).
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The warnings were:
nv50_pc_regalloc.c: In function ‘pass_generate_phi_movs’:
nv50_pc_regalloc.c:423:41: warning: array subscript is above array bounds
codegen/nv50_ir_peephole.cpp: In member function ‘bool nv50_ir::MemoryOpt::replaceStFromSt(nv50_ir::Instruction*, nv50_ir::MemoryOpt::Record*)’:
codegen/nv50_ir_peephole.cpp:1475:18: warning: array subscript is above array bounds
codegen/nv50_ir_peephole.cpp:1475:18: warning: array subscript is above array bounds
codegen/nv50_ir_peephole.cpp:1475:18: warning: array subscript is above array bounds
codegen/nv50_ir_peephole.cpp:1475:18: warning: array subscript is above array bounds
And add some assertions to catch this sooner in debug builds.
This fixes a dangling texture object pointer bug hit via wglShareLists().
When we push the GL_TEXTURE_BIT state we may push references to the default
texture objects which are owned by the gl_shared_state object. We don't
want to accidentally delete that shared state while the attribute stack
references shared objects. So keep a reference to it.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
This cleans up the reference counting of shared context state.
The next patch will use this to fix an actual bug.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
This also significantly improves the RV670 flush by using the CB1 flush
*always* and also DEST_BASE_0_ENA, which appears to magically fix some tests.
I am not entirely sure, but it's possible that RV670 flushing is fixed
completely.
v2: fix cayman by flushing texture cache instead of vertex cache
Thanks to Dave Airlie for testing Cayman.
Commit 99476561 (automake: src/glsl and src/glsl/glcpp) changed the
build system so that src/glsl/glsl_test is not built by default. This
inadvertently broke "make check", since the tests in
src/glsl/tests/lower_jumps (which are run by "make check") rely on
glsl_test.
This patch ensures that "make check" builds glsl_test before running
any tests.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes these GCC warnings.
osmesa.c: In function ‘osmesa_renderbuffer_storage’:
osmesa.c:417: warning: comparison is always false due to limited range of data type
osmesa.c:423: warning: comparison is always false due to limited range of data type
osmesa.c:431: warning: comparison is always false due to limited range of data type
osmesa.c:437: warning: comparison is always false due to limited range of data type
osmesa.c:447: warning: comparison is always false due to limited range of data type
osmesa.c:453: warning: comparison is always false due to limited range of data type
osmesa.c:463: warning: comparison is always false due to limited range of data type
osmesa.c:466: warning: comparison is always false due to limited range of data type
osmesa.c:476: warning: comparison is always false due to limited range of data type
osmesa.c:479: warning: comparison is always false due to limited range of data type
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Signed-off-by: Brian Paul <brianp@vmware.com>
Success was (tests-passed AND valgrind-tests-passed) but this meant that
if the valgrind tests weren't run it would be considered a failure.
The logic is now (tests-passed AND (!valgrind OR valgrind-tests-passed))
which lets us return success if the valgrind tests aren't run.
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Matt Turner <mattst88@gmail.com>
Needed for automake. Using AC_PROG_PATH(bison/flex) causes automake to
fail to build .y and .l files.
It is up to the builder to use bison/flex instead of yacc/lex.
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Matt Turner <mattst88@gmail.com>
Exporting a publicly visible class with a generic name like
"variable_entry" via ir_variable_refcount.h is kind of mean.
Many IR transformers would like to define their own "variable_entry"
class. If they accidentally include this header, the compiler/linker
may get confused and try to instantiate the wrong variable_entry class,
leading to bizarre runtime crashes.
The hope is that renaming this one will allow .cpp files to safely
declare and use their own file-scope "variable_entry" classes.
This avoids crashes caused by converting src/glsl to automake.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-and-tested-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This uses point size clamping to force point size to a particular value,
making the vertex shader output irrelevant.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
We don't set the other bits anywhere else except the other DSA states,
which are mutually-exclusive with this one.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
This fixes the gl_PointSize transform feedback test.
Point size clamping should happen at the rasterizer stage,
i.e. after the vertex and geometry shaders and transform feedback.
Drivers are expected to do this by themselves.
Simplifies the general case code in the ubyte-valued texture format
functions. More consolidation to come in subsequent commits.
Reviewed-by: Eric Anholt <eric@anholt.net>
Specifially, this being present works around a bug in Unigine
Sanctuary on i965 which previously resulted in bad rendering.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This can be used to work around broken application behavior, like in
Unigine where it attempts to use texture arrays without declaring
either "#extension GL_EXT_texture_array : enable" or "#version 130".
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
While typing out the new decode, I added a fallback mode for dumping
when we fail to re-map the BO after execution. This should get us a
minimal dump when trying to dump a batch that results in a GPU hang.
We were allocating registers into the MRF hack region, resulting in
sparkly renering in a few of the scenes. We could do better
allocation by making an MRF class, having MRFs conflict with the
corresponding GRFs, and tracking the live intervals of the "MRF"s and
setting up the conflicts. But this is way easier for the moment.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
After the removal of the dri driver link test, this should help avoid
the original problem that it was designed to catch: The warning about
a missing prototype due to typoing a function name scrolling by in the
Mesa build spew, and you not noticing until you try to run an
application and it falls back to swrast.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The envvar works for R100 and R200 too, and the classic R300 driver
doesn't even exist anymore.
"RADEON_NO_TCL" is already mentioned in the code and is the same envvar
used for the R300g driver.
lp_bld_tgsi_soa.c has been adapted to use this new interface, but
lp_bld_tgsi_aos.c has only been partially adapted, since nothing in
gallium currently uses it.
v2:
- Rename lp_bld_tgsi_action.[ch] => lp_bld_tgsi_action.[ch]
- Initialize tgsi_info in lp_bld_tgsi_aos.c
- Fix copyright dates
Prior commit 576161289d,
the parameter format was bpp, thus both 24bit and 32bit formats were
requested with format set to 32. Handle 24bit seperately now.
Fixes RGBX formats in wayland platform for egl_dri2 (EGL_ALPHA_SIZE=0).
Note: This is a candidate for the 8.0 branch.
This just copies what the LUMINANCE_ALPHA bits do.
Fixes piglit tests on softpipe complaining about missing unpack.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Cayman needs some of the MUL instructions spread across a full slot
of vectors.
It also no longer has RECIP_UINT, the recommendation is to replace it
with a U2F + RECIP_IEEE + MUL + F2U.
Signed-off-by: Dave Airlie <airlied@redhat.com>
The warning is absolutely useless. It doesn't actually say that there are
uninitialized variables. It points out the fact that there are missing
initializers and that variables are initialized to zero implicitly, which is
exactly what we want and what we commonly make use of.
C90 and C99 require all unspecified variables in the initializer list to be set
to zero.
The check for ctx->API was unnecessary, because OES extensions are not exposed
in desktop GL.
Also require renderbuffer support for ARB_texture_rgb10_a2ui,
as per the spec.
Tested by comparing old and new glxinfo with softpipe and r600g.
v2: fix bugs
v3: rename need_only_one -> need_at_least_one
rename num_elements -> num_mappings
add comments
use const when appropriate
Reviewed-by: Brian Paul <brianp@vmware.com>
This change is not exactly equivalent (sometimes we checked for non-zero,
sometimes if >0 or >1), but the behavior shouldn't change, because all drivers
report 0 for unsupported CAPs.
Exposing CAP_STREAM_OUTPUT_PAUSE_RESUME without CAP_MAX_STREAM_OUTPUT_BUFFERS
is a driver bug and st/mesa does no checking if the latter is supported as
well. Drivers must report CAPs consistently.
v2: make the array const
v2: handle the cap in r300 and r600 as well
Additional info for r600g:
The env var R600_GLSL130=1 enables GLSL 1.3.
Along with R600_STREAMOUT=1, it enables full GL 3.
Fix an access to uninitialized memory pointed out by valgrind in
glsl_to_tgsi_visitor::simplify_cmp(void).
Note: This is a candidate for the 8.0 branch.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Fix this GCC warning.
draw_pipe_clip.c: In function ‘interp’:
draw_pipe_clip.c:122:13: warning: variable ‘clip_dist’ set but not used
[-Wunused-but-set-variable]
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
When rendering to FBO, rendering is inverted. At the same time, we would
also make sure the point sprite origin is inverted. Or, we will get an
inverted result correspoinding to rendering to the default winsys FBO.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44613
NOTE: This is a candidate for stable release branches.
v2: add the simliar logic to ivb, too (comments from Ian)
simplify the logic operation (comments from Brian)
v3: pick a better comment from Eric
use != for the logic instead of ^ (comments from Ian)
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This simplifies the code quite a bit, consolidates some cases and
possibly catches more cases for the memcpy path.
More such changes will follow. Do just a few at a time to help bisect
any possible regressions.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This will let us use memcpy in more situations. We can also remove
the checks for byte spapping that happen before the calls to
_mesa_format_matches_format_and_type().
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
In a recent commit,
commit 1c0f1dd42a
Author: Chad Versace <chad.versace@linux.intel.com>
swrast: Fix fixed-function fragment processing
I defined a new function,_swrast_fragment_program, but neglected
to #include s_fragprog.h for clients of that function.
Note: This is a candidate for the 8.0 branch.
Reported-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
The evergreen+ CB no longer supports the following formats
compared to 6xx/7xx:
- COLOR_4_4
- COLOR_3_3_2
- COLOR_6_5_5
- COLOR_8_24_FLOAT
- COLOR_24_8_FLOAT
- COLOR_11_11_10
- COLOR_11_11_10_FLOAT
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
On i965, _mesa_ir_link_shader is never called. As a consequence, the
current fragment program (ctx->FragmentProgram->_Current) exists but is
invalid because it has no instructions. Yet swrast continued to attempt to
use the empty program.
To avoid using the empty program, this patch 1) defines a new function,
_swrast_use_fragment_program, which checks if the current fragment program
exists and differs from the fixed function fragment program, and, when
appropriate, 2) replaces checks of the form
if (ctx->FragmentProgram->_Current == NULL)
with
if (_swrast_use_fragment_program(ctx))
Fixes the following oglconform regressions on i965/gen6:
api-fogcoord(basic.allCases.log)
api-mtexcoord(basic.allCases.log)
api-seccolor(basic.allCases.log)
api-texcoord(basic.allCases.log)
blend-separate(basic.allCases)
colorsum(basic.allCases.log)
The tests were ran with the GLXFBConfig:
visual x bf lv rg d st colorbuffer sr ax dp st accumbuffer ms cav
id dep cl sp sz l ci b ro r g b a F gb bf th cl r g b a ns b eat
----------------------------------------------------------------------------
0x021 24 tc 0 32 0 r y . 8 8 8 8 . . 0 24 8 0 0 0 0 0 0 None
(Note: I originally believed that the hunk in
_swrast_update_fragment_program was unnecessary. But it is required to fix
blend-separate.)
Note: This is a candidate for the 8.0 branch.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=43327
Reveiwed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <idr@freedesktop.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Color clamping should be enabled in glGetTexImage if texture dataType is
GL_UNSIGNED_NORMALIZED and format is GL_LUMINANCE or GL_LUMINANCE_ALPHA
Fixes 2 Intel oglconform test cases: pxconv-gettex and pxtrans-gettex
https://bugs.freedesktop.org/show_bug.cgi?id=40864
NOTE: This is a candidate for the 8.0 branch
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This was losing bits of precision. Fixes (with the previous commits):
piglit EXT_texture_integer/getteximage-clamping
piglit EXT_texture_integer/getteximage-clamping GL_ARB_texture_rg
oglc advanced.mipmap.upload
Regresses oglc negative.typeFormatMismatch.teximage from fail to
abort, because it's been hitting texstore for a format/type combo that
shouldn't happen.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
In the core, we always treat spans of int/uint data as uint, so this
extract function was truncating storage of integer pixel data to a n
int texture to (0, max_int) instead of (min_int, max_int). There is
probably missing code for handling truncation on conversion between
pixel formats, still, but this does improve things.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
Mostly fixes piglit EXT_texture_integer/getteximage-clamping. The
remaining failure involves precision loss on storing of int32 texture
data (something I knew was an issue, but wasn't trying to test).
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
This cut and paste is pretty awful. I'm tempted to do a lot of this
using preprocessor tricks for customizing the parameter type from a
template function, but that's just a different sort of hideous.
Fixes 8 Intel oglconform int-textures cases.
NOTE: This is a candidate for the 8.0 branch.
v2: Add alpha formats, too.
Reviewed-by: Brian Paul <brianp@vmware.com>
Otherwise, when you asked for the _BaseFormat of an rb wrapping a
GL_RGB texture, you got GL_RGBA because that's what we were storing
the texture data as.
NOTE: This is a candidate for the 8.0 branch.
Most of this function was just calling
intel_renderbuffer_update_wrapper(), which was called immediately
afterwards in the only caller.
NOTE: This is a candidate for the 8.0 branch.
Fixes piglit ARB_copy_buffer-overlap, on swrast, which previously
assertion failed.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
A pure swrast-allocated buffer gets an irb of NULL, so we segfaulted
in the clear-accum test. Just look at the swrast renderbuffer pointer
for handling swrast rbs.
From the extension spec:
Added to section 5.4, as part of the discussion of which commands
are not compiled into display lists:
"Certain commands, when called while compiling a display list, are
not compiled into the display list but are executed immediately.
These are: ..., RenderbufferStorageMultisampleEXT..."
Fixes piglit EXT_framebuffer_multisample/dlist.
Reviewed-by: Brian Paul <brianp@vmware.com>
Noticed when handling a similar problem in EXT_framebuffer_multisample.
From the EXT_framebuffer_object spec:
Added to section 5.4, as part of the discussion of which commands
are not compiled into display lists:
"Certain commands, when called while compiling a display list, are
not compiled into the display list but are executed immediately.
These are: ..., GenFramebuffersEXT, BindFramebufferEXT,
DeleteFramebuffersEXT, CheckFramebufferStatusEXT,
GenRenderbuffersEXT, BindRenderbufferEXT, DeleteRenderbuffersEXT,
RenderbufferStorageEXT, FramebufferTexture1DEXT,
FramebufferTexture2DEXT, FramebufferTexture3DEXT,
FramebufferRenderbufferEXT, GenerateMipmapEXT..."
Reviewed-by: Brian Paul <brianp@vmware.com>
Constants array is always assumed to be RGBA, which means we need to
swizzle the constant elements into place to match the AoS ordering
(e.g., BGRA) that was passed to lp_build_tgsi_aos().
Signed-off-by: José Fonseca <jfonseca@vmware.com>
Should avoid dangling pointer derreference with
glean --run results --overwrite --quick --tests texSwizzle
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
We just prefix the $CLANG environment variable in configure.ac with acv_mesa_
Found by: tinderbox
Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This was horribly broken and has cost everyone more time than we were
ever going to save using it. It might have been fixable, but the
problem it was originally trying to solve can be better solved with
-Werror=missing-prototypes and -Werror=implicit-function-declaration.
Also, it was always producing a big scary warning about how the link
test was non-portable.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44928
Substantially increases performance in GLBenchmark PRO:
- 320x240 => 3.28x
- 1920x1080 => 1.47x
- 2560x1440 => 1.27x
The LD message ignores the sampler unit index and SAMPLER_STATE pointer,
instead relying on hard-wired default state. Thus, there's no need to
worry about running out of sampler units or providing SAMPLER_STATE;
this small patch should be all that's required.
NOTE: This is a candidate for release branches.
(It requires the preceding commit to compile.)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
brw_SAMPLE is full of complex workarounds for original Broadwater
hardware, and I'd rather avoid all that for my next Ivybridge patch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This function releases the buffer that contains user-space vertex data.
The buffer_offset field points into that buffer. So reset the
buffer_offset to zero when we release the buffer so that subsequent
draws don't inadvertantly get a bad offset.
Fixes error messages / failed assertions (in the draw module's bounds/size
checking code) when running piglit's polygon-mode test.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
-fvisibility=hidden was preventing them from being exported, which
combined with shared-glapi was causing undefined symbol errors at
runtime.
We don't want to make these functions part of the ABI, and given
how simple they are, we simply inline them.
From c998f732d42da5e962fe5da294493132c3e8dc5f Mon Sep 17 00:00:00 2001
From: Lucas Stach <dev@lynxeye.de>
Date: Tue, 24 Jan 2012 09:46:32 +0100
Subject: [PATCH] nvfx: fix nv3x fallout from state validation changes
Apparently nv3x needs some curde hacks to work properly. This
is clearly not the right fix, but it's the behaviour of the old
code and fixes regressions seen by users.
This has the drawback that when creating configure for
distribution, wayland needs to be available for the packager.
Also the the macros has the wayland prefix hardcoded, so
we cant copy it in mesa right now.
In bad applications like ipers which does a lot of draw calls with
no state changes this helps to greatly reduce time spent in prepare.
In ipers around 7% of CPU was spent in various prepare functions,
after this commit no prepare function show on the profile.
This commit also has the added benefit of now grouping all pipelined
drawing into a single draw call if the driver uses vbuf_render.
Reviewed-by: Stéphane Marchesin <marcheu@chromium.org>
Tested-by: Stéphane Marchesin <marcheu@chromium.org>
Previously, max_vs_entries was set to 128 for GT1, and 256 for GT2,
based on the PRM (see Vol2, part1, p28). However, Bspec section 1.6.5
indicates that the maximum number of VS entries is 256 for GT1.
No piglit regressions on GT1.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When storing data in a buffer of type DYNAMIC_DRAW, we don't create a
drm_intel_bo for it; instead we store the data in system memory and
defer allocation of the GPU buffer until it is needed. Therefore, in
brw_update_sol_surface(), we can't just consult the "buffer" field of
the intel_buffer_object structure; we need to call
intel_bufferobj_buffer() to ensure that the deferred allocation
occurs.
This parallels a similar fix for gen7 (see commit ba6f4c9).
Fixes piglit test EXT_transform_feedback/buffer-usage on gen6.
This is a candidate for the 8.0 release branch.
Reviewed-by: Eric Anholt <eric@anholt.net>
It always had the same value as ctx->Extensions.EXT_framebuffer_sRGB.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Strictly speaking, it's not legal to expose EXT_texture_integer without
EXT_gpu_shader4. It might be even dangerous (apps can assume EXT_gpu_shader4
is available without checking for it).
The check in compute_version is removed as well, because that's already
covered by GLSLVersion >= 130.
Reviewed-by: Brian Paul <brianp@vmware.com>
- use OR to combine bind flags
- combine both conditionals into one
- move the ARB_fbo enable where it belongs
Reviewed-by: Brian Paul <brianp@vmware.com>
For ARB_color_buffer_float. Most hardware can't do it and st/mesa is
the perfect place for a fallback.
The exceptions are:
- r500 (vertex clamp only)
- nv50 (both)
- nvc0 (both)
- softpipe (both)
We also have to take into account that r300 can do CLAMPED vertex colors only,
while r600 can do UNCLAMPED vertex colors only. The difference can be expressed
with the two new CAPs.
A current incomplete framebuffer was incorrectly used as a
st_framebuffer. When accessing st_framebuffer childs bad things happen:
e.g. st_framebuffer::iface was used to check whether its an incomplete
fb, instead we need to compare st_framebuffer::Base against
mesa_get_incomplete_framebuffer.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44919
Note: This is a candidate for the 8.0 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
Fixes Intel oglconform negative.typeFormatMismatch.copyteximage.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
This is part of fixing Intel oglconform
negative.typeFormatMismatch.copyteximage.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
This code is unprepared for handling integer (particularly, the
baseFormat of the TexFormat comes out as GL_RGBA, not GL_RGBA_INTEGER,
so the direct call of Driver.ReadPixels crashes due to the int vs
non-int error checking not having happened). I'm frankly tempted to
convert this code to MapRenderbuffer/MapTexImage rather than doing it
as meta ops, now that we have that support.
Improves the remaining crash in Intel oglconform for int-textures to
just a rendering failure.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
This aborts and crashes in intel oglconform's int-textures into being
just rendering failures. Clamping isn't handled yet.
v2: Add missing "break".
v3: Drop the int/uint distinction, since they don't need different clamping.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Brian Paul <brianp@vmware.com> (v2)
Similarly to how we handle this in texstore, we have to remap height
to depth so that we MapTextureImage each image layer individually.
Fixes part of Intel oglconform's int-textures advanced.fbo.rtt
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
This is a step toward fixing Intel oglconform's
int-textures advanced.fbo.rtt.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
This doesn't result in correct rendering -- GL requires that logic ops
work, while the hardware specs say it doesn't do them. I'm not sure
how we would want to handle this.
NOTE: This is a candidate for the 8.0 branch.
When we're actually rendering into a texture, map the texture image
instead of the corresponding renderbuffer. Before, we just copied
a pointer from the texture image to the renderbuffer. This change
will make the code usable by hardware drivers.
ctx->Driver.MapTexture() always points to _swrast_map_texture().
We're already reaching into swrast from t_vb_program.c anyway.
This will let us remove the ctx->Driver.Map/UnmapTexture() functions.
These are temporary, actually, but they'll make follow-on work easier to
implement in a step-by-step manner. Eventually the Map and RowStrideBytes
fields will go into a new swrast_renderbuffer type, but adding that type
now would involve touching a _lot_ of code that'll eventually be removed.
The fields marked as obsolete will go away completely at some point.
That field is only used by swrast code so there's no reason to mess
with it in the gallium state tracker.
This also lets us remove the unused st_format_data() type function and
related code.
When ARB VAOs are used, glPopClientAttrib does not resurrect a deleted
VAO or VBO. This difference between the two spec is, unfortunately,
not very well spelled out in the specs.
Fixes oglc vao(advanced.pushPop.deleteVAO) and
vao(advanced.pushPop.deleteVBO) tests.
NOTE: This is a candidate for release branches.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
There are more differences between Apple and ARB than just requiring
that all arrays be stored in VBOs. Additional uses will be added in
following commits.
Also, set the flag at Bind time instead of Gen time. The ARB_vao spec
specifies that behavior.
NOTE: This is a candidate for release branches.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This is a hack to work around drivers such as i965 that:
- Set _MaintainTexEnvProgram to generate GLSL IR for
fixed-function fragment processing.
- Don't call _mesa_ir_link_shader to generate Mesa IR from the
GLSL IR.
- May use swrast to handle glDrawPixels.
Since _mesa_ir_link_shader is never called, there is no Mesa IR to
execute. Instead do regular fixed-function processing.
Even on platforms that don't need this, the software fixed-function
code is much faster than the software shader code.
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44749
At least one place, the _mesa_need_secondary_color function in
state.h, uses this to make decisions. The next patch in this series
will add another dependency. Ideally, this field would go away and be
replace by a flag or something.
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
When rowstride was negatie, unsigned promotion caused a segfault here:
299│ if (rb->Format == MESA_FORMAT_S8) {
300│ const GLuint rowStride = rb->RowStride;
301│ for (i = 0; i < count; i++) {
302│ if (x[i] >= 0 && y[i] >= 0 && x[i] < w && y[i] < h) {
303├> stencil[i] = *(map + y[i] * rowStride + x[i]);
304│ }
305│ }
306│ }
Fixes segfault in oglconform
separatestencil-neu(NonPolygon.BothFacesBitmapCoreAPI),
though test still fails.
Note: This is a candidate for the stable branches.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=43327
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
i965 processes assignments of whole structures using
vec4_visitor::emit_block_move, a recursive function which visits each
element of a structure or array (to arbitrary nesting depth) and
copies it from the source to the destination. Then it increments the
source and destination register numbers so that further recursive
invocations will copy the rest of the structure. In addition, it sets
the swizzle field for the source register to an appropriate value of
swizzle_for_size(...) for the size of each element being copied, so
that later optimization passes won't be fooled into thinking that
unused vector elements are live.
This all works fine. However, emit_block_move also contains an
assertion to verify, before setting the swizzle field for the source
register, that the source register doesn't already contain a
nontrivial swizzle. The intention is to make sure that the caller of
emit_block_move hasn't already done some swizzling of the data before
the call, which emit_block_move would then counteract when it
overwrites the swizzle field. But the assertion is at the lowest
level of nesting of emit_block_move, which means that after the first
element is copied, instead of checking the swizzle field set by the
caller, it checks the swizzle field used when moving the previous
element. That means that if the structure contains elements of
different vector sizes (which therefore require different swizzles),
the assertion will erroneously fire.
This patch moves the assertion from emit_block_move to the calling
function, vec4_visitor::visit(ir_assignment *). Since the caller is
non-recursive, the assertion will only happen once, and won't be
fooled by emit_block_move's modification of the swizzle field.
This patch also reverts commit fe006a7 (i965/vs: Fix swizzle related
assertion), which attempted to fix the bug by making the assertion
more lenient, but only worked properly for structures, arrays, and
matrices in which each constituent vector is the same size.
This fixes the problem described in comment 9 of
https://bugs.freedesktop.org/show_bug.cgi?id=40865. Unfortunately, it
doesn't fix the whole bug, since the test in question is also failing
due to lack of register spilling support in the VS.
Fixes piglit test vs-assign-varied-struct. No piglit regressions on
Sandy Bridge.
This is a candidate for the 8.0 release branch.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=40865#c9
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is similar to a commit that did the same for the FS.
Shaves several more instructions off of the VS in Lightsmark, but no
statistically significant performance difference (n=5).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Shaves a few instructions off of the VS in Lightsmark, but no
statistically significant performance difference on gen7 (n=5).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
AC_CHECK_LIB has this nasty behavior, like the cflags tests, of
automatically putting the tested value into the global LIBS on
success. This caused -lexpat to end up in LIBS, but without the
--with-expat dir, so my 32-bit build on a 64 system using expat from a
custom prefix could only find the system expat and fail to link on the
one current consumer of the LIBS variable: the dri driver test link.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
While reading through the simulator, I found some interesting code that
looks like it checks the sampler default color pointer against the bound
set in STATE_BASE_ADDRESS. On failure, it appears to program it to the
base address itself.
So I decided to try programming a legitimate bound, and lo and behold,
border color worked.
+92 piglits on Sandybridge. Also fixes Lightsmark on Ivybridge.
NOTE: This is a candidate for stable release branches.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=28924
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38868
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Since we now always build shared glapi, this exposes the fact that libOSMesa was
underlinked when glapi was built shared.
Fix this by doing the same thing as drivers/X11/Makefile already does, ensuring
that the library is linked with the shared glapi library.
(I'm not clear why we link with both glapi.a and glapi.so, so this may be all wrong)
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Refine "always build shared dricore" so we don't build it if we don't need
it because we aren't actually building any dri drivers because of --disable-driglx-direct
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Looks insane, but it does appear we need a full slot per input/output.
This fixes another 180 or so piglit tests.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Adds all the easier lowhanging opcodes.
Fixes ~3000 piglit tests with GLSL1.30 enabled on cayman.
This just leaves the mul/div/mod ops to fix up.
Signed-off-by: Dave Airlie <airlied@redhat.com>
"If set, forces degamma on XYZ if format is
FMT_8_8_8_8, FMT_BC1, FMT_BC2, or FMT_BC3"
Don't claim support for sRGB on any other formts.
This fixes glean texture_srgb.
Signed-off-by: Dave Airlie <airlied@redhat.com>
It doesn't pass the piglit test, but it seems to be a lot closer
than it was before. I need to track down if there is another problem.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Due to the changes for multiple kcache banks support, now we are assigning
final SRCx_SEL values for kcache access at the later stage, when building the
bytecode. So we need to take into account kcache banks to distinguish
the constants with the same address but different bank index.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Same fix as previously done by Dave Airlie for r600/r700
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Clip planes are uploaded as a constant buffer and used by the vertex
shader to produce corresponding clip distances for hw clipping.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Add support for multiple kcache banks (constant buffers).
Lock the required lines only.
Allow up to 4 kcache line sets in the alu clause by using ALU_EXTENDED on eg+.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Fix this GCC warning on non-debug builds.
glsl_types.cpp: In member function 'gl_texture_index
glsl_type::sampler_index() const':
glsl_types.cpp:157: warning: control reaches end of non-void function
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Enable it in the evergreen_context_draw if needed.
Same as already done in the r600_context_draw for r6xx/r7xx.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
BURST_COUNT is clipped with ARRAY_SIZE, so set it to the max value
to avoid clipping.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
libglapi.so, libGL.so, libGLESv2.so, libGLESv1_CM.so must all
come from the same version of Mesa or bad things may happen.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Matt Turner <mattst88@gmail.com>
When the framebuffer has separate depth and stencil buffers, and HiZ is
not enabled on the depth buffer, mark the framebuffer as unsupported. This
happens when trying to create a framebuffer with Z16/S8 because we haven't
enabled HiZ on Z16 yet.
Fixes gles2conform test stencil8.
Note: This is a candiate for the 8.0 branch.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44948
Reviewed-and-tested-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed--by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
This loosens the format validation in glBlitFramebuffer. When blitting
depth bits, don't require an exact match between the depth formats; only
require that the two formats have the same number of depth bits and the
same depth datatype (float vs uint). Ditto for stencil.
Between S8_Z24 buffers, the EXT_framebuffer_blit spec allows
glBlitFramebuffer to blit the depth and stencil bits separately. So I see
no reason to prevent blitting the depth bits between X8_Z24 and S8_Z24 or
the stencil bits between S8 and S8_Z24. However, we of course don't want
to allow blitting from Z32 to Z32_FLOAT.
Fixes Piglit fbo/fbo-blit-d24s8 on Intel drivers with separate stencil
enabled.
The problem was that, on Intel drivers with separate stencil, the default
framebuffer has separate depth and stencil buffers with formats X8_Z24 and
S8. The test attempts to blit the depth bits from a S8_Z24 buffer into the
default framebuffer.
v2: Check that depth datatypes match.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44665
Note: This is a candidate for the 8.0 branch.
Reported-by: Xunx Fang <xunx.fang@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
The nvc0 gallium driver is advertising 128 MAX_INTERLEAVED_COMPS
which made it always assert in the linker when TFB was used since
the Outputs array was smaller than that maximum.
v2: added assertions
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
So it appears R600s (except rv670) do AR handling different using a different
opcode. This patch fixes up r600g to work properly on r600.
This fixes ~100 piglit tests here (in GLSL1.30 mode) on rv610.
v3: add index_mode as per the docs.
This still fails any dst relative tests for some reason I can't quite see yet,
but it passes a lot more tests than without.
v4: add a nop after dst.rel this could be improved using a second pass,
where we only insert nops if two instructions are sure to collide.
The docs say r600, rv610, rv630 needs this, and not rv670, rs780, rs880,
need AMD to confirm rv620, rv635.
v5: add is_nop_inst.
NOTE: This is a candidate for stable branches.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Use a bitmask approach to compute gl_array_object::_MaxElement.
To make this work correctly depending on the shader type actually used,
make use of the newly introduced typed bitmask getters.
With this change I gain about 5% draw time on some osgviewer examples.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
Depending on the installed shader type, different arrays are used
from gl_array_object. Provide helper functions that compute
the bitmask of these arrays that are finally enabled for a given
shader type. The will be used in a followup change.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
Commit ede60bc467 (glsl: Add isinf() and
isnan() builtins) uses "+INF" in the .ir file to represent infinity.
This worked on C99-compliant compilers, since the s-expression reader
uses strtod() to read numbers, and C99 requires strtod() to understand
"+INF". However, it didn't work on non-C99-compliant compilers such
as MSVC.
This patch modifies the s-expression reader to explicitly check for
"+INF" rather than relying on strtod() to support it.
This is a candidate for the 8.0 branch.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44767
Tested-by: Morgan Armand <morgan.devel@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
To fix failed assertions when calling glCopyBufferSubData().
svga_texture() asserts that the resource is a texture. Simply move the
calls to svga_texture() after the code that handles non-texture copies
so that we don't call it with non-texture resources.
Fixes glean bufferObject failure.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Two assignments to num_immediates were missing in
get_pixel_transfer_visitor() and get_bitmap_visitor().
The uninitialized value led to valgrind errors and crashes in some
cases.
Added new assertions to catch future problems in this area. Also
changed num_immediates to unsigned to avoid signed/unsigned
comparison warnings.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
The default access flags for OpenGL ES (via GL_OES_map_buffer) and
desktop OpenGL are different. The code previously tried to handle
this, but the decision was made at compile time. Since the same
driver binary can be used for both OpenGL ES and desktop OpenGL, the
decision must be made at run-time.
This should fix bug #44433. It appears that the test case does
various map and unmap operations and inspects the state of the buffer
object around each. When it sees that GL_BUFFER_ACCESS does not match
its expectations, it fails.
NOTE: This is a candidate for release branches.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44433
If we don't find an exact PIPE_FORMAT_x for a GL_(COMPRESSED)_RED/RG format,
try uncompressed formats. We were already doing this for the RGB(A) formats.
Fixes piglit arb_texture_compression-internal-format-query test.
NOTE: This is a candidate for the stable branches.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
msg_type moved by a bit, so the message type was being disassembled
incorrectly. In particular, render target writes were showing up as
"OWORD block write".
NOTE: This is a candidate for stable release branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Compared to sampler_gen5, simd_mode shifted by a bit and msg_type grew
by a bit. So we were printing slightly incorrect numbers.
NOTE: This is a candidate for stable release branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Both the VF and VS share space in the URB. First, the VF stores
attributes (shader inputs) there. The VS then reads the attributes,
executes, and reuses the space to store varyings (shader outputs).
Thus, we need to calculate the amount of URB space necessary for inputs,
outputs, and pick whichever is greater.
The old VS backend correctly did this (brw_vs_emit.c:408), but the new
VS backend only considered outputs.
Fixes vertex scrambling in GLBenchmark PRO on Ivybridge.
NOTE: This is a candidate for the 8.0 branch.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=41318
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
In the following scenario:
- CreateContext C1
- MakeCurrent C1
- DestroyContext C1 (does not actually destroy the first context, postponed
until the next MakeCurrent)
- CreateContext C2
- MakeCurrent C2
MakeCurrent will call flush on a half destroyed context, leading to crashes.
Since the other paths (destroy and makecurrent) already flush the context,
there is no need to flush here, so we remove this useless flush front call.
This fixes GPU crashes with Chrome and gallium drivers.
v2: Don't flag the format as being HiZ ready (there's DRI2 handshake
pain to go through).
Fixes piglit gl-3.0-required-sized-texture-formats
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
This is required for Z16 support for texturing, which is the first
thing to have a horizontal alignment of 8. Renderbuffers don't need
it, since they're always set up as the only mip level, but do it for
completeness anyway.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
This field is actually set up above.
NOTE: This is a candidate for the 8.0 branch, to avoid conflicts.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
I copy-and-pasted the thing I was allocating for as the context, so
the first time it would be NULL (root of a ralloc context) and they'd
chain off each other from then on.
NOTE: This is a candidate for the 8.0 branch.
The legal range for the device is apparently [-16.0, +15.0].
Limiting the range to [-15, +15] fixes piglit's lodbias test.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
The interaction between the mipmap lod min/max limits and the texture
base/max level limits is kind of tricky. Changing the base level
didn't work as expected before.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
This makes lod clamping more consistent with other drivers.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Update the dd.h docs to indicate that GL_MAP_INVALIDATE_RANGE_BIT
can be used with GL_MAP_WRITE_BIT when mapping renderbuffers and
texture images.
Pass the flag when mapping texture images for glTexImage, glTexSubImage,
etc. It's up to drivers whether to actually make use of the flag.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
To try to use less tex memory and maybe get better performance.
Spotted by Roland Scheidegger.
NOTE: This is a candidate for the 8.0 and 7.11 branches.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
The i965 driver advertises GL_ARB_texture_float and GL_ARB_texture_rg
support but the ctx->TextureFormatSupported[] table entries for
MESA_FORMAT_R_FLOAT32 and MESA_FORMAT_RGBA_FLOAT32 are false on gen 4
hardware. So the case for GL_R32F would fail and we'd print an
implementation error.
This patch adds more Mesa tex format options for GL_R32F and other R/G
formats so we fall back to 16-bit formats when 32-bit formats aren't
available.
Eric made the same fix in commit 6216a5b4 for the non R/G formats.
v2: try 16-bit formats before 32-bit formats and try RG formats before
RGBA where possible.
This should fix https://bugs.freedesktop.org/show_bug.cgi?id=44039
NOTE: This is a candidate for the 8.0 and 7.11 branches.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
This enables linear gradients if we need a linear,
it also sets the flat shade flag for color/constant interpolations.
Signed-off-by: Dave Airlie <airlied@redhat.com>
When I originally implemented the hack to use GRFs 111+ as fake MRFs, I
did so purely to avoid rewriting all the code that dealt with MRFs.
However, it turns out that a similar hack is actually required.
Newly discovered language in the BSpec indicates that SEND instructions
with EOT set "should" use g112-g127 as their source registers. Based on
assertions in the simulator, this is actually a requirement on certain
platforms.
Since we're faking MRFs already, we may as well use the officially
sanctioned range. My guess is that we avoided this issue because we
seldom use m0: URB writes in the new VS backend start at m1, and RT
writes in the new FS backend start at m2.
NOTE: This is a candidate for stable release branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Now that we no longer generate Mesa IR from GLSL IR, it's impossible to
use the old vertex shader backend for GLSL programs. There's simply no
Mesa IR to codegen from.
Any attempt to do so would result in immediate GPU hangs, presumably due
to the driver uploading an empty program with no EOT message.
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
According to Table 6.8 (Page 348) in the OpenGL 3.0 specification,
glGetVertexAttribiv supports GL_VERTEX_ATTRIB_ARRAY_INTEGER.
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes the following OGLConform tests on gen5:
depth-stencil(misc.state_on.depth_int)
fbo_db_ARBfp(basic.OnlyDepthBuffDrawBufferRender)
The problem was that, if the depth buffer's Mesa format was X8_Z24, then
we emitted the hardware format D24_UNORM_X8. But, on gen5, D24_UNORM_S8
must be emitted.
This bug was introduced by:
commit d84a180417
Author: Eric Anholt <eric@anholt.net>
i965: Base HW depth format setup based on MESA_FORMAT, not bpp.
v2: Deref 'intel' directly. Move the branch for newer chipset to top.
Quote the PRM. As requested by Ken.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=43408
Note: This is a candidate for the 8.0 branch.
Reported-by: Xunx Fang <xunx.fang@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
The original R600 requires the UNCACHED_FIRST_INST bit
to be set in the PS.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Note: this is candidate for the stable branches.
With the conversion to automake in commit
e326480e4e, several additional build
artifacts are created:
src/mesa/drivers/dri/i965/.deps/
src/mesa/drivers/dri/i965/.libs/
src/mesa/drivers/dri/i965/Makefile
src/mesa/drivers/dri/i965/Makefile.in
src/mesa/drivers/dri/i965/i965_dri.la
src/mesa/drivers/dri/i965/i965_symbols_test
This patch adds all of these files to .gitignore.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
TestMipMaps() function in src/OGLconform/textureNPOT.c calls glTexImage2D()
with width = 0. Texture with zero size skips miptree allocation due to a
condition in function _mesa_store_teximage3d(). While calling glGetTexImage()
it results in assertion failure in intel_map_texture_image() due to null mt
pointer.
This patch fixes the issue by detecting the zero size texture early in
glGetTexImage and glGetCompressedTexImage functions. In such a case function
simply returns doing nothing.
Verified that below mentioned bug is fixed by this patch.
https://bugs.freedesktop.org/show_bug.cgi?id=42334
NOTE: This is a candidate for stable branches
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This does introduce a warning by the automake build system, that the
missing-symbols test build is non-portable. That's true -- Mac OS X
can't take something built as a loadable module and just link it as a
library. Of course, we aren't building this on OS X at all, so it
would be nice to be able to suppress it, but I haven't found a way.
Still, the build is going to be much quieter than we have ever had
before, so I think this is a fair tradeoff until we find a way to shut
that warning up.
v2: Put a link in /lib to avoid transition pains for people.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
Reviewed-by: Matt Turner <mattst88@gmail.com> (v1)
Nothing works if HiZ is enabled and the DDX is incapable of HiZ (that is,
the DDX version is < 2.16).
The problem is that the refactoring that eliminated
intel_renderbuffer::stencil_rb broke the recovery path in
intel_verify_dri2_has_hiz(). Specifically, it broke line
intel_context.c:1445, which allocates the region for
DRI_BUFFER_DEPTH_STENCIL. That allocation was creating a separate stencil
miptree, despite the buffer being a packed depthstencil buffer. Havoc
ensued.
This patch introduces a bool flag that prevents allocation of that stencil
miptree.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44103
Tested-by: Ian Romanick <idr@freedesktop.org>
Note: This is a candidate for the 8.0 branch.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Fix this GCC warning with non-LLVM builds.
sp_screen.c: In function ‘softpipe_get_shader_param’:
sp_screen.c:141:28: warning: unused variable ‘sp_screen’ [-Wunused-variable]
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Calling glXSwapBuffers with no bound context causes segmentation
fault in function intelDRI2Flush. All the gl calls should be
ignored after setting the current context to null. So the contents
of framebuffer stay unchanged. But the driver should not seg fault.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44614
Reported-by: Yi Sun <yi.sun@intel.com>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Yi Sun <yi.sun@intel.com>
Fix this GCC warning.
lp_test_round.c: In function ‘test_round’:
lp_test_round.c:126:13: warning: variable ‘packed’ set but not used
[-Wunused-but-set-variable]
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Fix this GCC 4.6 warning with 64-bit builds.
u_debug_stack.c: In function ‘debug_backtrace_capture’:
u_debug_stack.c:45:17: warning: variable ‘frame_pointer’ set but not
used [-Wunused-but-set-variable]
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
The i915 GPU can't do A8 dst, so we abuse GREEN8 buffers for that
purpose. However, things get hairy as we start to do blending,
because then GL_DST_*_ALPHA should be replaced with GL_DST_*_COLOR.
This is what we do here.
Fixes piglt fbo-alpha.
v2: select the colors in the pixel shader
v3: fix rs state creation for pre-evergreen
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Create the video buffers in the format the driver preffers.
This temporary creates problems with decoder less VDPAU video playback.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Add a second extened constructor that takes plane
textures for the video buffer. Also provide a
function for texture templates.
Signed-off-by: Christian König <deathsimple@vodafone.de>
This requires GLSL 1.30 enabled, which requires integer types enabled,
so don't bother doing an INT to FLT conversion on it.
We should probably remove the instance id flt->int conversion when
turning on native integers.
this passes the three piglit tests with GLSL 1.30 forced on.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This commit rewrites a lot of the state_fb code to support
rendering to targets not aligned to 64 byte.
This allows us to drop the render temporaries as unaligned
targets are the only use-case where they are really needed. The
temporaries code was used for a lot of things more, but apparently
those also work without temps.
There is one regression in piglit fbo-clear-formats, but this will
be fixed with the use of real hardware clears and doesn't matter in
practice as no real application tries to scissor clear a 2x2 pixel
render target.
Signed-off-by: Lucas Stach <dev@lynxeye.de>
There are 3 changes:
1) stride is specified for each buffer, not just one, so that drivers don't
have to derive it from the outputs
2) new per-output property dst_offset, which specifies the offset
into the buffer in dwords where the output should be stored,
so that drivers don't have to compute the offsets manually;
this will also be useful for gl_SkipComponents
from ARB_transform_feedback3
3) register_mask is removed, instead, there is start_component
and num_components; register_mask with non-consecutive 1s
doesn't make much sense (some hardware cannot do packing of components)
Christoph Bumiller: fixed nvc0.
v2: resolve merge conflicts in Draw and clean it up
Virtual address space put the userspace in charge of their GPU
address space. It's up to userspace to bind bo into the virtual
address space. Command stream can them be executed using the
IB_VM chunck.
This patch add support for this configuration. It doesn't remove
the 64K ib size limit thought this limit can be extanded up to
1M for IB_VM chunk.
v2: fix rendering
v3: fix rendering when using index buffer
v4: make vm conditional on kernel support add basic va management
v5: catch the case when we already have va for a bo
v6: agd5f: update on top of ioctl changes
v7: agd5f: further ioctl updates
v8: indentation cleanup + fix non cayman
v9: rebase against lastest mesa + improvement from Marek & Michel
v10: fix cut/paste bug
v11: don't rely on updated radeon_drm.h
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Make the comments precise. Explain why each branch is needed and correct.
Document the potential pitfall in the true-branch.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
When using Mesa with a GLES API, calling _mesa_FramebufferRenderbuffer
with GL_DRAW_FRAMEBUFFER will report a 'user error' because
get_framebuffer_target validates that this enum from the framebuffer
blit extension is only used on GL. To work around it this patch makes
it use the GL_FRAMEBUFFER enum instead in that case.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=43418
Note: This is a candidate for the 8.0 branch.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
The gl_renderbuffer::Format field wasn't always set properly. This
didn't matter much in the past but with the recent swrast/renderbuffer
mapping changes, core Mesa will be directly touching OSMesa colorbuffers
so using the right MESA_FORMAT_x value is important.
Unfortunately, there aren't MESA_FORMATs for all the possible OSmesa
format/type combinations, such as GL_FLOAT / OSMESA_ARGB. If anyone
runs into these we can add new Mesa formats.
v2: add warnings for unsupported formats, fix ARGB_REV mix-up.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
We always access pull constant buffers using the message types "OWord
Block Read" or "OWord Dual Block Read". According to the Sandy Bridge
PRM, Vol 4 Part 1, pages 214 and 218, when using these messages:
"the surface pitch is ignored, the surface is treated as a
1-dimensional surface. An element size (pitch) of 16 bytes is
used to determine the size of the buffer for out-of-bounds
checking if using the surface state model."
Previously we were setting the pitch for pull constant buffers to the
size of the whole constant buffer--this made no sense and would have
led to incorrect behavior if it were not for the fact that the pitch
is ignored.
For clarity, this patch sets the pitch for pull constant buffers to 16
bytes, consistent with the hardware's behavior.
v2: Clarify the meaning of the ignored values by writing them as (16 - 1).
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Commit 9bdc44a528 (i965: Replace struct
with bit shifting for WM pull constant surfaces) accidentally
introduced off-by-one errors into the calculation of the surface
width, height, and depth. This patch restores the correct
computation.
The reason this wasn't noticed by Piglit tests is that the size of our
constant surfaces is always less than 2^20, therefore the off-by-one
error was causing the "depth" field of the surface to be set to all
1's. The hardware interpreted this as an extremely large surface, so
overflow checking was effectively disabled.
No Piglit regressions on Sandy Bridge.
NOTE: This is a candidate for the 7.11 and 8.0 branches.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Flat SHADE_MODEL still overrides any non-flat interpolation
qualifier, but pulling that state out of the rasterizer cso
isn't really worth the effort, is it ?
NOTE: This is a candidate for the 8.0 branch.
This fixes accum buffer operations. The accumulation buffer is the
only malloc-based renderbuffer for the intel drivers.
v2: apply x/y offset to returned pointer
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Eric Anholt <eric@anholt.net>
Fixes piglit EXT_framebuffer_multisample/negative-copypixels.
Reviewed-by: Brian Paul <brianp@vmware.com>
NOTE: This is a candidate for the 8.0 branch.
Fixes piglit EXT_framebuffer_multisample/negative-copyteximage.
Reviewed-by: Brian Paul <brianp@vmware.com>
NOTE: This is a candidate for the 8.0 branch.
Fixes piglit EXT_framebuffer_multisample-negative-readpixels.
Reviewed-by: Brian Paul <brianp@vmware.com>
NOTE: This is a candidate for the 8.0 branch.
Fixes piglit EXT_framebuffer_multisample/renderbuffer-samples.
Reviewed-by: Brian Paul <brianp@vmware.com>
NOTE: This is a candidate for the 8.0 branch.
Previously, we were saying that everything from the starting tile to
region width+height was part of the limits of our depthbuffer, even if
the tile was near the bottom of the depthbuffer. This mean that our
range was not clipping to buffer buonds if the start tile was anything
but the start of the buffer.
In bebc91f0f3, this was changed to
saying that we're just rendering to a region of the size of the
renderbuffer. This is great -- we get a range that should actually
match what we want. However, the hardware's range checking occurs
after the X/Y offset addition, so we were clipping out rendering to
small depth mip levels when an X/Y offset was present. Just add
tile_x/y to the width in that case -- the WM won't produce negative
x/y values pre-offset, so we just need to get the left/bottom sides of
the region to cover our buffer.
Fixes the following Piglit regressions on gen7:
spec/ARB_depth_buffer_float/fbo-clear-formats
spec/ARB_depth_texture/fbo-clear-formats
spec/EXT_packed_depth_stencil/fbo-clear-formats
NOTE: This is a candidate for the 8.0 branch.
The array holds GLuint values so remove the float cast.
Note, however, that to compute the average of four GLuints we really
want to do (a+b+c+d)/4 but that could overflow. This change doesn't
address that for now.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
In the first case, the newImage[] array contains GLuint values.
In the second case, the parameter type is GLuint, but the maxDepth
value is never used in this case (GL_FLOAT_32_UNSIGNED_INT_24_8_REV).
Pass ~OU just to be safe.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
We include both imports.h and u_math.h in the state tracker. This
leads to multiple, conflicting definitions of ffs() with MSVC.
Use FFS_DEFINED to skip the ffs() in u_math.h.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
include mesa headers before gallium headers to avoid problem with
ffs() being defined in u_math.h and then again in imports.h
The next commit will add some #ifdefs to prevent multiple definitions
of ffs().
Call ffs() and ffsll() everywhere. Define our own ffs(), ffsll()
functions when the platform doesn't have them.
v2: remove #ifdef _WIN32, __IBMC__, __IBMCPP_ tests inside ffs()
implementation. The #else clause was recursive.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Alexander von Gluck <kallisti5@unixzen.com>
Introduce vbo_get_minmax_indices() function to handle the min/max index
computation for nr_prims(>= 1). The old code just compute the first
prim's min/max index; this would results an error rendering if user
called functions like glMultiDrawElements(). This patch servers as
fixing this issue.
As when nr_prims = 1, we can pass 1 to paramter nr_prims, thus I made
vbo_get_minmax_index() static.
v2: per Roland's suggestion, put the indices address compuation into
vbo_get_minmax_index() instead.
Also do comination if possible to reduce map/unmap count
v3: per Brian's suggestion, use a pointer for start_prim to avoid
structure copy per loop.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Instead, do the uniform setting and input / output mapping directly in
brw_link_shader. Hurray for not generating Mesa IR! However, once
the i965 driver stops calling _mesa_ir_link_shader, UsesClipDistance
and UsesKill are no longer set.
Ideally gen6_upload_vs_push_constants should use the
gl_shader_program, but I don't see a way to propagate the information
there. The other alternative, since this is the only usage, is to
move gl_vertex_program::UsesClipDistance to brw_vertex_program.
The compile (and precompile) stages use UsesKill to determine the
cache key for the shader. This is then used to determine whether or
not to compile the shader. Calculating this data during compilation
is too late.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Eric Anholt <eric@anholt.net>
This previously enabled some optimizations in the fragment shader
(interpolation, etc.) if some input components were always 0.0 or
1.0. However, this data was generated by analyzing Mesa IR. The
next patch in this series removes generation of Mesa IR for GLSL
paths. When we detect that case, just set the used mask to ~0 and
circumvent the optimizations.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It used to be done in ir_to_mesa, and that was kind of a bad place.
I didn't change st_glsl_to_tgsi because there is some strange stuff
happening in the code that generates glDrawPixels shaders. It looked
like this would break horribly if I touched anything.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Track the calculated data in gl_shader_program instead of the
individual assembly shaders.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Rather than looking at the settings in individual assembly programs,
look at the settings in the top-level uniform values. The old code
was flawed because examining each shader stage in isolation could
allow inconsitent usage across stages (e.g., bind unit 0 to a
sampler2D in the vertex shader and sampler1DShadow in the fragment
shader).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously the fixed-function fragment shader was tracked as a
gl_program. This means that it shows up in the driver as a Mesa IR
program instead of as a GLSL IR program. If a driver doesn't generate
Mesa IR from the GLSL IR, that program is empty. If the program is
empty there is either no rendering or a GPU hang.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Poking directly at the backing resources works only by luck. Core
Mesa code should only know about the gl_uniform_storage structure.
Soon other code that looks at samplers will use the gl_uniform_storage
structures instead of the data in the gl_program.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
It looks like AC_PROG_SED was added in 2.59b, and wasn't in the
original 2.59 in the original 2.59. Presumably that's why, though
it could've been an oversight.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Matt Turner <mattst88@gmail.com>
The gen7_urb atom depends on CACHE_NEW_VS_PROG and CACHE_NEW_GS_PROG,
causing gen7_upload_urb() to be called when switching to a new VS
program.
In addition to partitioning the URB space between the VS and GS,
gen7_upload_urb() also allocated space for VS and PS push constants.
Unfortunately, this meant that whenever CACHE_NEW_VS was flagged, we'd
reallocate the space for the PS push constants. According to the BSpec,
after sending 3DSTATE_PUSH_CONSTANT_ALLOC_PS, we must reprogram
3DSTATE_CONSTANT_PS prior to the next 3DPRIMITIVE.
Since our URB allocation for push constants is entirely static, it makes
sense to split it out into its own atom that only subscribes to
BRW_NEW_CONTEXT. This avoids reallocating the space and trashing
constants.
Fixes a rendering artifact in Extreme Tuxracer, where instead of a snow
trail, you'd get a bright red streak (affectionately known as the
"bloody penguin bug").
This also explains why adding VS-related dirty bits to gen7_ps_state
made the problem disappear: it made 3DSTATE_CONSTANT_PS be emitted after
every 3DSTATE_PUSH_CONSTANT_ALLOC_PS packet.
NOTE: This is a candidate for the 7.11 branch.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38868
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Both dri2_create_context_attribs and drisw_create_context_attribs call
dri2_convert_glx_attribs, expecting it to fill in *api on success.
However, when num_attribs == 0, it was returning true without setting
*api, causing the caller to use an uninitialized value.
Tested-by: Vadim Girlin <vadimgirlin@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
svga_sampler_view contains a pointer to a pipe_resource (base class of
svga_texture) and svga_texture contains a pointer to an svga_sampler_view.
This circular dependency prevented the objects from ever being freed when
they pointed to each other. Make the svga_sampler_view::texture pointer
a "weak reference" (no reference counting) to break the dependency.
This is safe to do because the pipe_resource/texture always has a longer
lifespan than the sampler view so when svga_sampler_view stops referencing
the texture, the texture's refcount never hits zero.
Fixes a memory leak seen with google earth and other apps.
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
We were naively emitting each component at a time, even if we were
emitting the same value to multiple channels. Improves on a codegen
regression from the old VS to the new VS on some unigine shaders
(because we emit constant vecs/matrices as immediates instead of
loading them as push constants, so we had over 4x the instructions for
using them).
shader-db results:
Total instructions: 58594 -> 58540
11/870 programs affected (1.3%)
765 -> 711 instructions in affected programs (7.1% reduction)
WGL_ARB_extensions_string states that wglGetExtensionsStringARB should
return NULL for invalid HDCs. And some applications rely on it.
Reviewed-By: "Keith Whitwell" <keithw@vmware.com>
It caused an X protocol error in some (rare) situations.
This is a follow-on to the previous commits which fixes a bug reported
by Wayne E. Robertz.
NOTE: This is a candidate for the 7.11 branch.
Reviewed-by: Adam Jackson <ajax@redhat.com>
This is the same fix as the previous commit, except it's for the gallium
glx/xlib state tracker.
NOTE: This is a candidate for the 7.11 branch.
Reviewed-by: Adam Jackson <ajax@redhat.com>
as we do in Fake_glXChooseVisual(). This registers the MesaGLX
extension on the display so we can clean up buffers, etc. when
the display connection is closed.
Fixes a bug reported by Wayne E. Robertz.
NOTE: This is a candidate for the 7.11 branch.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Prevent any state from carrying over to a new translation in cases
where we assume that data is still zero from initial calloc (these
would require us to do individual zeroing before translation which
would be more code).
When creating an EGLImage from a struct wl_buffer * this ensures
that we create an XRGB8888 image if the wayland buffer doesn't have an
alpha channel. To determine if a wl_buffer has a valid alpha channel
this patch adds an internal wayland_drm_buffer_has_alpha() function.
It's important to get the internal format for an EGLImage right so that
if a GL texture is later created from the image then the GL driver will
know if it should sample the alpha from the texture or flatten it to
a constant of 1.0.
This avoids needing fragment program workarounds in wayland compositors
to manually ignore the alpha component of textures created from wayland
buffers.
krh: Edited to use wl_buffer_get_format() instead of wl_buffer_has_alpha().
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
No functional change. In the function
__indirect_glAreTexturesResident(), the variable cmdlen is only used
if USE_XCB is not defined. This patch avoids a compile warning in the
event that USE_XCB is defined.
v2: just move cmdlen declaration inside the #else part.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previous to this patch, we didn't do the limit check for
MAX_TRANSFORM_FEEDBACK_INTERLEAVED_COMPONENTS until the end of the
store_tfeedback_info() function, *after* storing all of the transform
feedback info in the gl_transform_feedback_info::Outputs array. This
meant that the limit check wouldn't prevent us from overflowing the
array and corrupting memory.
This patch moves the limit check to the top of tfeedback_decl::store()
so that there is no risk of overflowing the array. It also adds
assertions to verify that the checks for
MAX_TRANSFORM_FEEDBACK_INTERLEAVED_COMPONENTS and
MAX_TRANSFORM_FEEDBACK_SEPARATE_COMPONENTS are sufficient to avoid
array overflow.
Note: strictly speaking this patch isn't necessary, since the maximum
possible number of varyings is MAX_VARYING (16), whereas the size of
the Outputs array is MAX_PROGRAM_OUTPUTS (64), so it's impossible to
have enough varyings to overflow the array. However it seems prudent
to do the limit check before the array access in case these limits
change in the future.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
On drivers that set gl_shader_compiler_options::LowerClipDistance (for
example i965), we need to handle transform feedback of gl_ClipDistance
specially, to account for the fact that the hardware represents it as
an array of vec4's rather than an array of floats.
The previous way this was accounted for (translating the request for
gl_ClipDistance[n] to a request for a component of
gl_ClipDistanceMESA[n/4]) doesn't work when performing transform
feedback on the whole unsubscripted array, because we need to keep
track of the size of the gl_ClipDistance array prior to the lowering
pass. So I replaced it with a boolean is_clip_distance_mesa, which
switches on the special logic that is needed to handle the lowered
version of gl_ClipDistance.
Fixes Piglit tests "EXT_transform_feedback/builtin-varyings
gl_ClipDistance[{1,2,3,5,6,7}]-no-subscript".
Reviewed-by: Eric Anholt <eric@anholt.net>
The function tfeedback_decl::num_components() was not correctly
accounting for transform feedback of whole arrays and gl_ClipDistance.
The bug was hard to notice in tests, because it only affected the
checks for MAX_TRANSFORM_FEEDBACK_SEPARATE_COMPONENTS and
MAX_TRANSFORM_FEEDBACK_INTERLEAVED_COMPONENTS.
This patch fixes the computation, and adds an assertion to verify
num_components() even when MAX_TRANSFORM_FEEDBACK_SEPARATE_COMPONENTS
and MAX_TRANSFORM_FEEDBACK_INTERLEAVED_COMPONENTS are not exceeded.
The assertion requires keeping track of components_so_far in
tfeedback_decl::store(); this will be useful in a future patch to fix
non-multiple-of-4-sized gl_ClipDistance.
Reviewed-by: Eric Anholt <eric@anholt.net>
This just fixes up the enables for native integers and EXT_texture_integer
support in st/mesa.
It also set the MaxClipPlanes to 8.
We should consider exposing caps for MCP vs MCD, but since core
mesa doesn't care yet maybe we can wait for now.
v2: use 32-bit formats as per Marek's mail.
v3: add calim's fix for INT_DIV_TO_MUL_RCP disabling.
Signed-off-by: Dave Airlie <airlied@redhat.com>
It doesn't look like the GLSL compiler will produce sign op
for an unsigned anyways (seems insane anyways).
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds integer version of SSG that GLSL 1.30 can produce.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This enables fragment clamping in softpipe, it passes more
tests than it did previously with no regressions, There are still
a couple of failures in the SNORM types to investigate.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This fixes a number of texelFetch swizzle tests, and consoldiates
the swizzle handling in a new function.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Add support for using the clipdistance instead of clip plane.
Passes all piglit clipdistance tests.
v2: fixup some comments from Brian in review.
Signed-off-by: Dave Airlie <airlied@redhat.com>
softpipe always clipped using the position vector, however for unclipped
vertices it stored the position in window coordinates, however when position
and clipping are separated, we need to store the clip-space position and
the clip-space vertex clip, so we can interpolate both separately.
This means we have to take the clip space position and store it to use later.
This allows softpipe to pass all the clip-vertex piglit tests.
v2: fix llvm draw regression, the structure being passed into llvm needed
updating, remove some hardcoded ints that should have been enums while there.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This required changing the system value semantics, so we stored
a system value per vertex, instance id is the only other system
value we currently support, so I span it across the channels.
This passes the 3 vertexid-* piglit tests + lots of instanceid tests.
Signed-off-by: Dave Airlie <airlied@redhat.com>
If draw isn't using llvm we can support vertex texture and integers,
These will be fixed up later, but for now allow this check to happen
at run-time.
v2: since 3e22c7a253 we can ask draw for a non-llvm
context. Just track if ask and set the vars accordingly. This probably isn't perfect but should cover the cases we care about.
v3: use debug option, restructure to store in screen, as suggested by Jakob.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Mesa shouldn't call into the drivers if there are no renderbuffers
bound to the attachments for the buffers to be cleared.
Fixes a number of the clearbuffer-* tests on softpipe.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This fixes the test to allow cube/depth combinations on GL3
or EXT_gpu_shader4.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
We're not quite ready to actually support it in the implementation,
but at least this allows GL 3.0 API-reliant applications to hopefully
run successfully, though they won't get multisampling.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The EXT_texture_array required only 64, but GL 3.0 required 256.
Since we're already exposing values that can get us way beyond our
ability to map the single object directly, go ahead and expose all the
way to hardware limits.
Tested with new piglit EXT_texture_array/maxlayers on gen7.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We were doing the kill of the updated channels, then adding our copy
to the list of available stuff to copy. But if the copy was updating
its own source channels, we didn't notice, breaking this code:
R0.xyzw = arg0 + arg1;
R0.xyzw = R0.wwwx;
gl_FragColor.xyzw = clamp(R0.xyzw, 0.0, 1.0);
Fixes piglit glsl-copy-propagation-self-2.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch modifies all batches needed for HiZ. The batch length for
3DSTATE_HIER_DEPTH_BUFFER is also corrected from 4 to 3.
Performance +6.7% on Citybench.
num-frames: 400
resolution: 1918x1031
avg-hiz-off: 127.90 fps
avg-hiz-on: 136.50 fps
kernel: git://people.freedesktop.org/~anholt/linux.git branch=gen7-reset-sol sha=23360e4
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
It is unwise to use a stencil region's size to determine its
renderbuffer's size, because at region creation we fudge the width and
height to accomodate interleaved rows. (See the comment for MESA_FORMAT_S8
in intel_miptree_create()). Most users of stencil_region->{width,height}
should be converted to use stencil_rb->{Width,Height}.
We have already done the replacement in several locations. This patch
continues the replacement in {brw,gen7}_emit_depthbuffer(). To make those
functions look consistent, I've also done the equivalent replacement for
the depth buffer.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
It was named GEN6_WM_DEPTH_RESOLVE. Luckily, this caused no conflict,
because the value is identical for gen6 and gen7.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Don't create clip outputs if no clip planes are enabled.
Move clip validation after program validation: we were calling
linkage validation in case the VP needed rebuilding before the
FP was validated.
The vertex program needs to be built first because when
ClipDistance is used we'll want to only enable those outputs that
are also written.
These initialization functions weren't initializing all the fields so
some had undefined values. The callers of these functions sometimes use
a structure assignment to initialize new objects from these templates
so we'd just propagate the undefined values. That made for some confusing
info when debugging, plus it could lead to bugs.
v2: fix surf pointer mix-up: "&surf" -> "surf"
Jakob Bornecrantz <jakob@vmware.com>
This code isn't used anymore in preference for DRI2 client side swap buffers
throttling or throttling done inside the xa or xorg driver.
Signed-off-by: Jakob Bornecrantz <jakob@vmware.com>
Reviewed-by Brian Paul <brianp@vmware.com>
This peice of code has been here since the inital commit (c5c5cd71) and the
code that used instance_id_index was removed in (caede752) by José.
Signed-off-by: Jakob Bornecrantz <jakob@vmware.com>
Reviewed-by Brian Paul <brianp@vmware.com>
So the targets can drop the sw_wrapper winsys when no sw driver is being used.
Signed-off-by: Jakob Bornecrantz <jakob@vmware.com>
Reviewed-by Brian Paul <brianp@vmware.com>
This replaces the current code with an implementation compatible with
the new gallium interface. I've left some of the remains of the interface
intact so llvmpipe keeps building correctly, and I'll take a look at fixing
llvmpipe up later.
v2: fixup as per Brian's review
Signed-off-by: Dave Airlie <airlied@redhat.com>
This introduces an unspecified interpolation paramter that is only allowed for
color semantics, so a specified GLSL interpolation will override the ShadeModel
specified interpolation, but not vice-versa.
This fixes a lot of the interpolation tests in piglit.
v2: rename from unspecified to color
Signed-off-by: Dave Airlie <airlied@redhat.com>
This could lead to incorrect code when fixed regs are involved.
Surprisingly, the increased freedom actually leads to lower
register usage in some cases. Still want to find a better way
to treat constraints though ...
Always set position to insert before the current instruction,
the previous behaviour led to confusion (bug in checkPredicate
for BBs with only a single conditional branch).
Conflicts:
src/gallium/auxiliary/tgsi/tgsi_strings.c
src/mesa/state_tracker/st_atom_clip.c
commit d919791f2742e913173d6b335128e7d4c63c0840
Author: Christoph Bumiller <e0425955@student.tuwien.ac.at>
Date: Fri Jan 6 17:59:22 2012 +0100
d3d1x: adapt to new clip state
commit cfec82bca3fefcdefafca3f4555285ec1d1ae421
Author: Christoph Bumiller <e0425955@student.tuwien.ac.at>
Date: Fri Jan 6 14:16:51 2012 +0100
gallium/docs: update for clip state changes
commit c02bfeb81ad9f62041a2285ea6373bbbd602912a
Author: Christoph Bumiller <e0425955@student.tuwien.ac.at>
Date: Fri Jan 6 14:21:43 2012 +0100
tgsi: add TGSI_PROPERTY_PROHIBIT_UCPS
commit d4e0a785a6a23ad2f6819fd72e236acb9750028d
Author: Brian Paul <brianp@vmware.com>
Date: Thu Jan 5 08:30:00 2012 -0700
tgsi: consolidate TGSI string arrays in new tgsi_strings.h
There was some duplication between the tgsi_dump.c and tgsi_text.c
files. Also use some static assertions to help catch errors when
adding new TGSI values.
v2: put strings in tgsi_strings.c file instead of the .h file.
Reviewed-by: Dave Airlie <airlied@redhat.com>
commit c28584ce0d8c62bd92c8f140729d344f88a0b3cd
Author: Christoph Bumiller <e0425955@student.tuwien.ac.at>
Date: Fri Jan 6 12:48:09 2012 +0100
gallium: extend user_clip_plane_enable to apply to clip distances
commit f1d5016c07f786229ed057effbe55fbfd160b019
Author: Marek Olšák <maraeo@gmail.com>
Date: Fri Jan 6 02:39:09 2012 +0100
nvfx: adapt to new clip state
commit 6f6fa1c26bd19f797c1996731708e3569c9bfe24
Author: Marek Olšák <maraeo@gmail.com>
Date: Fri Jan 6 01:41:39 2012 +0100
st/mesa: fix DrawPixels with GL_DEPTH_CLAMP
commit c86ad730aa1c017788ae88a55f54071bf222be12
Author: Christoph Bumiller <e0425955@student.tuwien.ac.at>
Date: Tue Jan 3 23:51:30 2012 +0100
nv50: adapt to new clip state
commit 3a8ae6ac243bae5970729dc4057fe02d992543dc
Author: Christoph Bumiller <e0425955@student.tuwien.ac.at>
Date: Tue Jan 3 23:32:36 2012 +0100
nvc0: adapt to new clip state
commit 6243a8246997f8d2fcc69ab741a2c2dea080ff11
Author: Marek Olšák <maraeo@gmail.com>
Date: Thu Dec 29 01:32:51 2011 +0100
draw: initalize pt.user.planes in draw_init
This fixes a crash in glean/fpexceptions.
commit e3056524b19b56d473f4faff84ffa0eb41497408
Author: Marek Olšák <maraeo@gmail.com>
Date: Mon Dec 26 06:26:55 2011 +0100
svga: adapt to new clip state
commit c5bfa8b37d6d489271df457229081d6bbb51b4b7
Author: Marek Olšák <maraeo@gmail.com>
Date: Sun Dec 25 14:11:51 2011 +0100
r600g: adapt to new clip state
commit f11890905362f62627c4a28a8255b76eb7de7df2
Author: Marek Olšák <maraeo@gmail.com>
Date: Sun Dec 25 14:10:26 2011 +0100
r300g: adapt to new clip state
commit e37465327c79a01112f15f6278d9accc5bf3103f
Author: Marek Olšák <maraeo@gmail.com>
Date: Sun Dec 25 12:39:16 2011 +0100
draw: adapt to new clip state
This adds a regression in the LLVM clipping path. Can anybody see anything
wrong with the code? It works for every other case, just glean/fpexceptions
crashes when doing the "Infinite clip plane test".
commit b474d2b18c72d965eefae4e427c269cba5ce6ba2
Author: Marek Olšák <maraeo@gmail.com>
Date: Sun Dec 25 13:14:59 2011 +0100
u_blitter: don't save/set/restore clip state
commit 9dd240ea91f523a677af45e8d0adb9e661e28602
Author: Marek Olšák <maraeo@gmail.com>
Date: Sun Dec 25 13:11:56 2011 +0100
gallium: don't cso_save/set/restore clip state
The enable bits are in the rasterizer state.
commit a4f7031179f5f4ad524b34b394214b984ac950f6
Author: Marek Olšák <maraeo@gmail.com>
Date: Sun Dec 25 12:58:55 2011 +0100
gallium: default depth_clip to 1
depth_clip = !depth_clamp
commit fe21147a00ab90e549d63fe12ee4625c9c2ffcc3
Author: Marek Olšák <maraeo@gmail.com>
Date: Mon Dec 26 06:14:19 2011 +0100
trace,util: update state logging to new clip state
Also dump the other missing flags.
commit 2a3b96e84ac872dcc5bc1de049fe76bb58d64b23
Author: Marek Olšák <maraeo@gmail.com>
Date: Sun Dec 25 10:43:43 2011 +0100
st/mesa: adapt to new clip state
commit b7b656a42fca19d7c85267f42649a206a85a2c72
Author: Marek Olšák <maraeo@gmail.com>
Date: Sat Dec 17 15:45:19 2011 +0100
gallium: move state enable bits from clip_state to rasterizer_state
This brings the code in sync with gen6_sf_state.c; presumably the
mistake was a botched rebase on initial Ivybridge bring-up patches.
Found by diffing batch buffer dumps and noticing the random values.
Thanks to Eric for catching the obvious mistake.
NOTE: This is a candidate for the 7.11 branch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
In f3e9ccb3b, I renamed gen6_upload_wm_constants to
gen6_upload_wm_push_constants, but neglected to update this comment.
I don't think there ever was a gen7_prepare_wm_constants function; it
was probably a search and replace error. Of course, "prepare" functions
died a while back as well.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
CACHE_NEW_SAMPLER doesn't cover max_wm_threads, but it does cover
brw->sampler.count. BRW_NEW_PS_BINDING_TABLE is obvious, but it's
probably worth adding a comment anyway.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The dirty bit was already correctly in place, but there was no comment.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Also, annotate the use of _NEW_POINT as long as we're adding a comment.
NOTE: This is a candidate for the 7.11 branch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
According to a comment in gen6_sf_state, calls to get_attr_override need
both _NEW_PROGRAM and _NEW_LIGHT. Since Gen7 reuses the same function,
the same dirty bits should apply.
NOTE: This is a candidate for the 7.11 branch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The BRW_NEW_CURBE_OFFSETS dirty bit is only flagged by the
brw_curbe_offsets state atom which is only used on Gen4-5.
Since it's never flagged, there's no reason to depend on it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The BRW_NEW_URB_FENCE dirty bit is only flagged by the
brw_recalculate_urb_fence state atom which isn't used on Gen6+.
Since it's never flagged, there's no reason to depend on it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This brings the dirty bits in line with the comments.
This does /not/ need to be cherry-picked to stable branches because the
access requiring _NEW_BUFFERS was added in master as part of HiZ.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The trick was to produce an assignment in the IR along the lines of:
(assign (xyzw) (var_ref R0) (swiz wwww (var_ref R0) ))
which occurs only rarely even in code that looks like it should do
this, because of the assignment temporaries generated in ast_to_hir.
From the IR above, this optimization pass would then propagate
references of R0 into R0.wwww (seems reasonable), but without this
patch, a later reference of R0.wwww would see R0 first, turning that
into R0.wwww.wwww, which triggered opt_swizzle_swizzle, and then we
looped back to this code to do it again. Avoid that by skipping over
the usual ir_rvalue visitor's ir_swizzle hook, so that we get
handle_rvalue() on the ir_swizzle itself, not its referenced value.
Looking at only the swizzle will always optimize away at least as much
as looking at the swizzle's refererenced value.
We now still claim to propagate r0.w into r0.w, but at least we don't
trigger the loop.
v2: Rewrite commit message (changes by anholt)
Fixes piglit glsl-copy-propagation-self-1
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=34006
The r300 driver requires LLVM when building and other drivers that
depend on it for all TNL, like i915g will be a lot slower without it.
Signed-off-by: Jakob Bornecrantz <wallbraker@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The optimization was supposed to turn an attribute component that was
always 1.0 into a mov of 1.0. But by leaving loop this patch removes
out of that test, we applied the projection correction to the 1.0 and
got some other value, breaking openarena once it was converted to
using the new compiler backend.
Originally this hunk was separate from the former loop to make the
generated instructions slightly better pipelined. We now have
automatic instruction scheduling to handle that, and the generated
instruction sequence looked the same to me after this change (except
for the bugfix).
Previous to this patch, if the client requested transform feedback
using a subscript, but the variable was not an array
(e.g. "gl_FrontColor[0]"), we would produce a bogus error message like
"Transform feedback varying gl_FrontColor[0] found, but it's an array
([] expected)".
Changed the error message to e.g. "Transfrorm feedback varying
gl_FrontColor[0] requested, but gl_FrontColor is not an array."
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We were only comparing the number of depth and stencil bits but the
extension spec actually says the formats must match:
The error INVALID_OPERATION is generated if BlitFramebufferEXT is
called and <mask> includes DEPTH_BUFFER_BIT or STENCIL_BUFFER_BIT
and the source and destination depth or stencil buffer formats do
not match.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Commit acf82657f4 supposedly enabled
SIMD16 dispatch, but neglected to set the "16 Pixel Dispatch Enable"
bit, so nothing actually got enabled.
Furthermore, it neglected to set up the Dispatch GRF Start Register for
kernel 2, which is the SIMD16 program.
Increases performance in Nexuiz by ~15% at 800x600 (n=3).
NOTE: This is a candidate for the 7.11 branch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
$ dict invarient
No definitions found for "invarient", perhaps you mean:
gcide: Invariant
wn: invariant
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Replace target, level parameters with gl_texture_image.
Add gl_renderbuffer parameter to indicate source buffer for the copy.
This removes some redundant code in the drivers to find the source
renderbuffer and the destination texture image (which we already had
in _mesa_CopyTexSubImage).
Signed-off-by: Brian Paul <brianp@vmware.com>
We were comparing 32-bit Z buffer values against 16-bit fragment values.
Need to do scaling like for the 24-bit case.
Triangle Z testing was OK since it didn't hit this code path.
If the assertion was hit, it probably meant that we were unable to allocate
or map a vertex buffer. Instead of dying in a debug build, issue a warning
and continue.
We need to pass the pre-projection matrix clip planes into the driver,
instead of the post for the case we have a vertex shader that writes clip
vertex.
Signed-off-by: Dave Airlie <airlied@redhat.com>
translate signed/unsigned integers to coresponding uint/sint r32g32b32a32 types.
This fixes a bunch of piglit tests.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Brian mentioned that mesa-demos/reflect was broken on softpipe,
by my previous commit. The problem was were blindly translating none
to perspective, when color/pntc at least need it linear.
this is the final version that fixes the reflect regression.
Signed-off-by: Dave Airlie <airlied@redhat.com>
The IR for mix(float, float, bool) was missing a write mask, causing the
IR reader to die horribly. Furthermore, I neglected to add any of the
new prototypes to the 1.30 profiles.
Fixes oglconform's glsl-bif-com advanced.mix test cases.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44477
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The various l-value errors this was designed to catch are now caught
by other means. Marking the temporaries as read-only now just
prevents sensible error messages from being generated. It's
0:0(0): error: function parameter 'out p' references the read-only variable '_post_incdec_tmp'
versus
0:13(5): error: function parameter 'out p' references a post-decrement operation
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
These were used by swrast to make a combined depth+stencil buffer look
like separate depth and stencil buffers. But that's no longer needed
after rewriting the depth/stencil code in swrast.
Reviewed-by: Eric Anholt <eric@anholt.net>
These functions updated the gl_renderbuffer::_DepthBuffer and
_StencilBuffer fields. But those fields are no longer used.
Reviewed-by: Eric Anholt <eric@anholt.net>
Everything about this that we have tests for works except for the
deprecated metaops. The conclusion we came to on IRC sounded like we
were OK with turning it on as long as core functionality works. The
remaining failures (copypixels, drawpixels) should just be a matter of
finishing the MapRenderbuffer for them.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes on i965:
ARB_depth_buffer_float/fbo-depthstencil-GL_DEPTH32F_STENCIL8-blit
ARB_depth_buffer_float/fbo-stencil-GL_DEPTH32F_STENCIL8-blit
Reviewed-by: Brian Paul <brianp@vmware.com>
We were converting our ubyte stencil value to a float. Just write it
as a uint, which overwrites the X24 part of X24S8 with 0 but shouldn't
matter.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
I'm so surprised that gcc didn't catch this that I feel like I must be
misreading. srcMap is what we initialize (along with dstMap) from
this map value right after this check.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
They were meaning to do the same thing of memcpying rows, so just
write the code once.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
I'm going to reuse this function from glBlitFramebuffer() handling,
which wants to do the same thing.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
The last major issue (intervening-read) is fixed, so let's turn this
on for real. The only other known issue is a hardware limitation for
tesselation with flat shading.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
We need the kernel to reset our pointers to 0 in between. Note that
the initialization of function pointer had to move to after
InitContext since we didn't have intel->gen set up yet.
Fixes piglit EXT_transform_feedback/immediate-reuse
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The new kernel patch I submitted makes the interface opt-in, so all
batchbuffers aren't preceded by the 4 MI_LOAD_REGISTER_IMMs. This
requires the updated i915_drm.h present in libdrm 2.4.30.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The existing glsl_to_tgsi::remove_output_read pass did not work properly
when indirect addressing was involved; this commit replaces it with a
lowering pass that occurs before TGSI code generation.
Fixes varying-array related piglit tests.
Signed-off-by: Vincent Lejeune <vljn@ovi.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is similar to Gallium's existing glsl_to_tgsi::remove_output_read
lowering pass, but done entirely inside the GLSL compiler.
Signed-off-by: Vincent Lejeune <vljn@ovi.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This fixes
draw-elements-base-vertex user_varrays
draw-elements-instanced-base-vertex user_varrays
for softpipe with no llvm support (DRAW_USE_LLVM=false)
I'm not sure if this is the correct answer, but these tests were showing
a max_index of 7, then trying to fetch up to 43, maybe it should be fixing
max_index earlier somewhere to take care of this.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This patch silences these GCC warnings.
warning: unused variable 'texelBytes'
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
It is not explicitly stated in the GL 3.0 spec that transform feedback
can be performed on a whole varying array (without supplying a
subscript). However, it seems clear from context that this was the
intent. Section 2.15 (TransformFeedback) says this:
When writing varying variables that are arrays, individual array
elements are written in order.
And section 2.20.3 (Shader Variables), says this, in the description
of GetTransformFeedbackVarying:
For the selected varying variable, its type is returned into
type. The size of the varying is returned into size. The value in
size is in units of the type returned in type.
If it were not possible to perform transform feedback on an
unsubscripted array, the returned size would always be 1.
This patch fixes the linker so that transform feedback on an
unsubscripted array is supported.
Fixes piglit tests "EXT_transform_feedback/builtin-varyings
gl_ClipDistance[{4,8}]-no-subscript" and
"EXT_transform_feedback/output_type *[2]-no-subscript".
Note: on back-ends that set
gl_shader_compiler_options::LowerClipDistance (for example i965),
tests "EXT_transform_feedback/builtin-varyings
gl_ClipDistance[{1,2,3,5,6,7}]" still fail. I hope to address this in
a later patch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
With the addition of unit tests in commit
3ef3ba4d2e, several additional build
artifacts are created:
bin/depcomp
bin/missing
tests/Makefile
tests/Makefile.in
tests/glx/Makefile
tests/glx/Makefile.in
tests/glx/.deps/
tests/glx/.gitignore
This patch adds all of these files to .gitignore.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously we were using
gl_transform_feedback_object::Buffers[i]->Name to service an indexed
get request for GL_TRANSFORM_FEEDBACK_BUFFER_BINDING. However, if no
buffer has been bound, gl_transform_feedback_object::Buffers[i] is
NULL, so this was causing a segfault.
This patch switches to using
gl_transform_feedback_object::BufferNames[i], which is equal to
gl_transform_feedback_object::Buffers[i]->Name if
gl_transform_feedback_object::Buffers[i] is not NULL, and 0 if it is
NULL.
Fixes piglit test "EXT_transform_feedback/get-buffer-state
indexed_binding".
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
On drivers that set gl_shader_compiler_options::LowerClipDistance (for
example i965), references to gl_ClipDistance (a float[8] array) will
be converted to references to gl_ClipDistanceMESA (a vec4[2] array).
This patch modifies the linker so that requests for transform feedback
of gl_ClipDistance are similarly converted.
Fixes Piglit test "EXT_transform_feedback/builtin-varyings
gl_ClipDistance".
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When using transform feedback, there are three circumstances in which
it is useful for Mesa to instruct a driver to stream out just a
portion of a varying slot (rather than the whole vec4):
(a) When a varying is smaller than a vec4, Mesa needs to instruct the
driver to stream out just the first one, two, or three components of
the varying slot.
(b) In the future, when we implement varying packing, some varyings
will be offset within the vec4, so Mesa will have to instruct the
driver to stream out an arbitrary contiguous subset of the components
of the varying slot (e.g. .yzw or .yz).
(c) On drivers that set gl_shader_compiler_options::LowerClipDistance,
if the client requests that an element of gl_ClipDistance be streamed
out using transform feedback, Mesa will have to instruct the driver to
stream out a single component of one of the gl_ClipDistance varying
slots.
Previous to this patch, only (a) was possible, since
gl_transform_feedback_info specified only the number of components of
the varying slot to stream out. This patch adds
gl_transform_feedback_info::ComponentOffset, which indicates which
components should be streamed out.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, on i965 Gen6 and above, we weren't allocating space for
gl_ClipVertex in the VUE, since the VS was automatically converting it
to clip distances. This prevented transform feedback from being able
to capture gl_ClipVertex.
This patch goes aheads and allocates space for gl_ClipVertex in the
VUE on Gen6 and above. The old behavior is retained on Gen5 and
below, since (a) transform feedback is not yet supported on those
platforms, and (b) those platforms don't currently support
gl_ClipVertex anyhow.
Note: this constitutes a slight waste of VUE space for shaders that
use gl_ClipVertex and don't use transform feedback to capture it.
However, that seems preferable to making the VUE map (and all of the
state that depends on it) dependent on transform feedback settings.
Fixes Piglit test "EXT_transform_feedback/builtin-varyings
gl_ClipVertex".
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
On i965 Gen6 and above, gl_PointSize is stored in component W of the
first VUE slot (which corresponds to VERT_RESULT_PSIZ in the VUE map).
Normally we store varying floats in component X of a VUE slot, so we
need special case logic for gl_PointSize.
For Gen6, we do this with a ".wwww" swizzle in the GS. For Gen7, we
shift the component mask by 3 to select the W component.
Fixes Piglit test "EXT_transform_feedback/builtin-varyings
gl_PointSize".
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Commit 9d36c96d6e (mesa: Fix
glGetTransformFeedbackVarying()) accidentally added an extra memset()
call to the store_tfeedback_info() function, causing
prog->LinkedTransformFeedback.NumBuffers to be erased.
This patch removes the extra memset and rearranges the other
operations in store_tfeedback_info() to be in the correct order.
Fixes piglit tests "EXT_transform_feedback/api-errors *unbound*"
Reviewed-by: Eric Anholt <eric@anholt.net>
The src/dst arrays would overlap but dst was less than src so a simple
version of memcpy() would do the right thing. But this isn't guaranteed
when memcpy() is optimized.
Fixes demos/copypix when the dest region was clipped by the left side of
the window.
Reviewed-by: Adam Jackson <ajax@redhat.com>
This is useful for apps which don't print FPS.
Only enabled in SwapBuffers.
v2: track state per drawable, use libGL prefix
Reviewed-by: Michel Dänzer <michel@daenzer.net>
Do it after we check whether inst_end != -1.
Also move the code structure at the beginning of r300_fragment_shader_code
to detect underflows easily with valgrind.
Improves performance from cca 1 fps to 23 fps in Cogs.
This new codepath is not always used, instead, there is a heuristic which
determines whether to use it. Using translate for uploads is generally
slower than what we have had already, it's a win only in a few cases.
This is for GL_ARB_vertex_type_2_10_10_10_rev.
I just took the code from u_format_table.c. It's based on pack_rgba_float.
I had no other choice. The u_format hooks are not exactly compatible
with translate. The cleanup of it is left for future work.
Reviewed-by: Dave Airlie <airlied@redhat.com>
The conversion is limited to only a few cases, because converting to any other
type shouldn't happen in any driver.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Fetching int as float and vice versa is not allowed.
Fetching unsigned int as signed int and vice versa is not allowed either.
Doing conversions like that isn't allowed for samplers in OpenGL.
The three hooks could be consolidated into one fetch hook, which would fetch
uint as uint32, sint as sint32, and everything else as float. The receiving
parameter would be void*. This would be useful for implementing vertex fetches
for shader model 4.0, which has untyped registers.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Please see the diff for further info.
This paves the way for moving user buffer uploads out of drivers and should
allow to clean up the mess in u_upload_mgr in the meantime.
For now only allowed for buffers on r300 and r600.
Acked-by: Christian König <deathsimple@vodafone.de>
We don't wanna convert per-instance or constant (zero-stride) attribs into
ordinary vertex attribs.
More importantly, the translation of instance attribs now finally works.
To match what transfer_map returns. Really, subtracting the offset leads
to bugs if someone expects it to work exactly like transfer_map.
Reviewed-by: Brian Paul <brianp@vmware.com>
The current implementation was totally broken -- it was looking in an
unpopulated structure for varyings, and trying to do so using the
current list of varying names, not the list used at link time.
v2: Fix leaking of memory into the program per re-link.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
There was some duplication between the tgsi_dump.c and tgsi_text.c
files. Also use some static assertions to help catch errors when
adding new TGSI values.
v2: put strings in tgsi_strings.c file instead of the .h file.
Reviewed-by: Dave Airlie <airlied@redhat.com>
This reverts commit 5a478976ae.
It broke the build. DRI drivers were no longer being installed by
`make install` (and probably not being built at all). It appears to be
due to a few small, subtle mistakes, and the fix isn't clear enough to
simply commit without going through review. In the meantime, revert it.
All other xorg modules require at least 2.60 (released in 2006), so we
may as well increase it to match. It's also doubtful anyone tests the
build with 2.59 (from 2003), so it may not even work anyway.
Adds two missing '|| srcFormat == GL_RG_INTEGER' in assertions and a
bunch of missing pixel converions cases.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Commit 0ed11e3331 fixed a "use after free"
bug by getting the next pointer before deleting the current node.
Unfortunately, it also made "next" never get updated if i->need != need.
Fixes infinite loops in piglit tests fbo-depth-array and fbo-depthtex.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
textureSize() returns an int, ivec2, or ivec3, but never an ivec4.
Creating the destination register as an ivec4 triggered later failures,
even though the register did hold the proper values.
For example, piglit test vs-textureSize-compare calls textureSize on a
2D texture and compares the result to an expected value. Unfortunately,
our generated code also tried to compare the third and fourth components
which were undefined, and failed.
Fixes piglit test vs-textureSize-compare as well as 19 subcases of
oglconform's glsl-bif-tex-size test.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44339
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Commit d45814c925 totally added a data
dependency on _NEW_TEXTURE, even including the comment, but didn't
actually add the dirty bit.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
From the EXT_transform_feedback spec:
The error INVALID_OPERATION is also generated by BeginTransformFeedbackEXT
if no binding points would be used, either because no program object is
active or because the active program object has specified no varying
variables to record.
...
The error INVALID_VALUE is generated by BindBufferRangeEXT or
BindBufferOffsetEXT if <offset> is not word-aligned.
Fixes Piglit tests:
- EXT_transform_feedback/api-errors no_prog_active
- EXT_transform_feedback/api-errors interleaved_no_varyings
- EXT_transform_feedback/api-errors separate_no_varyings
- EXT_transform_feedback/api-errors bind_offset_offset_1
- EXT_transform_feedback/api-errors bind_offset_offset_2
- EXT_transform_feedback/api-errors bind_offset_offset_3
- EXT_transform_feedback/api-errors bind_offset_offset_5
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
From the EXT_transform_feedback spec:
The error INVALID_OPERATION is generated by
BeginTransformFeedbackEXT if any transform feedback buffer object
binding point used in transform feedback mode does not have a
buffer object bound.
This required adding a new NumBuffers field to the
gl_transform_feedback_info struct, to keep track of how many transform
feedback buffers are required by the current program.
Fixes Piglit tests:
- EXT_transform_feedback/api-errors interleaved_unbound
- EXT_transform_feedback/api-errors separate_unbound_0_1
- EXT_transform_feedback/api-errors separate_unbound_0_2
- EXT_transform_feedback/api-errors separate_unbound_1_2
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Other parts of the compiler assume that expressions will have
well-formed types or the error type. Just using the type of the thing
being operated on can cause expressions like ~3.14 or ~false to not
have a well-formed type. This could then result in an assertion
failure in the context epxression handler.
If there is an error processing the expression, set the type of the IR
expression to error.
Fixes piglit's bit-not-0[789].frag tests.
NOTE: This is a candidate for the 7.11 branch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=42755
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: Vinson Lee <vlee@vmware.com>
Detect whether a new enough version of XCB is installed at configure
time. If it is not, don't enable the extension and don't build the
unit tests.
v2: Move the AM_CONDIATION outside the case-statement so that it is
invoked even for non-GLX builds. This prevents build failures with
osmesa, for example.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Robert Hooker <robert.hooker@canonical.com>
The core Mesa code does the equivalent memory allocation, image mapping,
storing and unmapping. We just need to call prep_teximage() first to
handle the 'surface_based' stuff.
The other change is to always use the level=0 mipmap image when accessing
individual mipmap level images that are stored in resources/buffers.
Apparently, we were always using malloc'd memory for individual mipmap
images, not resource buffers, before.
Signed-off-by: Brian Paul <brianp@vmware.com>
This was disabled a year ago due to not having a story for handling
the blitter at the time. We're fine with using the blitter now.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The new assert in intelEmitCopyBlit() gets angry if we don't align to
dwords. Rather than make the assert have a special case for height ==
1 on the assumption that the hardware doesn't use it in that case,
just supply a correct pitch.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=43214
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We didn't consume these flags in any way that would produce a
functional difference, but we might have some day.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We don't want to match the visual against the default screen. If the
drawable is on a non-default screen then the appropriate visual might not
exist on the default screen. Conversely, if the same visual is
available on multiple screens then simply selecting for the right VID is
sufficient, since the server has promised that the same visual is
compatible with multiple screens.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Adam Jackson <ajax@redhat.com>
There are a couple scenarios where the source could be zero and the
operand could be either SRC_ALPHA or ONE_MINUS_SRC_ALPHA. For
example, if the source was ZERO. This would result in something like
(0).w, and a later call to ir_validate would get angry.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=42517
Coverity reported a read from pointer after free defect in
src/mesa/drivers/dri/intel/intel_mipmap_tree.c. Bug# 44205
In intel_miptree_all_slices_resolve() function, i = i->next was
executing after freeing i. I have defined a temporary variable
(next) to store the value of i->next before freeing i
Reported-by: Vinson Lee <vlee@vmware.com>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
In the interest of softpipe preferring correctness over speed and passing more
piglit tests, set this to off by default. For speed you really want llvmpipe.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This patch remove the 32bits limitation. As a side effect, it bring the support for the GL_ARB_depth_buffer_float extension.
No regression have been found on piglit, and all tests for GL_ARB_depth_buffer_float pass successfully.
Signed-off-by: Dave Airlie <airlied@redhat.com>
If glUniform1i and friends are going to dump data directly in
driver-allocated, the pointers have to be updated when the storage
moves. This should fix the regressions seen with commit 7199096.
I'm not sure if this is the only place that needs this treatment. I'm
a little uncertain about the various functions in st_glsl_to_tgsi that
modify the TGSI IR and try to propagate changes about that up to the
gl_program. That seems sketchy to me.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
v2:
Revalidate when shader_program is not NULL.
Update the pointers for all _LinkedShaders.
Init glsl_to_tgsi_visitor::shader_program to NULL in the
get_pixel_transfer_visitor & get_bitmap_visitor.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
A lot of tests in 'make check' will fail under these circumstances,
but at least the build should work.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds a new tests directory at the top-level and some extra build
infrastructure. The tests use the Google C++ Testing Framework, and
they will only be built if configure can detect its availability. The
tests are automatically wired-in to run with 'make check'.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Chad Versace <chad.versace@linux.intel.com>
Using 'new' as a function parameter name prevents including
glxclient.h the unit tests (future patch) that use the Google C++
Testing Framework.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
This extension is only enabled if the underlying driver advertises
support for OpenGL ES 2.0. This happens either through the getAPIMask
function in version 2 of the DRI2 extension or implicity through
version 2 of the DRISW extension.
Since there is no OpenGL ES 2.0 protocol, this extension is marked as
only available with direct-rendering.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
This also enables GLX_ARB_create_context and
GLX_ARB_create_context_profile if the driver supports DRI_DRISW
version 3 or greater.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
This converts all of the GLX data from glXCreateContextAttribsARB to
the values expected by the DRI driver interfaces.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
This adds the function and modifies dri2CreateNewContextForAPI to call
it. At this point only version 2 of the DRI2 API is advertised to the
loader.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Note that these extensions are not automatically enabled for screens
capable of direct-rendering.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
This also enables GLX_ARB_create_context and
GLX_ARB_create_context_profile if the driver supports DRI_DRI2 version
3 or greater.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
__glX_send_client_info only supports XCB, so use that instead of
__glXClientInfo when USE_XCB is defined.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
This function picks the correct client-info protocol (based on the
server's GLX version and set of extensions) and sends it to the
server.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
This code was generating the gcc warning:
variable ‘clearValue’ set but not used [-Wunused-but-set-variable]
Reviewed-by: Brian Paul <brianp@vmare.com>
The were always zero. When doing a sub-texture replacement we account
for the dstX/Y/Zoffsets when we map the texture image. So no need to
pass them into the texstore code anymore.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Buffers for shader based decoding can now be
released without its component still being around.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Acked-by: Maarten Lankhorst <m.b.lankhorst@gmail.com>
Changing pipe_resource was wrong, because it can be used by other contexts
at the same time. This fixes the last possible race condition in r300g
that I know of.
This also fixes blitting NPOT compressed textures. Random pixels sometimes
appeared at the right-hand edge of the texture.
Finally, this removes r300_texture_desc::stride_in_pixels. It makes little
sense with sampler views and surfaces being able to override width0, height0,
and the format entirely.
This fixes a regresssion (broken cube maps) caused by the
ctx->Driver.TexImage parameter simplification commit. The target var
is always GL_TEXTURE_CUBE_MAP at this point so the Face field was always
getting set to zero.
These field assignments aren't needed anyway since core Mesa sets them.
This fixes the latc fetches for llvmpipe, fixes
fbo-generatemipmap-formats GL_ARB_texture_compression
fbo-generatemipmap-formats GL_ATI_texture_compression_3dc
fbo-generatemipmap-formats GL_EXT_texture_compression_latc
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Dave Airlie <airlied@gmail.com>
As with previous commits, the target, level and texObj info can be
obtained through the texImage pointer.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
As with TexSubImage(), the target, level and texObj values can be obtained
through the texImage pointer.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
There's no need to pass the target, level and texObj parameters since
they can be easily obtained from the texImage pointer.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Since the move to Map/UnmapTextureImage, the core mesa routines are
equivalent to what the state tracker was doing.
The TexImage functions can be replaced too, but there's a few differences
that will need to be handled.
inv_swizzles is used in lp_tile_soa.py to create lp_tile_soa.c, we overwrite swizzles if they are already set.
This results in the i8 format getting alpha instead of red, and the l8 format
getting blue instead of red.
Fixes fbo-alphatest-formats, fbo-alphatest-formats ARB_texture_float,
and fbo-alphatest-formats EXT_texture_snorm on llvmpipe.
Signed-off-by: Dave Airlie <airlied@redhat.com>
introduce vbo_sizeof_ib_type() function to return the index data type
size. I see some place use switch(ib->type) to get the index data type,
which is sort of duplicate.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
On failure, intel_miptree_create() needs to *release* the miptree, not
just free it, so that the stencil_mt gets released too.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
This saves a couple of instructions on most programs with control
flow. More interestingly, 6 shaders from unigine sanctuary now fit
into 16-wide without register spilling.
We were printing out the line triggering the flush, but a variety of
different causes just printed the line number for intel_flush()'s call
of intel_batchbuffer_flush(). Plumb the line numbers from the caller
of intel_flush() on through.
Since the refactor in d7b33309fe, depth
in the miptree changed from 1 to 6, so we always decided it didn't
match, and we would relayout to something that would still not
"match".
Improves performance 23.8% (+/- 1.1%, n=4)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=43329
Fixes this GCC warning.
arrayobj.c: In function '_mesa_update_array_object_max_element':
arrayobj.c:310: warning: implicit declaration of function 'ffsll'
Signed-off-by: Vinson Lee <vlee@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
bitset.h is still used by classic nouveau -- see `git grep '\<BITSET_'`
-- and the state stored is too big to fit in 64bit integers (it requires
approximately 87 bits), so there is no obvious alternative here.
This effecively reverts commit 196800d798.
Since commit 82b9661894 and
34eae1c72a vbo support
is mandatory for all drivers. So, remove the remaining
FEATURE_ARB_vertex_buffer_object guards.
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
Consider the following code:
MOV A.x, B.x
MOV B.x, C.x
After the first line, cur_value[A][0] == B, indicating that A.x's
current value came from register B.
When processing the second line, we update cur_value[B][0] to C.
However, for drect copies, we fail to reset cur_value[A][0] to NULL.
This is necessary because the value of A is no longer the value of B.
Fixes Counter-Strike: Source in Wine (where the menu rendered completely
black in DX9 mode), completely white textures in Civilization V, and the
new Piglit test glsl-vs-copy-propagation-1.shader_test.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=42032
Tested-by: Matt Turner <mattst88@gmail.com>
Tested-by: Christopher James Halse Rogers <christopher.halse.rogers@canonical.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
In this code, 'i' loops over the number of virtual GRFs, while 'j' loops
over the number of vector components (0 <= j <= 3).
It can't possibly be correct to see if bit 'i' is set in the destination
writemask, as it will have values much larger than 3. Clearly this is
supposed to be 'j'.
Found by inspection.
Tested-by: Matt Turner <mattst88@gmail.com>
Tested-by: Christopher James Halse Rogers <christopher.halse.rogers@canonical.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
In Android IceCreamSandwich, SurfaceFlinger requires GL_OES_image_external
for basic compositing tasks. Without the extension, SurfaceFlinger fails
to start.
Despite the incompleteness of the extension's implementation introduced by
this patch, it is good enough to enable SurfaceFlinger and to unblock the
people who need to begin testing Mesa on IceCreamSandwich.
To enable the extension, set the environment variable
MESA_EXTENSION_OVERRIDE="+GL_OES_EGL_image_external". Ideally, Android
should set this in init.rc.
WARNING: This implementation of GL_OES_EGL_image_external is not complete.
Some of it is even incorrect. When we begin to really implement
GL_OES_EGL_image_external, much of the patch will need reverting.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
If the meta flag MESA_META_TEXTURE is present, then disable the texture
target GL_TEXTURE_EXTERNAL_OES.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
just noticed this in passing, not sure it actually fixes any issus.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmare.com>
Now the gl_array_object's layout matches the one used in
recalculate_input_bindings. Make use of this and remove the
bind_array_obj function.
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmare.com>
This fixes a regression seen with the isosurf demo when switching between
glBegin/End and glDrawArrays (do it several times). The problem was the
driver wasn't getting _NEW_ARRAY when the arrays were subtly changed:
(vertex3f, normal3f) vs. (normal3f, vertex3f).
This patch fixes that by signaling _NEW_ARRAY whenever we transition
between glBegin/End and glDrawArrays mode and display lists.
The patch also fixes up the initialization of the map_vp_none[] array
to stop putting strange values in the last five elements of the array.
v2: remove DRAW_ELEMENTS, don't distinguish between glDrawArrays and
glDrawElements
v3: add DRAW_DISPLAY_LIST for the display list case, just to be safe.
Reviewed-by: Mathias Froehlich <Mathias.Froehlich@web.de>
Tested-by: Mathias Froehlich <Mathias.Froehlich@web.de>
Remove gl_light::_dli and gl_light::_sli.
Both are only used for a value previously used in
color indexed rendering. Also both variables are only used
and never written.
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
Here is the final patch to enable dynamic eu instruction store size:
increase the brw eu instruction store size dynamically instead of just
allocating it statically with a constant limit. This would fix something
that 'GL_MAX_PROGRAM_INSTRUCTIONS_ARB was 16384 while the driver would
limit it to 10000'.
v2: comments from ken, do not hardcode the eu limit to (1024 * 1024)
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
A single next_insn may change the base address of instruction store
memory(p->store), so call it first before referencing the instruction
store pointer from an index.
This the final prepare work to enable the dynamic store size.
v2: comments from Ken, define emit_endif as bool type
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
If dynamic instruction store size is enabled, while after the brw_JMPI()
and before the brw_land_fwd_jump() function, the eu instruction store
base address(p->store) may change. Thus, the safe way to reference the
jmp instruction is by index instead of by the instruction address.
v2: comments from Eric, don't change the prototype of brw_JMPI
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
If dynamic instruction store size is enabled, while after
the brw_IF/ELSE() and before the brw_ENDIF() function, the
eu instruction store base address(p->store) may change.
Thus let if_stack just store the instruction index. This is
somehow more flexible and safe than store the instruction
memory address.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This partially reverts commit 363ff84475.
It caused severe performance drops in Nexuiz. Reported by Phoronix.
Tested by me on r300g and by IRC people on r600g.
When updating SOL indices, we were accidentally putting the starting
index in dword 1 and the SVBI number to increment in dword 2--these
should be reversed. Usually both of these values are zero, so we
didn't see any problem. However, if a transform feedback operation
spans multiple batch buffers, the starting index will be nonzero.
Fixes piglit test "EXT_transform_feedback/intervening-read output".
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When rendering triangle strips, vertices come down the pipeline in the
order specified, even though this causes alternate triangles to have
reversed winding order. For example, if the vertices are ABCDE, then
the GS is invoked on triangles ABC, BCD, and CDE, even though this
means that triangle BCD is in the reverse of the normal winding order.
The hardware automatically flags the triangles with reversed winding
order as _3DPRIM_TRISTRIP_REVERSE, so that face culling and two-sided
coloring can be adjusted to account for the reversed order.
In order to ensure that winding order is correct when streaming
vertices out to a transform feedback buffer, we need to alter the
ordering of BCD to BDC when the first provoking vertex convention is
in use, and to CBD when the last provoking vertex convention is in
use.
To do this, we precompute an array of indices indicating where each
vertex will be placed in the transform feedback buffer; normally this
is SVBI[0] + (0, 1, 2), indicating that vertex order should be
preserved. When the primitive type is _3DPRIM_TRISTRIP_REVERSE, we
change this order to either SVBI[0] + (0, 2, 1) or SVBI[0] + (1, 0,
2), depending on the provoking vertex convention.
Fixes piglit tests "EXT_transform_feedback/tessellation
triangle_strip" on Gen6.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The code for storing 1D, 2D and 3D tex images (whole or sub-images) was
all pretty similar. This consolidates those six paths.
v2: rework switch statement to catch unexpected targets
Reviewed-by: José Fonseca <jfonseca@vmware.com>
For 1D arrays, map each slice separately. Note that this was handled
correctly in _mesa_store_teximage2d() but not here.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
This gets rid of another renderbuffer->PutRow() call and _DepthBuffer
usage. We always work with 32-bit uint Z values now.
Reviewed-by: Eric Anholt <eric@anholt.net>
Use Map/UnmapRenderbuffer() for the special, optimized cases we care about.
Note that we're dropping some seldom-used cases in the new fast-path
code: as CI->RGB conversion and zooming.
Reviewed-by: Eric Anholt <eric@anholt.net>
We don't want to call these functions where we'll be using
Map/UnmapRenderbuffer(). So push them further down in the drawpixels
cases so that we can switch over to Map/UnmapRenderbuffer() step by step.
Reviewed-by: Eric Anholt <eric@anholt.net>
Stop using deprecated renderbuffer PutRow() function. Note that we
aren't using Map/UnmapRenderbuffer() yet because this call is inside
a swrast_render_start/finish() pair.
v2: use _mesa_pack_uint_24_8_depth_stencil_row(), per Eric.
Hopefully glCopyPixels(GL_DEPTH_STENCIL) will be handled by the
fast copy function. Otherwise, just do the copy with separate
depth + stencil copies. That's effectively what the removed code
did anyway.
Reviewed-by: Eric Anholt <eric@anholt.net>
The functions that read depth/stencil values understand all (packed)
depth/stencil buffer formats now so there's no reason to use the
wrappers.
Also, improve the format checks in fast_copy_pixels() to catch mismatched
depth/stencil cases.
v2: fix the test for combined depth+stencil buffers, per Eric.
Stop using the deprecated renderbuffer Get/Put Row/Values functions.
Consolidate code paths, etc. The file is nearly half the size it used
to be!
Reviewed-by: Eric Anholt <eric@anholt.net>
Use format pack/unpack functions instead of deprecated renderbuffer
GetRow/PutRow functions.
v2: use get_stencil_address(), s/destVals/newVals/
Reviewed-by: Eric Anholt <eric@anholt.net>
The former was only used for clearing buffers. The later wasn't used
anywhere! Remove them and all implementations of those functions.
Reviewed-by: Eric Anholt <eric@anholt.net>
Another step toward getting rid of the renderbuffer PutRow/etc functions.
v2: fix assorted depth/stencil clear bugs found by Eric
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Fixed the build failure, fixed a warning where attributs and error arguments had
been
inverted and fixed another call that was missing an argument.
Signed-off-by: José Fonseca <jfonseca@vmware.com>
Fixes almost all of the transform feedback piglit tests. Remaining
are a few tests related to tesselation for
quads/trifans/tristrips/polygons with flat shading.
v2: Incorporate Paul's feedback (squash with previous, state flag note,
static assert, update FINISHME)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
Reviewed-by: Paul Berry <stereotype441@gmail.com>
The code was relying on gs.prog_data's copy of the
number-of-verts-per-prim, which segfaulted on gen7 since it doesn't
make a GS program. We can easily calculate that value right here.
v2: Fix svbi_0_starting_index regression.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Although there is not much documentation of this fact, there are in
fact two separate VF caches:
- an "index-based" cache (described in the Sandy Bridge PRM, vol 2
part 1, section 2.1.2 "Vertex Cache"). This cache stores URB
handles of vertex shader outputs; its purpose is to avoid redundant
invocations of the vertex shader when drawing in random access mode
(e.g. glDrawElements()), and the same vertex index is specified
multiple times. It is automatically invalidated between
3D_PRIMITIVE commands and between instances within a single
3D_PRIMITIVE command.
- an "address-based" cache (mentioned briefly in vol 2 part 1, section
1.7.4 "PIPE_CONTROL Command"). This cache stores the data read from
vertex buffers; its purpose is to avoid redundant memory accesses
when doing instanced drawing or when multiple 3D_PRIMITIVE commands
access the same vertex data. It needs to be manually invalidated
whenever new data is written to a buffer that is used for vertex
data.
Previous to this patch, it was not necessary for Mesa to explicitly
invalidate the address-based cache, because there were no reasonable
use cases in which the GPU would write to a vertex data buffer during
a batch, and inter-batch flushing was taken care of by the kernel.
However, with transform feedback, there is now a reasonable use case:
vertex data is written to a buffer using transform feedback, and then
that data is immediately re-used as vertex input in the next drawing
operation. To make this use case work, we need to flush the
address-based VF cache between transform feedback and the next draw
operation. Since we are already calling
intel_batchbuffer_emit_mi_flush() when transform feedback completes,
and intel_batchbuffer_emit_mi_flush() is intended to invalidate all
caches, it seems reasonable to add VF cache invalidation to this
function.
As with commit 63cf7fad13 (i965: Flush
pipeline on EndTransformFeedback), this is not an ideal solution. It
would be preferable to only invalidate the VF cache if the next draw
call was about to consume data generated by a previous draw call in
the same batch. However, since we don't have the necessary dependency
tracking infrastructure to figure that out right now, we have to
overzealously invalidate the cache.
Fixes Piglit test "EXT_transform_feedback/immediate-reuse".
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
After creating new binding table entries for transform feedback, we
need to set the dirty flag BRW_NEW_SURFACES, so that a new binding
table pointer will be sent to the hardware. Otherwise the new binding
table entries will not take effect.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The surface states tracked by BRW_NEW_WM_SURFACES are no longer used
for just WM. They are also used for vertex texturing and transform
feedback. To avoid confusion, this patch renames BRW_NEW_WM_SURFACES
to BRW_NEW_SURFACES.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
X8 depth formats weren't supported until Ironlake (Gen 5).
Fixes GPU hangs introduced in d84a180417.
One example test case was "fbo-missing-attachment-blit from".
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Fixes piglit tests "EXT_transform_feedback/generatemipmap buffer" and
"EXT_transform_feedback/generatemipmap prims_written" on i965 Gen6.
Reviewed-by: Brian Paul <brianp@vmare.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Although i965 gen6 does not yet support ARB_transform_feedback2 or
NV_transform_feedback2, it needs to support pause/resume functionality
so that meta-ops will work correctly.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When transform feedback is paused, it is legal to change programs or
to perform drawing operations using a drawing mode that doesn't match
the transform feedback mode.
Reviewed-by: Brian Paul <brianp@vmare.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If a client calls BeginTransformFeedback(), then
PauseTransformFeedback(), then EndTransformFeedback(), we need to make
sure that the transform feedback object is not left in a "paused"
state, otherwise the next call to BeginTransformFeedback() will leave
transform feedback paused.
Reviewed-by: Brian Paul <brianp@vmare.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
During meta-operations (such as _mesa_meta_GenerateMipmap()), we need
to be able to draw even if GL_RASTERIZER_DISCARD is enabled. This
patch causes _mesa_meta_begin() to save the state of
GL_RASTERIZER_DISCARD and disable it (so that drawing can be done
during the meta-op), and causes _mesa_meta_end() to restore it.
Fixes piglit test "EXT_transform_feedback/generatemipmap discard" on
i965 Gen6.
Reviewed-by: Brian Paul <brianp@vmare.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This won't be used in the client-side libGL, but the xserver has to
generate a different protocol error depending on the reason context
creation failed.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Chia-I Wu <olv@lunarg.com>
There seems to have been two different ways to communicate the
profile. There were flags and there were profiles. I've opted to
remove the profile flags and use ST_PROFILE_DEFAULT (compatibility
profile) and ST_PROFILE_OPENGL_CORE (core profile) consistently
instead.
Also change the values of the ST_CONTEXT_FLAG_DEBUG and
ST_CONTEXT_FLAG_FORWARD_COMPATIBLE flags to match the WGL and GLX
values.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Chia-I Wu <olv@lunarg.com>
If the server returned BadContext, the error would just get droped on
the floor.
Fixes the piglit test glx-import-context-single-process
NOTE: This is a candidate for the 7.11 branch, but it also requires
the previous patch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Only initialize vlc in MPEG2 decoding once for all slices,
add more sanity checks to vlc decoding functions, support
multiple vlc input buffer, improve documentation of the
vlc functions.
v2: also implement multiple inputs for the vlc functions
v3: some bug fixes for buffer size and alignment corner cases
v4: rework of the patch, some more improvements
Signed-off-by: Maarten Lankhorst <m.b.lankhorst@gmail.com>
Signed-off-by: Christian König <deathsimple@vodafone.de>
Double free and array overflow, even if only 2 members are
used the last one needs to be set to NULL explicitly.
Signed-off-by: Maarten Lankhorst <m.b.lankhorst@gmail.com
Hi all
This fixes a memory leak of 32 bytes on exit.
From 924f8fdccb41b011f372bc57252005bcdb096105 Mon Sep 17 00:00:00 2001
From: Lauri Kasanen <curaga@operamail.com>
Date: Thu, 22 Dec 2011 21:28:33 +0200
Subject: [PATCH] gallivm: Close a memory leak
As reported by "valgrind --leak-check=full glxgears".
Signed-off-by: Lauri Kasanen <curaga@operamail.com>
Signed-off-by: José Fonseca <jfonseca@vmware.com>
In the case where a front and back output are specified, the draw code will
copy the back output into the front color slot and everything is happy.
However if no front is specified then the draw code will do a bad copy (separate patch), but also the frag shader won't pick up the color as there there is
no write to COLOR from the vertex shader just BCOLOR.
This patch fixes that problem so if it can't find a vertex shader output
for the front color slot, it will go and lookup and use one for the back color
slot.
Signed-off-by: Dave Airlie <airlied@redhat.com>
fixing these makes piglit fbo-integer pass on softpipe.
modified to re-order things, haven't addressed Eric's concerns,
can't find anything in spec that mentions sign extensions, it does say
integers aren't clamped or modified.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This makes it easier to keep track of which dirty bits correspond to
which pieces of context, since it makes _NEW_RASTERIZER_DISCARD
correspond with ctx->RasterDiscard.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Previously we were storing the RasterDiscard flag (for
GL_RASTERIZER_DISCARD) in gl_context::TransformFeedback. This was
confusing, because we use the _NEW_TRANSFORM flag (not
_NEW_TRANSFORM_FEEDBACK) to track state updates to it, and because
rasterizer discard has effects even when transform feedback is not in
use.
This patch makes RasterDiscard a toplevel element in gl_context rather
than a subfield of gl_context::TransformFeedback.
Note: We can't put RasterDiscard inside gl_context::Transform, since
all items inside gl_context::Transform need to be pieces of state that
are saved and restored using PushAttrib and PopAttrib.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Previously, we only enabled transform feedback when
MESA_GL_VERSION_OVERRIDE was 3.0 or greater, since transform feedback
support was not completely finished, so it didn't make sense to
advertise support for it unless absolutely necessary.
Now that transform feedback is fully implemented on gen6, we can
enable this extension unconditionally.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch adds software-based PRIMITIVES_GENERATED and
TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN queries that work by keeping
track of the number of primitives that are sent down the pipeline, and
adjusting as necessary to account for the way each primitive type is
tessellated.
In the long run we'll want to replace this with a hardware-based
implementation, because the software approach won't work with geometry
shaders or primitive restart. However, at the moment, we don't have
the necessary kernel support to implement a hardware-based query (we
would need the kernel to save GPU registers when context switching, so
that drawing performed by another process doesn't get counted).
Fixes Piglit tests EXT_transform_feedback/query-primitives_generated-*
and EXT_transform_feedback/query-primitives-written-*.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, i965 only supported two query types: GL_TIME_ELAPSED_EXT
and GL_SAMPLES_PASSED_ARB, and it distinguished between the two using
if/else statements that compared query->Base.Target to
GL_TIME_ELAPSED_EXT.
This patch changes the if/else statements to switch statements so that
we can add more query types without having to have a chain of
else-ifs.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We don't currently have kernel support for saving GPU registers on a
context switch, so if multiple processes are performing transform
feedback at the same time, their SVBI registers will interfere with
each other. To avoid this situation, we keep a software shadow of the
state of the SVBI 0 register (which is the only register we use), and
re-upload it on every new batch.
The function that updates the shadow state of SVBI 0 is called
brw_update_primitive_count, since it will also be used to update the
counters for the PRIMITIVES_GENERATED and
TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN queries.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is needed by i965 to ensure that transform feedback counters are
not incremented during meta-ops.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This function computes the number of primitives that will be generated
when the given drawing operation is performed. It accounts for the
tessellation that is performed on line strips, line loops, triangle
strips, triangle fans, quads, quad strips, and polygons, so it is
suitable for implementing the primitive counters needed by transform
feedback.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It isn't necessary to call FLUSH_VERTICES from bind_buffer_range,
because transform feedback buffers are not allowed to be changed when
transform feedback is active.
Thanks to Marek Olšák for pointing out this bug.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
This patch enables rasterizer discard functionality (a part of
transform feedback) in Gen6, by generating an alternate GS program
when rasterizer discard is active. Instead of forwarding vertices
down the pipeline, the alternate GS program uses a URB Write message
to deallocate the URB entry that was allocated by FF sync and
terminate the thread.
Note: parts of the Sandy Bridge PRM seem to imply that we could do
this more efficiently, by clearing the GEN6_GS_RENDERING_ENABLE bit,
and not allocating a URB entry at all. However, it's not clear how we
are supposed to terminate the thread if we do that. Volume 2 part 1,
section 4.5.4, says "GS threads must terminate by sending a URB_WRITE
message with the EOT and Complete bits set.", and my experiments so
far confirm that.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
A common use case for transform feedback is to perform one draw
operation that writes transform feedback output to a buffer, followed
by a second draw operation that consumes that buffer as vertex input.
Since vertex input is consumed at an earlier pipeline stage than
writing transform feedback output, we need to flush the pipeline to
ensure that the transform feedback output is completely written before
the data is consumed.
In an ideal world, we would do some dependency tracking, so that we
would only flush the pipeline if the next draw call was about to
consume data generated by a previous draw call in the same batch.
However, since we don't have that sort of dependency tracking
infrastructure right now, we just unconditionally flush the buffer
every time glEndTransformFeedback() is called. This will cause a
performance hit compared to the ideal case (since we will sometimes
flush the pipeline unnecessarily), but fortunately the performance hit
will be confined to circumstances where transform feedback is in use.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Previous to this patch, the function intel_batchbuffer_emit_mi_flush()
was a bit of a misnomer. On Gen4+, when not using the blit engine, it
didn't actually flush the pipeline--it simply generated a PIPE_CONTROL
command with the necessary bits set to flush GPU caches. This was
usually sufficient, since in most situations where
intel_batchbuffer_emit_mi_flush() was called, all we really care about
was ensuring cache coherency.
However, with the advent of OpenGL 3.0, there are two cases in which
data output by one stage of the pipeline might be consumed, in a later
draw operation, by an earlier stage of the pipeline:
(a) When using textures in the vertex shader.
(b) When using drawing with a vertex buffer that was previously
generated using transform feedback.
This patch addresses case (a) by changing
intel_batchbuffer_emit_mi_flush() so that on Gen6+, it sets the
PIPE_CONTROL_CS_STALL bit (this forces the pipeline to actually
flush). (Case (b) will be addressed by the next patch in the series).
This is not an ideal solution--in a perfect world, the driver would
have some buffer dependency tracking so that we would only have to
flush the pipeline in the two cases above. Until that dependency
tracking is implemented, however, it seems prudent to have
intel_batchbuffer_emit_mi_flush() actually flush the pipeline, so that
we get correct rendering, at the expense of a (hopefully small)
performance hit.
The change is only applied to Gen6+, since at the moment only Gen6+
supports the OpenGL 3.0 features that make a full pipeline flush
necessary.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This patch advertises support for EXT_transform_feedback on Intel
Gen6.
Since transform feedback support is not completely finished yet, for
now we only advertise support for it when MESA_GL_VERSION_OVERRIDE is
3.0 or greater (since transform feedback is required by GL version
3.0).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch adds basic transform feedback capability for Gen6 hardware.
This consists of several related pieces of functionality:
(1) In gen6_sol.c, we set up binding table entries for use by
transform feedback. We use one binding table entry per transform
feedback varying (this allows us to avoid doing pointer arithmetic in
the shader, since we can set up the binding table entries with the
appropriate offsets and surface pitches to place each varying at the
correct address).
(2) In brw_context.c, we advertise the hardware capabilities, which
are as follows:
MAX_TRANSFORM_FEEDBACK_INTERLEAVED_COMPONENTS 64
MAX_TRANSFORM_FEEDBACK_SEPARATE_ATTRIBS 4
MAX_TRANSFORM_FEEDBACK_SEPARATE_COMPONENTS 16
OpenGL 3.0 requires these values to be at least 64, 4, and 4,
respectively. The reason we advertise a larger value than required
for MAX_TRANSFORM_FEEDBACK_SEPARATE_COMPONENTS is that we have already
set aside 64 binding table entries, so we might as well make them all
available in both separate attribs and interleaved modes.
(3) We set aside a single SVBI ("streamed vertex buffer index") for
use by transform feedback. The hardware supports four independent
SVBI's, but we only need one, since vertices are added to all
transform feedback buffers at the same rate. Note: at the moment this
index is reset to 0 only when the driver is initialized. It needs to
be reset to 0 whenever BeginTransformFeedback() is called, and
otherwise preserved.
(4) In brw_gs_emit.c and brw_gs.c, we modify the geometry shader
program to output transform feedback data as a side effect.
(5) In gen6_gs_state.c, we configure the geometry shader stage to
handle the SVBI pointer correctly.
Note: ordering of vertices is not yet correct for triangle strips
(alternate triangles are improperly oriented). This will be addressed
in a future patch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This patch stores the geometry shader VUE map from a local variable in
compile_gs_prog() to a field in the brw_gs_compile struct, so that it
will be available while compiling the geometry shader. This is
necessary in order to support transform feedback on Gen6, because the
Gen6 geometry shader code that supports transform feedback needs to be
able to inspect the VUE map in order to find the correct vertex data
to output.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The Sandy Bridge PRM, volume 4, part 2, section 5.3.10 ("5.3.10
Register Region Restrictions") contains the following restriction on
the execution size and operand width of instructions:
"3. ExecSize must be equal to or greater than Width."
When emitting an IF instruction in single program flow mode on Gen6+,
we use an ExecSize of 1, therefore the Width of each operand must also
be 1.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In _mesa_BindBufferRange(), we need to verify that the offset and size
specified by the client do not exceed the size of the underlying
buffer. We were accidentally doing this check using ">=" rather than
">", so we were generating a bogus error if the client specified an
offset and size that fit exactly in the underlying buffer.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch adds two new fields to the gl_transform_feedback_info
struct:
- BufferStride records the total number of components (per vertex)
that transform feedback is being instructed to store in each buffer.
- Outputs[i].DstOffset records the offset within the interleaved
structure of each transform feedback output.
These values are needed by the i965 gen6 and r600g back-ends, so it
seems better to have the linker provide them rather than force each
back-end to compute them independently.
Also, DstOffset helps pave the way for supporting
ARB_transform_feedback3, which allows the transform feedback output to
contain holes between attributes by specifying
gl_SkipComponents{1,2,3,4} as the varying name.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Mapping to software and uploading again clearing is killing performance.
Signed-off-by: Maarten Lankhorst <m.b.lankhorst@gmail.com>
Signed-off-by: Christian König <deathsimple@vodafone.de>
Valgrind complains about a definitely lost block allocated in
intelNewTextureImage(). This leak was apparently created by
6e0f9001fe, "mesa: move
gl_texture_image::Data, RowStride, ImageOffsets to swrast", as it
removes the free() from _mesa_delete_texture_image().
Put the free() back, fixes a Valgrind error.
Signed-off-by: Pekka Paalanen <ppaalanen@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
There is no point in having them when we distribute eglext.h.
As for unofficial extensions, there is a chance that we might remove some of
them evetually. Keeping the #ifdef's for now should make that easier.
Update to revision 15052.
EGL_MESA_drm_image is now official. But apparently we have our own extension
to it and we need this in eglmesaext.h:
#ifdef EGL_MESA_drm_image
/* Mesa's extension to EGL_MESA_drm_image... */
#ifndef EGL_DRM_BUFFER_USE_CURSOR_MESA
#define EGL_DRM_BUFFER_USE_CURSOR_MESA 0x0004
#endif
#endif
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously, we advertised 0 VS texture units. Now that we have proper
support for using the sampling engine in the VS, we can advertise 16,
which is conveniently the number required for OpenGL 3.0.
v2: Enable on Gen4. I hacked up my tests to not use flat ivec varyings
and they pass.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This should avoid state-dependent FS recompiles when samplers that are
only used by the VS change.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The idea is to reuse this for the VS and (in the future) GS as well.
v2: Include yuvtex data since we're not dropping GL_MESA_ycbycr.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net> [v1]
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The visit() half computes the values to put in the header based on the
IR and simply stuffs that in the vec4_instruction; the emit() half uses
this to set up the message header. This works out well since emit() can
use brw_reg directly and access individual DWords without kludgery.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We'll want to reuse this for the VS, and it's complex enough that I'd
rather not cut and paste it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This translates the GLSL compiler's IR into vec4_instruction IR,
generating code to load coordinates, LOD info, shadow comparitors, and
so on into the appropriate message registers.
It turns out that the SIMD4x2 parameters are identical on Gen 5-7, and
the Gen4 code is similar enough that, unlike in the FS, it's easy enough
to support all generations in a single function.
v2: Load zeros for missing coordinates (fixing vs-texelFetch-sampler1D
and 2D on G45), and fix G45 message length for shadow comparisons.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
This is the part that takes the vec4_instruction IR and turns it into
actual Gen ISA.
v2: Add Gen4 messages, don't retype m0 to UW.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Prior to Ironlake, cube maps were stored as 3D textures. In recent
refactoring, we removed a separate "layers" parameter in favor of using
depth. Unfortunately, depth was getting minified, which is only correct
for actual 3D textures.
Fixes piglit tests:
- bugs/crash-cubemap-order
- fbo/fbo-cubemap
- texturing/cubemap
Also changes texturing/cubemap npot from abort to fail.
This hasn't seen a full test run since Piglit on Mesa master hangs
GM45 a lot.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
All of the extensions require that both libGL and either the server or
the direct rendering driver (or both) enable the extension before it's
advertised. It seems safe to assume that none of the other components
on OS X will enable these extensions, so all the #ifdef blocks here
just clutter the code.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: Jeremy Huddleston <jeremyhu@apple.com>
There are a few unsupported extensions (e.g., the ATI and NV float
extensions) that are still in the list. There is some small chance
that these may be supported some day.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
__glXInitialize calls AllocAndFetchScreenConfigs.
AllocAndFetchScreenConfigs unconditionally sends a glXQuerySeverString
request to the server. This request is only supported with GLX 1.1 or
later, so we were already implicitly incompatible with GLX 1.0
servers. How many more similar bugs lurk in the code that nobody has
noticed in years?
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously the share_xid was only set in the glXImportContextEXT path,
and it was left set to None in all of the other create-context paths.
Fixes the piglit test glx-query-context-info-ext.
NOTE: This is a candidate for the 7.11 branch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Send the DestroyContext protocol immediately when glXDestroyContext is
called, and never call it when glXFreeContextEXT is called. In both
cases, either destroy the client-side structures or, if the context is
current, set xid to None so that the client-side structures will be
destroyed later.
I believe this restores the behavior of the original SGI code. See
src/glx/x11 around commit 5df82c8. The spec doesn't say anything
about glXDestroyContext not really destroying imported contexts (it
acts like glXFreeContextEXT instead), but that's what the original
code did. Note that glXFreeContextEXT on a non-imported context does
not destroy it either.
Fixes the piglit test glx-free-context.
NOTE: This is a candidate for the 7.11 branch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Fixes the piglit test glx-get-context-id.
NOTE: This is a candidate for the 7.11 branch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
The primary problem was that the number of reply bytes read is clamped
to sizeof(propList), but the loop that processes the properties tries
to examine all of the properties sent by the server. If the server
sends 47,000 properties, we only read 3 but process all 47,000.
NOTE: This is a candidate for the 7.11 branch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Each of the DRI, DRI2, and DRISW backends contain code like the
following in their create-context routine:
if (shareList) {
pcp_shared = (struct dri2_context *) shareList;
shared = pcp_shared->driContext;
}
This assumes that the glx_context *shareList is actually the correct
derived type. However, if shareList was created as an
indirect-rendering context, it will not be the expected type. As a
result, shared will contain garbage. This garbage will be passed to
the driver, and the driver will probably segfault. This can be
observed with the following GLX code:
ctx0 = glXCreateContext(dpy, visinfo, NULL, False);
ctx1 = glXCreateContext(dpy, visinfo, ctx0, True);
Create-context is the only case where this occurs. All other cases
where a context is passed to the backend, it is the 'this' pointer
(i.e., we got to the backend by call something from ctx->vtable).
To work around this, check that the shareList->vtable->destroy method
is the same as the destroy method of the expected type. We could also
check that shareList->vtable matches the vtable or by adding a "tag"
to glx_context to identify the derived type.
NOTE: This is a candidate for the 7.11 branch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This is not exposed generally yet because some of the swrast paths hit
in piglit (drawpixels, copypixels, blit) aren't yet converted to
MapRenderbuffer.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is a little more unusual than the separate MESA_FORMAT_S8_Z24
support, because in addition to storing the real stencil data in a
MESA_FORMAT_S8 miptree, we also make the Z miptree be
MESA_FORMAT_Z32_FLOAT instead of the requested format.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
With separate stencil GL_DEPTH32F_STENCIL8, the miptree will have a
really different format (MESA_FORMAT_Z32_FLOAT) from the teximage
(MESA_FORMAT_Z32_FLOAT_X24S8).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The format handling here is tricky, because we're not actually
generating a Z32_FLOAT_X24S8 miptree, so we're guessing the format
that GL wants based on seeing Z32_FLOAT with a separate stencil.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
All the operations were just trying to get at irb->wrapped_depth->mt,
which is the same as irb->mt now.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
gen7 only supports the non-packed formats, even if you associate a
real separate stencil buffer -- otherwise it's as if the depth test
always fails.
This requires a little bit of care in the match_texture_image case,
since the miptree format no longer matches the texture image format.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This little bit of logic was duplicated, which isn't much, but I was
going to need to duplicate a bit of additional logic in the next
commit.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
There were only two places it was really used at this point, which was
in the batchbuffer emit of the separate stencil packets for gen6/7.
Just write in the ->stencil_mt reference in those two places and ditch
all this flailing around with allocation and refcounts.
v2: Fix separate stencil on gen7.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
this mentions which channels are used for slice and depth comparison values.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The 4th texcoord is used in this case for the comparison.
This fixes piglit glsl-fs-shadow2DArray* on softpipe.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The code didn't handle the case where front wasn't specified in the vertex
shader outputs, but back was.
In that case we were doing a copy from back to non-existant front,
this code checks we have existant front/backs and only does the copy when
they both exist.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Sets rgba layer as zeroth layer if a custom background_surface is specified.
Signed-off-by: Maarten Lankhorst <m.b.lankhorst@gmail.com>
Signed-off-by: Christian König <deathsimple@vodafone.de>
It's harmless to add support for attributes we don't support,
since they require a feature enabled for them to affect
something. As long as they aren't enabled, nothing happens.
This enables support for custom colorspaces and background colors.
Signed-off-by: Maarten Lankhorst <m.b.lankhorst@gmail.com>
Signed-off-by: Christian König <deathsimple@vodafone.de>
Currently only validating, since nothing else can be done with it yet
Signed-off-by: Maarten Lankhorst <m.b.lankhorst@gmail.com>
v2: removed check_video_surface
Signed-off-by: Christian König <deathsimple@vodafone.de>
This sample compare was always doing linear, and this makes the
glsl-fs-shadow1DArray test render like the Intel driver.
fix wrong 0->j from initial patch
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This is the first part of a fix to piglit glsl-fs-shadow1DArray
also fix the passing of unused r[2] in the normal 1D case.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The piglit draw-pixel-with-texture was asserting in the glsl->tgsi code,
due to 0 texture target, this makes sure the texture target is copied over
correctly when we copy instructions around.
v2: drive-by fix bitmap on the way past.
This avoids the assertion, have to contemplate fixing things as per the spec
later.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This will be especially useful for loading texturing parameters, where I
need to (for example) reference m3.xz<D>.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Copy and pasted from fs_inst::is_tex(), but without TXB.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
We'll be reusing most of these for the VS shortly. The one exception is
TXB (texturing with LOD bias), which is explicitly forbidden in the VS.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Fixes a regression since d2235b0f46,
in my new textureSize sampler(1DArrayShadow|2DShadow|2DArrayShadow)
piglit tests, though I'm not honestly sure how this ever worked.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Noticed a "warning: array subscript is above array bounds" given at one of
the existing sanity-check asserts. Turns out all the arrays of strings
haven't matched the corresponding enum values in a while, if ever.
I didn't know the proper names for any of these and couldn't find
them in the base specs aside from "result.pointsize" in
ARB_vertex_program, so I just filled in the enum's value
as was done with other slots.
Also add four STATIC_ASSERT()s to be sure and catch future additions
or bumps to MAX_VARYING/etc again, and some more non-static asserts
where there weren't any before.
(Note, the fragment enum that corresponded to result.color(half) was removed in
8d475822e6e19fa79719c856a2db5b6a205db1b9.)
Reviewed-by: Brian Paul <brianp@vmware.com>
llvm-3.1svn r145714 moved global variables into a new TargetOptions
class. TargetMachine constructor now needs a TargetOptions object as
well.
Signed-off-by: Vinson Lee <vlee@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This helper function is used during mipmap generation to prepare space
for the destination mipmap levels.
This improves/fixes two things:
1. If the texture object was created with glTexStorage2D, calling
_mesa_TexImage2D() to allocate the new image would generate
INVALID_OPERATION since the texture is marked as immutable.
2. _mesa_TexImage2D() always frees any existing texture image memory
before allocating new memory. That's inefficient if the existing
image is the right size already.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Namely:
- EXT_transform_feedback
- ARB_transform_feedback2
- ARB_transform_feedback_instanced
The old interface was not useful for OpenGL and had to be reworked.
This interface was originally designed for OpenGL, but additional
changes have been made in order to make st/d3d1x support easier.
The most notable change is the stream-out info must be linked
with a vertex or geometry shader and cannot be set independently.
This is due to limitations of existing hardware (special shader
instructions must be used to write into stream-out buffers),
and it's also how OpenGL works (stream outputs must be specified
prior to linking shaders).
Other than that, each stream output buffer has a "view" into it that
internally maintains the number of bytes which have been written
into it. (one buffer can be bound in several different transform
feedback objects in OpenGL, so we must be able to have several views
around) The set_stream_output_targets function contains a parameter
saying whether new data should be appended or not.
Also, the view can optionally be used to provide the vertex
count for draw_vbo. Note that the count is supposed to be stored
in device memory and the CPU never gets to know its value.
OpenGL way | Gallium way
------------------------------------
BeginTF = set_so_targets(append_bitmask = 0)
PauseTF = set_so_targets(num_targets = 0)
ResumeTF = set_so_targets(append_bitmask = ~0)
EndTF = set_so_targets(num_targets = 0)
DrawTF = use pipe_draw_info::count_from_stream_output
v2: * removed the reset_stream_output_targets function
* added a parameter append_bitmask to set_stream_output_targets,
each bit specifies whether new data should be appended to each
buffer or not.
v3: * added PIPE_CAP_STREAM_OUTPUT_PAUSE_RESUME for ARB_tfb2,
note that the draw-auto subset is always required (for d3d10),
only the pause/resume functionality is limited if the CAP is not
advertised
v4: * update gallium/docs
v5: * compactified struct pipe_stream_output_info, updated dump/trace
It's like DrawArrays, but the count is taken from a transform feedback
object.
This removes DrawTransformFeedback from dd_function_table and adds the same
function to GLvertexformat (with the function parameters matching GL).
The vbo_draw_func callback has a new parameter
"struct gl_transform_feedback_object *tfb_vertcount".
The rest of the code just validates states and forwards the transform
feedback object into vbo_draw_func.
Xa doesn't support it yet. Trying to do that would cause a segfault.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
When doing format conversion copies between a format without an
alpha channel and a format with an alpha channel, make sure the
destination alpha is set to 1.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
Backends indicate that they support this extension by returning
EGL_TRUE when native_display::get_param() is called with
NATIVE_PARAM_PRESENT_REGION and NATIVE_PARAM_PRESERVE_BUFFER.
native_present_control is extended to include the region that should
be presented. When native_present_control::num_rects is zero,
the whole surface is to be presented.
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
The comment said they deserved to be in emit_depthbuffer, and at this
point they were all there already.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now that we have miptrees for everything, we can more easily test for
!has_separate_stencil completeness. Also, test for whether the
stencil rb is the wrong kind of format for separate stencil, or if we
are trying to do packed to different images of a single miptree.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now there's the thing that CALLOCs and sets up window system vtable,
and the thing that CALLOCs and sets up user renderbuffer vtable. The
user renderbuffer vtable gets replaced later by
intel_renderbuffer_update_wrapper for wrapped renderbuffers (things
with name == ~0).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We were doing it in the caller in the renderbuffer code, but it was
missed in the separate stencil creation for textures. Apparently our
testing was using renderbuffers or pre-aligned sizes.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This used to be needed because irb->mt would be unset for fake packed
depth/stencil, but no longer.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The cool part was that in the "fbo-depthstencil -drawpixels
GL_DEPTH24_STENCIL8 32F_24_8_REV" testcase, the shifting happened to
end up with a value awfully close to the expected value, except for
every other pixel being 0 (the stencil value, shifted away to
nothing).
Reviewed-by: Brian Paul <brianp@vmware.com>
vlVdpPresentationQueueDisplay shouldn't scale, so
use size of destination surface as source rectangle.
Based on work of Maarten Lankhorst <m.b.lankhorst@gmail.com>
Signed-off-by: Christian König <deathsimple@vodafone.de>
Correctly use destination_rect and destination_video_rect
in the mixer, and also use a dirty area tracking for output surfaces.
Based on work of Maarten Lankhorst <m.b.lankhorst@gmail.com>
Signed-off-by: Christian König <deathsimple@vodafone.de>
Take viewport and scissors into account and make
the dirty area a parameter instead of a member.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Invalid shaders containing the character % at an unexpected location
would cause Bison to call yyerror with a message of:
syntax error, unexpected '%'
Bison expects yyerror() to take a string, while _mesa_glsl_error() is a
printf-style function. This hit the classic printf string escape issue:
_mesa_glsl_error(loc, state, "unexpected '%'"); // invalid!
_mesa_glsl_error(loc, state, "%s", "unexpected '%'"); // correct.
This caused assertion failures after ralloc_asprintf_append called
vsnprintf to determine the length of the text that would be printed:
vsnprintf would see the invalid format and return -1, an invalid length.
The solution is to define a proper yyerror() wrapper function that calls
_mesa_glsl_error with the "%s". Since we compile with -p "_mesa_glsl",
yyerror is defined as:
#define yyerror _mesa_glsl_error
So we have to #undef yyerror in order to be able to declare it.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=43564
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Paul Berry <stereotype441@gmail.com>
The versions in the xserver and in libGL have diverged enough that the
xserver doesn't want these.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
glext.h doesn't have GL_MIN_PROGRAM_TEXEL_OFFSET_EXT or
GL_MAX_PROGRAM_TEXEL_OFFSET_EXT. Using them in the XML causes code to
be generated for the xserver that won't compile. Use the names that
exist instead.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
That file was removed from the xserver with commit:
commit a80780a7638f847c3be20e5e0c7fe85e83d9bdd1
Author: Adam Jackson <ajax@redhat.com>
Date: Wed Nov 17 09:03:06 2010 -0500
glx: Remove swap barrier and hyperpipe support
Never implemented in any open source driver. The implementation
assumed explicit DDX driver knowledge of how the client-side driver
worked, since at the time the server's GL renderer was not a DRI driver.
But now, it is, so any implementation of these should be done with
additional DRI driver API, like the swap control extension.
Reviewed-by: Julien Cristau <jcristau@debian.org>
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Signed-off-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
This is only temporary until a better solution is available.
v2: print warnings and add gallium CAPs
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Since gl_framebuffer::_DepthBuffer and _StencilBuffer are only used
by swrast, do the validation of those fields in swrast too.
The main/depthstencil.[ch] code is no longer used and will be removed
next.
Reviewed-by: Eric Anholt <eric@anholt.net>
These files are copies of main/depthstencil.[ch] with s/mesa/swrast/.
The main/depthstencil.[ch] will go away soon.
Reviewed-by: Eric Anholt <eric@anholt.net>
These functions update the gl_framebuffer::_DepthBuffer and _StencilBuffer
fields, possibly creating renderbuffer wrappers that make a shared
depth+stencil accessible as depth-only or stencil only.
This stuff is only used by swrast now and will be moved there next.
Reviewed-by: Eric Anholt <eric@anholt.net>
We're just looking at the depth/stencil renderbuffers to do error
checking. We don't need to look at the depth/stencil wrappers to do
that. Also, remove pointless readRb = depthRb = NULL assignments.
Reviewed-by: Eric Anholt <eric@anholt.net>
We never want to use the depth/stencil buffer wrappers so always just
use the attachment renderbuffers. This is a step toward removing the
_DepthBuffer, _StencilBuffer fields.
Reviewed-by: Eric Anholt <eric@anholt.net>
GLfloat doesn't have enough precision to exactly represent 0xffffff
and 0xffffffff. (and a reciprocal of those, if I am not mistaken)
If -ffast-math is enabled, using GLfloat causes assertion failures in:
- fbo-blit-d24s8
- fbo-depth-sample-compare
- fbo-readpixels-depth-formats
- glean/depthStencil
For example:
fbo-depth-sample-compare: main/format_unpack.c:1769:
unpack_float_z_Z24_X8: Assertion `dst[i] <= 1.0F' failed.
Reviewed-by: Brian Paul <brianp@vmware.com>
fixes the following build error since
c83fb4d45f:
/usr/include/strings.h:46:13: error: expected declaration specifiers or
‘...’ before numeric constant
/usr/include/strings.h:46:13: error: conflicting types for ‘memset’
In file included from
../../../../src/gallium/winsys/g3dvl/xlib/xsp_winsys.c:34:0:
../../../../src/gallium/auxiliary/util/u_inlines.h: In function
‘pipe_buffer_create’:
../../../../src/gallium/auxiliary/util/u_inlines.h:189:4: error: too
many arguments to function ‘memset’
/usr/include/strings.h:46:13: note: declared here
bzero is defined in X11 as: #define bzero(b,len) memset(b,0,len)
including strings.h after the X11 header results in preprocessor
replacing 'bzero' in strings.h and generating unbuildable code.
Signed-off-by: Tobias Droste <tdroste@gmx.de>
This fixes the segfault, and seems to put this closer to where other
properties are being set. Hopefully it still conforms.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This just adds the correct checks and asserts in the right places. This doesn't
fix all the tests that I've sent to piglit, need to add int paths to go alongside the uint paths that don't go via float to fix it up properly.
I'm not sure how much of that could be templated/shared will have a look
once I write it the long way.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
It sets the wrong values (GL_XXX_LEFT instead of GL_XXX), and no other
Mesa driver does this, given that Mesa sets the right draw/read buffers
provided the Mesa visual has the doublebuffer flag filled correctly
which is the case.
Reviewed-by: Brian Paul <brianp@vmware.com>
This avoids forming invalid pointers needlessly, which even if
never dereferenced is undefined behavior. It also makes
_mesa_validate_pbo_access() more comprehensible.
Reviewed-by: Brian Paul <brianp@vmware.com>
NULL as an error indicator is meaningless, since it will return NULL
on success anyway if the caller passes in zero as the image's address
and asks to calculate the offset of the first pixel. For example,
_mesa_validate_pbo_access() does this.
This also matches the code in the non-GL_BITMAP codepath, which
already has an assert like this.
v2: Per Brian Paul's review, remove the function call entirely
and tighten the assert to only accept the two formats compatible with
GL_BITMAP. They always have one component per pixel.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The bug, reported to me by Vadim Girlin on IRC, was causing overzealous
elimination of code in parallel if statements such as the following:
if (x) {
r = false;
}
if (y) {
r = true;
}
Before this commit, the assignment inside the first if block would be
misdetected as dead code and removed.
Number of fragment shader variants is not very representative of the
memory used by LLVM, neither is number of shader instructions, as often
texture sampling constitutes most of the generated code.
This change adds an additional trim criteria: least recently used
fragment shader variants will be freed until the total number of LLVM IR
instruction falls below a specified threshold.
Reviewed-by: Brian Paul <brianp@vmware.com>
u_simple_list.h uses a sentinel element, and not a NULL element. So
ensure list is not empty when reducing the list of shader variants.
Something I noticed while trying to free variants more aggressively.
Reviewed-by: Brian Paul <brianp@vmware.com>
In a few places we need to allocate space for some number of generic
pixels. Use this new define instead of a magic number like 16 or
4 * sizeof(GLuint).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Copying these files is the first step in moving the software buffer
code from main/renderbuffer.c to swrast/s_renderbuffer.c
Reviewed-by: Eric Anholt <eric@anholt.net>
Implemented in terms of renderbuffer mapping/unmapping and format
packing/unpacking functions.
The swrast and state tracker code for implementing accumulation are
unused and will be removed in the next commit.
v2: don't use memcpy() in _mesa_clear_accum_buffer()
v3: don't allocate MAX_WIDTH arrays, be more careful with mapping flags
Reviewed-by: Eric Anholt <eric@anholt.net>
Change and document the interpretation of the color conversion matrix
in order to make the function more versatile and to simplify the
generated shader.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
In Gen6, transform feedback is accomplished by having the geometry
shader send vertex data to the data port using "Streamed Vertex Buffer
Write" messages, while simultaneously passing vertices through to the
rest of the graphics pipeline (if rendering is enabled).
This patch adds a geometry shader program that simply passes vertices
through to the rest of the graphics pipeline. The rest of transform
feedback functionality will be added in future patches.
To make the new geometry shader easier to test, I've added an
environment variable "INTEL_FORCE_GS". If this environment variable
is enabled, then the pass-through geometry shader will always be used,
regardless of whether transform feedback is in effect.
On my Sandy Bridge laptop, I'm able to enable INTEL_FORCE_GS with no
Piglit regressions.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Eric Anholt <eric@anholt.net>
R02_PRIM_END and R02_PRIM_START don't actually refer to bits in DWORD
2 of R0 (as the name, and comments in the code, would seem to
indicate). Actually they refer to bits in DWORD 2 of the header for
URB_WRITE messages.
This patch renames the defines to reflect what they actually mean. It
also addes a define URB_WRITE_PRIM_TYPE_SHIFT, which previously was
just hardcoded in .c files.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Prior to this patch, in the Gen4 and Gen5 GS, we used GRF 0 (called
"R0" in the code) as a staging area to prepare the message header for
the FF_SYNC and URB_WRITE messages. This cleverly avoided an
unnecessary MOV operation (since the initial value of GRF 0 contains
data that needs to be included in the message header), but it made the
code confusing, since GRF 0 could no longer be relied upon to contain
its initial value once the GS started preparing its first message.
This patch avoids confusion by using a separate register ("header") as
the staging area, at the cost of one MOV instruction.
Worse yet, prior to this patch, the GS would completely overwrite the
contents of GRF 0 with the writeback data it received from a completed
FF_SYNC or URB_WRITE message. It did this because DWORD 0 of the
writeback data contains the new URB handle, and that neds to be
included in DWORD 0 of the next URB_WRITE message header. However,
that caused the rest of the message header to be corrupted either with
undefined data or zeros. Astonishingly, this did not produce any
known failures (probably by dumb luck). However, it seems really
dodgy--corrupting FFTID in particular seems likely to cause GPU hangs.
This patch avoids the corruption by storing the writeback data in a
temporary register and then copying just DWORD 0 to the header for the
next message. This costs one extra MOV instruction per message sent,
except for the final message.
Also, this patch moves the logic for overriding DWORD 2 of the header
(which contains PrimType, PrimStart, PrimEnd, and some other data that
we don't care about yet). This logic is now in the function
brw_gs_overwrite_header_dw2() rather than in brw_gs_emit_vue(). This
saves one MOV instruction in brw_gs_quads() and brw_gs_quad_strip(),
and paves the way for the Gen6 GS, which will need more complex logic
to override DWORD 2 of the header.
Finally, the function brw_gs_alloc_regs() contained a benign bug: it
neglected to increment the register counter when allocating space for
the "temp" register. This turned out not to have any effect because
the temp register wasn't used on Gen4 and Gen5, the only hardware
models (so far) to require a GS program. Now, all the registers
allocated by brw_gs_alloc_regs() are actually used, and properly
accounted for.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When the GS is not in use, the entire URB space is available for the
VS. When the GS is in use, we split the URB space 50/50.
The 50/50 split is probably not optimal--we'll probably want tune this
for performance in a future patch. For example, in most situations,
it's probably worth allocating more than 50% of the space to the VS,
since VS space is used for vertex caching. But for now this is good
enough.
Based on previous work by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
We never filled this in before because we didn't care.
I'm skeptical these are correct; my sources indicate that both the VS
and GS # of entries are 256 on both GT1 and GT2.
I'm also loathe to change it and break stuff.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Normally when outputting instructions in SPF (single program flow)
mode, we convert IF and ELSE instructions to conditional ADD
instructions applied to the IP register. On platforms prior to Gen6,
flow control instructions cause an implied thread switch, so this is a
significant savings.
However, according to the SandyBridge PRM (Volume 4 part 2, p79):
[Errata DevSNB{WA}] - When SPF is ON, IP may not be updated by
non-flow control instructions.
So we have to disable this optimization on Gen6.
On later platforms, there is no significant benefit to converting flow
control instructions to ADDs, so for the sake of consistency, this
patch disables the optimization on later platforms too.
The reason we never noticed this problem before is that so far we
haven't needed to use SPF mode on Gen6. However, later patches in
this series will introduce a Gen6 GS program which uses SPF mode.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, GS generation code contained a lookup table that mapped
primitive types POLYGON, TRISTRIP, and TRIFAN to TRILIST, mapped
LINESTRIP to LINELIST, and left all other primitives unchanged. This
was silly, because we never generate a GS program for those primitive
types anyhow.
This patch removes the unnecessary lookup table.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch adds a new bit to the ctx->NewState bitfield,
_NEW_TRANSFORM_FEEDBACK, to track state changes that affect
ctx->TransformFeedback. This bit can be used by driver back-ends to
avoid expensive recomputations when transform feedback state has not
been modified.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
When driCreateScreen calls driConvertConfigs to try to convert the
configs for swrast, it fails and returns NULL. Instead of checking,
it just clobbers psc->base.configs. Then, when the application asks
for the FBconfigs, there aren't any.
Instead, make the caller responsible for freeing the old modes lists
if both calls to driConvertConfigs succeed.
Without the second fix, glxinfo fails unless you run it with
LIBGL_ALWAYS_INDIRECT:
$ glxinfo
name of display: :0.0
Error: couldn't find RGB GLX visual or fbconfig
$ LIBGL_ALWAYS_INDIRECT=1 glxinfo
name of display: :0.0
display: :0 screen: 0
direct rendering: No (LIBGL_ALWAYS_INDIRECT set)
server glx vendor string: NVIDIA Corporation
server glx version string: 1.4
[...]
Signed-off-by: Aaron Plattner <aplattner@nvidia.com>
Reviewed-and-tested-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
This patch fixes the samplerCubeShadow support in GLSL shader compiler.
shader compiler was picking the 'r' texture coordinate for shadow comparison
when the expected behaviour is to use 'q' texture coordinate in case of cube
shadow maps.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Fixes piglit tests fbo-array, fbo-depth-array, fbo-generatemipmap-array,
and array-texture, as well as the array variants of my new textureSize
and texelFetch tests.
Not a candidate for 7.11 because EXT_texture_array wasn't supported.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Fixes many crashes on Ivybridge due to upload_sf_state calling
brw_depthbuffer_format without an actual depth buffer. This was a
recent regression on master.
+3992 piglits on Ivybridge.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This evolved over several commits, and I also wanted to document some
new information about how we handle formats.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Now that all RBs have miptrees, and miptree mapping covered these last
two code paths, consistently use them.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Right now the fake packed d/s RBs are creating two sub-renderbuffers
with their own storage, and the hardware setup and the mapping code
have been explicitly referencing them. By setting miptrees on them,
we'll be able to make our renderbuffer code for fake packed
depth/stencil more consistent with all our other renderbuffers.
The interesting new behavior here is that there is now a mt with a
non-depthstencil format (X8Z24) that has a stencil_mt field
associated. This looks like it should be safe, and we'll need to be
able to do this for floating point depth/stencil as well.
Before, we had an uncached read of S8 to untile, then a RMW (so
uncached penalty) of the packed S8Z24 to store the value, then the
consumer would uncached read that once per pixel. If data was written
to the map, we would then have to uncached read the written data back
out and do the scatter to the tiled S8 buffer (also uncached access
penalties, since WC couldn't actually combine). So 3 or 5 uncached
accesses per pixel in the ROI (and we we were ignoring the ROI, so it
was the whole image).
Now we get an uncached read of S8 to untile, and an uncached read of
Z. The consumer gets to do cached accesses. Then if data was
written, we do streaming Z writes (WC success), and scattered S8
tiling writes (uncached penalty). So 2 or 3 uncached accesses per
pixel in the ROI.
This should be a performance win, to the extent that anybody is doing
software accesses of packed depth/stencil buffers.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
We don't gripe about void * arithmetic for our driver, and this
prevents silly casting when assigning the result of mapping to
non-byte types.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
We're going to want to reuse this logic in mapping of fake packed
miptrees wrapping separate depth/stencil miptrees.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
This code will be incrementally moving to a model like intel_fbo.c's
renderbuffer mapping with helper functions, as I move that code here.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
This will be used for things like packed depth/stencil temporaries and
making LLC-cached temporary mappings using blits.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
This will let us share teximage mapping logic with renderbuffer
mapping, which has an intel_mipmap_tree but not a gl_texture_image.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
This required is_hiz_depth_format to start returning true on S8_Z24 as
well, since that's the format we have here. The two previous callers
are only calling it on non-depthstencil formats.
This avoids us needing to have HiZ working on a new Z format
immediately upon exposing the format (particularly painful for
Z32_FLOAT_X24S8, which means all the fake packed depth/stencil paths).
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Some hardware can't reinterpret the format of hardware buffers and thus
the X server needs to know the format when the buffer is created.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Michel Daenzer <michel@daenzer.net>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
See intel_vertical_texture_alignment_unit() in intel_tex_layout.c;
certain surface types require setting this to VALIGN_4.
Analogous to commit dd0e46c410 on Gen6.
Fixes piglit test fbo-generatemipmap-formats with the
GL_ARB_depth_texture and GL_EXT_packed_depth_stencil arguments.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
The code forces single program flow to be enabled on Ironlake, or
equivalently, disables multiple program flow. The comment was reversed.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This moves the detiling to the fbo mapping, r200 depth is always tiled,
and we can't detile it with the blitter.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This could have been split up better, but the driver is just broken now,
so bisecting the brokenness is going to be painful no matter what.
This adds renderbuffer mapping/unmapping along with texture image allocation.
It drops all the old texture upload paths, some of which could possible be
reimplemented with the blitter later.
It also redoes the span code paths to use its own set of image mapping handlers,
along with removing the tiling decode paths for the color buffers, since
we now hope to use the blitter for this.
Signed-off-by: Dave Airlie <airlied@redhat.com>
I think there is a missing state update or flush somewhere, and every
so often PP_CNTL goes to the kernel with a texture enabled but no texture.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Previously a zero writemask would result in dst_chan == -1, meaning an
unnecessary MOV with the destination register dictated by undefined
memory contents would be emitted before returning. This caused
intermittent GPU hangs, e.g. with glean/texCombine.
Reviewed-by: Eric Anholt <eric@anholt.net>
Anything of less than (bw, bh) size is possible when you consider
rectangular textures, and this code is (now) safe for those. Even for
power-of-two textures, width could be 4 for FXT1 while not being
aligned to block size.
Fixes piglit compressedteximage GL_COMPRESSED_RGB_FXT1_3DFX
Reviewed-by: Brian Paul <brianp@vmware.com>
Generally this code works with width and height aligned to compressed
blocks, but at the 2x2 and 1x1 levels of a square texture (or height <
bh in general), we were skipping uploading our single row of blocks.
Fixes piglit compressedteximage GL_COMPRESSED_RGBA_S3TC_DXT5_EXT.
Reviewed-by: Brian Paul <brianp@vmware.com>
Since the MapTextureImage changes on Intel, nwn had corruption in the
scrollbar at the load game menu, and corrupted ground textures in the
starting zone. Heroes of Newerth's intro screen was also thoroughly
garbled. A new piglit test "compressedteximage" was created to
regression test this.
The issue was this code now seeing dstRowStride aligned to hardware
requirements instead of a temporary buffer that gets uploaded to
hardware later. The existing code was just trying to memcpy
srcRowStride * height / bh, while the glCompressedTexSubImage2D()
storage code nearby did the correct walking by blockheight rows at a
time. Just reuse the subimage upload instead of duplicating that
logic.
v2: Update comment at the top of the function (suggestion by Joel
Forsberg)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=41451
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
We checked if srcType == GL_UNSIGNED_BYTE earlier so there was no
way to reach this code. This was left-over code from the GLchan
removal work.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
If mode is not GL_POINT/LINE/FILL we'll have already reported the
error earlier in the function and returned so we can never get here.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
We shouldn't call _mesa_error() if the target is a proxy texture.
Errors are handled later in the function.
Fixes a Coverity warning.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Attempting to move an MRF to a MRF is not only pointless, it will fail
because MRFs are read-only, resulting in garbage in your register.
If we already set up a MRF source, there's nothing to resolve anyway.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The BITSET64_{TEST,SET,CLEAR}_RANGE macros only work on ranges
wither in the lower 32 or in the upper 32 bits of the bitset.
This change extends these macros to work on arbitrary ranges
possibly crossing the bitset word boundary.
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
The format is defined by GL_OES_compressed_ETC1_RGB8_texture.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Add support for GL_OES_compressed_ETC1_RGB8_texture to core mesa. There is no
driver support yet.
Unlike desktop GL compressed texture formats, GLES compressed texture formats
usually can only be used with glCompressedTexImage2D. All other gl*Tex*Image*
functions are updated to check for that.
Reviewed-by: Brian Paul <brianp@vmware.com>
The format is defined by GL_OES_compressed_ETC1_RGB8_texture. These routines
will be used in the following commit.
Reviewed-by: Brian Paul <brianp@vmware.com>
In swrast_map_renderbuffer negative strides lead to
render buffer map pointers that are off by 2^32.
Make sure that intermediate negative values are not
converted to an unsigned.
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
Fixes these GCC warnings.
u_vbuf.c: In function ‘u_vbuf_draw_begin’:
u_vbuf.c:839:20: warning: ‘max_index’ may be used uninitialized in this function [-Wuninitialized]
u_vbuf.c:838:20: warning: ‘min_index’ may be used uninitialized in this function [-Wuninitialized]
Signed-off-by: Vinson Lee <vlee@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
We weren't doing the necessary byte swap.
v2: use same arithmetic as unpack_ARGB1555() to be consistent.
Reviewed-by: Michel Dänzer <michel@daenzer.net>
In the refactor for handling user-defined out params, we failed to set
up the new color output tracking when there was no color drawbuffer in
place but alpha testing was on. Just always set up at least one when
handling gl_FragColor, since we won't make use of its value unless we
need to.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=42806
In 6d874d0ee1, I checked whether a
register that had been stored was BAD_FILE (as opposed to a legitimate
GRF), but actually the unset register was ARF NULL because it had been
memset to 0. Finding BAD_FILE for unset values in debugging was my
intention with that file, so make it the case more often by
rearranging the enum. There was only one place we relied on the magic
enum register_file to hardware register file correspondance anyway.
It is useful to have this option for shader-db, and it was also good
at the time where we were rejecting shaders due to various internal
limits we hadn't supported yet. However, at this point the precompile
step takes extra time (since not all NOS is known at link time) and
spews misleading debug in the common case of debugging a real app.
This is left in place for VS, where we still have a couple of codegen
failure paths that result in link failure through precompile. Those
need to be fixed.
shader-db can still get at the debug info it wants using
"shader_precompile=true" driconf option. Long term, we can probably
build a good-enough app for shader-db to trigger real codegen.
When new MESA_FORMAT_x enums are added we need to add a new entry in
the table of texture fetch functions. In the past this has been
missed if swrast isn't actually tested. Using a static assertion
should help with that.
This can be used to check that tables have the right number of entries,
etc. at compile-time. This will hopefully catch things that are missed
if particular drivers aren't tested, for example.
v2: Simplify the macro to omit the extra line number info (the compiler
already indicates the line number). And wrap the macro for readability.
There was only one consumer of this API, meta.c, which was intending
to ask "is this format just stencil index (and nothing else)?".
Instead, if one tried to glDrawPixels of GL_DEPTH_STENCIL-type
formats, it would just try to draw the stencil parts. Nothing good
came of this.
This function looks rather silly at this point, but I'm leaving it in
place to be the obvious parallel API to _mesa_is_depth_format(). Note
that if you want the old behavior, you should use it as
(_mesa_is_stencil_format() || _mesa_is_depthstencil_format()) like is
commonly done for depth-related tests.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Asking for the datatype of MESA_FORMAT_Z32_FLOAT_X24S8 is a bit funny
-- there's a float depth channel, and a stencil channel that doesn't
have a particular GLenum associated with its type, so what's the
correct response?
Because there is no query for stencil, just make this format's
datatype be that of the depth channel. It fixes the depth query (and
thus a failure in piglit gl-3.0-required-sized-formats), and none of
the other consumers of the _mesa_get_format_datatype() API care.
v2: Add a comment for why the DataType is this way for this format.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We were already doing it through the shader (layered underneath
GL_EXT_texture_swizzle) in the shadow compare case. This avoids
having per-format logic for switching out the surface format dependent
on the depth mode.
v2: Also do the swizzling for DEPTH_STENCIL. oops.
Reviewed-by: Ian Romanick <idr@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I tripped over this bug in the next commit, relying on our
EXT_texture_swizzle to do some shadow sampler-related swizzling. If a
writemask was masking out a channel of the destination that was a live
channel of the texture swizzle, it would read undefined values.
Fixes piglit ARB_fragment_program_shadow/masked.
Reviewed-by: Ian Romanick <idr@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This will make handling new formats (like actually exposing Z32F)
easier and more reliable.
v2: Remove the check for hiz buffer -- the MESA_FORMAT should really
be giving us the value we want even for hiz.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Complicates Gallium3D development and doesn't seem to have active users.
Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Signed-off-by: José Fonseca <jfonseca@vmware.com>
For the non-separate-stencil-only case, we've been using a NULL
surface for depth, so we didn't have to care. However, to support
separate stencil with no depthbuffer, we have to make the depth
surface non-NULL or the stencil test always fails thanks to separate
stencil inheriting the surface type of depth.
Fixes hiz-depth-stencil-test-d0-s8.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Page 77 (page 91 of the PDF) says about glGetActiveAttrib:
"The returned attribute name can be the name of a generic
attribute or a conventional attribute (which begin with the prefix
"gl_", see the OpenGL Shading Language specification for a
complete list)."
Page 261 (page 275 of the PDF) says about glGetProgramiv:
"If pname is ACTIVE_ATTRIBUTES, the number of active attributes in
program is returned."
It doesn't say anything about built-in vs. user-defined attributes.
From the language around glGetActiveAttrib and the lack of an
exclusion of built-in attributes, which exists other places (e.g.,
around glBindAttribLocation), we can infer that GL_ACTIVE_ATTRIBUTES
should include the active attribute count. It should also be included
in the values returned by glGetActiveAttrib.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=43138
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Yi Sun <yi.sun@intel.com>
To each switch statement in s_texfilter.c, add a break statement to the
default case.
Eliminates the Eclipse static analysis warning: No break at the end of
this case.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Replace the distinct struct gl_client_array members in gl_array_object by
an array of gl_client_arrays indexed by VERT_ATTRIB_*.
Renumber the vertex attributes slightly to keep the old semantics of the
distinct array members. Make use of the upper 32 bits in VERT_BIT_*.
Update all occurances of the distinct struct members with the array
equivalents.
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
Reviewed-by: Eric Anholt <eric@anholt.net>
Make gl_program::InputsRead a 64 bits bitfield.
Adapt the intel and radeon driver to handle a 64 bits
InputsRead value.
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
Reviewed-by: Eric Anholt <eric@anholt.net>
Introduce a set of defines for VERT_ATTRIB_* and VERT_BIT_*
that will be used in the followup patches.
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
Reviewed-by: Eric Anholt <eric@anholt.net>
According opengl spec 4.2.pdf table 6.12 (Vertex Array Object State) at
page 515, the element buffer object is listed in vertex array object.
So, move the ElementArrayBufferObj inside gl_array_object to make
element buffer object per-vao.
This would fix most of(3 left) intel oglc vao test fail
NOTE: this is a candidate for the 7.11 branch.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The position of the red and green bits was misstated in the comments.
Arguably, the names of these formats should be changed to "GR" to reflect
the component ordering and to be consistent with other formats.
And warn loudly in case people want to use it. Too many tester report
gpu hangs on irc and we rootcause this ...
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
If alpha test is enabled and there's no color buffers we still need the
fragment shader to emit a color.
v2: add _NEW_COLOR flag in _mesa_update_state_locked()
Fixes piglit fbo-alphatest-nocolor-ff failures with Gallium drivers.
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Eric Anholt <eric@anholt.net> (i965)
The source array elements are 8-bytes (float + uint) so we need
to multiply the src index by 2 to get the right array stride.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This format is used in the ARB_texture_rgb10_a2ui spec.
It adds core mesa support, texformat + texstore support, format_unpack
and fbobject.c (all patches from list merged + fixed up).
also fixes some whitespace issues.
Parts were:
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
These codepaths were missing the cases for BGR_INTEGER/BGRA_INTEGER.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
After reading ARB_texture_rgb10_a2ui it appears the packed formats
for integer types are only specified via this extension, and not via
the original ones. So condition the checks on this.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
MESA_FORMAT_RGBX8888_REV is one of the opaque pixel formats used on Android.
Thanks to texture-from-pixmap, drivers may actually see texture images with
this format on Android.
MESA_FORMAT_RGBX8888 is added only for completeness.
Reviewed-by: Brian Paul <brianp@vmware.com>
[olv: Move the new formats after MESA_FORMAT_ARGB8888_REV in gl_format. I
accidentally moved them to the wrong place when preparing the patch.]
GLX functions are sometimes directly available in the current binary. In such
cases, we do not need any alternate library loaded using dlopen. Otherwise,
dlopen may find the wrong libGL library and get functions that conflicts with
the current loaded ones.
For example, on Debian Sid with nvidia binary drivers, using mesa's libEGL with
GLX driver leads to wrong glXGetFBConfigs symbol loaded (or loaded twice?),
which leads to "GLX: failed to create any config" error message as the
glXGetFBConfigs symbol seems to return garbage. If the binary is linked with
nvidia's libGL, the GLX symbols are already available.
Without this patch, convert_fbconfig (src/egl/drivers/glx/egl_glx.c:233) fails
for every config found, after glXGetFBConfigAttrib(... GLX_RENDER_TYPE, ...)
call, as the value returned has GLX_COLOR_INDEX_BIT and not GLX_RGBA_BIT.
[olv: initialize handle, prepend egl_glx to the commit log]
Also fix up Makefiles to use the default mesa compilation flags.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jakob Bornecrants <jakob@vmware.com>
Also remove some unused variables in the st/xa makefile.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
With ICS (Android 4.0), several headers and structs are renamed. Define
ANDROID_VERSION so that we can choose a different path depending on the
platform version.
I've tested only softpipe and llvmpipe. r600g is also reported to work.
Enable the bit 3DSTATE_DEPTH_BUFFER.Tiled_Surface. From the Sandybridge
PRM, Volume 2, Part 1, Section 7.5.5.1.1 3DSTATE_DEPTH_BUFFER, Bit 1.27
Tiled Surface:
[DevGT+]: This field must be set to TRUE.
Fixes GPU hangs on the following Piglit tests:
hiz-stencil-test-fbo-d0-s8
hiz-stencil-read-fbo-d0-s8
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
v2: Guard against rb->mt being NULL, since we may enter the draw
regions path before intel_prepare_render() has been called to set
them.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com> (v1)
This is a no-op change on gen6, but should result in some
actually-unsupported formats on gen4 no longer being chosen (like
RGBA_FLOAT32 now being RGBA_FLOAT16).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
GL 3.0 specifies GL_RGB10_A2 as a required sized format for rendering
and texturing.
This introduces two piglit regressions: one due to fbo-mipmap-copypix
hitting swrast GetRow (we want to convert swrast to MapRenderbuffer),
and one due to fbo-blending-formats being too picky while leaving
dithering on.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now that all the rest of the driver is driven off of the surface
formats table, all we really need to do is add the mapping from
MESA_FORMAT to BRW_SURFACEFORMAT. However, we also add format
override for I16/L16 render targets at the same time, so that existing
users of I16 that were getting promoted to I32 and then getting the
I32->R32 override still get FBO support.
Fixes failures in piglit gl-3.0-required-sized-texture-formats, and
will prevent regressions in ARB_texture_float on gen4 when moving to
fully table-driven texture format setup.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Until GL 3.0, there isn't any requirement on the actual sizes of
channels chosen. By falling back to 16 here, we can correctly support
ARB_texture_float on original i965 hardware, which can't correctly
filter 32-bit floats.
Not all i965 hardware can do RGB float16, and this will at least save
half the memory and have expected behavior in terms of precision.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This should be a no-op change. The initializers are reordered to
match the ordering of the enum, since there isn't a clearly sensible
ordering, but "the order they were added to the driver, sort of" is
definitely not one.
Also, the unsupported formats are explicitly initialized to 0, so it's
more obvious what we aren't claiming to support.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is currently duplicated with intel_context.c's setup of the
formats table, and sets true for exactly the same set of formats on
gen6.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I've never seen a use for the thread ID value, but knowing the format
being rendered is kind of a big deal.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We are already testing this if appropriate in
intel_validate_framebuffer (FBO completeness), so no need to avoid
attaching the texture to the renderbuffer here.
This causes MESA_FORMAT_R11_G11_B10_FLOAT to now be renderable as a texture
attachment on i965.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We don't want to go writing GetRow/PutRow for every format required by
GL 3.0, when it's very hard to get those functions called, and in
every case we want to make swrast do direct mapping through
MapRenderbuffer anyway.
This causes MESA_FORMAT_R11_G11_B10_FLOAT to be considered complete on gen6.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This moves any chipset-dependent logic we want for render target
format choices to init time as well. There is still logic left at
state update for SRGB handling, where format choices change based on
GL state.
The brw_render_target_supported() function should now return correct
results, instead of relying on the limited results from
intel_span_supports_format() to avoid lying about FBO completeness.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This will be used to drive chosing formats and determining framebuffer
completeness, instead of the bunch of ad-hoc checks we have had until
now.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
The formats.c code's "datatype" value is "what does this value mean",
i.e. unorm or snorm or float, and is the return value from the
GL_TEXTURE_RED_TYPE class of queries. The depth formats were marked
as GL_UNSIGNED_INT, which is what we use for integer, and not what we
should be returning from the glGetTexLevelParameter.
In texstore, we were inappropriately using it as an argument to
_mesa_unpack_depth_span() that was expecting a value like
GL_UNSIGNED_INT or GL_UNSIGNED_SHORT. Just hardcode
_mesa_unpack_depth_span()'s arguments for now, though it looks like
the consumers of that interface would be happier with using
MESA_FORMAT.
Reviewed-by: Brian Paul <brianp@vmware.com>
The GL_TEXTURE_WHATEVER_SIZE entrypoints were checking if the
specified base type of the texture allowed that channel to be present
before reporting the size of the channel, so that GL_RGB didn't end up
with an alpha size if the hardware driver had to store it that way.
The GL_TEXTURE_WHATEVER_TYPE entrypoints weren't checking it, so you
would end up with strange responses from the GL involving 0-bit
floating-point alpha components in GL_RGB32F, even though it says
GL_NONE as expected for other 0-sized channels.
Make _TYPE check _BaseFormat the same as _SIZE, which results in
fixing most of the GL_RGB* testcases of gl-3.0-required-sized-formats
pass on i965.
v2: Add a default case with a warning (suggestion by Brian Paul)
Reviewed-by: Brian Paul <brianp@vmware.com> (v1)
The motivation behind this is to add some self-documentation in the code
about how each CAP can be used.
The idea is:
- enum pipe_cap is only valid in get_param
- enum pipe_capf is only valid in get_paramf
Which CAPs are floating-point have been determined based on how everybody
except svga implemented the functions. svga have been modified to match all
the other drivers.
Besides that, the floating-point CAPs are now prefixed with PIPE_CAPF_.
Regresses one Piglit test: bugs/fdo10370.
I'm not enabling HiZ for gen7 yet because it causes a mysterious
performance regression.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
For depthstencil renderbuffers, we were using separate stencil only if the
hardware required it. Since the performance gains from HiZ is so high, we
should always use separate stencil if the hardware supports it.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
I implemented functions for horizontal/vertical alignment units separately
because I find it easier to read that way...especially with all the
corner-cases.
[chad] Corrected the vertical alignment calculation by checking for
depthstencil formats.
v2:
- Fix typos in intel_horizontal_texture_alignment_unit():
s/height/width/ and s/VALIGN/HALIGN.
- Remove special case for compressed formats in
intel_get_texture_alignment unit(). Compressed formats are already
handled in the halign and valign functions.
- Replace check ``_mesa_is_depth_format(...) ||
_mesa_is_depthstencil_format(...)`` with explcitit checks against
GL_DEPTH_COMPONENT and GL_DEPTH_STENCIL.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
This allows us to replace all the calls to
intel_get_texture_alignment_unit() with a single call at miptree creation.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
When a depth texture is first attached to framebuffer, allocate a HiZ
miptree for it.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
After brw_try_draw_prims() emits a batch, mark that the depth buffer needs
a depth resolve if the buffer was written to and if it has an accompanying
HiZ buffer.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Resolve all buffers that will be mapped by intelSpanRenderStart. This
comprises resolving the depth buffer of each enabled texture and of the
read and draw buffers.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Factor the mapping loops from intelSpanRenderStart() into
intel_span_map_buffers(). This in preparation for the next commit,
which resolves the buffers before mapping.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Before emitting primitives in brw_try_draw_prims(), resolve the depth
buffer's HiZ buffer and resolve the depth buffer of each enabled depth
texture.
v2: [anholt] The driver no longer validates drm bo's, so update a comment
to reflect that.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
To do so, we must resolve all buffers on entering a glBegin/glEnd block.
For the detailed explanation, see the Doxygen comments in this patch.
v2:
- Fix typo: s/enusure/ensure/.
- In brwPrepareExecBegin(), do the same resolves as done by
brw_predraw_resolve_buffers().
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
A lot of the state manipulation is handled by the meta-op state setup.
However, some batches need manual intervention.
v2:
Do not special-case the 3DSTATE_DEPTH_STENCIL.Depth_Test_Enable bit
for HiZ in gen6_upload_depth_stencil(). The HiZ meta-op sets
ctx->Depth.Test, just read the value from that.
v3:
Add a new dirty flag, BRW_STATE_HIZ, for brw_tracked_state. Flag it
immediately before and after executing the HiZ operation in
gen6_resolve_slice(). Add the flag to the the dirty bits for the
following state packets:
gen6_clip_state
gen6_depth_stencil_state
gen6_sf_state
gen6_wm_state
v4:
- Add BRW_NEW_STATE_HIZ to the dirty bit table in brw_state_upload.c.
This is needed for INTEL_DEBUG=state.
- Align brw dirty bit for gen6_depth_stencil_state.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Some state batches also need to be manipulated. That's done in the next
commit.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
brw_context::hiz contains state needed to perform HiZ meta-ops and
indicates if a HiZ operation is currently in progress.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Add the following functions:
intel_renderbuffer_resolve_hiz
intel_renderbuffer_resolve_depth
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Add functions that
- set a miptree slice as needing a resolve
- resolve a single slice of a miptree
- resolve all slices of a miptree
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
This is a map of miptree slices to needed resolves, implemented as
a linked list. A future commit will embed such a list in
intel_mipmap_tree.
If you think I'm crazy to put a list in a miptree, read the Doxygen in
this patch for intel_resolve_map.
v2: [anholt] Move Doxygen from functin prototypes to definitions.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Now that intel_renderbuffer::region has been replaced with a miptree, the
HiZ functions region parameter must be replaced with a miptree parameter.
Change the return type from bool to void.
Rename the 'depth' parameter to 'layer', because it will correspond to
irb->mt_layer.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Remove the following functions:
i830_hiz_resolve_noop
i915_hiz_resolve_noop
brw_hiz_resolve_noop
My original strategy for how intel->vtbl.resolve_*buffer was used has
substantially changed. The above functions are no longer called in the
current strategy.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
This is required to correctly implement HiZ for mipmapped and
multi-layered textures.
v2: Accomodate refcount fixes in intel_process_dri2_buffer_*() that were
introduced in v2 of commit
intel: Replace intel_renderbuffer::region with a miptree [v2]
Reviewed-by: Eric Anholt <eric@anholt>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
For depthstencil textures using separate stencil, we embedded a stencil
buffer in intel_texture_image. The intention was that the embedded stencil
buffer would be the golden copy of the texture's stencil bits. When
necessary, we scattered/gathered the stencil bits between the texture
miptree and the embedded stencil buffer.
This approach had a serious deficiency for mipmapped or multi-layer
textures. Any given moment the embedded stencil buffer was consistent with
exactly one miptree slice, the most recent one to be scattered. This
permitted tests of type A to pass, but broke tests of type B.
Test A:
1. Create a depthstencil texture.
2. Upload data into (level=x1,layer=y1).
3. Read and test stencil data at (level=x1, layer=y1).
4. Upload data into (level=x2,layer=y2).
5. Read and test stencil data at (level=x2, layer=y2).
Test B:
1. Create a depthstencil texture.
2. Upload data into (level=x1,layer=y1).
3. Upload data into (level=x2,layer=y2).
4. Read and test stencil data at (level=x1, layer=y1).
5. Read and test stencil data at (level=x2, layer=y2).
v2:
Only allocate stencil miptree if intel->must_use_separate_stencil,
because we don't make the conversion from must_use_separate_stencil to
has_separate_stencil until commit
intel: Use separate stencil whenever possible
v3:
Don't call ChooseNewTexture in intel_renderbuffer_wrap_miptree() in
order to determine the renderbuffer format. Instead, pass the format as
a param to that function.
CC: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
This is in preparation for properly implementing glFramebufferTexture*()
for mipmapped depthstencil textures. The FIXME comments deleted by this
patch give a rough explanation of what was broken.
This refactor does the following:
- In intel_update_wrapper() and intel_wrap_texture(), change the
parameters to prepare to remove functions' dependency on
gl_texture_image.
- Move the call to intel_renderbuffer_set_draw_offsets() from
intel_render_texture() into intel_udpate_wrapper().
Each time I encounter those functions, I dislike their vague names.
(Update which wrapper? What is wrapped? What is the wrapper?). So, while
I was mucking around, I also renamed the functions.
v2:
In addition to the ``GLenum internal_format`` parameter to
intel_wrap_miptree(), add a ``gl_format format`` parameter. This
removes the need to recalculate for the true format from
internal_format with ChooseNewTextureFormat, which was just weird.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
This is a small helper function that asserts that a given level and layer
are valid for a miptree. I will be extensively using it in the future
miptree HiZ functions.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Since the renderbuffer tracks the miptree level and layer that it wraps,
the 'tex_image' and 'zoffset' params are no longer needed to calculate the draw
offsets.
Not only are they no longer needed, but their presence would prevent
calculating the renderbuffer draw offsets in situations where there were
no texture image. Such situations will occur during the HiZ meta-op and
during scatter/gather of separate stencil textures.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
TODO: Make v2 for kwg.
Add two fields to intel_renderbuffer:
mt_level
mt_layer
Multiple renderbuffers may simultaneously wrap a single texture and each
provide a different view into that texture. [Consider
glFramebufferTextureLayer()]. The new fields indicate which slice of the
miptree is wrapped by the renderbuffer.
The buffer resolve operations, to be introduced in the future, require
these fields in order to resolve the correct slice in the miptree.
To add the fields, it was necessary to replace the type of some function
parameters from gl_texture_image to gl_renderbuffer_attachment.
v2: [kwg] Replace confusing condition `CubeMapFace > 0` with the more
sensible `Target == GL_TEXTURE_CUBE_MAP`.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
For all texture targets except GL_TEXTURE_CUBE_MAP, the 'nr_images' and
'depth' fields of intel_mipmap_level were identical. In the exceptional
case, nr_images == 6 and depth == 1.
It is simple to determine if a texture is a cube or not, so the presence
of two fields here was not helpful. Worse, it was confusing. When we
eventually implement GL_ARB_texture_cube_map_array, this mess would have
become even more confusing.
This patch removes 'nr_images' and assigns to 'depth' a consistent
meaning: depth is the number of 2D slices at each miplevel. The exact
semantics of depth varies according to the texture target:
- For GL_TEXTURE_CUBE_MAP, depth is 6.
- For GL_TEXTURE_2D_ARRAY, depth is the number of array slices. It is
identical for all miplevels in the texture.
- For GL_TEXTURE_3D, it is the texture's depth at each miplevel. Its
value, like width and height, varies with miplevel.
- For other texture types, depth is 1.
As a consequence, parameters were removed from the following function
signatures:
intel_miptree_set_level_info
Remove 'nr_images'.
i945_miptree_layout
brw_miptree_layout_texture
brw_miptree_layout_texture_array
Remove 'slices'.
v2:
- Replace "It's" with "Its".
- Remove all hunks in intel_fbo.c. The hunks were spurious and sneaked
in during a rebase.
- Remove unneeded hunk in intel_tex_map_image_for_swrast(). It was
a little refactor of the for-loop's upper bound.
v4:
In intel_miptree_get_image_offset(), document the conditions under
which different if-branches are taken.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
They're not supported by hw directly, but it's easy to emulate
them with a shader swizzling fixup.
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
[danvet: The important thing is to write a 1 to the unused alpha
channel, the ddx is relying on this for render accel.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
If the gallium driver doesn't support PIPE_FORMAT_R16G16B16A16_SNORM
the call to st_choose_renderbuffer_format() would fail and we'd generate
an GL_OUT_OF_MEMORY error. We'd never get to the subsequent code that
handles software/malloc-based renderbuffers.
Add a special-case check for PIPE_FORMAT_R16G16B16A16_SNORM which is used
for software-based accum buffers. This could be fixed in other ways but
it would be a much larger patch. st_renderbuffer_alloc_storage() could
be reorganized in the future.
This fixes accum buffer allocation for the svga driver.
Note: This is a candidate for the 7.11 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Extract the body of the inner loop into a new function,
intel_miptree_copy_slice().
This is in preparation for adding support for separate stencil and HiZ to
intel_miptree_copy_teximage(). When copying a slice of a depthstencil
miptree that uses separate stencil, we will also need to copy the
corresponding slice of the stencil miptree. The easiest way to do this
will be to call intel_miptree_copy_slice() recursively. Analogous
reasoning applies to copying a slice of a depth miptree with HiZ.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Add a new field, intel_mipmap_level::slice, and move the offset fields
into it. Also add some much needed documentation for these fields.
Before this patch, a separate array was allocated for the
intel_mipmap_level::{x,y}_offsets. This was just silly; it incurred an
extra call to malloc and diminished memory locality.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Essentially, this patch just globally substitutes `irb->region` with
`irb->mt->region` and then does some minor cleanups to avoid segfaults
and other problems.
This is in preparation for
1. Fixing scatter/gather for mipmapped separate stencil textures.
2. Supporting HiZ for mipmapped depth textures.
As a nice benefit, this lays down some preliminary groundwork for easily
texturing from any renderbuffer, even those of the window system.
A future commit will replace intel_mipmap_tree::hiz_region with a miptree.
v2:
- Return early in intel_process_dri2_buffer_*() if region allocation
fails.
- Fix double semicolon.
- Fix miptree reference leaks in the following functions:
intel_process_dri2_buffer_with_separate_stencil()
intel_image_target_renderbuffer_storage()
v3:
- [anholt] Fix check for hiz allocation failure. Replace
``if (!irb->mt)` with ``if(!irb->mt->hiz_region)``.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Move the following inline functions:
intel_get_rb_region
intel_framebuffer_has_hiz
A future commit will replace the renderbuffer's region with a miptree.
This small refactor will eliminate the need for intel_fbo.h to include
intel_mipmap_tree.h on that commit. I'd like to avoid the situation where
each header transitively includes every other header.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
The only user of intel_framebuffer_get_hiz_region() was
intel_framebuffer_has_hiz(). So I folded the body of the former into the
latter.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
A great refactor thrashing begins after this commit for HiZ and separate
stencil. Removing code for texture HiZ will make that refactoring easier,
because then we don't have to maintain that code during the refactor.
To disable HiZ for textures, I've removed the hook in
intel_update_wrapper() that allocates a HiZ buffer when attaching a depth
texture to a framebuffer.
HiZ was broken for textures anyway, so there's no regression here.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
The function gathered the stencil buffer into the depth buffer only when
the map mode contained the read bit. But we must do the gather even if the
map mode is write-only. If we do not, then, when the depth buffer's stencil
bits are scattered into the stencil buffer by intel_unmap_renderbuffer(),
some of the scattered stencil bits would be invalid.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
1. Don't map the depthstencil buffer twice
Place a guard in intel_renderbuffer_map() to prevent a renderbuffer
from being mapped twice. This happened if a single buffer was attached to
the framebuffer's depth and stencil attachment points. (Interestingly,
because intel_map_renderbuffer_gtt() is idempotent, the double mapping did
not cause bugs for depthstencil buffers *without* separate stencil).
2. Stop overriding gl_framebuffer::_DepthBuffer,_StencilBuffer
Normally, if a depthstencil buffer is attached to the framebuffer's
depth attachment point, then _mesa_update_framebuffer() installs
a wrapper depth renderbuffer at gl_framebuffer::_DepthBuffer. Ditto for
the stencil attachment point and gl_framebuffer::_StencilBuffer
A depthstencil intel_renderbuffer with separate stencil contains hidden
depth and stencil renderbuffers, which are the *real* renderbuffers. In
order to force swrast to work, we were installing, in
brw_update_draw_buffer(), the hidden renderbuffers at
gl_framebuffer::_DepthBuffer and _StencilBuffer, thus overriding the
behavior of _mesa_update_framebuffer(). However, now that
intel_renderbuffer_map() is implemented with MapRenderbuffer(),
overriding _mesa_update_framebuffer's introduces bugs. This patch
removes the override code.
Fixes several Piglit tests on gen7.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
The special stencil span accessors, as set by intel_span_init_funcs.
perform software W detiling. Since intel_renderbuffer_map() now uses
MapRenderbuffer, rb->Data points to an *untiled* stencil buffer.
Fixes several Piglit tests on gen7.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
It's intended to indicate whether the driver/hardware supports reading
of the values written into shader outputs.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
texture_combine converts the result rgba to CHAN_TYPE from FLOAT. At the
same time, make sure the span->array->ChanType is changed, too.
v2: pick a nicer comment from Brian
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Parameter n and rgbaChan are both from structure span, thus using span
as paramter to simplify the prototype. Function texture_combine is only
used by _swrast_texture_span, so I guess it's safe to do so.
This patch is mainly for the next patch.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
And update r300g.
This is different from util_draw_max_index in how it obtains vertex elements
and that it doesn't have to call util_format_description due to additional
precomputed data in vertex elements.
This forks vbo_get_minmax_index. We need to know the index range when
translating non-native vertices into native ones. There is no other way
around it.
Previously we were mapping/unmapping the index buffer each time we
found the restart index in the buffer. This is bad when the restart
index is frequently used. Now just map the index buffer once, scan
it to produce a list of sub-primitives, unmap the buffer, then draw
the sub-primitives.
Also, clean up the logic of testing for indexed primitives and calling
handle_fallback_primitive_restart(). Don't call it for non-indexed
primitives.
v2: per Jose, only map the relevant part of the index buffer with
pipe_buffer_map_range()
Reviewed-by: José Fonseca <jfonseca@vmware.com>
On original gen4, the surface format didn't determine the return data
type from sampling like it does on g45 and later.
Fixes GL_EXT_texture_integer/texture_integer_glsl130
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Merge may produce incorrect order of operations for r600-eg:
x: inst1 R0.x, ... ; //from current group
...
t: inst0 R0.x, ... ; //from previous group, same destination
Result of inst1 will be lost.
So compare destinations and don't allow this.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Previously a vertex shader that used no samplers would get updated (by
calling the driver's ProgramStringNotify) when a sampler in the
fragment shader was updated. This was discovered while investigating
some spurious code generation for shaders in Cogs. The behavior in
Cogs is especially pessimal because it ping-pongs sampler uniform
settings:
glUniform1i(sampler1, 0);
glUniform1i(sampler2, 1);
draw();
glUniform1i(sampler1, 1);
glUniform1i(sampler2, 0);
draw();
glUniform1i(sampler1, 0);
glUniform1i(sampler2, 1);
draw();
// etc.
ProgramStringNotify is still too big of a hammer. Applications like
Cogs will still defeat the shader cache. A lighter-weight mechanism
that can work with the shader cache is needed. However, this patch at
least restores the previous behavior.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In slow_read_depth_stencil_pixels_separate() we might have separate
depth and stencil buffers or a combined buffer. In the later case,
don't map the buffer twice. This function is used when the depth
scale/bias pixel transfer values are not the defaults.
Fixes http://bugs.freedesktop.org/show_bug.cgi?id=42963
Reviewed-by: José Fonseca <jfonseca@vmware.com>
glspec doesn't say that we should skip the attenuation and spot
calculation for infinite light(Ppli.w == 0). Instead, it gives a same
formula to do the light calculation for both finite light and infinite
light(see page 62 of glspec 2.1.pdf)
Also from the formula (2.4) at page 62 of glspec 2.1.pdf, we can skip
attenuation calculation if Ppli.w == 0.
This would fix all the intel oglc l_sed fail subcases and introduces no
intel oglc regressions.
v2: fix an wrong intendation(comments from Brian).
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Brian Paul <brianp@vmware.com>
Make sure all lighting tables are updated before using the table to
calculate something, say using _SpotExpTable to calculate
_VP_inf_spot_attenuation.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Fixes regression in piglit:
ARB_color_buffer_float/GL_RGBA16F-getteximage
ARB_color_buffer_float/GL_RGBA16F-readpixels
ARB_color_buffer_float/GL_RGBA32F-getteximage
ARB_color_buffer_float/GL_RGBA32F-readpixels
Reviewed-by: Brian Paul <brianp@vmware.com>
Fixes some spurious GL errors in the upcoming
gl-3.0-required-sized-formats piglit test.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
intelAllocateBuffer() was oblivious to separate stencil buffers. This
patch fixes it to allocate a non-tiled stencil buffer with special pitch,
just as the DDX does.
Without this, any app that attempted to create an EGL surface with stencil
bits would crash. Of course, this affected only environments that used the
builtin DRI2 backend, such as Android and Wayland.
Fixes GLBenchmark2.1 on Android on gen7.
Note: This is a candidate for the 7.11 branch.
Tested-by: Louie Tsaie <louie.tsai@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
I changed the dimensions of the stencil buffer's region, as allocated by
the DDX, at xf86-video-intel commit
commit 3e55f3e88b40471706d5cd45c4df4010f8675c75
dri: Do not tile stencil buffer
But I forgot to make the analogous update to the Intel DRI2 glue in Mesa.
This patch makes that update.
Surprisingly, the mismatch did not cause any bugs. But the mismatch, if
left unfixed, *would* create bugs in the next commit.
Note: This is a candidate for the 7.11 branch.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
When calculating the y offset needed for detiling window system stencil
buffers, replace the term
region->height * 2 + region->height % 2 - 1
with
rb->Height - 1 .
The two terms are incidentally equivalent due to some out-of-date,
incorrect code in the Intel DRI2 glue for DDX. (See
intel_process_dri2_buffer_with_separate_stencil(), line ``buffer_height /=
2;``).
Note: This is a candidate for the 7.11 branch (only the intel_span.c hunk).
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Rather than redefining the BYTE/SHORT_TO_FLOAT macros, just define new
ones with different names. These macros preserve zero when converting.
Reviewed-by: Eric Anholt <eric@anholt.net>
Don't assert/die if a VBO is too small. Return zero instead. For
debug builds, emit a warning message since this is an unusual situation
that might indicate that there's a bug in the app.
Note that util_draw_max_index() now returns max_index+1 instead of
max_index. This lets us return zero to indicate that one of the VBOs
is too small to draw anything.
Fixes a failure with the new piglit vbo-too-small test.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
The swrast ReadPixels code has no dependencies on swrast since moving
to Map/UnmapRenderbuffer(). We'll be able to remove s_readpix.c and
remove the state tracker's glReadPixels code next.
Acked-by: Eric Anholt <eric@anholt.net>
This was only used by the xlib driver to add an alpha channel to the
front/window color buffer. This was no longer going to work well with
the move to direct mapping of renderbuffers.
Reviewed-by: Eric Anholt <eric@anholt.net>
Seldom used and this won't work when we move to using Map/UnmapRenderbuffer
everywhere. This will let us remove a bunch of core Mesa code too.
Reviewed-by: Eric Anholt <eric@anholt.net>
For a depthstencil buffer with separate stencil,
intel_renderbuffer::region is null. (The regions are kept in hidden depth
and stencil buffers). Since the region is null, intel_map_renderbuffer()
assumed there was no data and returned a null map pointer, which in turn
was dereferenced (!) by MapRenderbuffer's caller.
This patch fixes intel_map_renderbuffer() to map the hidden depth buffer
through the GTT and return that as the mapped pointer. Also, the stencil
bits are scattered and gathered when needed.
Fixes the following Piglit tests on gen7:
fbo/fbo-readpixels-depth-formats
hiz/hiz-depth-read-fbo-d24s8
hiz/hiz-stencil-read-fbo-d24s8
EXT_packed_depth_stencil/fbo-clear-formats
EXT_packed_depth_stencil/fbo-depth-GL_DEPTH24_STENCIL8-blit
EXT_packed_depth_stencil/fbo-depth-GL_DEPTH24_STENCIL8-drawpixels
EXT_packed_depth_stencil/fbo-depth-GL_DEPTH24_STENCIL8-readpixels
EXT_packed_depth_stencil/fbo-depthstencil-GL_DEPTH24_STENCIL8-readpixels-24_8
EXT_packed_depth_stencil/fbo-depthstencil-GL_DEPTH24_STENCIL8-readpixels-FLOAT-and-USHORT
EXT_packed_depth_stencil/fbo-stencil-GL_DEPTH24_STENCIL8-readpixels
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
If a window system stencil buffer had a region with odd height, then the
calculated y offset needed for software detiling was off by one. The bug
existed in intel_{map,unmap}_renderbuffer_s8() and in the intel_span.c
accessors.
Fixes the following Piglit tests on gen7:
general/depthstencil-default_fb-readpixels-24_8
general/depthstencil-default_fb-readpixels-FLOAT-and-USHORT
Fixes SIGABRT in the following Piglit tests on gen7:
general/depthstencil-default_fb-blit
general/depthstencil-default_fb-copypixels
general/depthstencil-default_fb-drawpixels-24_8
general/depthstencil-default_fb-drawpixels-FLOAT-and-USHORT
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
When gathering the temporary buffer's pixles into the gem buffer, we had
the two buffers juxtaposed. Oops.
Fixes the following Piglit tests on gen7:
general/GL_SELECT - alpha-test enabled
general/GL_SELECT - depth-test enabled
general/GL_SELECT - no test function
general/GL_SELECT - scissor-test enabled
general/GL_SELECT - stencil-test enabled
Fixes SIGABRT in Piglit tests EXT_framebuffer_object/fbo-stencil-* on
gen7.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
The function already implements 3 cases (map through GTT, blit to
a temporary, and detile stencil buffer to temporary), and a 4th will be
added soon: scatter/gather for depthstencil buffers using separate
stencil. For sanity's sake, this factors each case out into its own
function.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Don't call set_unfiform_initializers if link failed, or it would trigger
a GL_INVALID_OPERATION error. That's not an expected behavior of
glLinkProgram function.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Previously, we would fail to compile the following shader due to a bug
in lazy built-in importing:
#version 130
void main() {
float f = abs(5.0);
int i = abs(5);
}
The first call, abs(5.0), would fail to find a local signature, look
through the built-ins, and import "float abs(float)".
The second call, abs(5), would find the newly imported float signature
in the local shader, and settle for that. Unfortunately, it failed to
search the built-ins for the correct/exact signature, "int abs(int)".
Thus, abs(5) ended up being a float, causing a bizarre type error when
we tried to assign it to an int.
Fixes piglit test builtin-overload-matching.frag.
This is /not/ a candidate for stable branches, as it should only be
possible to trigger this bug using GLSL 1.30's built-in functions that
take integer arguments. Plus, the changes are fairly invasive.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
match_function_by_name performs two fairly separate tasks:
1. Hunt down the appropriate ir_function_signature for the callee.
2. Generate the actual ir_call (assuming we found the callee).
Both of these are complicated. The first has to handle exact/inexact
matches, lazy importing of built-in prototypes, different scoping rules
for 1.10, 1.20+, and ES. Not to mention printing a user-friendly error
message with pretty-printed "maybe you meant this" candidate signatures.
The second has to deal with void/non-void functions, pre-call implicit
conversions for "in" parmeters, and post-call "out" call conversions.
Trying to do both in one function is just too unwieldy. Time to split.
This patch purely moves the code to generate an ir_call into a separate
function and reindents it. Otherwise, the code is identical.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
When matching function signatures across multiple linked shaders, we
often want to see if the current shader has _any_ match, but also know
whether or not it was exact. (If not, we may want to keep looking.)
This could be done via the existing mechanisms:
sig = f->exact_matching_signature(params);
if (sig != NULL) {
exact = true;
} else {
sig = f->matching_signature(params);
exact = false;
}
However, this requires walking the list of function signatures twice,
which also means walking each signature's formal parameter lists twice.
This could be rather expensive.
Since matching_signature already internally knows whether a match was
exact or not, we can just return it to get that information for free.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
We need something that looks like a compiler and not like some hacker
put some functions together. /rant
This is a band-aid for these two problems:
- The R600 and EG control-flow instructions appear in switch statements
next to each other, causing conflicts when adding new instructions.
- The ALU control-flow instructions are bitshifted by 3 (from CF_INST 26:29
to CF_INST 23:29, as is defined by r600 ISA) even for EG, where CF_INST
is 22:29.
To fix this mess, the 'inst' field is bitshifted to the left either by 22, 23,
or 26 (directly in the definitions), such that it can be just or'd when making
bytecode without any shifting. All switch statements have been divided into
two, one for R600 and the other for EG.
Of course, there is a better way to do this, but that is left for future
work.
Tested on RV730 and REDWOOD with no regressions.
v2: minor cleanup as per Alex's comment.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
This is also done in ir_to_mesa and st_glsl_to_tgsi, but that code
will be removed soon.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If they were disabled on entry, and we enabled one (like for
BlitFramebuffer), we wouldn't disable it on the way out. Retain the
attempted optimization here (don't keep calling to set each bit for
changes that won't matter) by just setting the bits directly with
appropriate flushing.
Fixes misrendering on the second draw of piglit fbo-blit.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
It's unlikely that we changed the object but no other texture
parameter, but be correct anyway. Noticed by inspection.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Also, actually update const_storage_size, therefore avoiding to
unnecessarily reallocate aligned_constant_storage every single time
draw_vs_set_constants() is called.
Reviewed-by: Brian Paul <brianp@vmware.com>
Although textureSize is represented as an ir_texture with op == ir_txs,
it doesn't have a coordinate, so normalizing it doesn't make sense.
Fixes crashes in oglconform glsl-bif-tex-size basic.samplerCube.* tests.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This patch solves three bugs.
1. When a texture was attached to the GL_DEPTH_STENCIL_ATTACHMENT point,
Mesa attached the texture only to the depth attachment point
gl_framebuffer::Attachment[BUFFER_DEPTH]
and failed to attach it to the stencil attachment point
gl_framebuffer::Attachment[BUFFER_STENCIL]
2. When a texture was attached to the GL_DEPTH_ATTACHMENT point and then
later attached to the GL_STENCIL_ATTACHMENT point, Mesa created two
separate renderbuffer wrappers. This caused a GL error in
glGetFramebufferAttachmentParameteriv().
3. Same as 2, but with depth and stencil juxtaposed.
Fixes Piglit test ARB_framebuffer_object/same-attachment-glFramebufferTexture2D-GL_DEPTH_STENCIL
Note: This is a candidate for the stable branches.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
The compiler setup for these VF-uploaded attributes looks a little
cheesy with mixing system values and real VBO-sourced attributes. It
would be nice if we could just compute the ATTR[] map to GRF index up
front and use it at visit time instead of using ir->location in the
ATTR file. However, we don't know the reg_offset at
visit(ir_variable *) time, so we can't do the mapping that early.
Fixes piglit vertexid test.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We only allow 16 vec4s of attributes in our GLSL/ARB_vp programs, and
1 more element will get used for gl_VertexID/gl_InstanceID. So it
should never have been possible to hit this fallback, unless there was
another bug. If you do hit this, you're probably using gl_VertexID
and falling back to swrast won't work for you anyway.
This also updates the limits for gen6+.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This used to be script-generated, but now it's just a bunch of static
variables in a .h file for no good reason.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
v2: check if pipe_buffer_map() returns NULL, and return NULL from
svga_vbuf_render_map_vertices(). Per Jose's suggestion.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Previously, if we failed to allocate a VBO (either for display list
compilation or immediate mode rendering) we'd eventually segfault
when trying to map the non-existant buffer or in a glVertex/Color/etc
call when we hit a null pointer.
Now we don't try to map non-existant buffers and if we do fail to
allocate a VBO we plug in no-op functions for glVertex/Color/etc
so we don't segfault.
None of the code in api_noop.c was used anymore. The new vbo_noop.c
functions are true no-ops. They'll be used to no-op glBegin/End functions
when we run out of VBO memory.
Only a handful of functions from api_noop.c are actually used by
the VBO module. Move them to the VBO module. With this change,
none of the code in api_noop.c is actually used anymore.
NEW_COLOR is only needed on Gen4-5 as brw_update_renderbuffer_surfaces
only uses ctx->Color when intel->gen < 6.
This should reduce unnecessary state updates.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Constant expressions which called GLSL's equal() and notEqual()
built-ins on bvecs would hit an assertion failure; we simply forgot to
implement them for booleans.
NOTE: This is a candidate for stable release branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
These simply don't exist in the 1.30 specification---none of the Offset
variants allow samplerCube. This must have been a cut and paste error
from textureGrad, which /does/ allow cubemaps.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Due to a cut and paste error, these were accidentally misnamed
textureProj() rather than textureProjOffset().
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
From the GLSL 1.30 spec, section 8.7 "Texture Lookup Functions":
"In all functions below, the bias parameter is optional for fragment
shaders. The bias parameter is not accepted in a vertex shader."
This was a cut and paste mistake.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
brw_wm_samplers actually enables any active samplers regardless of what
pipeline stage is using them, so it doesn't make much sense for it to be
WM-specific. So, rename it to "brw_samplers."
To properly generalize it, move sampler_count and sampler_offset from
brw_context::wm to a new brw_context::sampler that can be shared without
looking strange.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Like for the WM pull constants, we can merge the former prepare/emit
stages into one tracked state atom. Furthermore, the code that used to
handle the binding table was removed in the last commit, leaving some
rather silly looking short functions that can easily be folded in.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Although the hardware supports separate binding tables for each pipeline
stage, we don't see much advantage over a single shared table.
Consider the contents of the binding table:
- Textures (16)
- Draw buffers (8)
- Pull constant buffers (1 for VS, 1 for WM)
OpenGL's texture bindings are global: the same set of textures is
available to all shader targets. So our binding table entries for
textures would be exactly the same in every table.
There are only two pull constant buffers (not many), and although draw
buffers aren't interesting to the VS, it shouldn't hurt to have them in
the table. The hardware supports up to 254 binding table entries, and
we currently only use 26.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
First, the texturing setup code is relevant for all pipeline stages,
while renderbuffer surfaces are only used by the WM.
Secondly, renderbuffer and texture setup depends on a different set of
dirty bits. There's no reason to walk the array of textures when
changing draw buffers, or vice-versa.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
These were only split for historical reasons: brw_wm_constants used to
be the "prepare" step, while brw_wm_constant_surface was "emit". Now
that both happen at emit time, it makes sense to combine them.
Call the newly combined state atom "brw_wm_pull_constants" to indicate
help distinguish it from the Gen6+ atoms that handle push constants.
Finally, remove the BRW_NEW_WM_CONSTBUF dirty bit entirely now that it's
never flagged nor used.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
When reading the "brw_wm_constants" and "gen6_wm_constants" atoms
side-by-side, I initially failed to notice the crucial difference:
the Gen6 atoms are for Push Constants, while brw_wm_constants handles
Pull Constants. (Gen4/5 Push Constants are handled by "brw_curbe.")
Renaming these should clarify the code and save me from constant
confusion over the fact that "gen6_wm_constants" isn't just a newer
version of "brw_wm_constants."
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
This code is fairly fragile, as it depends on the ordering of the
entries in the binding table, which will change soon.
Also, stop listening on the BRW_NEW_WM_CONSTBUF dirty bit as it's no
longer required.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
These fields control how many entries the hardware prefetches into the
state cache, so they only impact performance, not correctness. However,
it's not clear how to use this in a way that's beneficial.
According to the documentation, kernels "using a large number" of
entries may wish to program this to zero to avoid thrashing the cache;
it's unclear how many is too many. Also, Ironlake's WM was missing this
feature entirely---the count had to be zero.
The dirty bit tracking to handle this complicates the surface state
and binding table setup; removing it should simplify things and make
future refactoring easier. So just set 0 for the number of entries
rather than trying to compute and track it.
Appears to have no impact on Nexuiz and OpenArena on Sandybridge.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
The comment states that brw_update_vs_constant_surface produces a
CACHE_NEW_SURF_BIND dirty bit, but it doesn't. In fact, that bit
no longer even exists.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
brw_vs_surfaces _produces_ the BRW_NEW_NR_VS_SURFACES dirty bit, so it
makes no sense for it to subscribe to it.
Fixes an assertion failure in many piglit tests when INTEL_DEBUG is set:
brw_state_upload.c:484: void brw_upload_state(struct brw_context *):
Assertion `!check_state(&examined, &generated)' failed.
One such piglit test is vs-uniform-array-mat2-col-rd.shader_test.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Comparing brw_upload_vs_pull_constants and brw_upload_wm_pull_constants,
it became evident that something was amiss: the VS code had both
CACHE_NEW_VS_PROG and BRW_NEW_VERTEX_PROGRAM, while the WM code was
missing the CACHE_NEW_WM_PROG flag.
Not observed to fix anything, but likely necessary.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Now that we have vtable entries in place, we should use them. This
allows us to drop the cut and pasted Gen7 brw_tracked_state atoms as
they now do exactly the same thing as their brw_wm_surface_state
counterparts.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Gen7+ SURFACE_STATE is different from Gen4-6, so we need separate
per-generation functions for creating and updating it. However, the
usage is the same, and callers just want to utilize the appropriate
functions with minimal pain. So, put them in the vtable.
Since these take a brw_context pointer and are only used on Gen4, just
add a forward declaration. This is the simplest (if not cleanest)
solution. It would be nicer to have a i965-specific vtable, but that's
a refactor for another day.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
These two are fairly unique types so add specific cases for decoding them.
Passes piglit fbo-clear-format and fbo-generatemipmap-format tests for these
two extensions.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This ports the softpipe NV_conditional_render support to llvmpipe.
This passes the nv_conditional_render-* piglit tests.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds a new function r600_need_cs_space. Currently, it's easy to overflow
the CS - queries are not counted in. I guess that's not the only case where
the driver may crap out.
This is the inverse operation to _mesa_pack_rgba_span_int. The 16-bit
code isn't done because of lack of testing and not being sure how sign
extension/clamping should be handled between, say, 16-bit int and
32-bit int or uint.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This requires using a new fragment shader to get the integer color
output, and a new vertex shader because #version has to match between
the two.
v2: Clarify that there's no need for BindFragDataLocation.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
We're missing support for the software paths still, but basic
rendering is working.
v2: Override RGB_INT32/UINT32 to not be renderable, since the hardware
can't do it but we do allow texturing from it now. Drop the
DataType override, since the _mesa_problem() isn't in that path
any more.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
Before, I was tracking the ir_variable * found for gl_FragColor or
gl_FragData[]. Instead, when visiting those variables, set up an
array of per-render-target fs_regs to copy the output data from. This
cleans up the color emit path, while making handling of multiple
user-defined out variables easier.
v2: incorporate idr's feedback about ir->location (changes by Kenneth Graunke)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When rendering to integer color buffers, we need to be careful to use
MRFs of the correct type when emitting color writes.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, brw_type_for_base_type returned UD for array variables,
similar to structures. For structures, each field may have a different
type, so every field access must explicitly override the register's type
with that field's type. We chose to return UD in this case since it was
the least common, so errors would be more obvious.
For arrays, it makes far more sense to return the type corresponding to
an element of the array. This allows normal array access to work
without the hassle of explicitly overriding the register's type.
This should obsolete a bunch of type overrides throughout the code.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: s/GL_TRUE/true/, and re-enable RGB_INT32 based on discussion
yesterday about required RB formats vs texture formats.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
This will let the feature be incrementally developed, hidden behind
the flag we're all using as we work on GL 3.0 support.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
While not required by any particular spec version, mplayer was asking
for L16 and hoping for actual L16 without checking. The 8 bits
allocated led to 10-bit planar video data stored in the lower 10 bits
giving only 2 bits of precision in video. While it was an amusing
effect, give them what they actually wanted instead.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=41461
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We want to be able to support some formats for texturing that we can't
render to, which means that some choices for RenderbufferStorage end
up being incomplete (for example, L8 currently). For these, where we
don't render to them, we don't want to have to make up an rb->DataType
that's only used for GetRow()/PutRow().
Commit 1401b96b (radeon: cleanup radeon shared code after r300 and
r600 classic drivers removal) removed the file
src/mesa/drivers/dri/radeon/server/radeon.h, but it left behind the
symlink which was used to share that file into the
src/mesa/drivers/dri/r200/server directory.
This patch removes the dangling symlink.
Reviewed-by: Alex Deucher <alexdeucher@gmail.com>
This patch modifies the GLSL linker to assign additional slots for
varying variables used by transform feedback, and record the varying
slots used by transform feedback for use by the driver back-end.
This required modifying assign_varying_locations() so that it assigns
a varying location if either (a) the varying is used by the next stage
of the GL pipeline, or (b) the varying is required by transform
feedback. In order to avoid duplicating the code to assign a single
varying location, I moved it into its own function,
assign_varying_location().
In addition, to support transform feedback in the case where there is
no fragment shader, it is now possible to call
assign_varying_locations() with a consumer of NULL.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Tested-by: Marek Olšák <maraeo@gmail.com>
This just validates the input parameters so far.
Fixes piglit's bindfragdata-invalid-parameters test.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Up until now modifying the GLSL compiler has been pretty straightforward.
This is where things get interesting. But still pretty straightforward.
Switch statements can be thought of a series of if/then/else statements.
Case labels are compared with the value of a test expression and the case
statements are executed if the comparison is true.
There are a couple of aspects of switch statements that complicate this simple
view of the world. The primary one is that cases can fall through sequentially
to subsequent case, unless a break statement is encountered, in which case,
the switch statement exits completely.
But break handling is further complicated by the fact that a break statement
can impact the exit of a loop. Thus, we need to coordinate break processing
between switch statements and loop statements.
The code generated by a switch statement maintains three temporary state
variables:
int test_value;
bool is_fallthru;
bool is_break;
test_value is initialized to the value of the test expression at the head of
the switch statement. This is the value that case labels are compared against.
is_fallthru is used to sequentially fall through to subsequent cases and is
initialized to false. When a case label matches the test expression, this
state variable is set to true. It will also be forced to false if a break
statement has been encountered. This forcing to false on break MUST be
after every case test. In practice, we defer that forcing to immediately after
the last case comparison prior to executing a case statement, but that is
an optimization.
is_break is used to indicate that a break statement has been executed and is
initialized to false. When a break statement is encountered, it is set to true.
This state variable is then used to conditionally force is_fallthru to to false
to prevent subsequent case statements from executing.
Code generation for break statements depends on whether the break statement is
inside a switch statement or inside a loop statement. If it inside a loop
statement is inside a break statement, the same code as before gets generated.
But if a switch statement is inside a loop statement, code is emitted to set
the is_break state to true.
Just as ASTs for loop statements are managed in a stack-like
manner to handle nesting, we also add a bool to capture the innermost switch
or loop condition. Note that we still need to maintain a loop AST stack to
properly handle for-loop code generation on a continue statement. Technically,
we don't (yet) need a switch AST stack, but I am using one for orthogonality
with loop statements, in anticipation of future use. Note that a simple
boolean stack would have sufficed.
We will illustrate a switch statement with its analogous conditional code that
a switch statement corresponds to by examining an example.
Consider the following switch statement:
switch (42) {
case 0:
case 1:
gl_FragColor = vec4(1.0, 2.0, 3.0, 4.0);
case 2:
case 3:
gl_FragColor = vec4(4.0, 3.0, 2.0, 1.0);
break;
case 4:
default:
gl_FragColor = vec4(0.0, 0.0, 0.0, 0.0);
}
Note that case 0 and case 1 fall through to cases 2 and 3 if they occur.
Note that case 4 and the default case must be reached explicitly, since cases
2 and 3 break at the end of their case.
Finally, note that case 4 and the default case don't break but simply fall
through to the end of the switch.
For this code, the equivalent code can be expressed as:
int test_val = 42; // capture value of test expression
bool is_fallthru = false; // prevent initial fall through
bool is_break = false; // capture the execution of a break stmt
is_fallthru |= (test_val == 0); // enable fallthru on case 0
is_fallthru |= (test_val == 1); // enable fallthru on case 1
is_fallthru &= !is_break; // inhibit fallthru on previous break
if (is_fallthru) {
gl_FragColor = vec4(1.0, 2.0, 3.0, 4.0);
}
is_fallthru |= (test_val == 2); // enable fallthru on case 2
is_fallthru |= (test_val == 3); // enable fallthru on case 3
is_fallthru &= !is_break; // inhibit fallthru on previous break
if (is_fallthru) {
gl_FragColor = vec4(4.0, 3.0, 2.0, 1.0);
is_break = true; // inhibit all subsequent fallthru for break
}
is_fallthru |= (test_val == 4); // enable fallthru on case 4
is_fallthru = true; // enable fallthru for default case
is_fallthru &= !is_break; // inhibit fallthru on previous break
if (is_fallthru) {
gl_FragColor = vec4(0.0, 0.0, 0.0, 0.0);
}
The code generate for |= and &= uses the conditional assignment capabilities
of the IR.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We now tie the grammar to the ctors of the ASTs they reference.
This requires that we actually have definitions of the ctors.
In addition, we also need to define "print" and "hir" methods for the AST
classes. The Print methods are pretty simple to flesh out. However, at this
stage of the development, we simply stub out the "hir" methods and flesh
them out later.
Also, since actual class instances get returned by the productions in the
grammar, we also need to designate the type of the productions that
reference those instances.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The grammar is modified to support switch statements. Rather than follow the
grammar in the appendix, which allows case labels to be placed ANYWHERE
as a regular statement, we follow the development of the grammar as
described in the body of the GLSL spec.
In this variation, the switch statement has a body which consists of a list
of case statements. A case statement is preceded by a list of case labels and
ends with a list of statements.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Data structures for switch statement and case label are created that parallel
the structure of other AST data.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It is needed for nv50's new shader backend. With this change, both u_math.h
and imports.h in core mesa define the same function. I have to #undef log2f
here to avoid the conflict. Not sure if there is a better way to deal with
the situation.
Acked-by: José Fonseca <jfonseca@vmware.com>
Switch all of the code in ir_to_mesa, st_glsl_to_tgsi, glUniform*,
glGetUniform, glGetUniformLocation, and glGetActiveUniforms to use the
gl_uniform_storage structures in the gl_shader_program.
A couple of notes:
* Like most rewrite-the-world patches, this should be reviewed by
applying the patch and examining the modified functions.
* This leaves a lot of dead code around in linker.cpp and
uniform_query.cpp. This will be deleted in the next patches.
v2: Update the comment block (previously a FINISHME) in _mesa_uniform
about generating GL_INVALID_VALUE when an out-of-range sampler index
is specified.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Tom Stellard <thomas.stellard@amd.com>
_mesa_ir_link_shader needs to be called before cloning the IR tree so
that the var->location field for uniforms is set.
WARNING: This change breaks several integer division related piglit
tests. The tests break because _mesa_ir_link_shader lowers integer
division to an RCP followed by a MUL. The fix is to factor out more
of the code from ir_to_mesa so that _mesa_ir_link_shader does not need
to be called at all by the i965 driver. This will be the subject of
several follow-on patches.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Tom Stellard <thomas.stellard@amd.com>
These were both useful debugging aids while developing this code.
log_uniform will be used to keep the MESA_GLSL=uniform behavior.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Tom Stellard <thomas.stellard@amd.com>
Connects all of the gl_program_parameter structures with the correct
gl_uniform_storage structures.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Tom Stellard <thomas.stellard@amd.com>
These functions are used to create and destroy the connections between
a uniform and the storage used by the driver to hold its value.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Tom Stellard <thomas.stellard@amd.com>
This function propagates the values from the backing storage of a
gl_uniform_storage structure to the driver supplied data locations.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Tom Stellard <thomas.stellard@amd.com>
v2: Remane class count_uniform_size based on feedback from Eric:
"Maybe just "count_uniform_size"? "usage" makes me think "way it's
dereferenced" or something."
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Tom Stellard <thomas.stellard@amd.com>
v2: Update a comment block about the different treatment of
location=-1 based on feedback from Ken.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Tom Stellard <thomas.stellard@amd.com>
Prepend _mesa_uniform_ to the names and rework the calling
convention. The calling convention was changed for a couple reasons.
1. Having a single variable named 'location' have completely different
meanings at different places in the function is confusing. Before
calling split_location_offset the location is the encoded value
returned by glGetUniformLocation. After calling split_location_offset
it's the index of the uniform in the gl_uniform_list::Uniforms array.
2. In a later commit the original value of 'location' is needed after
split_location_offset has been called.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Tom Stellard <thomas.stellard@amd.com>
v2: Update some comments based on feedback from Eric Anholt.
v3: Remove gl_uniform_storage::dirty field. Make
gl_uniform_storage::initialized be bool, and make
gl_uniform_storage::sampler be uint8_t.
v4: Include stdbool.h after Tom Stellard noticed a build failure that
was introduced by the changes in v2. Oops.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Tom Stellard <thomas.stellard@amd.com>
There are cases where we might want to internally query the location
of a uniform in a shader that failed linking.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Tom Stellard <thomas.stellard@amd.com>
Some C code will want access to the glsl_base_type and
glsl_sampler_dim enums in the near future.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Tom Stellard <thomas.stellard@amd.com>
The spec says "Only ClearBufferiv should be used to clear
stencil buffers." and "Only ClearBufferfv should be used to clear
depth buffers." However, on the following page it also says:
"The result of ClearBuffer is undefined if no conversion between
the type of the specified value and the type of the buffer being
cleared is defined (for example, if ClearBufferiv is called for a
fixed- or floating-point buffer, or if ClearBufferfv is called
for a signed or unsigned integer buffer). *This is not an error.*"
Emphasis mine.
Fixes problems with piglit's clearbuffer-invalid-drawbuffer test.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This fixes a regression from the recent glReadPixels changes found
with the piglit hiz tests.
Use either MESA_FORMAT_RGBA8888 or MESA_FORMAT_RGBA8888_REV for color
buffers depending on endian-ness. Before, the gl_renderbuffer::Format
field was MESA_FORMAT_RGBA8888 but the data was really stored as
MESA_FORMAT_RGBA8888_REV when using a little endian machine.
Getting this right matters now that we can access renderbuffer data
without going through the span functions (namely glReadPixels() +
MapRenderbuffer()).
These vars will just get overwritten when we call _mesa_add_renderbuffer()
anyway. We only need to set the InternalFormat field when we create the
software renderbuffer.
Reviewed-by: Eric Anholt <eric@anholt.net>
This fixes the glReadPixels() regression for reading from the front/back
color buffers.
Note, we only allow one mapping of an XImage/Pixmap renderbuffer
at any time. That might need to be revisited in the future.
One of the points of GL_ARB_texture_storage is to make it impossible
to have malformed mipmap stacks. If we know the texture object is
immutable, we can skip a bunch of size checking.
Commit a73c65c534 had a typo which
accidentally enabled the workaround-free Gen7 code on Gen6.
Fixes GPU hangs in anything using pow() or integer division/modulus.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
According to the documentation, Ivybridge's math instruction works in
SIMD16 mode for the fragment shader, and no longer forbids align16 mode
for the vertex shader.
The documentation claims that SIMD16 mode isn't supported for INT DIV,
but empirical evidence shows that it works fine. Presumably the note
is trying to warn us that the variant that returns both quotient and
remainder in (dst, dst + 1) doesn't work in SIMD16 mode since dst + 1
would be sechalf(dst), trashing half your results. Since we don't use
that variant, we don't care and can just enable SIMD16 everywhere.
The documentation also still claims that source modifiers and
conditional modifiers aren't supported, but empirical evidence and
study of the simulator both show that they work just fine.
Goodbye workarounds. Math just works now.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
It seems line loop stipple in hardware needs something I don't know, it might
need a proper geometry shader who knows.
Signed-off-by: Dave Airlie <airlied@redhat.com>
SPI semantic indices for PS/VS are now static, so we don't
need to update spi config for every shaders combination. We can move
the functionality of r600_spi_update to r600(evergreen)_pipe_shader_ps.
Flatshade state is now controlled by the global FLAT_SHADE_ENA flag
instead of updating FLAT_SHADE for all inputs.
Sprite coord still requires the update of spi setup when
sprite_coord_enable is first changed from zero (enabled), and then
only when it's changed to other non-zero value (enabled for other input).
Change to zero (disabling) and back to the same value is handled via
global SPRITE_COORD_ENA.
New field "sprite_coord_enable" added to "struct r600_pipe_shader"
to track current state for the pixel shader. It's checked in the
r600_update_derived_state.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
There is no need to duplicate semantic mapping which is done in hw, so get
rid of r600_find_vs_semantic_index.
TGSI name/sid pair is mapped to the 8-bit semantic index for SPI.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
gbm_gallium does not depend on DRI, but its build rules depend on DRI_LIB_DEPS
being set. Output an error when the user enables gbm_gallium but disables
DRI. This is just a workaround.
If the VS has outputs that aren't consumed by the FS we were mapping
them all to one unused VS output index, but that's illegal. Instead,
map unused VS outputs to unique indexes.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
The code expects the geometry shader to be NULL.
We don't have geometry shaders now, but it's good to be prepared.
v2: check for support in the cso context
SPI semantic indices for PS/VS are now static, so we don't
need to update spi config for every shaders combination. We can move
the functionality of r600_spi_update to r600(evergreen)_pipe_shader_ps.
Flatshade state is now controlled by the global FLAT_SHADE_ENA flag
instead of updating FLAT_SHADE for all inputs.
Sprite coord still requires the update of spi setup when
sprite_coord_enable is first changed from zero (enabled), and then
only when it's changed to other non-zero value (enabled for other input).
Change to zero (disabling) and back to the same value is handled via
global SPRITE_COORD_ENA.
New field "sprite_coord_enable" added to "struct r600_pipe_shader"
to track current state for the pixel shader. It's checked in the
r600_update_derived_state.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
There is no need to duplicate semantic mapping which is done in hw, so get
rid of r600_find_vs_semantic_index.
TGSI name/sid pair is mapped to the 8-bit semantic index for SPI.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
The readpixels microbenchmark in mesa-demos goes from 47Mpix/sec at
1000x1000 to 450Mpix/sec. The 10x10 sizes stay about the same.
Reviewed-by: Brian Paul <brianp@vmware.com>
Renderbuffer mapping handles flushing the batchbuffer if required, so
all we need to do is make sure any pending rendering has reached the
batchbuffer.
Reviewed-by: Brian Paul <brianp@vmware.com>
There doesn't appear to be any particular reason for this -- it's not
like the width is changing between the deref and the use.
Reviewed-by: Brian Paul <brianp@vmware.com>
In all of piglit, only two tests hit it (reading to RGBA float, where
GetRow would drop floats into place from R, RG, or RGB). Mostly this
is because _ColorReadClamp has been causing transferOps to always be
set, skipping any fast-paths anyway.
Reviewed-by: Brian Paul <brianp@vmware.com>
This may be a bit slower than before because we're switching from
per-format compiled loops in GetRow to
_mesa_unpack_rgba_block_unpack's loop around a callback to unpack a
pixel. The solution there would be to make _mesa_unpack_rgba_block
fold the span loop into the format handlers.
(On the other hand, function call overhead will hardly matter if
MapRenderbuffer means the driver gets the data into cacheable memory
instead of uncached).
The adjust_colors code should no longer be required, since the unpack
function does the 565 to float conversion in a single pass instead of
converting it (poorly) through 8888 as apparently happened in the
past.
Reviewed-by: Brian Paul <brianp@vmware.com>
This should be useful in making more generic fast paths in the pixel
paths.
v2: Add note about PACK_SWAP_BYTES, and fix up for endianness by
synchronizing with memcpy_texture paths in texstore.c.
Reviewed-by: Brian Paul <brianp@vmware.com>
This introduces two new span helper functions we'll want to use in
several places as we move to MapRenderbuffer, which pull out integer
depth and stencil values from a renderbuffer mapping based on the
renderbuffer format.
v2: Use format_unpack helper for stencil read.
v3: Clean up comment after conversion to format_unpack.
Reviewed-by: Brian Paul <brianp@vmware.com>
This also makes it handle 24/8 vs 8/24, fixing piglit
depthstencil-default_fb-readpixels-24_8 on i965. While here, avoid
incorrectly fast-pathing if packing->SwapBytes is set.
v2: Move the unpack code to format_unpack.c, fix BUFFER_DEPTH typo
v3: Fix signed/unsigned comparison.
Reviewed-by: Brian Paul <brianp@vmware.com>
This avoids going through the wrapper that has to rewrite the data for
packed depth/stencil. This isn't done in _swrast_read_stencil_span
because we don't want to map/unmap for each span.
v2: Move the unpack code to format_unpack.c.
v3: Fix signed/unsigned comparison.
Reviewed-by: Brian Paul <brianp@vmware.com>
i965's MUL instruction can't take an immediate value as its first
argument. So normally, if constant propagation wants to propagate a
constant into the first argument of a MUL instruction, it swaps the
order of the two arguments.
This doesn't work for 32-bit integer (and unsigned integer)
multiplies, because the MUL operation is asymmetric in that case (it
multiplies 16 bits of one operand by 32 bits of the other).
Fixes piglit tests {vs,fs}-multiply-const-{ivec4,uvec4}.
Reviewed-by: Eric Anholt <eric@anholt.net>
If we're drawing sprites and the fragment shader needs both auto-
generated texcoords and user-defined varying vars we need to use
this fallback path.
The reason is when we enable auto texcoord generation, it gets
enabled for all texcoord sets. And that clobbers the user-defined
varying vars.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
If we use the draw-module for wide point/line/etc drawing we'll need
a fragment shader too (like we pass in the vertex shader).
This fixes sprite point rendering when forcing the swtnl path.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
The state tracker may generate shaders that use generic vs outputs /
fs inputs like:
DCL IN[0], GENERIC[0]
DCL IN[1], GENERIC[10]
DCL IN[2], GENERIC[11]
This patch remaps 0, 10, 11 to small integers like 1, 2, 3 so that we
stay inside the SVGA3D limit (8).
The remapping is done to both the vertex shader outputs and the
fragment shader inputs. The same mapping must be used for a vs/fs
pair.
Note that 'union svga_compile_key' is now 'struct svga_compile_key'
because we needed to add the register remapping table. The change in
size isn't really significant though (it's not a search key).
Also, add assertions when building up SVGA3D src/dst registers to we
don't try to store too large of value for the bitfield size.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Missed this back in the arb_robustness branch
<6b329b9274b18c50f4177eef7ee087d50ebc1525>.
NOTE: This is a candidate for the 7.11 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The returned value should be a texture target index, not a bit.
I spotted this from seeing a new compiler warning caused by the increase
in the number of texture targets. This has been broken for a long time.
Note: This is a candidate for the 7.11 branch.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This requires tracking a couple extra fields in ir_variable:
* A flag to indicate that a variable had an initializer.
* For non-const variables, a field to track the constant value of the
variable's initializer.
For variables non-constant initalizers, ir_variable::has_initializer
will be true, but ir_variable::constant_initializer will be NULL. The
linker can use the values of these fields to check adherence to the
GLSL 4.20 rules for shared global variables:
"If a shared global has multiple initializers, the initializers
must all be constant expressions, and they must all have the same
value. Otherwise, a link error will result. (A shared global
having only one initializer does not require that initializer to
be a constant expression.)"
Previous to 4.20 the GLSL spec simply said that initializers must have
the same value. In this case of non-constant initializers, this was
impossible to determine. As a result, no vendor actually implemented
that behavior. The 4.20 behavior matches the behavior of NVIDIA's
shipping implementations.
NOTE: This is candidate for the 7.11 branch. This patch also needs
the preceding patch "glsl: Refactor generate_ARB_draw_buffers_variables
to use add_builtin_constant"
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=34687
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Paul Berry <stereotype441@gmail.com>
Incrementing "td" before initializing it is
pointless and just leads to an uninitialized
variable warning with MSVC.
Signed-off-by: Christian König <deathsimple@vodafone.de>
This is an OpenGL ES specific extension. External textures are textures that
may be sampled from, but not be updated (no glTexSubImage* and etc.). The
image data are taken from an EGLImage.
Reviewed-by: Brian Paul <brianp@vmware.com>
Acked-by: Jakob Bornecrantz <jakob@vmware.com>
GL_TEXTURE_RECTANGLE_NV (and soon GL_TEXTURE_EXTERNAL_OES) is special. Handle
it in its own if-block. There should be no functional change.
Reviewed-by: Brian Paul <brianp@vmware.com>
Acked-by: Jakob Bornecrantz <jakob@vmware.com>
This extension introduces a new sampler type: samplerExternalOES.
texture2D (and texture2DProj) can be used to do a texture look up in an
external texture.
Reviewed-by: Brian Paul <brianp@vmware.com>
Acked-by: Jakob Bornecrantz <jakob@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
3-bit fields are used store texture target in several places. That will fail
when TEXTURE_EXTERNAL_INDEX, which happends to be the 9th texture target, is
added. Make them 4-bit fields.
Reviewed-by: Brian Paul <brianp@vmware.com>
Acked-by: Jakob Bornecrantz <jakob@vmware.com>
remove another long if condition test. I don't feel a strong need of
this patch. But for it make the code a little simpler(I do think so),
I send it out.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
glRenderbufferStorage man page says:
GL_INVALID_VALUE is generated if either of width or height is negative,
or greater than the value of GL_MAX_RENDERBUFFER_SIZE.
NOTE: this is a candidate for the 7.11 branch
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
EXT_framebuffer_object bspec says:
Get Value Type Get Command Initial Value
------------------------------- ------ ----------- -----------
RENDERBUFFER_INTERNAL_FORMAT_EXT Z+ GetRenderbufferParameterivEXT RGBA
NOTE: this is a candidate for the 7.11 branch
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The ARB_texture_swizzle spec says:
The error INVALID_OPERATION is generated if TexParameteri,
TexParameterf, TexParameteriv, or TexParameterfv, parameter <pname>
is TEXTURE_SWIZZLE_R, TEXTURE_SWIZZLE_G, TEXTURE_SWIZZLE_B,
or TEXTURE_SWIZZLE_A, and <param> is not RED, GREEN, BLUE, ALPHA,
ZERO, or ONE.
The error INVALID_OPERATION is generated if TexParameteriv, or
TexParameterfv, parameter <pname> TEXTURE_SWIZZLE_RGBA, and the four
consecutive values pointed to by <param> are not all RED, GREEN, BLUE,
ALPHA, ZERO, or ONE.
So, the GL_TEXTURE_SWIZZLE* pname is legal for glTexParameterf(v)
NOTE: this is a candidate for the 7.11 branch
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
When a vertex shader input attribute is declared with an integral type
(e.g. ivec4), we need to ensure that the generated vertex shader code
addresses the vertex attribute register using the proper register
type. (Previously, we assumed all vertex shader input attributes were
floating-point).
In addition, when uploading vertex data that was specified with
VertexAttribIPointer, we need to instruct the vertex fetch unit to
convert the data to signed or unsigned int, rather than float. And
when filling in the implied w=1 on a vector with less than 4
components, we need to fill it in with the integer representation of 1
rather than the floating-point representation of 1.
Fixes piglit tests vs-attrib-{ivec4,uvec4}-precision.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch ensures that gl_client_array::Integer is properly set to
GL_TRUE for vertex attributes specified using glVertexAttribIPointer,
and to GL_FALSE for vertex attributes specified using
glVertexAttribPointer, so that the vertex attributes can be
interpreted properly by driver back-ends.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
When converting an expression like "++x" to GLSL IR we were failing to
account for the possibility that x might be an unsigned integral type.
As a result the user would receive a bogus error message "Could not
implicitly convert operands to arithmetic operator".
Fixes piglit tests {vs,fs}-{increment,decrement}-uint.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
TODO: check if GetImage works with passing the pitch as width, similar to PutImage,
which avoids the extra copy, ala dri_sw_displaytarget_display() in src/gallium/winsys/sw/dri/dri_sw_winsys.c
This is a cleanup of commit 02f1b50987.
Update tex buffer using a dri_drawable hook from implemented in sw/drisw.c.
This saves us the duplication of dri_drawable.c.
CC: Stuart Abercrombie <sabercrombie@chromium.org>
CC: Stéphane Marchesin <marcheu@chromium.org>
Certain exports (position, point size, etc.) are treated
specially by the shader and not counted as generic exports.
Note the exports and any relevant related state bits.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This fixes issues with the code playing fast and loose with types of
buffers, and as a bonus avoids the wrappers that were previously used
to pull bits out of packed depth/stencil buffers.
Reviewed-by: Brian Paul <brianp@vmware.com>
Some of the return values were u32, some were 24 bits, and z16
returned 16 bits. The caller would have to do all the work of
interpreting the format all over again. However, there are no callers
of this function at this point.
Reviewed-by: Brian Paul <brianp@vmware.com>
Perhaps the easiest implementation, nouveau can directly map buffers
even if tiled, and uses separate surfaces for its texture
renderbuffers so we don't have to worry about that offset.
Reviewed-by: Brian Paul <brianp@vmware.com>
Unlike intel, we do a blit to/from GTT memory in order to
untile/retile the renderbuffer data, since we don't have fence
registers for accessing it.
(There is software tiling code in radeon_tile.c, but it's unused and
doesn't support macro tiling)
Reviewed-by: Brian Paul <brianp@vmware.com>
v2: Add separate stencil S8 W-tile swizzling/deswizzling. Tested for
the swizzling case with env INTEL_SEPARATE_STENCIL=1 INTEL_HIZ=1
./bin/hiz-depth-stencil-test-fbo-d24-s8
v3: Apply Chad's fix for S8 window system buffers.
Reviewed-by: Chad Versace <chad@chad-versace.us>
Mesa core's is generic for things like osmesa.
For swrast_dri.so, we have to do Y flipping. The front-buffer path
isn't actually tested, though, because both before and after it fails
with a BadMatch in XGetImage.
Reviewed-by: Brian Paul <brianp@vmware.com>
This reverts commit abaebcee78.
The assertion I made was that "the zero-copy code in validation" would
zero copy. Of course, I deleted that check back in January because
the two sites that would trigger it (glTexImage() and this one) both
immediately bound their mt to the object, making the other check
pointless.
Removes two extra blits in glx-tfp. Also fixed the Android home
screen, which wasn't rendering because the extra copy broke the
relationship between the texture and the eglimage.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=42152
Tested-by: Chad Versace <chad@chad-versace.us>
Tested-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
This also fixes the build error due to missing link_uniforms.cpp in the source
lists.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Chad Versace <chad@chad-versace.us>
[olv: the missing link_uniforms.cpp was added before this patch is committed]
With the hope that Android.mk and SConscript can share the file to reduce
future breakage.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Chad Versace <chad@chad-versace.us>
MinGW uses MSVC's runtime DLLs for most of C runtime's functions, and
there has same semantics for vsnprintf.
Not sure how this worked until now -- maybe one of the internal
vsnprintf implementations was taking precedence.
Previously, the vertex and fragment shader back-ends assumed that all
varyings were floats. In GLSL 1.30 this is no longer true--they can
also be of integral types provided that they have an interpolation
qualifier of "flat".
This required two changes in each back-end: assigning the correct type
to the register that holds the varying value during shader execution,
and assigning the correct type to the register that ties the varying
value to the rest of the graphics pipeline (the message register in
the case of VS, and the payload register in the case of FS).
Fixes piglit tests fs-int-interpolation and fs-uint-interpolation.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
This function is similar to get_base_type(), but when called on
arrays, it returns the scalar type composing the array. For example,
glsl_type(vec4[]) => float_type.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
i965 graphics hardware has two floating point modes: ALT and IEEE. In
ALT mode, floating-point operations never generate infinities or NaNs,
and MOV instructions translate infinities and NaNs to finite values.
In IEEE mode, infinities and NaNs behave as specified in the IEEE 754
spec.
Previously, we used ALT mode for all vertex and fragment programs,
whether they were GLSL programs or ARB programs. The GLSL spec is
sufficiently vague about how infs and nans are to be handled that it
was unclear whether this mode was compliant with the GLSL 1.30 spec or
not, and it made it very difficult to test the isinf() and isnan()
functions.
This patch changes i965 GLSL programs to use IEEE floating-point mode,
which is clearly compliant with GLSL 1.30's inf/nan requirements. In
addition to making the Piglit isinf and isnan tests pass, this paves
the way for future support of the ARB_shader_precision extension.
Unfortunately we still have to use ALT floating-point mode when
executing ARB programs, because those programs require 0^0 == 1, and
i965 hardware generates 0^0 == NaN in IEEE mode.
Fixes piglit tests "isinf-and-isnan fs_fbo", "isinf-and-isnan vs_fbo",
and {fs,vs}-{isinf,isnan}-{vec2,vec3,vec4}.
The implementations are as follows:
isinf(x) = (abs(x) == +infinity)
isnan(x) = (x != x)
Note: the latter formula is not necessarily obvious. It works because
NaN is the only floating point number that does not equal itself.
Fixes piglit tests "isinf-and-isnan fs_basic" and "isinf-and-isnan
vs_basic".
This patch adds the extension '.ir' to all the files in
src/glsl/builtins/ir/, and changes generate_builtins.py so that it no
longer globs on '*' to find the files to build. This prevents
spurious files (such as EMACS' infamous *~ backup files) from breaking
the build.
The implementation of ir_binop_nequal in constant_expression_value()
appears to have been copy-and-pasted from the implementation of
ir_binop_equal, but with all instances of '==' changed to '!='. This
is correct except for one minor flaw: one of those '==' operators was
in an assertion checking that the types of the two arguments were
equal. That one needs to stay an '=='.
Fixes piglit tests {fs,vs}-inline-notequal.
svga keeps a small queue of similar primitive draws in order to coalesce
them into a single draw primitive command.
But the buffers referred in primitives not yet emitted were being ignored
in the considerations to flush or not the context.
This fixes piglit vbo-map-remap, vbo-subdata-sync, vbo-subdata-zero, and
Seeker.
Based on investigation and patch from Brian Paul.
Reviewed-By: Brian Paul <brianp@vmware.com>
Forgot to destroy the pipe context on xa context destroy.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
pb_debug_manager_dump was trying to take a lock already
held by all callers.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Jos Fonseca <jfonseca@vmware.com>
This drops all the old drmSupports* checks since KMS does them all, and it
also drop R300_CLASS and R600_CLASS.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
It caught one possible bug I recall in my time working on the driver,
and we haven't been setting it for non-fixed-function since the new FS
backend came along. The bug it caught was likely a confusion about
sampler mappings, which we have tests for these days.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Paul Berry <stereotype441@gmail.com>
This was the last prepare() function, and it's the first state atom,
so it must be ready to move.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Paul Berry <stereotype441@gmail.com>
I don't really want to touch this impenetrable code in this series, so
just call the one function from the other, since no other atom cares
about them.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Paul Berry <stereotype441@gmail.com>
While other units need to know about our constant buffer offsets,
nothing else cared about which particular BO other than the emit() half.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Paul Berry <stereotype441@gmail.com>
This rearranges the code a bit, and makes the upload of the binding
table take only as many surfaces as there are in use.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Paul Berry <stereotype441@gmail.com>
These produce BRW_NEW_SURFACES (used by binding table emit()) and
BRW_NEW_NR_WM_SURFACES (used by WM unit emit()). Fixes a bug where
with no texturing and no color buffer, we wouldn't consider the null
renderbuffer in nr_surfaces. This was harmless because nr_surfaces is
only used for the prefetch info in the unit state.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Paul Berry <stereotype441@gmail.com>
These produce BRW_NEW_SURFACES (used by binding table emit()) and
BRW_NEW_NR_WM_SURFACES (used by WM unit emit()). Fixes a bug where
with no texturing and no color buffer, we wouldn't consider the null
renderbuffer in nr_surfaces. This was harmless because nr_surfaces is
only used for the prefetch info in the unit state.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Paul Berry <stereotype441@gmail.com>
This is part of a series trying to eliminate the separate prepare()
hook in state upload. The prepare() hook existed to support the
check_aperture in between calculating state updates and setting up the
batch, but there should be no reason for that any more.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Paul Berry <stereotype441@gmail.com>
As we move state to emit() time from prepare() time, a couple of the
places that flag fallbacks will move here.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Paul Berry <stereotype441@gmail.com>
We were doing the BO validate step in prepare() (brw_validate_state())
hooks of atoms so that we could check_aperture before emitting the
relocation trees during brw_upload_state() that would actually make
the batchbuffer reference too much memory to be executed. Now that
all relocations occur in the batchbuffer, we can instead
check_aperture after emitting our state into the batchbuffer, and
easily roll back, flush, and retry if we happened to go over the
limits.
This will let us remove the whole prepare() vs emit() split in our
state atoms, which is a source of tricky dependencies and duplicated
code.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Paul Berry <stereotype441@gmail.com>
This will be used to avoid the prepare() step in the i965 driver's
state setup. Instead, we can just speculatively emit the primitive
into the batchbuffer, then check if the batch is too big, rollback and
flush, and replay the primitive.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Paul Berry <stereotype441@gmail.com>
This could have broken always_flush_cache on i965, since
reserved_space doesn't reflect the size of the workaround flushes, and
we might run out of space. This should make always_flush_cache more
useful on pre-i965, anyway (since the point is to flush around each
draw call, even within a batchbuffer).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Paul Berry <stereotype441@gmail.com>
Replace pipe->flush() with pipe->texture_barrier() in
the texture upload path for the staging texture.
This should be enough to get data out of the gpu
caches ready to be read for texture fetch.
It's not useful for anything.
The rest of the patch is just a cleanup resulting
from some of the variables being no longer used.
There are no piglit regressions.
With the recent changes to interpolation stuff, we can now get the value
direct from the program instead of just being fail.
fixes some of the glsl-1.30 interpolation tests with softpipe
Signed-off-by: Dave Airlie <airlied@redhat.com>
DRI2 supports this now - and already enables it explicitly - but drisw
does not and should not. Otherwise toolkits like clutter will only ever
SwapBuffers once and wait forever for an event that's not coming.
Signed-off-by: Adam Jackson <ajax@redhat.com>
Previously check_resources could fail, but we'd still try to optimize
the shader, do device-specific code generation, etc. In some cases,
this could explode (especially in the device-specific code
generation). I haven't found that I could trigger this with the
current code. When too many samplers were used with the new uniform
handling code, I observed several crashes deep down in the driver.
NOTE: This is candidate for the 7.11 branch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=41609
Cc: Eric Anholt <eric@anholt.net>
Reviewed-and-tested-by: Kenneth Graunke <kenneth@whitecape.org>
Previously a shader like
int X;
struct X { int i; };
void main() { gl_Position = vec4(0.0); }
would generate two error message:
0:2(19): error: struct `X' previously defined
0:2(20): error: incomplete declaration
The first one is the real error, and the second is spurious.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Other parts of the code already caught things like 'float x[4][2]'.
However, nothing caught 'float [4] x[2]'.
Fixes piglit test array-multidimensional-new-syntax.vert.
NOTE: This is candidate for the 7.11 branch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The idea here is to set up the message header with the Sampler State
pointer which the hardware provides as part of the PS Thread Payload in
register g0.
Unfortunately, the existing code
fs_reg(GRF, 0, BRW_REGISTER_TYPE_UD))
actually references "virtual GRF 0" rather than the hardware g0. This
is just some arbitrary GRF temporary which will get register allocated.
So, we ended up setting up the header with garbage.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Fixes the 1000000.0 overflow cases of piglit
GL_EXT_packed_float/pack.c
Reviewed-by: Marek Ol ák <maraeo@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
From the GL_EXT_packed_float spec:
For an RGBA color, if <type> is not one of FLOAT,
UNSIGNED_INT_5_9_9_9_REV_EXT, or UNSIGNED_INT_10F_11F_11F_REV_EXT,
or if the CLAMP_READ_COLOR_ARB is TRUE, or CLAMP_READ_COLOR_ARB
is FIXED_ONLY_ARB and the selected color (or texture) buffer is
a fixed-point buffer, each component is first clamped to [0,1].
Then the appropriate conversion formula from table 4.7 is applied
the component."
(but we previously resolved that the CLAMP_READ_COLOR bit is not
relevant to glGetTexImage())
This fixes most of the cases in piglit GL_EXT_packed_float/pack.
Reviewed-by: Marek Ol ák <maraeo@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This is only used in the code for packing to INF, and resulted in an
extra bit set that was set anyway, so it was harmless except for the
confusion caused.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
From page 22 (28 of PDF) of GLSL 1.30 spec:
It is an error to provide a literal integer whose magnitude is too
large to store in a variable of matching signed or unsigned type.
Unsigned integers have exactly 32 bits of precision. Signed integers
use 32 bits, including a sign bit, in two's complement form.
Fixes piglit int-literal-too-large-0[123].frag.
v2: Take care with INT_MIN, use stroull, and make it a function.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
There's no sense in building a broken driver. Previously, there was
the potential of building a DRI1-only driver that would work for DRI1
and fail on DRI2 because the newer libdrm code wasn't present. Now
the radeon build system should be matching intel and nouveau.
This can probably be reduced even further by moving this logic to the
scissor state update or just removing the logic entirely, but I don't
trust myself in radeon quite that much.
It's past time, and it was going to get in the way of the renderbuffer
mapping refactor. We dropped all the other DRI1 drivers for this
release, and I can't imagine anybody supporting DRI1 radeon classic in
a new release of Mesa.
Diff produced by treating kernel_mm as true, deleting the DRI1 paths
that produce kernel_mm false, and deleting code.
It's past time, and it was going to get in the way of the renderbuffer
mapping refactor. We dropped all the other DRI1 drivers for this
release, and I can't imagine anybody supporting DRI1 radeon classic in
a new release of Mesa.
Cleanup of the resulting dead code to follow.
Acked-by: Alex Deucher <alexander.deucher@amd.com>
These are effectively doing type->get_base_type()->base_type, which is
equivalent to type->base_type. Just use that, as it's simpler.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
And not all existing queries. The only reason we have that list is to be able
to suspend and resume the active ones.
This reduces looping over queries when suspending and resuming.
The queries no longer have to track some of their states.
We weren't setting TEX_SEM_WAIT on instructions that read the value of a
TEX instruction and also wrote the same register as the TEX instruction.
This is the sequence we were miscompiling:
1: TEX temp[0], input[2].xy__, 2D[0]
...
16: src0.xyz = temp[22], src1.xyz = temp[0], src2.xyz = temp[19]
MAD temp[0].xyz, src0.xxx, src1.xyz, src2.xxx
https://bugs.freedesktop.org/show_bug.cgi?id=42090
This required the following changes:
- WM setup now makes the appropriate set of barycentric coordinates
(perspective vs. noperspective) available to the fragment shader,
based on whether the shader requires perspective interpolation,
noperspective interpolation, both, or neither.
- The fragment shader backend now uses the appropriate set of
barycentric coordiantes when interpolating, based on the
interpolation mode returned by
ir_variable::determine_interpolation_mode().
- SF setup now uses gl_fragment_program::InterpQualifier to determine
which attributes are to be flat shaded (as opposed to the old logic,
which only flat shaded colors).
- CLIP setup now ensures that the clipper outputs non-perspective
barycentric coordinates when they are needed by the fragment shader.
Fixes the remaining piglit tests of interpolation qualifiers that were
failing:
- interpolation-flat-*-smooth-none
- interpolation-flat-other-flat-none
- interpolation-noperspective-*
- interpolation-smooth-gl_*Color-flat-*
Reviewed-by: Eric Anholt <eric@anholt.net>
The name was misleading. The actual effect of the bit is to cause
the clipper to emit *non-perspective* barycentric coordinate
information (which is only needed when doing noperspective
interpolation).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This patch changes how fs_visitor::emit_general_interpolation()
decides what kind of interpolation to do. Previously, it used the
shade model to determine how to interpolate colors, and used smooth
interpolation on everything else. Now it uses
ir_variable::determine_interpolation_mode(), so that it respects GLSL
1.30 interpolation qualifiers.
Fixes piglit tests interpolation-flat-*-smooth-{distance,fixed,vertex}
and interpolation-flat-other-flat-{distance,fixed,vertex}.
Reviewed-by: Eric Anholt <eric@anholt.net>
This patch modifies the fragment shader back-end so that instead of
using a single delta_x/delta_y register pair to store barycentric
coordinates, it uses an array of such register pairs, one for each
possible intepolation mode.
When setting up the WM, we intstruct it to only provide the
barycentric coordinates that are actually needed by the fragment
shader--that is computed by brw_compute_barycentric_interp_modes().
Currently this function returns just
BRW_WM_PERSPECTIVE_PIXEL_BARYCENTRIC, because this is the only
interpolation mode we support. However, that will change in a later
patch.
Reviewed-by: Eric Anholt <eric@anholt.net>
This patch modifies the special case in
fs_visitor::split_virtual_grfs() that prevents splitting from being
applied to the delta_x/delta_y register pair (this register pair needs
to remain contiguous so that it can be used by the PLN instruction).
When gen>=6, this register pair is in a fixed location, not a virtual
register, so it was in no danger of being split. And
split_virtual_grfs' attempt not to split it was preventing some other
unrelated register from being split.
Reviewed-by: Eric Anholt <eric@anholt.net>
This function determines how a variable should be interpolated based
both on interpolation qualifiers and the current shade model.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, we treated the 'smooth' qualifier as equivalent to no
qualifier at all. However, this is incorrect for the built-in color
variables (gl_FrontColor, gl_BackColor, gl_FrontSecondaryColor, and
gl_BackSecondaryColor). For those variables, if there is no qualifier
at all, interpolation should be flat if the shade model is GL_FLAT,
and smooth if the shade model is GL_SMOOTH.
To make this possible, I added a new value to the
glsl_interp_qualifier enum, INTERP_QUALIFIER_NONE.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This patch makes GLSL interpolation qualifiers visible to drivers via
the array InterpQualifier[] in gl_fragment_program, so that they can
easily be used by driver back-ends to select the correct interpolation
mode.
Previous to this patch, the GLSL compiler was using the enum
ir_variable_interpolation to represent interpolation types. Rather
than make a duplicate enum in core mesa to represent the same thing, I
moved the enum into mtypes.h and renamed it to be more consistent with
the other enums defined there.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Without this it's possible to wind up in a draw call with the
glBegin/End VBO still in a mapped state. This is a problem for
the SVGA3D driver and probably not good for other HW drivers.
Now that texture borders are gone, we never need to allocate our
textures through non-miptrees, which simplifies some irritating paths.
v2: Remove the !mt support case from intel_map_texture_image()
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
Reviewed-by: Brian Paul <brianp@vmware.com>
This replaces software rendering of textures with the deprecated
1-pixel border (which is always bad, since mipmapping is rather broken
in swrast, and GLSL 1.30 is unsupported) with hardware rendering that
just pretends there was never a border (so you have potential seams on
apps that actually intentionally used the 1-pixel borders, but correct
rendering otherwise).
This doesn't regress any piglit tests on gen6 (since the texwrap
border/bordercolor cases already failed due to broken border color
handling), but regresses texwrap border cases on original gen4 since
those end up sampling the border color instead of the border pixels.
It's a small price to pay for not thinking about texture borders any
more.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
We wanted to reuse this in the Intel driver.
v2: Move the flag to ctx->Const
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
Reviewed-by: Brian Paul <brianp@vmware.com>
The intel driver (and gallium, it looks like, though it doesn't use
these texstore functions at this point) doesn't bother making storage
for textures with 0 width, height, or depth. This avoids them having
to deal with returning a mapping for that nonexistent data.
Fixes assertion failures with an upcoming intel driver change.
Reviewed-by: Brian Paul <brianp@vmware.com>
This can be useful if you want to create a bunch of temporary strings
with a common prefix. For example, when iterating over uniform
structure fields, one might want to create temporary strings like
"pallete.primary", "palette.outline", and "pallette.shadow".
This could be done by overwriting the '.' with a null-byte and calling
ralloc_asprintf_append, but that incurs the cost of strlen("pallete")
every time...when this is already known.
These new functions allow you rewrite the tail of the string, given a
starting index. If the starting index is the length of the string, this
is equivalent to appending.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Consider the following vertex shader and fragment shader:
// vertex shader
varying vec4 v;
uniform vec4 u;
void main() { gl_Position = vec4(0.0); v = u; }
// fragment shader
void main() { gl_FragColor = vec4(0.0); }
Since the fragment shader does not use 'v', it is demoted from a
varying to a simple global variable. Once that happens, the
assignment to 'v' is useless, and it should be removed. In addition,
'u' is no longer active, and it should also be removed.
Performing extra dead code elimination after demoting shader inputs
and outputs takes care of this. This elimination must occur before
assigning uniform locations, or the declaration of 'u' cannot be
removed.
This change *breaks* the piglit test getuniform-01, but that test is
already incorrect. The test uses a vertex shader that assigns to a
user-defined varying, but it has no fragment shader. Since Mesa does
not support ARB_separate_shader_objects (we only support the EXT
version), the linker correctly eliminates the user-defined varying.
The cascading effect is that the uniform queried by the C code of the
test is also (correctly) eliminated.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=41980
Tested-by: Brian Paul <brianp@vmware.com>
Cc: Bryan Cain <bryancain3@gmail.com>
Cc: Vinson Lee <vlee@vmware.com>
Cc: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
These should be useful for doing transform feedback on Sandybridge.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
These are correct to the best of my knowledge, gleaned from a variety of
internal sources. Sadly, the Sandybridge PRM has incorrect limits.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The inconsistency between vs_max_threads and max_vs_entries was rather
annoying. I could never seem to remember which one was reversed, which
made it harder to find quickly. "Max __ Threads" seems more natural.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
According to the docs for 3DSTATE_PS (Gen7+) and 3DSTATE_WM (Gen6),
there is a platform dependent value for the minimum number of pixel
shader threads. It may also vary based on whether WIZ Hashing is on.
For example, Ivybridge requires at least 4 threads if WIZ hashing is
disabled, and 8 if it's enabled. Programming it to use less threads is
illegal. Sandybridge appears to have similar restrictions.
So on newer platforms, INTEL_DEBUG=sing will probably just hang the GPU.
Rather than try to patch it up for newer platforms and extend it to
support geometry shaders, just remove it as it isn't that useful anyway.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
For GL_RGB_SCALE and GL_ALPHA_SCALE targets, the API wrapper code
attempts to ensure the parameter is 1.0, 2.0, or 4.0.
This is unnecessary: set_combiner_scale in texenv.c (called by
_mesa_TexEnvfv) already checks this and raises an appropriate error.
It's also incorrect: For glTexEnvx, the API validation code directly
compares the GLfixed input parameter with a floating point constant,
prior to converting fixed-point to floating point.
Fixes an issue in the OpenGL ES 1.1 conformance suite.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Kill the code paths taken when src_mt is null. It is never null, otherwise
there would be a segfault on line 4 of this function:
GLuint width = src_mt->level[level].width;
(Some interleaved lines in the diff make the real diff non-obvious. All
I did was delete some code and then left-shifted what remained to correct
the indentation.)
Reviewed-by: Eric Anholt <eric@aholt.net>
Signed-off-by: Chad Versace <chad@chad-versace.us>
A driver trying to set up builtin uniforms is faced with a problem:
How do I walk the ir_variable structure (representing an array of
structs, or array of matrices, or struct, or whatever), and set up
driver structures so that dereference of that uniform gets the
corresponding ParameterValues[] entry. The rule in general is that
each corresponding vector-sized field of an array of structs is one
builtin uniform state slot. i965 relied on another invariant: each
state slot has a number of unique channel swizzles corresponding to
the number of elements in the field's vector, to avoid needing to walk
the glsl_type in parallel to get at vector_elements.
All of the builtin uniforms followed this behavior, except for
gl_NormalMatrix. That's a mat3 (so 3 vec3s), but it was swizzled as 3
vec4s.
Fixes piglit glsl-fs-normalmatrix.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
This is the new name for gl_MaxVaryingFloats now that non-float
varyings exist. Fixes piglit
glsl-1.30/execution/maximums/gl_MaxVaryingFloats
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
In commit 3e5d3626, Eric added a homebrew workaround to fix GPU hangs in
the Mesa "engine" demo and oglc's api-texcoord test.
Unfortunately, his PIPE_CONTROL contains a Depth Stall, which
necessitates the post-sync non-zero workaround,
Fixes GPU hangs in Civilization 4, PlaneShift, and 3DMMES.
Hopefully Heroes of Newerth as well, though I haven't tested that.
NOTE: This is candidate for the 7.11 branch.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=40324
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=41096
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-and-tested-by: Eric Anholt <eric@anholt.net>
I had a colleague hitting issues compiling with an old gcc3.2
system. These patches got them through.
NOTE: This is a candidate for the 7.11 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
Use VRAM for static and immutable buffers. This restores the
recently removed r600g winsys behaviour for memory locations.
This also improoves rendering times on the gpu for some
OpenSceneGraph based test cases by about 15%.
Signed-off-by: Marek Olšák <maraeo@gmail.com>
Make sure we do not run into the classic ABA problem on buffer object bind,
reusing this name and may be never rebind since we get an new name
that was just deleted and never rebound in between.
The explicit rebinding to the debault object in the current context
prevents the above in the current context, but another context
sharing the same objects might suffer from this problem.
Minor var renaming and comments edited by Brian.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Signed-off-by: Brian Paul <brianp@vmware.com>
Buffer objects may be shared across contexts.
Rework the array attrib push/pop implementation
to be thread safe. Make use of more library functions
for this purpose.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
In the past, swrast_texture_image::Data has been overloaded. It could
either point to malloc'd memory storing texture data, or it could point
to a current mapping of GPU memory.
Now, Buffer always points to malloc'd memory (if we're not using GPU
memory) and Data always points to mapped memory. The next step would
be to rename Data -> Map.
This change also involves adding swrast functions for mapping textures
and renderbuffers prior to rendering to setup the Data pointer. Plus,
corresponding functions to unmap texures and renderbuffers. This is
very much like similar code in the dri drivers.
Only swrast and the drivers that fall back to swrast need these fields now.
This removes the last of the fields related to software rendering from
gl_texture_image.
In lp_build_stencil_op() the incoming 'stencil' var is a 2-element array.
There's a front-face writemask and a back-face writemask but we're ignoring
the later. This patch doesn't fix anything but at least points out the
problem.
v2: Avoid the C99 rounding functions, because I don't trust
get/setting the C99 rounding mode from inside our library not having
other side effects. Instead, open-code roundEven() behavior around
Mesa's IROUND, which we're already testing for C99 rounding mode
safety.
Fixes glsl-1.30/compiler/built-in-functions/round*
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
- Fix _GNUC__ typo in both checks
- Fix logic error in check for gcc < 3.4 that breaks for gcc 2.x & older
Without this fix, builds with gcc 3.4.x end up depending on undefined
_mesa_bitcount instead of gcc's __builtin_popcount.
NOTE: This is a candidate for the stable branches.
Signed-off-by: Alan Coopersmith <alan.coopersmith@oracle.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Should fall back to shader based decoding (g3dvl) for now.
This is probably broken on systems that support xvmc, because
nouveau_video_buffer_create has no way to know for what api
the buffer is created, so I think this call might need a
separate argument as workaround.
Signed-off-by: Maarten Lankhorst <m.b.lankhorst@gmail.com>
The phase instance counts are not necessarily redeclared so with
the separation of declarations and instructions we wouldn't know
which instance count applies to which phase.
Correct linkage requires examining the signature itself, it cannot
be reconstructed from declarations only since unused registers may
have been omitted from them.
We don't want to clutter the code or handicap new hardware for
the sake of ancient GPUs on which d3d1x won't ever be used,
much less be fully compliant, anyway.
We were mis-computing the size of the user-space vertex buffer in
some circumstances. This led to a failed assertion at u_inlines.h:222
when using the VMware svga driver.
For example, if we had arrays such as:
array[0]: element_offset = 12, stride = 24
array[1]: element_offset = 0, stride = 24
We'd mistakenly compute 'bytes' to be 12 bytes too small.
I've reorganized the function too. By time it's called, we know that
we've got interleaved arrays either all in one VBO or all in user memory
and the stride is equal for all arrays.
Move the code that lived inside the attr==0 test after the loop.
In the loop we compute the true vertex size. That size factors into the
pipe->redefine_user_buffer() call later. Using the vertex size instead
of array[0]'s element_offset fixes the failed assertion.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Setting MaxIfDepth to UINT_MAX effectively means "don't lower anything."
Explicitly checking for this common case allows us to avoid walking the
IR, computing nesting levels, and so on.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Bryan Cain <bryancain3@gmail.com>
Commit 488fe51cf8 converted the EmitNoIfs
flag to MaxIfDepth, an unsigned integer saying "flatten if-statements
nested beyond this depth."
Unfortunately, i965 left this initialized to 0, which made ir_to_mesa
attempt to flatten all if-statements. We didn't notice right away
because we usually throw away ir_to_mesa's code in favor of the native
VS and FS backends...but this still creates a lot of unnecessary work.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Now that this is identical to gen6_wm_constants, just use that instead.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This makes it match gen6_prepare_wm_push_constants. For some reason, it
had been using AUB_TRACE_NO_TYPE.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
We definitely want CACHE_NEW_WM_PROG, not CACHE_NEW_VS_PROG.
NOTE: This is a candidate for the 7.11 branch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The DDX may allocate a buffer with a too small size.
Instead of failing, let's pretend everything's alright.
Such bugs should be fixed in the DDX, of course.
NOTE: This is a candidate for the stable branches.
The condmod instruction ends up generating garbage condition codes,
because apparently the comparison happens on the accumulator value (33
bits for UD), not the truncated value that would be written.
Fixes vs-op-neg-*
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The condmod instruction ends up generating garbage condition codes,
because apparently the comparison happens on the accumulator value (33
bits for UD), not the truncated value that would be written.
Fixes fs-op-neg-*
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
When there is no ARB_vertex_program program enabled, the Current
pointer points at a default program, so we were always using
VERTEX_PROGRAM_TWO_SIDE, even for fixed function lighting.
Fixes piglit two-sided-lighting*
Reviewed-by: Brian Paul <brianp@vmware.com>
From the GL 2.1 specification, page 114 (page 128 of the PDF):
"The version of PixelStore that takes a floating-point value
may be used to set any type of parameter; if the parameter is
boolean, then it is set to FALSE if the passed value is 0.0
and TRUE otherwise, while if the parameter is an integer, then
the passed value is rounded to the nearest integer."
Fixes piglit roundmode-pixelstore.
Note: This is a candidate for the 7.11 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
Simply generate GL_INVALID_OPERATION error at display list mode. As
explained by Brian, we are going to access PBO data at compile time.
No need to defer the error at execution time.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Using IROUND() to convert a float depth value to a 32-bit uint Z value.
didn't work (it returns a signed value). Just use a cast instead
Fixes piglit fbo-depth-array failure with swrast.
Note: this is a candidate for the 7.11 branch.
This code isn't really relevant since the kernel takes care not
to destroy busy GMR buffers.
Also with the advent of fence objects, the code was incorrect since
it didn't refcount fence handles.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
Wrap _mesa_unpack_bitmap to handle the case that data is stored in pixel
buffer object.
This would make calling Bitmap with data stored in PBO by display list work.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
v2: quote the spec; explicitly exclude the GL_BITMAP case to make code
more readable. (comments from Ian)
v3: Cast the offset by GLintptr to remove the compile warning(comments
from Brian).
I also found that I should use _mesa_sizeof_packed_type() instead,
as it includes packed pixel type, like GL_UNSIGNED_SHORT_5_6_5.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The patch(based on the reading of the emulator) came from while I was
trying to fix the oglc pbo texImage.1PBODefaults fail. This case
generates a texture with the width and height equal to window's width
and height respectively, then try to texture it on the whole window.
So, it's exactly one texel for one pixel. And, the min filter and mag
filter are GL_LINEAR. It runs with swrast OK, as expected. But it failed
with i965 driver.
Well, you can't tell the difference from the screen, as the error is
quite tiny. From my digging, it seems that there are some tiny error
happened while getting tex address. This will break the one texel for
one pixel rule in this case. Thus the linear result is taken, with tiny
error.
This patch would fix all oglc pbo subcase fail with the same issue on
both ILK, SNB and IVB.
v2: comments from Ian, make the address_round filed assignment consistent.
(the sampler is alread memset to 0 by the xxx_update_samper_state
caller, so need to assign 0 first)
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Generate the program parameters list by walking the IR instead of by
walking the list of linked uniforms. This simplifies the code quite a
bit, and is probably a bit more correct. The list of linked uniforms
should really only be used by the GL API to interact with the
application.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: Bryan Cain <bryancain3@gmail.com>
Cc: Eric Anholt <eric@anholt.net>
Having a few of these includes or forward declarations inside the
'extern "C"' block can cause problems later. Specifically, it
prevents C++ linkage functions from being added to ir_to_mesa.h and
makes G++ angry if 'struct foo' is seen both inside and outside an
'extern "C"'.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fold _mesa_get_active_uniform into its only caller in the process.
More changes are coming soon.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This simplificiation was enabled by the earlier refactors that
eliminated the references to the assembly shaders stored in the
gl_shader_program structure.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Finding this bit in the documentation proved challenging. It wasn't in
the SEND instruction's message descriptor section, nor the data port
message descriptor section. It turns out to be part of the Render
Target Write message's control bits, and in the documentation is named
"Last Render Target Select".
Shaders that use Multiple Render Targets should set this bit on the last
RT write, but not on any prior ones.
The GPU does update the Pixel Scoreboard appropriately, but doesn't
document this bit as directly causing a scoreboard clear.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
After printing the details of a specific message, we always print out
the message length and response length with nice "mlen" and "rlen"
labels.
For Gen5+ URB writes, we were dumping mlen and rlen a second time:
urb 0 urb_write interleave used complete mlen 5, rlen 0 mlen 5 rlen 0
Also, for Gen6 data port messages, we were including mlen and rlen in
the tuple of undecipherable integers.
Both of these are completely redundant. So, remove them.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Every brw_set_???_message function had duplicated code, per-generation,
to set the Message Descriptor and Extended Message Descriptor bits
(SFID, message length, response length, header present, end of thread).
However, these fields are actually specified as part of the SEND
instruction itself; individual types of messages don't even specify
them (except for header present, but that's in the same bit location).
Since these are exactly the same regardless of the message type, just
create a function to set them, using the generic message structs. This
not only shortens the code, but hides a lot of the per-generation
complexity (like the SFID being in destreg__conditionalmod) in one spot.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The existing code asserted that eot == 0, as it doesn't make sense for
a thread to sample a texture as the last thing it does.
It doesn't make much sense to pass around a dead parameter either.
Especially for a function which already has a long parameter list.
So, remove the parameter and just set EOT to 0.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
When reading the data port code, it was not clear to me what these
values meant, nor where I could find them in the documentation.
Especially since the latest BSpec and older PRMs document them in
radically different places...neither of which are near the descriptions
of individual messages.
Cite the documentation, and rename them to SFID to signify that these
are Shared Function IDs that one can read about in the GPU overview,
rather than arbitrary bitfields. While we're add it, make them an enum.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Currently, we use the Render Cache for scratch access (read/write data)
and the Sampler Cache for all read only data (pull constants).
Reversing the condition here is clearer: if the caller requested the
Render Cache, use that. Otherwise, they requested the Data Cache
(which does not exist on Gen6) or Sampler Cache, so use the Sampler
Cache.
This should not change behavior in any way.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Using the constant cache for reads isn't going to work for scratch
reads (variably-indexed arrays or register spills), as these aren't
constant at all.
Also, in the new VS backend, use the proper message number for OWord
Dual Block Write messages. It's now 10, instead of 9.
+205 piglits.
NOTE: This is a candidate for the 7.11 branch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
While reviewing some compiler cleanups I'd sent out, Paul noticed that
tree grafting wasn't taking "out" parameters into account.
Further investigation revealed that it isn't strictly necessary: ir_call
ends basic blocks, and tree grafting currently only operates on basic
blocks. So calls already kill grafts.
However, just to be safe, this patch makes "out" parameters explicitly
kill grafts. Paul and I both prefer this. It's a bit clearer.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
The 'mode' param is a bitset of GL_MAP_READ_BIT, GL_MAP_WRITE_BIT.
A future commit will perform buffer resolves in intel_region_map(). So,
even though the access mode is irrelevant to the GTT, the extra
information allows us to intelligently avoid unneccessary buffer resolves.
Signed-off-by: Chad Versace <chad@chad-versace.us>
Add the following to the vtbl:
hiz_resolve_depthbuffer
hiz_resolve_hizbuffer
For all drivers for which HiZ is not enabled, the methods are set to be
no-ops. If HiZ is enabled, the methods are currently to set to empty
stubs.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad@chad-versace.us>
intel_context::gen field is set by intelInitContext(). So, by calling
intelInitContext() before initializing the vtable, we can can construct
different vtables for different gens.
Specifically, this allows us to set the HiZ operations to be no-ops for
contexts for which HiZ is not enabled.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad@chad-versace.us>
During anholt's MapTextureImage refactoring, the call to
intel_tex_image_s8z24_create_renderbuffers was missplaced. It needs to
occur *after* the miptree is allocated.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Don't dereference the color buffer if one isn't attached.
This fixes the following Piglit tests in my experimental HiZ branch:
glean/logicOp
glean/paths
Note: This is a candidate for the stable branches.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad@chad-versace.us>
This is necessary because i965 will need to call vbo_bind_array() when
cleaning up after a buffer resolve meta-op.
Detailed Explanation
--------------------
The vbo module tracks vertex attributes separately from the gl_context.
Specifically, the vbo module maintins vertex attributes in
vbo_exec_context::array::inputs, which is synchronized with
gl_context::Array::ArrayObj::VertexAttrib by vbo_bind_array().
vbo_draw_arrays() calls vbo_bind_array() to perform the synchronization
before calling the real draw call, vbo_context::draw_arrays.
Intel hardware accomplishes buffer resolves with a meta-op. Frequently,
that meta-op must be performed within glDraw* in the moment immediately
before the draw occurs (The hardware designers hate us...). After
performing the meta-op, but before calling vbo_bind_array(), the
gl_context's vertex attributes will have been restored to their original
state (that is, their state before the meta-op began), but the vbo
module's vertex attribute are those used in the last meta-op. Therefore we
must manually synchronize the two with vbo_bind_array() before continuing
with the original draw command (that is, the one requested with glDraw*).
See brw_predraw_resolve_buffers(), which will be added in a future commit.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Chad Versace <chad@chad-versace.us>
This hook allows the driver to prepare for a glBegin/glEnd.
i965 will use the hook to avoid avoid recursive calls to FLUSH_VERTICES
during a buffer resolve meta-op.
Detailed Justification
----------------------
When vertices are queued during a glBegin/glEnd block, those vertices must
of course be drawn before any rendering state changes. To enusure this,
Mesa calls FLUSH_VERTICES as a prehook to such state changes. Therefore,
FLUSH_VERTICES itself cannot change rendering state without falling into
a recursive trap.
This precludes meta-ops, namely i965 buffer resolves, from occuring while
any vertices are queued. To avoid that situation, i965 must satisfy the
following condition: that it queues no vertex if a buffer needs resolving.
To satisfy this, i965 will use the PrepareExecBegin hook to resolve all
buffers on entering a glBegin/glEnd block.
--------
v2: Don't add dd_function_table::CleanupExecEnd. Anholt and I discovered
that hook to be unnecessary.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Chad Versace <chad@chad-versace.us>
In some cases, Intel hardware requires that depth and stencil buffers be
separate. To accommodate swrast, i965 resorts to hackery that causes
a segfault in the fastpaths of draw_depth_stencil_pixels() and
read_depth_stencil_pixels().
The hack is that i965 sets framebuffer->Attachment[BUFFER_DEPTH].Renderbuffer
and framebuffer->Attachment[BUFFER_STENCIL].Renderbuffer to a dummy
renderbuffer for which the GetRow accessors and friends are null. The real
buffers are located at framebuffer->_DepthBuffer and framebuffer->_Stencilbuffer.
To fix the segault, this patch skips the fastpath if
framebuffer->Attachment[BUFFER_DEPTH].Renderbuffer->GetRow is null.
Note: This is a candidate for the 7.11 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Chad Versace <chad@chad-versace.us>
When i965 uses (in the near future) meta-ops to perform buffer resolves,
the meta-op stack exceeds depth 2. I bumped it to 8 because... 8 is bigger
than 2, but not too big.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Chad Versace <chad@chad-versace.us>
If this flag is set, then _mesa_meta_begin/end will save/restore the state of
GL_SELECT and GL_FEEDBACK render modes.
Intel's future buffer resolve meta-ops will require this, since buffer resolves
may occur when the GL_RENDER_MODE is GL_SELECT.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Chad Versace <chad@chad-versace.us>
This is required in order for meta-ops to save/restore the GL_RENDER_MODE
state.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Chad Versace <chad@chad-versace.us>
I initially produced the patch using this bash command:
for file in {intel,i915,i965}/*.{c,cpp,h}; do [ ! -h $file ] && sed -i
's/GLboolean/bool/g' $file && sed -i 's/GL_TRUE/true/g' $file && sed -i
's/GL_FALSE/false/g' $file; done
Then I manually added #include <stdbool.h> to fix compilation errors,
and converted a few functions back to GLboolean that were used in core
Mesa's function pointer table to avoid "incompatible pointer" warnings.
Finally, I cleaned up some whitespace issues introduced by the change.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Chad Versace <chad@chad-versace.us>
Acked-by: Paul Berry <stereotype441@gmail.com>
It was previously under gpu_shader4, but I'm pretty sure everyone's
going to be doing GLSL 1.30 first (since gpu_shader4 is basically 1.30
plus a bunch of extra stuff).
Fixes piglit glsl-1.30/texel-offset-limits.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When saving the active program in _mesa_meta_begin, it was actually
saving the fragment program instead. This means that if the
application binds a program that only has a vertex shader then when
the meta saved state is restored it will forget the bound program.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=41969
Reviewed-by: Chad Versace <chad@chad-versace.us>
This is a step towards providing a direct route for drivers accepting
GLSL IR for codegen. Perhaps more importantly, it runs the fixed
function fragment program through the GLSL IR optimization. Having
seen how easy it is to make ugly fixed function texenv code that can
do unnecessary work, this may improve real applicatinos.
On converting fixed function programs to generate GLSL, the linker
became cranky that we were trying to make something that wasn't a
linked vertex+fragment program. Given that the Mesa GLES2 drivers
also support desktop GL with EXT_sso, just telling the linker to shut
up seems like the easiest solution.
As pointed out by Michel Dänzer, gcc -lstdc++ doesn't work on all systems,
because it may require other libraries which are only pulled in implicitly
by g++. And libstdc++ is available only with GNU compiler.
Use c++ compiler for linking and remove redundant LDFLAGS += -lstdc++
all over the tree.
In addition to setting up the flags correctly, this renames the
generated libraries to ensure they get 'Mangled' in the name.
This is very useful for distros and the like, where mangled Mesa
and non-mangled GL libraries typically need to be installed
side-by-side.
Reviewed-by: Dan Nicholson <dbn.lists@gmail.com>
Scalar instruction that need to write to the xyz components of a
register must reserve the RGB instruction slot for a REPL_ALPHA
instruction. With this commit, the scheduler will attempt to free
the RGB slot by moving the write to the w component of a register.
Introuduce a simple function called copy_data to do the image data copy
stuff for all the save_CompressedTex*Image function. The function check
the NULL data case to avoid some potential segfault. This also would
make the code a bit simpler and less redundance.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Fix is in {read,draw}_depth_stencil_pixels(). If depthRb == stencilRb,
then it is redundant to check depthRb->x *and* stencilRb->x.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Chad Versace <chad@chad-versace.us>
For glReadPixels, the user supplied pixels have format
GL_UNSIGNED_INT_24_8. But, when the depthstencil buffer's format was
MESA_FORMAT_S8_Z24, the fastpath read from the buffer without reordering
the depth and stencil bits. To fix this, this patch just skips the
fastpath when the format is not MESA_FORMAT_Z24_S8.
The problem and fix for glWritePixels is analagous.
Fixes the Piglit tests below on i965/gen6 and causes no regressions.
general/depthstencil-default_fb-drawpixels-24_8
general/depthstencil-default_fb-readpixels-24_8
EXT_packed_depth_stencil/fbo-depthstencil-GL_DEPTH24_STENCIL8-drawpixels-24_8
EXT_packed_depth_stencil/fbo-depthstencil-GL_DEPTH24_STENCIL8-readpixels-24_8
Note: This is a candidate for the stable branches.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Chad Versace <chad@chad-versace.us>
This is required for an accurate implementation of d3d1x's
CheckFormatSupport query.
It also seems generally useful for state trackers, which could
choose alternative rendering paths or formats if blending would
come at a significant performance loss.
The texture semaphore allows for prefetching of texture data. On my
RV515, this increases the FPS of Lightsmark by 33% (This is with the
reg_rename pass enabled, which is enabled in the next commit).
There is a new env variable now called RADEON_TEX_GROUP, which allows
you to specify the maximum number of texture lookups to do at once.
The default is 8, but different values could produce better results
for various application / card combinations.
We no longer emit full instructions immediately after they have been
merged. Instead merged instructions are added to the ready list and
the scheduler can commit them whenever it wants.
This is supported by the pseudo-code on pages 27 and 28 (pages 41 and
42 of the PDF) of the OpenGL 2.1 spec. The last part of the
implementation of ArrayElement is:
if (generic attribute array 0 enabled) {
if (generic vertex attribute 0 array normalization flag is set, and
type is not FLOAT or DOUBLE)
VertexAttrib[size]N[type]v(0, generic vertex attribute 0 array element i);
else
VertexAttrib[size][type]v(0, generic vertex attribute 0 array element i);
} else if (vertex array enabled) {
Vertex[size][type]v(vertex array element i);
}
Page 23 (page 37 of the PDF) of the same spec says:
"Setting generic vertex attribute zero specifies a vertex; the
four vertex coordinates are taken from the values of attribute
zero. A Vertex2, Vertex3, or Vertex4 command is completely
equivalent to the corresponding VertexAttrib* command with an
index of zero."
Fixes piglit test attribute0.
NOTE: This is a candidate for stable branches.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Don't allow any "CPU" buffers to be allocated by the pb_fenced
buffer manager, since we can't protect against failures during
buffer validation.
Also, add an extra slab buffer manager to allocate buffers from
the kernel if there is a failure to allocate from our big buffer pool.
The reason we use a slab manager for this, is to avoid allocating
many very small buffers from the kernel.
v2: Increased VMW_MAX_BUFFER_SIZE and fixed some comments.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
Returns a configuration that makes the dri state-tracker-manager
throttle.
Also disable kernel-based throttling.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
Hooks up throttling if there is a configuration function present and
it indicates that throttling is desired.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
Adds a possibility for the state tracker manager to query the
target for a specific configuration.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
But don't hook it up just yet until we figure out a good way to do that.
Also, we should, in the future, add driconf options to control what
throttling reasons should be honored, and the number of outstanding
swaps allowed.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
The X server has limited throttle support on the server side,
but doing this in the client has some benefits:
1) X server throttling is per client. Client side throttling can be done
per drawable.
2) It's easier to control the throttling based on what client is run,
for example using "driconf".
3) X server throttling requires drm swap complete events.
So implement a dri2 throttling extension intended to be used by direct
rendering clients.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
Reviewed-by: Michel Dänzer <michel@daenzer.net>
This change releases the stw_framebuffer::mutex past creation of
the pbuffer stw_framebuffer. Without this change the pbuffers
lock is never released. Since on win32 mutexes are recursive, this
does not hurt as long as all actions on a context are done from
the same thread. But if, for example, context creation happens in
a different thread than usage, every access to the context will
block for ever.
Signed-off-by: José Fonseca <jfonseca@vmware.com>
In commit 018ea68d87, when I
de-compacted clip planes on Gen6+, I updated both the old and new VS
back-ends to reflect the change in how clip planes are stored, but I
failed to change the code in gen6_vs_state.c that uploads clip plane
constants when using the old VS back-end.
As a result, if the set of enabled clip planes wasn't contiguous
starting with 0, then clipping would not occur properly. This patch
corrects gen6_vs_state.c to upload clip plane constants in the new
de-compacted form.
This only affects the old VS back-end (which is used for
fixed-function and ARB vertex programs, not for GLSL vertex shaders).
Fixes Piglit test fixed-clip-enables.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=41603
Reviewed-by: Eric Anholt <eric@anholt.net>
This fixes a bug where we'd wind up emitting an invalid instruction like
MOVE R[0]., R[1]; - note the empty/zero writemask. If we don't write to
any dest register channels, cull the instruction.
v2: simply change/fix the existing test for instruction culling.
Instead of the renderbuffer pointer. In the future, attaching a texture
may not mean the renderbuffer pointer gets set too.
Plus, remove some commented-out assertions.
v2: add a 'reading' parameter to distinguish between reading and writing
to the renderbuffer (we don't want to check if _ColorReadBuffer is null
when we're about to draw). Eric found this mistake.
These functions were only called in framebuffer.c where they were defined.
Remove the unneeded attIndex parameter too.
Reviewed-by: Eric Anholt <eric@anholt.net>
What I would prefer to assert is that, for each region that is currently
mapped, no batch is emitted that uses that region's bo. However, it's much
easier to implement this big hammer.
Observe that this requires that the batch flush in intel_region_map() be
moved to within the map_refcount guard.
v2: Add comments (borrowed from anholt's reply) explaining why the
assertion is a good idea.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad@chad-versace.us>
When updating a register reference to reflect the fact that we were
taking its absolute value, the fragment shader back-end failed to
clear the negate flag, resulting in abs(-x) getting computed as
-abs(x).
I also found (and fixed) a similar problem in brw_eu.h, but I'm not
aware of an actual manifestation of that problem.
Fixes piglit test glsl-fs-abs-neg-with-intermediate.
brw_set_compression_control took a GLboolean as an argument, then
promptly used a switch statement to compare it with various enumeration
values. Clearly it's not actually a boolean.
Introduce a new enumeration type, enum brw_compression, and use that.
Found by converting GLboolean to bool; clang then gave warnings about
switching on a boolean and ultimately duplicated case errors.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Chad Versace <chad@chad-versace.us>
Neither OES_framebuffer_object nor EXT_framebuffer_object allow
querying the window system FBO.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Previously GL_DEPTH_BUFFER and GL_STENCIL_BUFFER were (incorrectly)
allowed for both. Those enums don't even really exist! Now GL_DEPTH
and GL_STENCIL are only allowed for the window system FBO.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This adds support to the clear and tile caches for integer storage
and clearing, avoiding any floating paths.
Signed-off-by: Dave Airlie <airlied@redhat.com>
these are never USCALED, always UINT in reality.
taken from some work by Christoph Bumiller
v2: fixup formatting of table + tabs
Signed-off-by: Dave Airlie <airlied@redhat.com>
Previously it was getting set in draw_set_mapped_constant_buffer() but
if there were no shader constants, that function wasn't called. So the
pt.user.planes field was null and we died when we tried to access the
clip planes in the LLVM-generated code.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=41663
Note: This is a candidate for the 7.11 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Instead of 12 use DRAW_TOTAL_CLIP_PLANES. The max number of user-defined
clip planes was increased to 8 so the total number of planes is 14.
This doesn't fix any specific bug, but clearly the old code was wrong.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
For example, GL_TRIANLGES is converted to _3DPRIM_TRILIST.
The conversion is necessary because HiZ and MSAA resolve operations emit
a 3DPRIM_RECTLIST, which cannot be conveyed by GLenum.
As a consequence, brw_gs_prog_key.primitive is also converted.
v2
----
- [anholt] Split brw_set_prim into brw/gen6 variants in previous commit,
since not much code is really shared between the two.
- [anholt] Replace switch statements with table lookups, since this is
a hot path.
Reviewed-by: Eric Anholt <eric@anho.net>
Signed-off-by: Chad Versace <chad@chad-versace.us>
The "slight optimization to avoid the GS program" in brw_set_prim() is not
used by Gen 6, since Gen 6 doesn't use a GS program. Also, Gen 6 doesn't use
reduced primitives.
Also, document that intel_context.reduced_primitive is only used for Gen < 6
Reviewed-by: Eric Anholt <eric@anho.net>
Signed-off-by: Chad Versace <chad@chad-versace.us>
now that we have integer texture types I can drop this workaround so that
copies of values is done properly (as floats would fail on some corner cases).
Signed-off-by: Dave Airlie <airlied@redhat.com>
glDeleteProgram should only be able to remove the one refcount for the
user's reference to the program from the hash table (even though that
ref does live on in the hash table until the last other ref is
removed).
Fixes piglit ARB_shader_objects/delete-repeat.
Reviewed-by: Chad Versace <chad@chad-versace.us>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
PIPE_CONTROL reported time stamp are 64 bits value incrementing every
80 ns, and only the low 32 bits are active (high 32 are always 0).
v2: Cleaned up whitespace, function arguments (anholt).
Fixes piglit EXT_timer_query/time-elapsed
Signed-off-by: Zou Nan hai <nanhai.zou@intel.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
The rest of the linker/glsl translation code checks for NULL, so I suppose we should check here too. Fixes crash on exit with i915g instanced drawing.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
If there is not enough space in pushbuffer for fence emission
(nouveau_fence_emit -> nv50_screen_fence_emit -> MARK_RING),
the pushbuffer is flushed, which through flush_notify ->
nv50_default_flush_notify -> nouveau_fence_update marks currently
emitting fence as flushed. But actual emission is done after this mark.
So later when there is a need to wait on this fence and pushbuffer
was not flushed in between, fence wait will never finish causing
application to hang.
To fix this, introduce new fence state between AVAILABLE and EMITTED,
set it before emission and handle it everywhere.
Additionally obtain fence sequence numbers after possible flush in
MARK_RING, because we want to emit fences in correct order.
Reviewed-by: Christoph Bumiller <e0425955@student.tuwien.ac.at>
Note: This is a candidate for the 7.11 branch.
We need add a new set of fragment shader variants, along with new vertex
elements for signed and unsigned clears.
The new fragment shader variants are due to the integers values requiring
CONSTANT interpolation. The new vertex element descriptions are for passing
the clear color as an unsigned or signed integer value.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds the various mesa->gallium and gallium->mesa format conversions
along with the GL->gallium texture choosers for integers.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This add support for unsigned/signed integer types via adding a 'pure' bit
in the format description table. It adds 4 new u_format get/put hooks,
for get/put uint and get/put sint so that accessors can get native access
to the integer bits. This is used to avoid precision loss via float converting
paths.
It doesn't add any float fetchers for these types at the moment, GL doesn't
require float fetching from these types and I expect we'll introduce a lot
of hidden bugs if we start allowing such conversions without an API mandating
it.
It adds all formats from EXT_texture_integer and EXT_texture_rg.
0 regressions on llvmpipe here with this.
(there is some more follow on code in my gallium-int-work branch, bringing
softpipe and mesa to a pretty integer clean state)
v2: fixup python generator to get signed->unsigned and unsigned->signed
fetches working.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This fixes up the integer format choosing to pick the closest mesa format
then the most likely fallback.
(the formatting in this file needs cleaning in another patch).
Signed-off-by: Dave Airlie <airlied@redhat.com>
This just adds a simple packing for GL_UNSIGNED_INT/GL_INT destination formats.
This is enough for at least the gallium drivers to pack both unsigned and signed types for read pixels.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Most of these functions used three spaces for the first level of
indentation, but four spaces for the next level. One used tabs and then
three spaces. Some used 3/4 in a then block but 3/3 in the else block.
Normally I try to avoid field days like this, but since the functions
were so inconsistent, even internally, it was making it difficult to
edit without introducing spurious whitespace changes.
So, just get it over with. git diff -b shows 0 lines changed.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
i915 and i830 hardware doesn't have HiZ, so remove all HiZ related
assertions from *update_draw_buffer().
I've removed the dead format checks completely rather than replace them
with more appropriate checks. This doesn't reduce "assertion coverage",
however, because when I added these HiZ related assertions in c8fdf66
there were no pre-existing checks there.
Signed-off-by: Chad Versace <chad@chad-versace.us>
Silences a warning about comparing to an unsigned variable. It looks like
the result of swizzle_for_size() is always assigned to unsigned vars.
Reviewed-by: Chad Versace <chad@chad-versace.us>
This fixes failures found with the new piglit texsubimage test.
Two things were broken:
1. The dxt code doesn't handle sources images where width != row stride.
Check for that and take the _mesa_make_temp_ubyte_image() path to get
an image where width = rowstride.
2. If we don't take the _mesa_make_temp_ubyte_image() path we need to
take the source image unpacking parameters into account in order to
get the proper starting memory address of the source texels.
Note: This is a candidate for the 7.11 branch.
Docs say that default shader input color input need to be spec
as ARGB8888. And a clear rect prim essentially uses this value
instead of default diffuse. Depth on the other hands is an ieee
32 bit float. Clear stencil is U8.
Completely different are the clear values for zone init prims.
These are speced in the actual output pixel layout (and need
to be repeated for 16 bit formats).
Clear up the confusion by adding some comments.
v2: Retain the target swizzling support added by Stephan Marchesin.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Previously, if the user enabled a non-consecutive set of clip planes
(e.g. 0, 1, and 3), the driver would compact them down to a
consecutive set starting at 0. This optimization was of dubious
value, and complicated the implementation of gl_ClipDistance.
This patch changes the driver so that with Gen6 and later chipsets, we
no longer compact the clip planes. However, we still discard any clip
planes beyond the highest number that is in use, so performance should
not be affected for applications that use clip planes consecutively
from 0.
With chipsets previous to Gen6, we still compact the clip planes,
since the pre-Gen6 clipper thread relies on this behavior.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
The only remaining uses of brw_vs_prog_key::nr_userclip only occurred
when using clip planes (as opposed to gl_ClipDistance). This patch
renames the value to nr_userclip_planes and sets it to zero when
gl_ClipDistance is in use. This avoids unnecessary VS recompiles.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, brw_compute_vue_map required an argument indicating the
number of clip planes in use, but all it did with it was check if it
was nonzero.
This patch changes brw_compute_vue_map to take a boolean instead.
This allows us to avoid some unnecessary recompilation of the Gen4/5
GS and SF threads.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Previous to this patch, setup_uniform_clipplane_values() was setting
up clip plane uniforms based on ctx->Transform.ClipPlanesEnabled, a
piece of state not stored in the vertex shader cache key. As a
result, a change to this piece of state might not trigger a necessary
vertex shader recompile.
The patch adds a field to the vertex shader cache key,
userclip_planes_enabled, to store the current value of
ctx->Transform.ClipPlanesEnabled. Also, it changes
setup_uniform_clipplane_values() to read from this new field, so that
it's manifestly clear that the vertex shader isn't depending on state
not stored in the cache key.
Note: when the vertex shader uses gl_ClipDistance, the VS backend
doesn't need to know which clip planes are in use, so we leave the
field as zero in that case to avoid unnecessary recompiles.
Fixes Piglit test vs-clip-vertex-enables.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
No functional change. This patch rearranges the struct
brw_vs_prog_key so that the two fields related to clipping are
together, and documents those fields. This should make the patches
that follow easier to comprehend, since they add additional
clipping-related fields to this structure.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
The i965 driver already had a function to count bits in a 64-bit uint
(brw_count_bits()), but it was buggy (it only counted the bottom 32
bits) and it was clumsy (it had a strange and broken fallback for
non-GCC-like compilers, which fortunately was never used). Since Mesa
already has a _mesa_bitcount() function, it seems better to just
create a _mesa_bitcount_64() function rather than special-case this in
the i965 driver.
This patch creates the new _mesa_bitcount_64() function and rewrites
all of the old brw_count_bits() calls to refer to it.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Fixes the ES1 conformance 'userclip' test, which broke when we increased
MAX_CLIP_PLANES to 8. Core Mesa already validates incoming values
against MAX_CLIP_PLANES; we just need the ES wrapper to pass everything
through.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
It's required for ES 1.0 and 1.1, and isn't specified for ES 2.
While the comment says Mesa depends on it internally, removing it from
ES2 doesn't seem to regress any Piglit or ES2 conformance tests.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
In particular, drivers don't enable this in ES 1.1 contexts.
Prior to this, none of the OpenGL ES 1.1 conformance tests passed.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Since core Mesa no longer depends on gl_texture_image::Data pointing to
mapped texture buffers we don't have to mess with it all over the place
in the state tracker. Now Data is only used to point to malloc'd memory
that holds images which don't fit in the texture object's mipmap buffer.
These were used to find the start of a 3D image slice (or 2D array texture
slice) given a base address. Instead, use a simple array of address of
image slices instead.
This is a step toward getting rid of the gl_texture_image::ImageOffsets
field.
Reviewed-by: Eric Anholt <eric@anholt.net>
This patch implements proper support for gl_ClipVertex by causing the
new VS backend to populate the clip distance VUE slots using
VERT_RESULT_CLIP_VERTEX when appropriate, and by using the
untransformed clip planes in ctx->Transform.EyeUserPlane rather than
the transformed clip planes in ctx->Transform._ClipUserPlane when a
GLSL-based vertex shader is in use.
When not using a GLSL-based vertex shader, we use
ctx->Transform._ClipUserPlane (which is what we used prior to this
patch). This ensures that clipping is still performed correctly for
fixed function and ARB vertex programs. A new function,
brw_select_clip_planes() is used to determine whether to use
_ClipUserPlane or EyeUserPlane, so that the logic for making this
decision is shared between the new and old vertex shaders.
Fixes the following Piglit tests on i965 Gen6:
- vs-clip-vertex-const-accept
- vs-clip-vertex-const-reject
- vs-clip-vertex-different-from-position
- vs-clip-vertex-equal-to-position
- vs-clip-vertex-homogeneity
- vs-clip-based-on-position
- vs-clip-based-on-position-homogeneity
- clip-plane-transformation clipvert_pos
- clip-plane-transformation pos_clipvert
- clip-plane-transformation pos
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad@chad-versace.us>
Before this patch, clip planes didn't work properly in Mesa when using
vertex shaders, because Mesa assigned both gl_ClipVertex and
gl_Position to the same gl_vert_result (VERT_RESULT_HPOS). As a
result, backends couldn't distinguish between the two variables, so
any shader that wrote different values to them would fail to work
properly.
This patch paves the way for proper support of gl_ClipVertex by
creating a new enumerated value in gl_vert_result for it
(VERT_RESULT_CLIP_VERTEX). After this patch, a back-end may add
support for gl_ClipVertex using the following algorithm:
- If using a user-supplied GLSL vertex shader:
- If the bit corresponding to VERT_RESULT_CLIP_VERTEX is set in
gl_program::OutputsWritten:
- Clip using the vertex shader output VERT_RESULT_CLIP_VERTEX and
the clip planes defined in gl_context::Transform.EyeUserPlane.
- Else:
- Clip using the vertex shader output VERT_RESULT_HPOS and the
clip planes defined in gl_context::Transform.EyeUserPlane.
- Else (either using fixed function or an ARB vertex program):
- Clip using the vertex shader output VERT_RESULT_HPOS and the clip
planes defined in gl_context::Transform._ClipUserPlane (*)
where (*) represents the normal Mesa behavior before this patch.
An example of implementing the above algorithm can be found in the
patch that follows this one, which implements gl_ClipVertex in i965
Gen6.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The previous change was not effective for lines, because there is no
4 planes 4x4 block rasterization path: it is handled by the 16x16 block
case too, and the 16x16 block was not being budged as it should.
This fixes assertion failures on line rasterization.
llvmpipe has a few special rasterization paths for triangles contained in
16x16 blocks, but it allows the 16x16 block to be aligned only to a 4x4
grid.
Some 16x16 blocks could actually intersect the tile
if the triangle is 16 pixels in one dimension but 4 in the other, causing
a buffer overflow.
The fix consists of budging the 16x16 blocks back inside the tile.
This just adds the entries to the table and fixes the asserts up.
The int32 one is definitely wrong, since it uses a float temp
which will lose precision, but its no worse than now.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is taken from reading EXT_texture_integer + EXT_texture_rg in combination,
Comments on necessity of each format, naming of formats and bugs in the
formats tables please.
Is there any formats I've missed?
Eric looked over this to make sure its consistent at least.
As I've changed the ordering of things in the format table, the follow
patches are required to avoid regression.
Signed-off-by: Dave Airlie <airlied@redhat.com>
As per Brian's suggestion we can generate this table at first start
to make sure its correct. This is a sad workaround for compilers which
don't support named initialiser. (its 2011).
Signed-off-by: Dave Airlie <airlied@redhat.com>
Commit d1fda903 (radeon: Drop mapping we were doing around
glGetTexImage()) removed the common Radeon source file
radeon_tex_getimage.c, and pulled it out of the r200, r300, r600, and
radeon makefiles. But it left behind the symlinks that were being
used to share that file among the four directories.
This patch removes the dangling symlinks.
Reviewed-by: Brian Paul <brianp@vmware.com>
We want quad/pixel Z values to be interpolated exactly the same for
multi-pass algorithms. Because of how the optimized Z-test code is
written, we can't cull the first quad in a run even if it's totally
killed. See the comment for more info.
NOTE: This is a candidate for the 7.11 branch.
Instead of relying on the mirror in the Mesa IR assembly shader, just
use the variables actually stored in the GLSL IR. This will be a bit
slower, but nobody cares about the performance of glGetActiveAttrib.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This just folds get_active_attrib into _mesa_GetActiveAttribARB
and moves the resulting function function to the other source file.
More changes are coming soon.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This currently mirrors the state tracking
gl_shader_program::Attributes, but I'm working towards eliminating
that.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This just folds bind_attrib_location into _mesa_BindAttribLocationARB
and moves the resulting function function to the other source file.
More changes are coming soon.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
hash_table_replace doesn't use get_node to avoid having to hash the key twice.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This allows querying the linked shader itself rather than the Mesa IR.
This is the first step towards removing gl_program::Attributes.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The symbol table in the linked shaders may contain references to
variables that were removed (e.g., unused uniforms). Since it may
contain junk, there is no possible valid use. Delete it and set the
pointer to NULL.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
All drivers in Mesa have supported this extension for eons. This
extension is an optional features in desktop OpenGL (via
GL_ARB_draw_buffers) and OpenGL ES 2.x (via GL_NV_draw_buffers).
The extension is not usable in OpenGL ES 1.x. There is no
glDrawBuffers* entry point in OpenGL ES 1.x contexts, and glGet*v
generate errors when MAX_DRAW_BUFFERS or DRAW_BUFFERi is queried.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This also moves ATI_draw_buffers. This is to facilitate enabling
NV_draw_buffers in OpenGL ES 2.0.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Us poor souls who cross compile mesa want to be able to specify which pkg-config to pick, or at least just change one place.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Until now, we've been treating 1D arrays as a single slice, and each
array slice is actually just a row of the 2D texture. While swrast
still stores them this way, hardware drivers think that 1D arrays have
actual separate slices not stored as contiguous rows.
Reviewed-by: Brian Paul <brianp@vmware.com>
The path for ->Data was failing to be called for the FBO draw offset
fallback, and also had mismatched compressed texture support code.
This drops the intel_prepare_render() in the blit path. We aren't
copying to/from a GL_FRONT buffer, so it doesn't matter.
Too many separate functions each called from one location (in
different files). This code should all die soon when swrast starts
using MapTextureImage.
Before, we were only allocating these from our TexImage, so if the
texture image was set up in any other way (non-accelerated
glGenerateMipmaps()), they'd be missing or wrong.
Now that we can zero-copy generate the mipmaps into brand new
glTexImage()-generated storage using MapTextureImage(), we no longer
need to allocate image->Data in mipmap generate. This requires
deleting the drivers' old overrides of the miptree tracking after
calling _mesa_generate_mipmap at the same time, or the drivers
promptly lose our newly-generated data.
Reviewed-by: Eric Anholt <eric@anholt.net>
Both POW and INT DIV need a message length of 2; previously, we only
checked for POW.
Also, BRW_MATH_FUNCTION_INT_DIV_QUOTIENT_AND_REMAINDER has a response
length of 2; previously, we only checked for SINCOS. We don't use this
message, but in case we ever decide to, we may as well fix it now.
While we're at it, just move these computations into
brw_set_math_message, since they're entirely based on the function.
This fixes it for both brw_math and the old backend's brw_math_16.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Drivers implementing GLSL 1.30 want to do integer modulus, and until we
can stop generating code via ir_to_mesa, it's easier to make it silently
generate rubbish code. Multiply will do.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Classic compiler mistake. In the example below, the OMOD optimization
was combining instructions 4 and 10, but since there was an instruction
(#8) in between them that wrote to the same registers as instruction 10,
instruction 11 was reading the wrong value.
Example of the mistake:
Before OMOD:
4: MAD temp[0].y, temp[3]._y__, const[0]._x__, const[0]._y__;
...
8: ADD temp[2].x, temp[1].x___, -temp[4].x___;
...
10: MUL temp[2].x, const[1].y___, temp[0].y___;
11: FRC temp[5].x, temp[2].x___;
After OMOD:
4: MAD temp[2].x / 8, temp[3]._y__, const[0]._x__, const[0]._y__;
...
8: ADD temp[2].x, temp[1].x___, -temp[4].x___;
...
11: FRC temp[5].x, temp[2].x___;
https://bugs.freedesktop.org/show_bug.cgi?id=41367
Source swizzles for transcendent instructions were being stored in the X
channel regardless of what channel the instruction was writing.
This was causing problems for some helper functions that were expecting
source swizzles to occupy channels corresponding to the instruction's
writemask. This commit makes transcendent instructions follow the same
convention as normal instructions for representing source swizzles.
Previous behavior:
LG2 temp[0].y, input[0].x___;
Current behavior:
LG2 temp[0].y, input[0]._x__;
From the EXT_transform_feedback spec:
Primitives can be optionally discarded before rasterization by calling
Enable and Disable with RASTERIZER_DISCARD_EXT. When enabled, primitives
are discared right before the rasterization stage, but after the optional
transform feedback stage. When disabled, primitives are passed through to
the rasterization stage to be processed normally. RASTERIZER_DISCARD_EXT
applies to the DrawPixels, CopyPixels, Bitmap, Clear and Accum commands as
well.
And the GL 3.2 spec says it applies to ClearBuffer* as well.
Reviewed-by: Brian Paul <brianp@vmware.com>
This reverts commit d631c19db4.
The commit was broken, and ended up returning false all the time
because nobody in the world binds every single possible vertex array.
On further reflection, we don't want to discount stride == 0: This
function is just used for deciding to calculate whether to compute the
bonuds on the index, and there's no sense in computing index bounds
when stride == 0.
For the separate question of "how much data do I upload for this
vertex element?", the i965 driver was fixed to upload the data.
Fixes a regression of about 2x in 3DMMES, and most importantly, makes
Hammerfight playable.
Commit d631c19db4 avoided this problem
by forcing the driver to get the min/max index, but that commit was
broken, so just fix the driver problem (confusion between "do I need
to upload any data?" and "do I need the index bounds in order to
upload any data?").
Generally we're using fragment programs in all our drivers, so wasting
4MB for code that's never called is pretty lame. Reduces i965 memory
allocation for a short shader program from 21,932,128B to 17,737,816B.
As innocuous as it seemed, ebca47a basically broke the world (e.g.,
>200 piglit regressions). In vec4_visitor::emit_block_move,
src->swizzle was expected to be BRW_SWIZZLE_NOOP before setting it to
a swizzle that would replicate the existing channels of the source
type to a vec4 (e.g., .xyyy for a vec2).
The original assertion seems to have been a little bogus. In addition
to being BRW_SWIZZLE_NOOP, src->swizzle might already be a swizzle
that would replicate the existing channels of the source type to a
vec4. In other words, it might already have the value that we're
about to assign to it.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
If GL_NV_texture_env_combine4 is not supported, setting the fourth
combiner term would generate a GL error.
Of course, I noticed this right after committing the previous patch
to use a loop in the first place. <sigh>
Note that GL_EXT_texture_env_combine is always supported so the first
three combiner terms are always accepted.
There's four combiner terms (not 3) with GL_NV_texture_env_combine4.
Use a loop to make the code a little more compact.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The drivers don't need to care about the domains. All they need to set
are the bind and usage flags. This simplifies the winsys too.
This also fixes on r600g:
- fbo-depth-GL_DEPTH_COMPONENT32F-copypixels
- fbo-depth-GL_DEPTH_COMPONENT16-copypixels
- fbo-depth-GL_DEPTH_COMPONENT24-copypixels
- fbo-depth-GL_DEPTH_COMPONENT32-copypixels
- fbo-depth-GL_DEPTH24_STENCIL8-copypixels
I can't explain it.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
I have moved 'last_flush' and 'binding' from r600_bo to winsys/radeon.
The other members are now part of r600_resource.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
We were checking whether render_condition is set. That was not reliable,
because it's always set with trace and noop regardless of driver support.
Reviewed-by: Brian Paul <brianp@vmware.com>
This removes:
- PIPE_CAP_MAX_TEXTURE_IMAGE_UNITS
- PIPE_CAP_MAX_VERTEX_TEXTURE_UNITS
in favor of the that new per-shader cap.
Reviewed-by: Brian Paul <brianp@vmware.com>
All drivers support it (well, except Cell). The boolean option is going away
from core Mesa too.
This is a follow-up to Ian Romanick's patch
"mesa: Remove ARB_texture_mirrored_repeat extension enable flag".
Reviewed-by: Brian Paul <brianp@vmware.com>
This is from a Coverity defect report.
src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
1314 void
1315 vec4_visitor::emit_block_move(dst_reg *dst, src_reg *src,
1316 const struct glsl_type *type, bool
predicated)
...
1351 /* Do we need to worry about swizzling a swizzle? */
->1352 assert(src->swizzle = BRW_SWIZZLE_NOOP);
1353 src->swizzle = swizzle_for_size(type->vector_elements);
Reported-by: Vinson Lee <vlee@vmware.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=40158
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As written, this test correctly raises an error for #elif being used
with an undefined macro (and not as an argument to "defined"). If the
preceding #if were '#if 1' then this diagnositc would correctly be
hidden. That allows code such as the following to not raise an error:
#ifndef MAYBE_UNDEFINED
#elif MAYBE_UNDEFINED < 5
...
#endif
So this test case is working as expected already. We add it here just
to improve test coverage.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Carl Worth <cworth@cworth.org>
The specification reserves any macro name containing two consecutive
underscores, (anywhere within the name). Previously, we only raised
this error for macro names that started with two underscores.
Fix the implementation to check for two underscores anywhere, and also
update the corresponding 086-reserved-macro-names test.
This also fixes the following two piglit tests:
spec/glsl-1.30/preprocessor/reserved/double-underscore-02.frag
spec/glsl-1.30/preprocessor/reserved/double-underscore-03.frag
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Carl Worth <cworth@cworth.org>
This is as simple as abstracting one existing block of code into a
function call and then adding a single call to that function for the
case of a non-function-like macro.
This fixes the recently-added 097-paste-with-non-function-macro test
as well as the following piglit tests:
spec/glsl-1.30/preprocessor/concat/concat-01.frag
spec/glsl-1.30/preprocessor/concat/concat-02.frag
Also, the concat-04.frag test now passes for the right reason. The
test is intended to fail the compilation, but before this commit it
was failing compilation (and hence passing the test) for the wrong
reason.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Carl Worth <cworth@cworth.org>
Apparently we never implemented this, (but we've got a GLSL 1.30 test
in piglit that is exercising this case).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Carl Worth <cworth@cworth.org>
There was already a loop here to look for multiple token pastes, but
it was mistakenly incrementing the iterator counter after performing
one paste.
Instead, leave the loop iterator in place to coalesce as many tokens
as necessary into one.
This fixes the recently add 096-paste-twice test as well as the
following piglit test:
spec/glsl-1.30/preprocessor/concat/concat-03.frag
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Carl Worth <cworth@cworth.org>
This is something that piglit is exercising that currently fails.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Carl Worth <cworth@cworth.org>
The GL spec says that luminance values are returned as (l, 0, 0, 1),
L/A values as (l, 0, 0, a) and intensity values as (i, 0, 0, 1).
Use the pixel transfer scale controls to implement that.
This fixes a few failures in the new piglit getteximage-formats
test when getting a compressed L or L/A image.
If color material mode is enabled, constant buffer entries related
to the material coefficients will depend on glColor. So add
_NEW_CURRENT_ATTRIB to the bitset returned for material-related
constants in _mesa_program_state_flags().
This fixes a bug exercised by the new piglit draw-arrays-colormaterial
test.
Note: This is a candidate for the 7.11 branch.
This hasn't been needed so far since none of the core Mesa code paths
that call ctx->Driver.AllocTextureImageBuffer() are used with the
state tracker. That will change in upcoming patches.
Note that this function duplicates some code seen in the st_TexImage()
function. That can be cleaned up later.
The target, level and texObj can be obtained through the texImage
parameter. We could make similar changes for the TexImage() hooks too.
Reviewed-by: Eric Anholt <eric@anholt.net>
This fixes a build error introduced with commit
"winsys/svga: Update to vmwgfx kernel module 2.1"
if both the svga driver and the xorg state tracker was enabled
at the same time.
If needed we can re-add a minimal target for basic functionality.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
When an FBO is rendering to a texture (rather than a renderbuffer),
Gallium sets up an internal renderbuffer to handle the rendering, and
copies over enough texture state to make this work.
InternalFormat was missed out, causing glTexCopyImage to take a slow
path unnecessarily.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=41263
Signed-off-by: Simon Farnsworth <simon.farnsworth@onelan.co.uk>
Signed-off-by: Brian Paul <brianp@vmware.com>
Introduces fence objecs and a size limit on query buffers.
The possibility to map the fifo from user-space is gone, and
replaced by an ioctl that reads the 3D capabilities.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jakob Bornecranz <jakob@vmware.com>
Don't store references to these on the surface but on the context.
References to transfers are still stored on the surface since we allow
only a single map of a surface at a time.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
All drivers remaining in Mesa support this extension. This extension
is either required or optional features in desktop OpenGL, OpenGL ES
1.x, and OpenGL ES 2.x.
This extension was previously not supported on mach64, mga, and savage
(Savage3D and other pre-Savage4).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
All drivers remaining in Mesa support this extension. This extension
is either required or optional features in desktop OpenGL, OpenGL ES
1.x, and OpenGL ES 2.x.
This extension was previously not supported on i810, mach64, mga,
savage, sis, and tdfx (Voodoo Banshee and Voodoo3).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
All drivers remaining in Mesa support this extension. This extension
is either required or optional features in desktop OpenGL, OpenGL ES
1.x, and OpenGL ES 2.x.
This extension was previously not supported on mach64.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
All drivers remaining in Mesa support this extension. This extension
is either required or optional features in desktop OpenGL, OpenGL ES
1.x, and OpenGL ES 2.x.
This extension was previously not supported on mach64, mga, or r128.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
All drivers remaining in Mesa support this extension. This extension
is either required or optional features in desktop OpenGL, OpenGL ES
1.x, and OpenGL ES 2.x. The existing support is already partially
broken in Mesa (e.g., querying GL_TEXTURE_ENV_MODE in OpenGL ES 2.x).
This patch does not change the situation in any way.
It looks like the only hardware supported by Mesa that cannot do
ARB_texture_env_combine is pre-NV10 NVIDA chips. It appears that
these chips cannot do the GL_SUBTRACT mode. Based on looking at older
copies of nvOpenGLspecs.pdf found on the net, NVIDIA never supported
ARB_texture_env_combine on those chips either.
This extension was previously not supported on mach64, mga (G200),
r128, savage, sis, and tdfx (Voodoo Banshee and Voodoo3).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
All drivers remaining in Mesa support this extension. This extension
is either required or optional features in desktop OpenGL, OpenGL ES
1.x, and OpenGL ES 2.x. The existing support is already partially
broken in Mesa (e.g., querying GL_TEXTURE_ENV_MODE in OpenGL ES 2.x).
This patch does not change the situation in any way.
This extension was previously not supported on mach64, mga (G200),
savage (Savage3D and other pre-Savage4), sis, and tdfx (Voodoo
Banshee).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
All drivers remaining in Mesa support this extension. This extension
is either required or optional features in desktop OpenGL, OpenGL ES
1.x, and OpenGL ES 2.x. The existing support is already partially
broken in Mesa (e.g., querying GL_CLIENT_ACTIVE_TEXTURE in OpenGL ES
2.x). This patch does not change the situation in any way.
This extension was previously not supported on i810, mga (G200), or
tdfx (Voodoo Banshee).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We can use the core Mesa code for glGetTexImage() since it handles the
image mapping/unmapping now. We'll keep the decompress_with_blit() path
in the hope that it's faster than core Mesa's software decompression code.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=41312
The formula we were previously using for asinh:
asinh x = ln(x + sqrt(x * x + 1))
is numerically unstable: when x is a large negative value, the quantity
x + sqrt(x * x + 1)
is a small positive value (on the order of 1/(2|x|)). Since the
logarithm function is very sensitive in this range, any error in the
computation of the square root manifests as a large error in the
result.
This patch changes to the equivalent formula:
asinh x = sign(x) * ln(abs(x) + sqrt(x * x + 1))
which is only slightly more expensive to compute, and is numerically
stable for all x.
Fixes piglit tests
spec/glsl-1.30/execution/built-in-functions/[fv]s-asinh-*.
Reviewed-by: Chad Versace <chad@chad-versace.us>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Fixes the glsl-1.30/compiler/built-in-functions/trunc-* tests under 1.30.
Reviewed-by: Chad Versace <chad@chad-versace.us>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Somehow we managed to get the unsigned int vectors, but not scalar.
Fixes _mesa_problem complaints in piglit's uint tests.
Reviewed-by: Chad Versace <chad@chad-versace.us>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Bitshifts are one of the rare places that GLSL allows mixed base types
without an implicit conversion occurring.
Reviewed-by: Chad Versace <chad@chad-versace.us>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
For hardware drivers, we only have ir_to_mesa called for the purposes
of potential swrast fallbacks (basically never on a 1.30 driver),
which we don't really care about. This will allow 1.30 to be
implemented without rewriting swrast for it.
Reviewed-by: Chad Versace <chad@chad-versace.us>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
On pre-GEN6 chips, the VUE slots set aside for clip distance aren't
actually used, so there is no reason for the clipper to waste time
interpolating them.
When commit 62bad54727 changed the enum
value used to represent these VUE slots, that caused the clipper to
start interpolating them as an accidental side effect. This patch
reverts to the old clipper behavior.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch corrects two errors in the computation of the psiz/flags
VUE slot on pre-GEN5 when using the new VS backend:
- The clip flags (which should be stored in the w component of the
first VUE slot) were being accidentally duplicated in all other
components of that VUE slot, causing partially clipped triangles to
sometimes disappear completely.
- The OR instruction wasn't being stored in "inst", causing the
BRW_PREDICATE_NORMAL flag to be applied to the wrong instruction.
This patch fixes regressions in clipping behavior when using shaders
on GEN4-5.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This constructor was storing its argument in the wrong field of the
"imm" enum, resulting in it being converted to a float when it should
have remained an unsigned integer. This was preventing clipping from
working properly on pre-GEN6.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In pre-GEN6, when using clip planes, both the vertex shader and the
clipper need access to the client-supplied clip planes, since the
vertex shader needs them to set the clip flags, and the clipper needs
them to determine where to insert new vertices.
With the old VS backend, we used a clever optimization to avoid
placing duplicate copies of these planes in the CURBE: we used the
same block of memory for both the clipper and vertex shader constants,
with the clip planes at the front of it, and then we instructed the
clipper to read just the initial part of this block containing the
clip planes.
This optimization was tricky, of dubious value, and not completely
working in the new VS backend, so I've removed it. Now, when using
the new VS backend, separate parts of the CURBE are used for the
clipper and the vertex shader. Note that this doesn't affect the
number of push constants available to the vertex shader, it simply
causes the CURBE to occupy a few more bytes of URB memory.
The old VS backend is unaffected. GEN6+, which does clipping entirely
in hardware, is also unaffected.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now that i965 supports 8 clip planes instead of 6, the size of the
brw_vs_compile::userplane array needs to be increased to 8. Changed
the array size to MAX_CLIP_PLANES so that if the number changes again
in the future, this array size won't be missed.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When using user-defined clipping planes, the i965 driver compacts the
array of clipping planes so that disabled clipping planes do not
appear in it--this saves precious push constant space and makes it
easier to generate the pre-GEN6 clip program. As a result, when
enabling clipping planes in GEN6+ hardware, we always enable clipping
planes 0 through n-1 (where n is the number of clipping planes
enabled), regardless of which clipping planes the user actually
requested.
However, we can't do this when using gl_ClipDistance, because it would
be prohibitively complex to compact the gl_ClipDistance array inside
the user-supplied vertex shader. So, when enabling clipping planes in
GEN6+ hardware, if gl_ClipDistance is in use, we need to pass the
user-supplied enable flags directly through to the hardware rather
than just enabling the first n planes.
Fixes Piglit test vs-clip-distance-enables.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Since the i965 driver supports 8 clipping planes now, we need 4 bits
to store the number of user clipping planes, not 3.
In theory this isn't strictly necessary, since brw_clip.h is only used
on pre-GEN6, and pre-GEN6 only advertises support for 6 clipping
planes, but it seems wise to err on the safe side.
In the process I removed the pad0 element of struct
brw_clip_prog_key--it doesn't seem necessary because the compiler
automatically inserts padding if needed.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Override the context's GLSL version if the environment variable
MESA_GLSL_VERSION_OVERRIDE is set. Valid values for
MESA_GLSL_VERSION_OVERRIDE are integers, such as "130".
MESA_GLSL_VERSION_OVERRIDE has the same behavior as INTEL_GLSL_VERSION,
except that it applies to all drivers, not just Intel's. Since the former
supercedes the latter, this patch disables the latter.
Reviewed-by: Dave Airlie <airlied@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Again, the check was needlessly specific: this works fine on Gen7.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The check was designed to forbid it on old generations (Gen5/Ironlake),
not on new ones. It just works on Gen7/Ivybridge.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The mesa core code uses MapTextureImage() like we need now.
v2: Drop mapping around _mesa_generate_mipmap for compressed, since
the whole path ends up going through MapTextureImage(), and the
meta decompression code ended up causing us to lose track of the
region that was originally mapped and assertion fail.
v2: Changes by Brian to MapTexImage in the decompression path.
v3: Changes by anholt to fix srcRowStride for decompression of NPOT.
Tested-by: Brian Paul <brianp@vmware.com> (v2)
EXT_texture_integer also specifies border color should be a color
union, the values are used according to the texture sampler format.
(update docs)
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
It is necessary to manually set the GL version to 3.0 in order to run
Piglit tests that use glGetUniform*().
This patch allows one to override the version of the OpenGL context by
setting the environment variable MESA_GL_VERSION_OVERRIDE.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad@chad-versace.us>
This is a follow-up to commit
2d686fe911, which added decoding of
GL_CLIP_DISTANCE[67] to the _mesa_set_enable() function. This patch
makes the following additional fixes:
- Uses GL_CLIP_DISTANCEi enums consistently within enable.c rather
than the deprecated GL_CLIP_PLANEi enums.
- Generates an error if the user tries to access a clip flag that is
unsupported by the hardware.
- Applies the same change to _mesa_IsEnabled(), so that querying clip
flags using glIsEnabled() works properly.
- Applies corresponding changes to get.c, so that querying clip flags
using glGet*() works properly.
Fixes piglit test clip-flag-behavior.
Reviewed-by: Brian Paul <brianp@vmware.com>
We get called for TexImage higher up, and in a relatively normal way
(pixels == NULL is common for FBO setup).
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
There's nothing in our normal texture path we need for this. We don't
PBO upload blit it. We don't need to worry about flushing because
MapTextureImage handles it. hiz scattergather doesn't apply, but MTI
handles it too.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
It's totally gratuitous -- the image's miptree will be checked for
binding to the object later, anyway, with zero-copy or blitting as
appropriate.
Tested-by: Kristian Høgsberg <krh@bitplanet.net>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
_mesa_reference_renderbuffer already short-circuits equality, and
intel_miptree_release does nothing on NULL.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This mirrors the structure Eric used in the new VS backend, and seems
simpler. In particular, the math1/math2 split will avoid having to
figure out how many operands there are, as this is already known by the
caller.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
_swrast_choose_texture_sample_func() handles null texture object pointers
and will return the "null" sampler function which returns (0,0,0,1). This
fixes a minor regression from ce82914f5a
All drivers remaining in Mesa support this extension. This extension
is required in desktop OpenGL. The existing support is already partially
broken in Mesa (e.g., using format=GL_ABGR for glTexImage2D in OpenGL ES 2.x).
This patch does not change the situation in any way.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
All drivers remaining in Mesa support this extension. This extension
is either required or optional features in desktop OpenGL, OpenGL ES
1.x, and OpenGL ES 2.x.
EXT_texture_format_BGRA8888 is mostly a subset of EXT_bgra. The only
difference seems to be that EXT_texture_format_BGRA8888 allows GL_BGRA
as an internal format to glTexImage2D and friends.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This extension is always enabled, and drivers do not have
to option to disable it.
I kept this one separate from the others because I was a little
uncertain about the changes to get.c.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Mesa has never any portion of this extension, and neither has any
other vendor.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The year 2006 apparently came from the "Last Modified Date" in the
spec header. however, the revision history at the bottom say "2/22/00
mjk - added NVIDIA Implementation Details." From that we can safely
infer that the spec is from at least 2000, and it may even be older.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The following extensions are always enabled, and drivers do not have
to option to disable them:
GL_ARB_multisample
GL_ARB_texture_compression
GL_ARB_vertex_buffer_object / GL_OES_mapbuffer
GL_EXT_copy_texture
GL_EXT_multi_draw_arrays / GL_SUN_multi_draw_arrays
GL_EXT_polygon_offset
GL_EXT_subtexture
GL_EXT_texture_edge_clamp / GL_SGIS_texture_edge_clamp
GL_EXT_vertex_array
GL_SGIS_generate_mipmap
This set was picked because the are all either required or optional
features in desktop OpenGL, OpenGL ES 1.x, and OpenGL ES 2.x. The
existing support for some is already partially broken in Mesa (e.g.,
proxy texture targets in OpenGL ES). This patch does not change the
situation in any way.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This extension is enabled by default in _mesa_init_extensions, so
drivers don't need to enable it again.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This extension is enabled by default in _mesa_init_extensions, so
drivers don't need to enable it again.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes OpenArena on Gen7. Technically, adding only the first depth stall
fixes it, but the documentation says to do all three, and the Windows
driver seems to do it.
Not observed to fix anything on Gen6 yet.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38863
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
It seems that GT1/GT2 sorts of variations are here to stay, and more
special cases will likely be required in the future. Checking by PCI ID
via the IS_xxx_GTx macros is cumbersome; introducing a new 'gt' field
analogous to intel->gen will make this easier.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Seeing as they were only used once (in the same function they were
defined), having them as context members seemed rather pointless.
Remove them entirely (rather than using local variables) since the
chipset generation checks are actually just as straightforward.
While we're at it, clean up the remainder of the if-tree that set them.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
At one point, the documentation said that max thread count in 3DSTATE_PS
was at bit offset 23, but it's actually 24 on Ivybridge. Not only did
this halve our thread count, it caused us to write 1 into a bit 23, which
is marked as MBZ (must be zero). Furthermore, it made us write an even
number into this field, which is apparently not allowed. Apparently we
were just lucky it worked.
NOTE: This is a candidate for the 7.11 branch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
- first determine the buffer range to upload for each buffer by walking over
vertex elements
- take buffer_offset into account
- take src_offset into account
- take src_format into account in more places
- don't just blindly upload (stride*count) bytes
NOTE: This is a candidate for the 7.11 branch.
It can now override both buffer offsets and strides in additions to resources.
Overriding buffer offsets was kinda hackish and could cause issues with
non-native vertex formats.
intel_image->mt might be NULL, say with border width set. It then would
trigger a segfault at intel_map/unmap_texture_image function.
This would fix the oglc misctest(basic.textureBorderIgnore) fail.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Fence emission can flush the push buffer, which through flush_notify
unreferences recently emitted fence. If ref count is increased after
fence emission, unreference deletes the fence, which causes SIGSEGV.
Backtrace:
nouveau_fence_del
nouveau_fence_ref
nouveau_fence_next
nouveau_pushbuf_flush
MARK_RING
nv50_screen_fence_emit
nouveau_fence_emit
nv50_flush
This bug manifested as an assertion failure in nouveau_fence.c, because
SIGSEGV handler tried to shutdown the application and used messed up
fence.
This issue was reported by Maxim Levitsky.
Note: This is a candidate for the 7.11 branch.
Without this we'd miss the last update in a sequence like {COLOR0, COLOR1},
{COLOR0}, {COLOR0, COLOR1}. I originally had a patch for this that called
updated_drawbuffers() when the buffer count changed, but later realized that
was wrong. The ARB_draw_buffers spec explicitly says "The draw buffer for
output colors beyond <n> is set to NONE.", and this is queryable state.
This fixes piglit arb_draw_buffers-state_change.
NOTE: This is a candidate for the 7.11 branch.
Signed-off-by: Henri Verbeet <hverbeet@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This patch fix a "Unresolved Symbols" run time error when using G3DVL
through the VDPAU state tracker, by linking the vdpau targets with librt.
Reported by Arkadiusz Miśkiewicz.
Caused by this commit :
commit e911dbb563
Author: Emeric Grange <emeric.grange@gmail.com>
Date: Mon Sep 12 23:39:33 2011 +0200
Signed-off-by: Emeric Grange <emeric.grange@gmail.com>
This should bring g3dvl back to work until we figured out
how SCALED types should really work.
Signed-off-by: Christian König <deathsimple@vodafone.de>
There is no guarantee that the tokens TGSI will persist beyond the
create_fs_state. The pipe driver (and therefore the draw module) is
responsible for making copies of the TGSI tokens when it needs them.
Reviewed-by: Brian Paul <brianp@vmware.com>
i915_miptree_layout, i945_miptree_layout, and brw_miptree_layout always
just return GL_TRUE, so there's really no point to it. Change them to
void functions and remove the (dead) error checking code.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
For some reason I thought subexpressions were chained off the top-level
one. This isn't the case, so just create a temporary context and free
it. All of this memory would be eventually freed, but now is freed
much sooner.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Very simple shaders don't actually use GLSL built-ins. For example:
- gl_Position = gl_ModelViewProjectionMatrix * gl_Vertex;
- gl_FragColor = vec4(0.0);
Both of the shaders used by _mesa_meta_glsl_Clear() also qualify.
By waiting to initialize the built-ins until the first time we need to
look for a signature, we can avoid the overhead entirely in these cases.
Makes piglit run roughly 18% faster (255 vs. 312 seconds).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, we conditionally set up the SF pipline stage with a
urb_entry_read_offset of 2 when clipping was in use, and 1 otherwise,
causing the clip distance VUE slots to be skipped if present. This
was an extremely minor savings (it saved the SF unit from reading 2
vec4s out of the URB, but it didn't affect any computation, since we
only instruct the SF unit to perform interpolation on VUE slots that
are actually used by the fragment shader).
GLSL 1.30 requires an interpolated version of gl_ClipDistance to be
available for reading in the fragment shader, so we need the SF's
urb_entry_read_offset to be 1 when the fragment shader reads from
gl_ClipDistance.
This patch just unconditionally sets the urb_entry_read_offset to 1 in
all cases; this is sufficient to make gl_ClipDistance available to the
fragment shader when it is needed, and the performance loss should be
negligible when it isn't.
Reviewed-by: Eric Anholt <eric@anholt.net>
When gl_ClipDistance is in use, the contents of the gl_ClipDistance
array just need to be copied directly into the clip distance VUE
slots, so we re-use the code that copies all other generic VUE slots
(this has been extracted to its own method). When gl_ClipDistance is
not in use, the vertex shader needs to calculate the clip distances
based on user-specified clipping planes.
This patch also removes the i965-specific enum values
BRW_VERT_RESULT_CLIP[01], since we now have generic Mesa enums that
serve the same purpose (VERT_RESULT_CLIP_DIST[01]).
Reviewed-by: Eric Anholt <eric@anholt.net>
When the vertex shader writes to gl_ClipDistance, we do clipping based
on clip distances rather than user clip planes, so don't waste push
constant space storing user clip planes that won't be used.
Reviewed-by: Eric Anholt <eric@anholt.net>
i965 requires gl_ClipDistance to be formatted as an array of 2 vec4's
(as opposed to an array of 8 floats), so enable the lowering pass that
performs this conversion.
Reviewed-by: Eric Anholt <eric@anholt.net>
In order to support 8 clip distances, we need to properly decode when
the user sets the GL_CLIP_DISTANCE6 and GL_CLIP_DISTANCE7 enable
flags.
For clarity, this patch changes the names GL_CLIP_PLANE[0-5] in the
switch statement to the equivalent names GL_CLIP_DISTANCE[0-5], since
the GL_CLIP_PLANE names are deprecated.
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Brian Paul <brianp@vmware.com>
This patch assigns enumerated values for gl_ClipDistance in the
gl_vert_result and gl_frag_attrib enums, so that driver back-ends can
assign gl_ClipDistance to the appropriate hardware registers. It also
adjusts the functions _mesa_vert_result_to_frag_attrib() and
_mesa_frag_attrib_to_vert_result() (which translate between the two
enums) to correctly translate the new enumerated values.
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Brian Paul <brianp@vmware.com>
GLSL 1.30 requires us to use gl_ClipDistance for clipping if the
vertex shader contains a static write to it, and otherwise use
user-defined clipping planes. Since the driver needs to behave
differently in these two cases, we need a flag to record whether the
shader has written to gl_ClipDistance.
The new flag is called UsesClipDistance. We initially store it in
gl_shader_program (since that is the data structure that is available
when we check to see whethe gl_ClipDistance was written to), and we
later copy it to a flag with the same name in gl_vertex_program, since
that is a more convenient place for the driver to access it (in i965,
at least).
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Brian Paul <brianp@vmware.com>
In i965 GEN6+ (and I suspect most other hardware), gl_ClipDistance
needs to be laid out as a pair of vec4's (the first containing clip
distances 0-3, and the second containing clip distances 4-7).
However, it is declared in GLSL as an array of 8 floats.
This lowering pass acts at the GLSL level, modifying the declaration
of gl_ClipDistance so that it is an array of vec4's rather than an
array of floats, and renaming it to gl_ClipDistanceMESA. In addition,
it modifies all accesses to the array so that they access the
appropiate component of one of the vec4's.
Since some hardware may not internally represent gl_ClipDistance as a
pair of vec4's, this lowering pass is optional. To enable it, set the
LowerClipDistance flag in gl_shader_compiler_options to true.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch fixes a bug in ir_hirearchical_visitor: when traversing an
exec_list representing the formal or actual parameters of a function,
it modified base_ir to point to each parameter in turn, rather than
leaving it as a pointer to the enclosing statement. This was a
problem, since base_ir is used by visitor classes to locate the
statement containing the node being visited (usually so that
additional statements can be inserted before or after it). Without
this fix, visitors might attempt to insert statements into parameter
lists.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Only the XYZ components are checked to be negative by SVGA3DOP_TEXKILL.
GL_ARB_fp requires all four components be checked. Emit a second texkill
for W if needed.
We only need to do the divide by Q step for TXP instructions.
This fixes the incorrectly rendered soft shadow test in Lightsmark.
Along with the previous texture swizzle commit, this also fixes all
the piglit glsl-fs-shadow2d-XX.shader_test failures.
This exposes the GL_EXT_texture_swizzle extension and allows the various
depth texture modes to be implemented properly. This, plus a follow-on
texture/shadow change fixes quite a few piglit GLSL shadow sampler test
failures.
Emit the SVGA3D_RS_POINTSPRITEENABLE render state.
When sprite_coord_mode=PIPE_SPRITE_COORD_LOWER_LEFT emit extra frag
shader code to invert the Y coordinate of the incoming texcoord.
Accurately describe what operations are supported when a format caps
entry is not advertised by the host, and which formats are never
supported, instead of making ad-hoc and often incorrect assumptions.
It is sometimes useful to examine the first frame or and early frame of a
quickly executing and non-repeating application, this chain introduces a new
environment variable that is checked when creating contexts. If
GALLIUM_RBUG_START_BLOCKED is set, then each context that is created is started
in a blocked state. This allows time to connect rbug before anything is
rendered in the context.
There is already comments show how to detect a null texture. Fix the
code to match the comments.
This would fix the oglc divzero(basic.texQOrWEqualsZero) and
divzero(basic.texTrivialPrim) test case fail.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Fix the constant interpolation enable bit mask for flat light mode.
FRAG_BIT_COL0 attribute bit might be 0, in which case we need to
shift one more bit right.
This would fix the oglc specularColor test fail on both Sandybridge and
Ivybridge.
v2: move the constant interp bitmask setup code into for(; attr <
FRAG_ATTRIB_MAX; attr++) loop suggested by Eric.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: Xiang, Haihao <haihao.xiang@intel.com>
Since the blit gets sequenced after other batchbuffer rendering like
normal, there's no need to push things out early.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
All that matters here is the format of the texture, not the
internalformat (which might mean various different pixel formats). In
one case, the pbo upload for MESA_FORMAT_YCBCR would have swapped the
channels for MESA_FORMAT_YCBCR_REV.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This also improves the debugging output in the failure paths so you
get more than just "failed", and don't get spammed with "failed" when
you didn't even have a PBO to try.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
There were notes about the possibility of slowdowns due to zcopy from
a PBO due to thrashing around of the region. Slowdowns are even more
likely now that textures are generally tiled, which a zcopy wouldn't
get. Additionally, there were no checks on the buffer size to ensure
that the hardware-required rounding was present, which could result in
GPU hangs on large zcopy PBOs.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This doesn't cover support for this format as a renderbuffer yet. The
spec allows implementations to not support it, though it is something
we do want to support.
Only one failure in piglit on gen6, which is texwrap with bordercolor
(as usual).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
AFAIK, there are few users of this extension and I can see a couple
reasons why this is probably broken in Mesa anyway.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The pipe_sampler_view::format field should be prefered over the resource/
texture format. The former is used to override the texture format for
sRGB decode enable/disable, etc.
Also, use new util_format_is_srgb() helper to catch all sRGB formats.
This fixes the piglit tex-srgb test for GL_EXT_texture_sRGB_decode.
This fixes a regression from a8cf4b6acf
The problem occured when two successive glDrawArrays calls accessed
subsequent elements in user-space arrays. The user-space array
from the first call wasn't being grown to accomodate the second
draw call's elements.
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
While the program won't successfully link in the end, this avoids
possible assertion failure in the driver during linking if
this->result isn't initialized with something already.
Fixes piglit:
vertex-program-two-side enabled front back front2 back2
vertex-program-two-side enabled front back
vertex-program-two-side enabled front2 back2
We now raise an GL_INVALID_ENUM in glBegin() if mode is illegal, as was
done in Yuanhan Liu's original patch.
Take geometry shaders support into account too.
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Checking if the paints are opaque in renderer_validate_blend() does not
work. We could be drawing images. Remove the check from
renderer_validate_blend() and take image drawing into consideration in
blend_use_shader().
The bug was introduced by 3f0a966807,
which affects the lookup demo.
vg_context_is_object_valid() checks if a handle is valid by checking if
the handle is a valid key of the object hash table. However, the keys
of the object hash table were object pointers.
Fix vg_context_add_object() to use the handles as the keys so that
vg_context_is_object_valid() works. This bug was introduced by
99c67f27d3.
Use _mesa_set_enable() to avoid a redudant context lookup.
Need to disable the texture target in decompress_texture_image() so the
unit isn't still enabled after glGetTexImage() returns. Arguably, the
meta restore code should do this, but it doesn't.
Reviewed-by: Eric Anholt <eric@anholt.net>
If we're generating a mipmap for an sRGB texture we need to bypass
sRGB->linear conversion. Otherwise the destination mipmap level
(drawn with a textured quad) will have the wrong colors.
If we can't turn of sRGB->linear conversion (GL_EXT_texture_sRGB_decode)
we need to use the software fallback for mipmap generation.
Note: This is a candidate for the 7.11 branch.
The 1-bit alpha channel was incorrectly encoded. Previously, any non-zero
alpha value for the ubyte alpha value would set A=1. Instead, use the
most significant bit of the ubyte alpha to determine the A bit. This is
consistent with the other channels and other OpenGL implementations.
Note: This is a candidate for the 7.11 branch.
Reviewed-by: Michel Dänzer <michel@daenzer.net>
builtin_stubs.cpp is only supposed to be used for builtin_compiler. It
contains a stub version of _mesa_glsl_initialize_functions() that does
nothing.
libglsl.a already contains builtin_function.cpp, the generated file that
contains a version of _mesa_glsl_initialize_functions() that actually
initializes all the built-in functions.
By mistakenly linking to builtin_stubs, glsl_compiler and glsl_test are
unable to compile any shaders that use built-in functions.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Since Mesa is now capable of supporting up to 8 clipping planes
instead of 6, this patch updates Gallium internals to support 8
clipping planes as well.
Reviewed-by: Brian Paul <brianp@vmware.com>
draw_pipe_clip.c contained an ifdef to ensure that its local
definition of MAX_CLIPPED_VERTICES would not take effect if the global
MAX_CLIPPED_VERTICES (defined in src/mesa/main/config.h) was already
defined. This was unnecessary because draw_pipe_clip.c doesn't
directly or indirectly include src/mesa/main/config.h. Removed the
ifdef to reduce confusion.
Reviewed-by: Brian Paul <brianp@vmware.com>
This will allow drivers to increase ctx->Const.MaxClipPlanes to 8,
which is required for GLSL-1.30 compliance.
No driver behavior should be affected. However, many data structures
use MAX_CLIP_PLANES as an array size, so these arrays will get
slightly larger.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously this value was set to MAX_CLIP_PLANES, which is defined to
be 6. But MAX_CLIP_PLANES needs to be increased to 8 to support
GLSL-1.30-compliant drivers. This patch hard-codes the default value
of ctx->Const.MaxClipPlanes to 6, so that when MAX_CLIP_PLANES is
increased, it won't affect drivers that do not support 8 clip planes.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This patch removes the assertion "MAX_CLIP_PLANES == 6" from the i965
driver. This assertion is unnecessary; nothing in the driver requires
MAX_CLIP_PLANES to be 6.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
To support GLSL 1.30, we will need to increase MAX_CLIP_PLANES to 8.
To avoid breaking drivers that do not yet support 8 clip planes, this
patch modifies the Mesa core code that pertains to clipping to use
ctx->Const.MaxClipPlanes rather than MAX_CLIP_PLANES, since
ctx->Const.MaxClipPlanes will remain 6 for drivers that only support 6
clip planes.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We generate silly code for array access, and it's easier to generally
support the cleanup than to specifically avoid the bad code in each
place we might generate it.
Removes 4.6% of instructions from 41.6% of shaders in shader-db,
particularly savage2/hon and unigine.
v2: Fixes by Ken: Make is_zero/one member functions, and fix a
progress flag.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
_NEW_WINDOW_POS wasn't a real Mesa state flag, but we were missing
_NEW_BUFFERS to update the stipple offset when FBO binding or window
size changed, and _NEW_POLYGON to update when stippling gets enabled.
Fixes oglconform's tristrip test.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Because we skip the pattern upload when stippling is disabled, we need
to check again when it might have been turned on.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Trigger GL_INVALID_ENUM error if the face paramter is not a valid value.
Trigger GL_INVALID_VALUE error if the GL_SHININESS value is out side
[0, ctx->Constant.MaxShiniess].
v2: fix the max shininess value.
v3: suggested by Brian, move the face check into glMaterialfv function
to reduce code duplicate. Also, refactor the error message.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
The null platform has no window or pixmap surface (but pbuffer surface).
And the only valid display is EGL_DEFAULT_DISPLAY. It is useful for
offscreen rendering. It works everywhere becase no window system is
required.
All of the extensions actually supported by Mesa have been remapped by
remap.c for a long time. Emitting all of these data structures is
just clutter.
Drivers that need additional functions remapped, should add
'offset="assign"' to the function definition in the .xml file.
The changes to remap_helper.h are in a follow-on ~8700 line patch that
would surely be rejected by the mailing list.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chia-I Wu <olv@lunarg.com>
Since GL_EXT_blend_logic_op is removed, _mesa_rgba_logicop_enabled(ctx)
just returns ctx->Color.ColorLogicOpEnabled. That seems kind of silly.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Since GL_EXT_blend_logic_op is removed, _LogicOpEnabled and
ColorLogicOpEnabled always have the same value.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Support is removed for four reasons:
1. The implementation was broken with respect to separate blend
equations. The GL_EXT_blend_equation_separate spec says:
"If EXT_blend_logic_op and EXT_blend_equation_separate are both
supported, the logic op blend equation should be supported separately
for RGB and alpha as with the other blend equation modes."
But Mesa's implementation of GL_LOGIC_OP specifically forbids this.
2. No hardware supported by Mesa can support separate blend equations
involving GL_LOGIC_OP.
3. No applications could be found that use this extension.
4. No other Linux OpenGL drivers support this extension.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Cc: Brian Paul <brianp@vmware.com>
The last user of this function was driInitExtensions, and that function
was removed in a previous commit.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
From the NV_conditional_render spec:
BeginQuery sets the active query object name for the query type given by
<target> to <id>. If BeginQuery is called with an <id> of zero, if the
active query object name for <target> is non-zero, if <id> is the active
query object name for any query type, or if <id> is the active query
object for condtional rendering (Section 2.X), the error INVALID OPERATION
is generated.
Fixes piglit nv_conditional_render-begin-while-active.
Reviewed-by: Brian Paul <brianp@vmware.com>
From the NV_conditional_render spec:
BeginQuery sets the active query object name for the query type given by
<target> to <id>. If BeginQuery is called with an <id> of zero, if the
active query object name for <target> is non-zero, if <id> is the active
query object name for any query type, or if <id> is the active query
object for condtional rendering (Section 2.X), the error INVALID OPERATION
is generated.
Fixes piglit nv_conditional_render-begin-zero.
Reviewed-by: Brian Paul <brianp@vmware.com>
st_glsl_to_tgsi.cpp was completely ignored by makedepend because it was
not included in ALL_SOURCES, which caused that the file was not recompiled
when certain header files were changed (like glsl/ir.h).
The first part of this commit is just a consolidation.
The second part is the fix.
When copy propagating a value into an instruction that negates its
argument, we need to invert the sense of the value's "negate" flag, so
that -(+x) becomes -x and -(-x) becomes +x.
Previously, we were always setting the value's "negate" flag to true
in this circumstance, so that both -(+x) and -(-x) turned into -x.
Fixes Piglit test vs-double-negative.shader_test.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This code was really broken before. A lot of the error checks were
done much later (too late), and some of the error checks would fail.
The underlying problem is that Mesa doesn't ever keep compressed paletted
textures in their original format. The textures are immediately
converted to some RGB or RGBA format.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=39991
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Jin Yang <jin.a.yang@intel.com>
Accroding the man page, GL_INVALID_VALUE would generated if access has any
bits set other than those valid defined bits.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
According the man page, GL_INVALID_OPERATION should generated if
glPixelZoom is executed between the execution of glBegin and the
corresponding execution of glEnd.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
According the man page, GL_INVALID_OPERATION should be generated if
glIsEnabled is executed betwwen the execution of glBegin and the
correspoding execution of glEnd.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
Fix error handling while calling glTexEnv with invalid texture
environment parameters.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
According to the man page, it should trigger a GL_INVALID_OPERATION
while calling some glGet* functions inside glBegin and glEnd.
This patch dose handle the following functions:
glGetBooleanv
glGetFloatv
glGetIntegerv
glGetInteger64v
glGetDoublev
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
According man page, trigger error when calling glEvalMesh1/2D inside
glBegin/glEnd.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
We've had a hack to fix this in Gentoo on Solaris for a while.
Signed-off-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
This moves the gallium interface for clears from using a pointer to 4 floats to a pointer to a union of float/unsigned/int values.
Notes:
1. the value is opaque.
2. only when the value is used should it be interpretered according to
the surface format it is going to be used with.
3. float clears on integer buffers and vice-versa are undefined.
v2: fixed up vega and graw, dropped hunks that shouldn't have been in
patch.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Do it during swrast state validation since the FetchTexel() functions
are only called from swrast now and not core Mesa.
Remove assertions in mipmap.c since they're no longer appropriate.
Pass an explicit surface format as we do with pipe_put_tile_rgba_format().
This fixes the piglit fbo-srgb-blit test. With GL_EXT_framebuffer_sRGB we
override the resource's format with an explicit format (linear vs. sRGB).
We need to do so both when getting and putting tiles.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=40402
Reviewed-by: Dave Airlie <airlied@redhat.com>
We could constant interpolated values now and set have_perspective
if nothing else is set to avoid a GPU hang.
Signed-off-by: Dave Airlie <airlied@redhat.com>
TGSI CONSTANT interpolation is just flat, and we just read the values
direct from the LDS into the GPR without doing any interpolation on them.
This is needed to pass integer types into the fragment shader.
Signed-off-by: Dave Airlie <airlied@redhat.com>
If we get a scaled type assume its a real integer type (as textures are).
Also fixup the blend bypass and blend clamp flags on evergreen as per the
docs.
Signed-off-by: Dave Airlie <airlied@redhat.com>
LLVM 3.0svn added SubtargetInfo as additional parameter to
createMCDisassembler() and createMCInstPrinter().
See revision 139237 of LLVM.
Signed-off-by: Tobias Droste <tdroste@gmx.de>
Signed-off-by: Brian Paul <brianp@vmware.com>
If we're drawing to a luminance, luminance/alpha or intensity surface
we have to adjust (rebase) the fragment/quad colors before writing them
to the tile cache. The tile cache always stores RGBA colors but if
we're caching a L/A surface (for example) we need to be sure that R=G=B
so that subsequent reads from the surface cache appear to return L/A
We previously had a special case for RGB (no alpha) surfaces. This
change generalizes that for the other base formats.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=40408, but sRGB
formats are still failing. That'll be addressed in a later patch.
When compiling glDrawPixels, glTexImage(), etc. and we're copying
the user's image we need to be careful about GL error checking.
Previously, we were incorrectly generating GL_OUT_OF_MEMORY in
unpack_image() if width <= 0 or height <= 0 or for invalid format/type
values. We now check those arguments in unpack_image() and return NULL
if there's a bad value. The command will get compiled with the
arguments as-is and image=NULL. Later, when the command is executed the
correct errors will be generated.
This issue was reported by Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
I'm not 100% sure about this, it may need a version check or it might
be completely wrong.
added multisample ones as well.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The array_lvalue field was attempting to enforce the restriction that
whole arrays can't be used on the left-hand side of an assignment in
GLSL 1.10 or GLSL ES, and can't be used as out or inout parameters in
GLSL 1.10.
However, it was buggy (it didn't work properly for built-in arrays),
and it was clumsy (it unnecessarily kept track on a
variable-by-variable basis, and it didn't cover the GLSL ES case).
This patch removes the array_lvalue field completely in favor of
explicit checks in ast_parameter_declarator::hir() (this check is
added) and in do_assignment (this check was already present).
This causes a benign behavioral change: when the user attempts to pass
an array as an out or inout parameter of a function in GLSL 1.10, the
error is now flagged at the time the function definition is
encountered, rather than at the time of invocation. Previously we
allowed such functions to be defined, and only flagged the error if
they were invoked.
Fixes Piglit tests
spec/glsl-1.10/compiler/qualifiers/fn-{out,inout}-array-prohibited*
and
spec/glsl-1.20/compiler/assignment-operators/assign-builtin-array-allowed.vert.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Prevents lockups with piglit tests draw-elements and draw-vertices using large
numbers of vertices.
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Alex Deucher <alex.deucher@amd.com>
If we are called via the legacy DRI interface, and we don't support
legacy DRI (InitScreen is NULL), print a debug message, so it is easy
to see why the driver fails to initialize.
See https://bugs.freedesktop.org/show_bug.cgi?id=40437
Also includes loading of shared shader library code (used for f64
and integer division) and setting up the immediate array buffer
which is appended to the code.
Per the GL spec, clamp incoming colors prior to blending depending on
whether the destination buffer stores normalized (non-float) values.
Note that the constant blend color needs to be clamped too (we always
get the unclamped color from Mesa).
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=40412
This introduces an UNCLAMPED_FLOAT_TO_UBYTE x 4 inline function, as
suggested by Brian. It uses it in a few places I noticed from previous
color changes, and also some core mesa places. I haven't updated other places
yet.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This introduces a new gl_color_union union and moves the current
ClearColorUnclamped to use it, it removes current ClearColor completely and
renames CCU to CC, then all drivers are modified to expected unclamped floats instead.
also fixes st to use translated color in one place it wasn't.
Signed-off-by: Dave Airlie <airlied@redhat.com>
The EXT_texture_integer issues says:
Should pixel transfer operations be defined for the integer pixel
path?
RESOLVED: No. Fragment shaders can achieve similar results
with more flexibility. There is no need to aggrandize this
legacy mechanism.
v2: fix comments, fix unpack paths, use same comment/code
v3: fix last comment
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Since TGSI now has a UARL opcode that takes an integer as the source, it is
no longer necessary to hack around the lack of an integer ARL opcode using I2F.
UARL is only emitted when native integers are enabled; ARL is still used
otherwise.
Reviewed-by: Brian Paul <brianp@vmware.com>
==27540== Invalid read of size 4
==27540== at 0x96277B7: _mesa_make_extension_string (string3.h:144)
==27540== by 0x9604E78: _mesa_make_current (context.c:1514)
==27540== by 0x9602A8B: st_api_make_current (st_manager.c:789)
==27540== by 0x45406E7: ???
==27540== Address 0xad35b30 is 3,688 bytes inside a block of size 3,691 alloc'd
==27540== at 0x4025315: calloc (vg_replace_malloc.c:467)
==27540== by 0x9627641: _mesa_make_extension_string (extensions.c:910)
==27540== by 0x9604E78: _mesa_make_current (context.c:1514)
==27540== by 0x9602A8B: st_api_make_current (st_manager.c:789)
==27540== by 0x45406E7: ???
And:
==28351== Invalid write of size 2
==28351== at 0x4C087CC: _mesa_make_extension_string (string3.h:144)
==28351== by 0x4BE6198: _mesa_make_current (context.c:1514)
==28351== by 0x4BD4CAB: st_api_make_current (st_manager.c:789)
==28351== Address 0x48dd1f3 is 19 bytes inside a block of size 20 alloc'd
==28351== at 0x4025315: calloc (vg_replace_malloc.c:467)
==28351== by 0x4C08711: _mesa_make_extension_string (extensions.c:778)
==28351== by 0x4BE6198: _mesa_make_current (context.c:1514)
==28351== by 0x4BD4CAB: st_api_make_current (st_manager.c:789)
==28351==
==28351== Invalid read of size 4
==28351== at 0x4C087EC: _mesa_make_extension_string (extensions.c:806)
==28351== by 0x4BE6198: _mesa_make_current (context.c:1514)
==28351== by 0x4BD4CAB: st_api_make_current (st_manager.c:789)
==28351== Address 0x48dd1f4 is 0 bytes after a block of size 20 alloc'd
==28351== at 0x4025315: calloc (vg_replace_malloc.c:467)
==28351== by 0x4C08711: _mesa_make_extension_string (extensions.c:778)
==28351== by 0x4BE6198: _mesa_make_current (context.c:1514)
==28351== by 0x4BD4CAB: st_api_make_current (st_manager.c:789)
The first part adds 2, because ' ' and '\0' may be written at the end
of the buffer.
These two functions were nearly the same with lots of duplicated code.
Now pass in a boolean 'elts' flag and use a few conditionals to implement
the linear vs. indexed cases.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
==5715== Invalid read of size 4
==5715== at 0x4AA590B: _mesa_make_extension_string (extensions.c:908)
==5715== by 0x4A83198: _mesa_make_current (context.c:1514)
==5715== by 0x4A71CAB: st_api_make_current (st_manager.c:789)
==5715== Address 0x4795730 is 0 bytes inside a block of size 1 alloc'd
==5715== at 0x4025315: calloc (vg_replace_malloc.c:467)
==5715== by 0x4AA5B4C: _mesa_make_extension_string (extensions.c:772)
==5715== by 0x4A83198: _mesa_make_current (context.c:1514)
==5715== by 0x4A71CAB: st_api_make_current (st_manager.c:789)
This fixes piglit/fbo-generatemipmap-array.
It looks like SQ_TEX_SAMPLER_WORD0_0.TEX_ARRAY_OVERRIDE should be set
for array textures in order to disable filtering between slices,
which adds a dependency between sampler views and sampler states.
This patch reworks sampler state updates such that they are postponed until
draw time. TEX_ARRAY_OVERRIDE is updated according to bound sampler views.
This also consolidates setting the texture state between vertex and
pixel shaders.
The only purpose this call served in the DRI swrast driver was to
initialize the remap table. Core Mesa already does the dispatch
offset remapping for every function that could possibly ever be
supported. There's no need to continue using that cruft in the
driver.
Core Mesa already does the dispatch offset remapping for every
function that could possibly ever be supported. There's no need to
continue using that cruft in the driver.
Since the call to _mesa_enable_imaging_extensions (via
driInitExtensions) is removed, EXT_blend_color, EXT_blend_logic_op,
and EXT_blend_minmax are no longer advertised. These all resulted in
software fallbacks, so their loss will not be mourned.
EXT_blend_subtract is, however, explicitly added to the list.
GL_FUNC_SUBTRACT is fully accelerated, but GL_FUNC_REVERSE_SUBTRACT
(still) results in a software fallback.
Cc: Alex Deucher <alexdeucher@gmail.com>
Cc: Dave Airlie <airlied@redhat.com>
Core Mesa already does the dispatch offset remapping for every
function that could possibly ever be supported. There's no need to
continue using that cruft in the driver.
Since the call to _mesa_enable_imaging_extensions (via
driInitExtensions) is removed, EXT_blend_color is explicitly added to
the list.
EXT_blend_logic_op is removed from the list of extensions because
blend factors and separate blend equations are not handled correctly.
Cc: Alex Deucher <alexdeucher@gmail.com>
Cc: Dave Airlie <airlied@redhat.com>
Core Mesa already does the dispatch offset remapping for every
function that could possibly ever be supported. There's no need to
continue using that cruft in the driver.
Since the call to _mesa_enable_imaging_extensions (via
driInitExtensions) is removed, EXT_blend_color is explicitly added to
the list.
EXT_blend_logic_op is removed from the list of extensions because
blend factors and separate blend equations are not handled correctly.
Based on feedback from Roland Scheidegger.
Cc: Dave Airlie <airlied@redhat.com>
Cc: Alex Deucher <alexdeucher@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Corbin Simpson <MostAwesomeDude@gmail.com>
Core Mesa already does the dispatch offset remapping for every
function that could possibly ever be supported. There's no need to
continue using that cruft in the driver.
Since the call to _mesa_enable_imaging_extensions (via
driInitExtensions) is removed, EXT_blend_color is explicitly added
with a dependency on the drmSupportsBlendColor flag.
EXT_blend_logic_op is removed from the list of extensions because
blend factors and separate blend equations are not handled correctly.
Based on feedback from Roland Scheidegger.
Cc: Alex Deucher <alexdeucher@gmail.com>
Cc: Dave Airlie <airlied@redhat.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Core Mesa already does the dispatch offset remapping for every
function that could possibly ever be supported. There's no need to
continue using that cruft in the driver.
Since the call to _mesa_enable_imaging_extensions (via
driInitExtensions) is removed, EXT_blend_color, EXT_blend_minmax, and
EXT_blend_subtract are explicitly added to the list.
EXT_blend_logic_op is removed from the list of extensions because
blend factors and separate blend equations are not handled correctly.
Cc: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Cc: Viktor Novotný <noviktor@seznam.cz>
Core Mesa already does the dispatch offset remapping for every
function that could possibly ever be supported. There's no need to
continue using that cruft in the driver.
EXT_blend_logic_op is removed from the list of extensions because
blend factors and separate blend equations are not handled correctly.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It's not clear if these are acceptable cases so issue a one-time warning
in debug builds when we hit them.
Fixes segfault in piglit fbo-mipmap-copypix test.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
The test was of an enum, attIndex, which should be unsigned. The
explicit check for < 0 was replaced with a cast to unsigned in an
assertion that attIndex is less than the size of the array it will be
used to index.
Reviewed-by: Eric Anholt <eric@anholt.net>
Trivially silence the compiler by adding '(void) foo;' for each unused
parameter. These parameters could not be removed. They are part of
interface used elsewhere in Mesa, and some of the other customers
actually use these parameters.
The internalFormat, format, and type parameters were not used by
either try_pbo_upload or try_pbo_zcopy, so remove them. The width
parameter was also not used by try_pbo_zcopy (because it doesn't
actually copy anything), so remove it too.
Eric Anholt notes:
The current structure of this code is so hateful I can't bring
myself to say anything about whether changing the current code is
good or bad.
I have a dream that one call would try to make a surface
(miptree/region) out of the PBO, then we'd see about whether it
matches up nicely and zero-copy/blit using that. That would be
reusable for texsubimage, which is currently awful in this
respect.
At some point we should revisit this code with pitchforks and torches.
The depth0 parameter was not used in intel_miptree_create_for_region,
so remove it. All of the places that call this function, pass 1 for
that parameter, and the place where it looks like it should have been
used (the call to intel_miptree_create_internal) also had 1 hard
coded.
Reviewed-by: Eric Anholt <eric@anholt.net>
The GLenum target parameter was not used in intel_copy_texsubimage, so
remove it. Also remove the GLenum internalFormat parameter. Each
caller just copied this out of the intel_texture_image that is already
passed to intel_copy_texsubimage.
Reviewed-by: Eric Anholt <eric@anholt.net>
The intel_context and tiling parameters were not used by any if the
i9[14]5_miptree_layout or the functions they call, and the tiling parameter was
not used by brw_miptree_layout. Remove the unnecessary parameters.
Also clean-up some of the naming, etc. in
intel_buffer_object_purgeable. 'intel' is usually used as the name of
an intel_context pointer, and intel_obj is usually used as the name of
an intel_*_obj pointer. These changes were suggested by Eric Anholt.
Reviewed-by: Eric Anholt <eric@anholt.net>
v2: Remove the assertion in intel_batchbuffer_space:
assert((intel->batch.state_batch_offset - intel->batch.reserved_space)
>= intel->batch.used*4);
After reviewing all the places where this is called, I'm (fairly)
comfortable that this assertion was redundant. Having the assertion
adds ~20KiB to a driver build:
text data bss dec hex filename
903173 26392 1552 931117 e352d i965_dri.so
924093 26392 1552 952037 e86e5 i965_dri.so
Based on feedback from Eric Anholt.
Reviewed-by: Eric Anholt <eric@anholt.net>
This differs from the FS in that we track constants in each
destination channel, and we we have to look at all the swizzled source
channels. Also, the instruction stream walk is done in an O(n) manner
instead of O(n^2).
Across shader-db, this reduces 8.0% of the instructions from 60.0% of
the vertex shaders, leaving us now behind the old backend by 11.1%
overall.
Tracking virtual GRFs has tension between using a packed array per
virtual GRF (which is good for register allocation), and sparse arrays
where there's an element per actual register (so the first and second
column of a mat2 can be distinguished inside of an optimization pass).
The FS mostly avoided the need for this second sparse array by doing
virtual GRF splitting, but that meant that instances where virtual GRF
splitting didn't work, instructions using those registers got much
less optimized.
Now instead of env INTEL_NEW_VS=1 to get it, you need INTEL_OLD_VS=1
to not get it. While it's not quite to the same codegen efficiency as
the old backend, it is not regressing piglit on G965 and G45, and
actually fixing bugs on gen6, and the remaining codegen quality
regressions all appear tractable.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We don't expect uniform accesses to generally go away from being dead
code at this point, and we will want to have uniforms packed before
spilling them out to pull constants when we are forced to do that.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
The offset to the arrays after the first was mis-scaled, so we'd go
access off the end of the surface and read 0s. Fixes
glsl-vs-uniform-array-3.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
While we had nice debug output for most of the instruction stream, it
was terminated by a series of anonymous MOVs and a send.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It maps to MESA_FORMAT_RGBA8888_REV. Surfaces of the format can only be
sampled from but not render to.
Only i915 is tested.
Reviewed-by: Eric Anholt <eric@anholt.net>
[olv: add a check in intel_image_target_renderbuffer_storage]
Add a new format token, __DRI_IMAGE_FORMAT_ABGR8888, to __DRI_IMAGE. It
maps to MESA_FORMAT_RGBA8888_REV in core mesa or
PIPE_FORMAT_R8G8B8A8_UNORM in gallium. The format is used by
translucent surfaces on Android.
We were splitting on each side of an unlinked program, and the two
sides lost track of which variables they referenced, resulting in
assertion failure during validation. Fixes piglit
link-struct-uniform-usage.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously, it would produce:
Failed to compile FS: 0:6(7): error: non-lvalue in assignment
and now it produces:
Failed to compile FS: 0:5(7): error: whole array assignment is not
allowed in GLSL 1.10 or GLSL ES 1.00.
Also, add spec quotation to the two places we have code for array
lvalues in GLSL 1.10.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We just want to mark the whole thing used, not mark from each element
the whole size in use. Fixes undefined URB entry writes on i965,
which blew up with debugging enabled.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Uses the new _mesa_decompress_image() function. Unlike the meta path
that uses textured quad rendering to do decompression, this works with
signed formats as well.
We'd still accept the GL_PALETTE[48]_* formats in glCompressedTexImage2D,
but they wouldn't be listed if you queried whether they were supported.
Signed-off-by: Adam Jackson <ajax@redhat.com>
From section 7.1 (Vertex Shader Special Variables) of the GLSL 1.30
spec:
"It is an error for a shader to statically write both
gl_ClipVertex and gl_ClipDistance."
Fixes piglit test mixing-clip-distance-and-clip-vertex-disallowed.c.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The check now applies both when explicitly declaring the size of
gl_TexCoord and when implicitly setting the size of gl_TexCoord by
accessing it using integral constant expressions.
This is prep work for adding similar size checks to gl_ClipDistance.
Fixes piglit tests texcoord/implicit-access-max.{frag,vert}.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
From the GLSL 1.30 spec, section 7.1 (Vertex Shader Special Variables):
The gl_ClipDistance array is predeclared as unsized and must be
sized by the shader either redeclaring it with a size or indexing it
only with integral constant expressions.
Fixes piglit tests clip-distance-implicit-length.vert,
clip-distance-implicit-nonconst-access.vert, and
{vs,fs}-clip-distance-explicitly-sized.shader_test.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
-g3 causes binaries to be 3x - 10x bigger, not only on MinGW w/ dwarf
debugging info, but linux as well.
Stick with -g, (which defaults to -g2), like autoconf does.
Return true for NATIVE_PARAM_PREMULTIPLIED_ALPHA when all formats with
alpha support premultiplied alpha.
(Based on Chia-I Wu's patch)
[olv: remove the use of param_premultiplied_alpha from the original
patch]
Handle "format" events and return configs for the supported formats.
(Based on Chia-I Wu's patch)
[olv: update and explain why PIPE_FORMAT_B8G8R8A8_UNORM should not be
enabled without HAS_ARGB32]
Return true for NATIVE_PARAM_PREMULTIPLIED_ALPHA when all formats with
alpha support premultiplied alpha. Currently, it means when argb32 and
argb32_pre are both supported.
When wl_drm is avaiable and enabled, handle "format" events and return
configs for the supported formats. Otherwise, assume all formats of
wl_shm are supported.
EGL does not export this capability of a display server. But wayland
makes use of EGL_VG_ALPHA_FORMAT to achieve it.
So, when the native display returns true for the parameter, st/egl will
set EGL_VG_ALPHA_FORMAT_PRE_BIT for all EGLConfig's with non-zero
EGL_ALPHA_SIZE. EGL_VG_ALPHA_FORMAT attribute of a surface will affect
how the surface is presented.
Because st/vega does not support EGL_VG_ALPHA_FORMAT_PRE_BIT,
EGL_OPENVG_BIT will be cleared.
Replace the parameters of native_surface::present by a struct,
native_present_control. Using a struct allows us to add more control
options without having to update each backend every time.
The opcodes and strings were reversed. Quotient means division, and
modulus means remainder.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Follow a subset of changes in 7b1d94e5d1.
There are known issues, but it works to a certain degree. Non-working
demos also fail gracefully. More importantly, it fixes the build.
The list of numbers in (constant type (<numbers>)) needs to contain
exactly type->components() numbers (16 for a mat4, 3 for a vec3, etc.)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Each of these vecN constants only provided one component, which is
illegal. The printed IR is meant to contain exactly as many components
as are necessary; the IR reader does not splat single values.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We were failing to relocate, so on the first draw run our scratch
would tend to get written to 0x0.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We were passing an MRF as the source argument, instead of using the
implied move and putting the MRF number in the proper place in the
instruction encoding.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
On the old backend, we used scalar mode because Mesa IR math is
result.xyzw = math(op0.xxxx), which matched up well. However, in GLSL
IR we do things like result.xy = math(op0.xy), so we want vector mode.
For the common case of result.x = math(op0.x), performance will be the
same (no cost for un-executed channels), though result.xyzw =
math(op0.xxxx) would be worse.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When we tried to retype a brw_null_reg() in CMP(), the retyping didn't
take effect because HW_REG just ignores the type field.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If you get your total GRF count wrong, you write over some other
shader's g0, and the GPU fails shortly thereafter.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We need this for the upcoming fix for sw texture_from_pixmap.
Signed-off-by: Stuart Abercrombie <sabercrombie@chromium.org>
Signed-off-by: Stéphane Marchesin <marcheu@chromium.org>
Current just the items that have been removed from Mesa are mentioned
in the release notes.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Mesa hasn't supported color-index rendering for a long time.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
GL_COLOR_INDEX produced the same result (because GL_BITMAP is always
used for stencil glDrawPixels), but it was confusing to read. I spent
about 15 minutes wondering, "WTF?"
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Mesa hasn't supported color-index rendering for a long time.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
_mesa_make_temp_float_image can't work on color-index textures, but
there is no such thing as a color-index texture anymore.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These sampling functions don't work on color-index textures, but there
is no such thing as a color-index texture anymore.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These enums were only valid with the paletted texture extensions.
This allows a couple other trivial clean-ups.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
There's nothing left that can call any of these functions. This also
removes the meta-ops code that implemented the first two.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This was also discussed at XDS 2010. However, actually making the
change was delayed because several drivers still exposed these
extensions to significant benefit (e.g., tdfx). Now that those
drivers have been removed, this code can be removed as well.
v2: A lot of bits that were missed in the previous patch have been removed.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Since we now lay out the VUE the same way regardless of whether
two-sided color is enabled, brw_compute_vue_map() no longer needs to
know whether two-sided color is enabled. This allows the two-sided
color flag to be removed from the clip, GS, and VS keys, so that fewer
GPU programs need to be recompiled when turning two-sided color on and
off.
Reviewed-by: Eric Anholt <eric@anholt.net>
When doing two-sided color on GEN6+, we use the SF unit's
INPUTATTR_FACING mode to cause front colors to be used on front-facing
triangles, and back colors to be used on back-facing triangles. This
mode requires that the front and back colors be adjacent in the VUE.
Previously, we would only place front and back colors adjacent in the
VUE when two-sided color was enabled. Now we place them adjacent in
the VUE whether two-sided color is enabled or not. (We still only
swizzle the colors when two-sided color is enabled, so there should be
no user-visible change).
This simplifies the implementation of the VUE map and reduces the
amount of code that is dependent on two-sided color mode.
Reviewed-by: Eric Anholt <eric@anholt.net>
The previous computation had two bugs: (a) it used a formula based on
Gen5 for Gen6 and Gen7 as well. (b) it failed to account for the fact
that PSIZ is stored in the VUE header. Fortunately, both bugs caused
it to compute a URB size that was too large, which was benign. This
patch computes the URB size directly from the VUE map, so it gets the
result correct in all circumstances.
Reviewed-by: Eric Anholt <eric@anholt.net>
The variables offset[], idx_to_attr[], nr_bytes, nr_attrs, and
header_regs were all serving purposes which are now served by the VUE
map.
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, brw_clip_interp_vertex() iterated only through the
"non-header" elements of the VUE when performing interpolation
(because header elements don't need interpolation). This code now
refers exclusively to the VUE map to figure out which elements need
interpolation, so that brw_clip_interp_vertex() doesn't need to know
the header size.
Reviewed-by: Eric Anholt <eric@anholt.net>
This patch replaces some ad-hoc computations using ATTR_SIZE and the
offset[] array to use the VUE map functions
brw_vert_result_to_offset() and brw_vue_slot_to_offset().
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously we would examine the offset[] array (since an offset of 0
meant "not in use"). This paves the way for removing the offset[]
array.
Reviewed-by: Eric Anholt <eric@anholt.net>
The offsets within the VUE of HPOS and NDC are needed only in a few
auxiliary clipping functions. This patch moves computation of those
offsets into the functions that need them, and does the computation
using the VUE map.
Reviewed-by: Eric Anholt <eric@anholt.net>
This patch changes get_attr_override() (which computes the
relationship between vertex shader outputs and fragment shader inputs)
to use the VUE map.
Reviewed-by: Eric Anholt <eric@anholt.net>
This patch removes the variables nr_attrs and nr_setup_attrs, whose
purpose is now being served by the VUE map. nr_attr_regs and
nr_setup_regs are still needed, however they are now computed using
the VUE map rather than by counting the number of vertex shader
outputs (which caused subtle bugs when gl_PointSize was written).
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, the SF used nr_setup_attrs to determine whether it was
looking at the last element of the VUE. Changed this code to use the
VUE map.
Reviewed-by: Eric Anholt <eric@anholt.net>
These data structures were serving the same purpose as the VUE map,
but were buggy. Now that the code has been transitioned to use the
VUE map, they are not needed.
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, SF code used the idx_to_attr[] array to compute the
location of entries in the VUE map. This array didn't properly
account for gl_PointSize. Now we use the VUE map directly.
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, some of the code in SF erroneously used bitfields based on
the gl_frag_attrib enum when actually referring to vertex results.
This worked, because coincidentally the particular enum values being
used happened to match between gl_frag_attrib and gl_vert_result. But
it was fragile, because a future change to either gl_vert_result or
gl_frag_attrib would have made the enum values stop matching up. This
patch switches the SF code to use the correct enum.
Reviewed-by: Eric Anholt <eric@anholt.net>
The new function, called get_vert_result(), uses the VUE map to find
the register containing a given vertex attribute. Previously, we used
the attr_to_idx[] array, which served the same purpose but didn't
account for gl_PointSize correctly.
This fixes a bug on pre-Gen6 wherein the back side of a triangle would
be rendered incorrectyl if the vertex shader wrote to gl_PointSize.
Reviewed-by: Eric Anholt <eric@anholt.net>
This patch moves the computation of the SF URB entry read offset from
upload_sf_unit() to its own function, so that it can be re-used when
creating the gen4-5 SF program.
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, the new VS backend computed the size of the URB entry by
counting the number of MRFs used in emitting the URB entry. Now it
just gets it straight from the VUE map.
Reviewed-by: Eric Anholt <eric@anholt.net>
max_usable_mrf has been carefully set such that (max_usable_mrf -
base_mrf) is a multiple of 2, so that an even number of VUE slots are
emitted with each URB write (which Gen6 requires). This patch adds an
assertion to confirm that this is the case, and moves the comment to
this effect to be near the assertion.
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, the new VS backend used two functions,
emit_vue_header_gen6() and emit_vue_header_gen4() to emit the fixed
parts of the VUE, and then a pair of carefully-constructed loops to
emit the rest of the VUE, leaving out the parts that were already
emitted as part of the header.
This patch changes the new VS backend to use the VUE map to emit the
entire VUE.
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, emit_vue_header_gen4() used local variables to keep track
of which registers were storing the NDC and HPOS. This patch uses the
output_reg[] array instead, so that the code that manipulates NDC and
HPOS can be more easily refactored.
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, the old VS backend computed the URB entry size by adding
the number of vertex shader outputs to the size of the URB header.
This often produced a larger result than necessary, because some
vertex shader outputs are stored in the header, so they were being
double counted. This patch changes the old VS backend to compute the
URB entry size directly from the number of slots in the VUE map.
Note: there's a subtle change in that we no longer count header
registers towards the size of the VF input. I believe this is
correct, because the header is only emitted in the output of the VS
stage--it is not present in the input. (As evidence for this, note
that brw_vs_state.c sets urb_entry_read_offset to 0--it does not
include space for the header as part of the VS input).
Reviewed-by: Eric Anholt <eric@anholt.net>
Some parts of the i965 driver keep track of locations within the VUE
(vertex URB entry) using byte offsets. This patch adds inline
functions to compute these byte offsets using the VUE map.
Reviewed-by: Eric Anholt <eric@anholt.net>
Several places in the i965 code make implicit assumptions about the
structure of data in the VUE (vertex URB entry). This patch adds a
function, brw_compute_vue_map(), which computes the structure of the
VUE explicitly. Future patches will modify the rest of the driver to
use the explicitly computed map rather than rely on implicit
assumptions about it.
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, this conversion was duplicated in several places in the
i965 driver. This patch moves it to a common location in mtypes.h,
near the declaration of gl_vert_result and gl_frag_attrib.
I've also added comments to remind us that we may need to revisit the
conversion code when adding elements to gl_vert_result and
gl_frag_attrib.
Reviewed-by: Eric Anholt <eric@anholt.net>
This just adds the opcodes for evergreen, need to work on r600 and cayman
implementations.
don't advertise nativeintegers yet until we work out all the regressions.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This just adds all the API check for vertex arrays using 2101010 types.
2101010 is also useable with GL_BGRA.
v2: fix whitespace.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This adds the vertex processing paths for the 2101010 types. It converts
the attributes to floats for all the immediate entry points, some entrypoints
are normalised and the attrib APIs take a normalized parameter.
There are four main paths,
ui10 -> float unnormalized
i10 -> float unnormalized
ui10 -> float normalized
i10 -> float normalized
along with the ui2/i2 equivs.
Signed-off-by: Dave Airlie <airlied@redhat.com>
add new APIs to the internal mesa driver interface + set funcs in vtxfmt.c
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
These are the new API entrypoints for ARB_vertex_type_2_10_10_10_rev
extension, along with the new INT_2_10_10_10_REV enum.
v2: fixup crazy whitespace cut-n-paste mess
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Drivers supporting native integers set UniformBooleanTrue to the integer value
that should be used for true when uploading uniform booleans. This is ~0 for
Gallium and 1 for i965.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This just reorgs one define in csv file, and adds all the new formats
that are needed for this extension.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This fixes all but one of the piglit regressions from enabling native integers
in softpipe. The change to fix the last regression is still being discussed.
The preferred solution to keeping track of the picture structure
has been putting it in the state tracker, so use picture_structure
instead of frame_started to check if a frame needs to begin.
If picture_structure has been changed, end the frame and start again.
Signed-off-by: Maarten Lankhorst <m.b.lankhorst@gmail.com>
Signed-off-by: Christian König <deathsimple@vodafone.de>
This could happen in 3 different cases, and ERRNO can explain what
happened. First case would be EIO (gpu hang), second EINVAL (something is
wrong inside the batch), and we also discovered that sometimes it happens
with ENOSPACE. All of those cases are different it it could be worth to at
least know what happened.
Signed-off-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
According to the comment, we need to load /some/ push constants on
pre-Gen6 hardware or the GPU will hang. The existing code set these
bogus parameters to NULL pointers; unfortunately, the code in
brw_curbe.c that loads them dereferences those pointers. So, change
them to be pointers to an actual floating point value of 0.0.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
As per Brian's suggestion, add caps for drivers that support texture
offsets to advertise a min/max via TGSI, also use it in the state tracker.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This adds tokens for texture offsets, to store 4 * swizzled vec 3
for use in TXF and other opcodes.
It also contains TGSI exec changes for softpipe to use this code,
along with GLSL->TGSI support for TXF.
v2: add some more comments, add back padding I removed.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This fixes the swrast failures for piglit's fbo-generatemipmap-formats
test (for uncompressed formats). At some point down the road this code
will go away so I haven't checked all the other store_texel() functions.
Simple demos such as test-opengl-gl_basic work. SurfaceFlinger does not
work yet due to missing GL_OES_draw_texture support (and maybe more).
Reviewed-by: Chad Versace <chad@chad-versace.us>
In preparation for porting i915 to Android, factor its source lists into
a shared makefile. This prevents duplication of source lists, and hence
prevents the Android build from breaking as often.
Reviewed-by: Chad Versace <chad@chad-versace.us>
This is a better, more fine-grained way of lowering if statements. Fixes the
game And Yet It Moves on nv50.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We don't want to set the pixmap bit in the EGL config if the DRI
config we're adding is a double buffered config. However, don't clear
any other bits the platform might pass in in the surface_type
argument.
Using multiply and reciprocal for integer division involves potentially
lossy floating point conversions. This is okay for older GPUs that
represent integers as floating point, but undesirable for GPUs with
native integer division instructions.
TGSI, for example, has UDIV/IDIV instructions for integer division,
so it makes sense to handle this directly. Likewise for i965.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Bryan Cain <bryancain3@gmail.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Add generator instructions for the scratch opcodes.
Add emit_before() for handling ->ir and ->annotation inheritance.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This DP4 had one of its operands missing, so we were generating
garbage clip distances. Using the per-opcode instruction generators
made it obvious.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Set ctx->WindowRenderBuffer to EGL_BACK_BUFFER. As EGL_WINDOW_BIT of a
config is set only when there is dri_double_buffer, that makes sure
window surfaces are always double-buffered and contexts will render to
the back buffer.
Reviewed-by: Chad Versace <chad@chad-versace.us>
Advertising different format support based on sample count was a
bad idea, it made resolve to window work, but resolve to anything
else would fail.
See 9f4998639c.
By emitting code before generate_code(), we ended up in align1 mode
where writemasks don't exist, so we rescaled gl_Vertex.w and things
went badly. By moving GL_FIXED support to the visitor, we end up with
normal codegen, and as a bonus the GL_FIXED setup ends up getting
printed appropriately in debug output.
Fixes gtf/GL2Tests/fixed_data_type
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
At some point we need to also move uniform accesses out to pull
constants when there are just too many in use, but we lack tests for
that at the moment.
Fixes glsl-vs-large-uniform-array.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This avoids the massive conditional move array access, and brings code
generation quality for the new VS backend into the realm of efficiency
of the old backend (roughly 20% more instructions generated than
before across shader-db, instead of assertion failing for generating
over 10,000 instructions on many shaders!).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We sometimes want to put an instruction somewhere besides the end of
the instruction stream, and we also want per-opcode instruction
generation to enable compile-time checking of operands.
We'll be using that to track things for the new VS backend, and this will
avoid cluttering brw_vs_surface_state.c for it.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We were primarily failing to convert in the NativeIntegers case, which
this fixes. However, we were also just truncating float uniforms when
converting to integer, which does not appear to be the correct
behavior. Note, however, that the NVIDIA drivers also truncate
instead of rounding.
GL_DOUBLE return type is dropped because it was never used and
completely broken. It can be added when there's test code.
Fixes piglit ARB_shader_objects/getuniform
v2: This is a rewrite of my previous glGetUniform patch, which Ken
pointed out missed storage_type-based conversions to integer,
which was totally broken still thanks to a typo in the testcase.
v3: Quote the spec justifying the rounding behavior.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
At least for Intel, all our uniform components are of uint32_t size, either
float or signed or unsigned int. For uploading uniform data in the driver,
it's much easier to upload a full dword per uniform element instead of trying
to pick out the bool byte and then fill in the top 3 bytes of pad with 0.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Replace each occurence of
#include "../glsl/*.h"
with
#include "glsl/*.h"
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad@chad-versace.us>
libmesa_dri_common is a static library that contains the sources in
src/mesa/drivers/dri/common. Each DRI driver should link to it.
Reviewed-by: Chia-I Wu <olv@lunarg.com>
Signed-off-by: Chad Versace <chad@chad-versace.us>
In src/mesa/Android.mk, it is non-trivial to determine which variables are
imported by `include sources.mak`. So document them.
Reviewed-by: Chia-I Wu <olv@lunarg.com>
Signed-off-by: Chad Versace <chad@chad-versace.us>
libmesa_dricore.a is analogous to the libmesa.a built by the Autoconf
build.
Reviewed-by: Chia-I Wu <olv@lunarg.com>
Signed-off-by: Chad Versace <chad@chad-versace.us>
In order that the Autoconf and Android build can share the same source
lists, move the lists from
src/mesa/drivers/dri/Makefile.defines
into
src/mesa/drivers/dri/common/Makefile.sources
I would like for Android to just reuse Makefile.defines, but the file is
unsuitable for reuse.
Reviewed-by: Chia-I Wu <olv@lunarg.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off: Chad Versace <chad@chad-versace.us>
driverfuncs.o is already contained in libmesa.a, so remove it from the
following source lists:
src/mesa/drivers/dri/Makefiles.defines:COMMON_SOURCES.
src/mesa/drivers/dri/swrast/Makefile:SWRAST_COMMON_SOURCES
Reviewed-by: Ian Romanick <idr@freedesktop.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Remove defintion of COMMON_SOURCES from {r300,r660}/Makefile. The
defintion is a duplicate of that found in
src/mesa/drivers/dri/Makefile.defines.
Reviewed-by: Ian Romanick <idr@freedesktop.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
The layersize calculation is slightly different on +evergreen.
This makes mpeg2 video decoding and piglits texture-packed-formats
test work correctly on this hardware.
I noticed that a thread was created for every time async flush was called, so I moved it and used some semaphores to synch.
Signed-off-by: Maarten Lankhorst <m.b.lankhorst@gmail.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Signed-off-by: Christian König <deathsimple@vodafone.de>
This prevents null dereferences in validation of interdependent
state after a switch to a pipe context where we mark all state
as dirty but where not all state is valid / set yet.
The window system buffer will be BGRA and applications will try to
directly resolve to it, which would trigger an INVALID_OPERATION in
BlitFramebuffer if the multisample renderbuffer is RGBA.
All commonly used windows toolchains define wgl entrypoints in the windows
headers, and mesa_wgl.h not only is unnecessary but actually often stands
in the waydue to slight inconsistencies.
So remove it.
This is a port of vec4_visitor::try_rewrite_rhs_to_dst to fs_visitor.
Not only is this technique less invasive and more robust, it also
generates better code. Over and above the previous technique, this
reduced instruction count in shader-db by 0.28% on average and 1.4% in
the best case.
In no case did this technique result in more code than the prior method.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
This reverts commit 53c89c67f3, along with
the subsequent this->result = reg_undef additions it required.
Both Eric and I agree that the way he did this is really fragile; if you
forget to add this->result = reg_undef before calling accept(), it may
end up using the same register for two separate things, breaking things
in strange and mysterious ways.
The next commit will port over the new VS backend's method for solving
this problem, which is simpler, less intrusive, and still manages to
avoid MOVs in the common case.
Nothing in Mesa supports color-index textures, and most of the other
infrastructure that could allow such support has already been removed.
This puts the final nail in the coffin.
Also clean out some GL_COLOR_INDEX comments in formats.c.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This came from the "kill it with fire" discussion at XDS 2010.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This continues to allocate texImage->Data as before, so
drivers calling these functions need to use that when present.
Reviewed-by: Brian Paul <brianp@vmware.com>
ctx->Driver.MapTextureImage() / UnmapTextureImage() will be called by
the glTex[Sub]Image(), glGetTexImage() functions, etc. when we're
accessing texture data, and also for software rendering when accessing
texture data.
Reviewed-by: Brian Paul <brianp@vmware.com>
All driver implementations of FreeTextureImageBuffer already check
that Data != NULL and free it. However, this means that we will also
free driver storage if the driver storage wasn't in the form of a Data
pointer.
This was produced by the following semantic patch:
@@
expression C;
expression T;
@@
- if (T->Data) {
- C->Driver.FreeTextureImageBuffer(C, T);
+ C->Driver.FreeTextureImageBuffer(C, T);
- }
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This was produced by sed, except for one hunk in driverfuncs.c where
trailing whitespace was dropped.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Add platform_android.c that supports _EGL_PLAFORM_ANDROID. It works
with drm_gralloc, where back buffers of windows are backed by GEM
objects.
In Android a native window has a queue of back buffers allocated by the
server, through drm_gralloc. For each frame, EGL needs to
dequeue the next back buffer
render to the buffer
enqueue the buffer
After enqueuing, the buffer is no longer valid to EGL. A window has no
depth buffer or other aux buffers. They need to be allocated locally by
EGL.
Reviewed-by: Benjamin Franzke <benjaminfranzke@googlemail.com>
Reviewed-by: Chad Versace <chad@chad-versace.us>
[olv: with assorted minor changes, mostly suggested during the review]
Add rgba_masks to dri2_add_config. When it is non-NULL, the DRI config
is accepted only when the offsets and sizes of the its channels match
rgba_mask.
Reviewed-by: Chad Versace <chad@chad-versace.us>
Quickly tested with 945GME. SurfaceFlinger (the display server and
compositor) works. 2D apps with RGB or RGBA visuals work. As for 3D
apps, some work and some do not.
Quickly tested with VMWare Workstation 7.1.4 on Linux with GeForce
GT220. SurfaceFlinger (the display server and compositor) works. 2D
apps with RGB visual works. However, due to missing
PIPE_FORMAT_R8G8B8A8_UNORM support, those with RGBA visual do not.
Factor out C_SOURCES from Makefile to Makefile.sources, and let Makefile
and SConscript share it.
Note that
$(TOP)/src/glsl/ralloc.c and
$(TOP)/src/mesa/program/register_allocate.c
are removed from C_SOURCES in Makefile.sources and added back in
Makefile and SConscript. The idea is that they are not part of r300g.
But having them in libr300.a makes build non-GL targets such as the
compiler tests or g3dvl much easier. Also, for practical reason, TOP
would be an undefined variable in Makefile.sources.
drmVersion and driver specific ioctls are used to get the PCI ID from a
DRM fd. Eexpand the mechanism to nouveau and vmwgfx, except that for
nouveau, only the vendor ID is needed, and for vmwgfx, always assume
SVGA II.
When generating dispatch templates, emit the '(void) blah;' magic to
make GCC happy. This reduces a lot of warning spam if you build with
-Wunused-parameter or -Wextra.
Reviewed-by: Chia-I Wu <olv@lunarg.com>
In preparation for porting i965 to Android, factor its source lists into
a shared makefile. This prevents duplication of source lists, and hence
prevents the Android from breaking as often.
Acked-by: Chia-I Wu <olv@lunarg.com>
Signed-off-by: Chad Versace <chad@chad-versace.us>
If the state tracker tries to map the resource directly but we can't or don't
want to do that, fail to create a transfer.
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Compiling some (large) files with i686-pc-mingw32-gcc 4.2.2 (at least)
and the -gstabs option triggers a compiler error. Use this work-around
to simply compile the effected files without -gstabs.
Make setting the quant matrixes a generic interface.
Also removes setting the quant matrix from the XvMC interface
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Younes Manton <younes.m@gmail.com>
Make the picture_structure enum spec complient.
Also remove it from the compositor.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Younes Manton <younes.m@gmail.com>
Revert back to a macroblock based interface. The structure used
tries to keep as close to the spec as possible.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Younes Manton <younes.m@gmail.com>
Implement PIPE_CAP_NUM_BUFFERS_DESIRED giving the decoder control over
the number of buffers a state tracker should allocate.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Younes Manton <younes.m@gmail.com>
First of all get ride of the decode_buffer structure, while still giving
the decoder the ability to organize it's buffers depending on the needs
of the state tracker.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Younes Manton <younes.m@gmail.com>
Instructions with 3 source operands have no write mask, so we may replace their
destinations with PV/PS in the next group even if their dst.write is 0.
Note: This is a candidate for the 7.11 branch.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Need to do full check when not all bank swizzles in the group are forced
(e.g. when trying to merge interp_* group with the next instruction)
Note: This is a candidate for the 7.11 branch.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Nothing in Mesa generates these opcodes, and i965 hardware cannot
support it natively. If support were ever added for this opcode in
Mesa, there had better be a lowering pass for hardware that doesn't
support it natively.
while debugging texelFetchOffset we kept hitting the assert.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Otherwise we continue and hit the "Illegal formal parameter mode"
assertion.
Fixes negative compile test texelFetchOffset.frag in piglit.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds texelFetch support to translate from GLSL to TGSI TXF opcode.
I've tested this works with an r600g and softpipe backend.
v2: drop comments, fix title,
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Bryan Cain <bryancain3@gmail.com>
This just calls the texel fetch functions directly bypassing the sampling,
notes:
1: loops inside switch should be more optimal.
2: borders can be sampled though only up to border depth, outside that
its undefined.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is a straight texel fetch with no filtering or clamping. It uses
integers to specify the i/j/k (from EXT_gpu_shader4).
To enable this I had to add another hook into the tgsi sampler so that
we could easily bypass all the filtering sample does.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This adds the get_dims callback that is called from the tgsi exec_txq.
It returns values as per EXT_gpu_program4.
v2: fix one indent + use a switch (slighty modified from Brian)
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
this adds another callback in the sampler struct containing get_dims
entry point. This is used to query the driver for the texture resource
dimensions for the resource bound to the current sampler.
v2: remove unusued variable, fix indent
Signed-off-by: Dave Airlie <airlied@redhat.com>
It's the same as GL_AMD_conservative_depth. The specs have slight
differences in wording, but don't differ in content or behavior.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Tested with a Radeon HD 6250. SurfaceFlinger (the display server and
compositor) works. 2D apps with RGB or RGBA visuals work. As for 3D
apps, some work but some don't (with serious rendering defects).
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Put restrict in the function definitions to silence MSVC warnings
about incompatible assignments in "func = lp_tile_foobar;" when func
was declared with restrict keywords but the rhs function wasn't.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This patch documents some Mesa coding style conventions that came up
during the discussion of commit 67b5a32 (Perform implicit type
conversions on function call out parameters).
Before, if we ended up here without a BO for our image, but did choose
a miptree that had active rendering in the command buffer, our
teximage data would jump ahead of the rendering using the old texture
contents.
This showed up as breakage in gen-teximage and friends in the
following commit.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Several drivers have these fields in their subclasses of gl_texture_image.
They'll be useful for core Mesa too...
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The use of mmap() in winsys requires large file support. Not all OSes
have LFS so a wrapper should be used. In particular, os_mmap() should
call __mmap2() on Android.
Replace all calls to dd_function_table::MapBuffer with appropriate
calls to dd_function_table::MapBufferRange, then remove all the cruft.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The code previously passed GL_DYNAMIC_DRAW for the access parameter.
By inspection, I believe that all drivers would treat this as
GL_READ_WRITE because it's not GL_READ_ONLY and it's not
GL_WRITE_ONLY.
It appears the i965 code wants GL_WRITE_ONLY (it's about to write a
bunch of data in, never read data), while the arrayelt code is
GL_READ_ONLY (just dereffed as arguments to CALL_Whatever*v).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Keith Whitwell <keithw@vmware.com>
No driver used that parameter, and most drivers ended up with a bunch
of unused-parameter warnings because it was there.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
No driver used that parameter, and most drivers ended up with a bunch
of unused-parameter warnings because it was there.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
No driver used that parameter, and most drivers ended up with a bunch
of unused-parameter warnings because it was there.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
No driver used that parameter, and most drivers ended up with a bunch
of unused-parameter warnings because it was there.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
No driver used that parameter, and most drivers ended up with a bunch
of unused-parameter warnings because it was there.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
No driver used that parameter, and most drivers ended up with a bunch
of unused-parameter warnings because it was there.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Unfortunately, since a previous efficiency improvement, we no longer
have any open-source testcases producing register spilling, so this
code was untested in the fragment shader path. That should change
when we get proper temporary array support in the fragment shader.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=40194
Also, remove the BRW_SAMPLER_MESSAGE_SIMD8_RESINFO #define because
there totally isn't a SIMD8 variant.
Unfortunately, resinfo returns FLOAT32 on Broadwater/Crestline, unlike
G45 which returns a proper UINT32. This turns out to be simple,
however: when we emit MOVs to select the desired half of the SIMD16
result, we can simply override the register type to be float so it's
converted to an integer.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Not all texturing operations return floating point data. For example,
the resinfo message (textureSize or TXS) returns integer data. In the
future, we'll also add integer texture support.
ir_texture's type field contains this information; use its base type to
appropriately type the destination register. We want to keep it as a
four component vector, however, since SIMD8 samplers always have a
response length of 4.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Formats were based on a patch sent to xf86-video-nouveau by Bryan Cain
Signed-off-by: Maarten Lankhorst <m.b.lankhorst@gmail.com>
[Michel Dänzer: Add xorg_xvmc.c to SConscript.]
The driver may install its own vertex shader. _mesa_set_vp_override
must be called so that core mesa can generate correct fragment program..
Reviewed-by: Brian Paul <brianp@vmware.com>
Factor out source lists from Makefile to Makefile.sources, and let
Makefile, SConscript, and Android.mk share it.
Note that files in $(GENERATED_SOURCES) are removed from $(C_SOURCES).
Acked-by: José Fonseca <jfonseca@vmware.com>
Acked-by: Chad Versace <chad@chad-versace.us>
ParseSourceList() can be used to parse a source list file and returns
the source files defined in it. It is supposed to be used like this
# get the list of source files from C_SOURCES in Makefile.sources
sources = env.ParseSourceList('Makefile.sources', 'C_SOURCES')
The syntax of a source list file is compatible with GNU Make. This
effectively allows SConscript and Makefile to share the source lists.
Acked-by: José Fonseca <jfonseca@vmware.com>
Acked-by: Chad Versace <chad@chad-versace.us>
There is no ir_hierarchical_visitor::visit(ir_if *) method, since ir_if
is not a leaf node. Instead, there are visit_enter and visit_leave
methods. Use visit_enter arbitrarily (either would work fine, though
visit_enter will catch errors sooner).
Found thanks to a warning emitted by Clang.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
When intel_context requires separate stencil but the DRI2 separate stencil
handshake fails, then abort and emit an error instructing the user to
upgrade the DDX to 2.16.0.
CC: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Implement the any() part of the operation the same way regular ir_unop_any
is implemented.
This is a port of commit e7bf096e8b to glsl_to_tgsi, with added integer
support.
Logical-or is implemented using addition (followed by clamping to [0,1]) on
values of 0.0 and 1.0. Replacing the logical-or operators with addition gives
a + b which has a result on the range [0, 2].
Previously a SNE instruction was used to clamp the resulting logic value to
[0,1]. In a fragment shader, using a saturate on the add has the same effect.
Adding the saturate to the add is free, so (at least) one instruction is
saved. In a vertex shader, using an SLT on the negation of the add result has
the same effect. Many older shader architectures do not support the SNE
instruction. It must be emulated using two SLT instructions and an ADD. On
these architectures, the single SLT saves two instructions.
Note that SNE is still used when integers are used for boolean values, since
there is no such thing as an integer saturate, and older shader architectures
without SNE don't support integers.
This is a port of commit 41f8ffe5e0 to glsl_to_tgsi with integer support
added.
Since this is the software path, set GRALLOC_USAGE_SW_WRITE_OFTEN when
PIPE_BIND_RENDER_TARGET, and set GRALLOC_USAGE_SW_READ_OFTEN when
PIPE_BIND_SAMPLER_VIEW.
libGLES_mesa with swrast should link in these libraries
libmesa_egl
libmesa_egl_gallium
libmesa_st_egl
libmesa_st_mesa
libmesa_glsl
libmesa_glsl_utils
libmesa_pipe_softpipe
libmesa_winsys_sw_android
libmesa_gallium
Reviewed-by: Chad Versace <chad@chad-versace.us>
This builds the static library libmesa_glsl and executable glsl_compiler
from glsl. glsl_compiler is only installed for engineering build.
Reviewed-by: Chad Versace <chad@chad-versace.us>
This is the first step to integrate Mesa into Android(-x86) build
system. You can git clone mesa under the external/ directory of Android
source tree and build Android with
$ make BOARD_GPU_DRIVERS=swrast
It will build libGLES_mesa that will be loaded by Android runtime.
libGLES_mesa is still a stub in this commit.
Both HW and SW rendering are supported for Android. For SW rendering,
we use the generic gralloc lock/unlock for mapping and unmapping color
buffers (in winsys/android).
For HW rendering, we need to know the real type of color buffers. This
backend works with drm_gralloc, where a color buffer is backed by a GEM
object.
On Android, color buffers are passed between server and clients as
opaque buffer_handle_t. This winsys makes use of gralloc, which
provides a generic way to map and unmap buffer_handle_t for CPU access.
Add EGL_ANDROID_image_native_buffer and EGL_ANDROID_swap_rectangle.
There is no spec for them though.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Chad Versace <chad@chad-versace.us>
Android uses Linux kernel and its own C runtime. It resembles
PIPE_OS_LINUX a lot with some minor exceptions.
Reviewed-by: Brian Paul <brianp@vmware.com>
Move vbo_exec_FlushVertices_internal out of FEATURE_beginend.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Chad Versace <chad@chad-versace.us>
Makes the new vertex shader backend work on Ivybridge.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
When ctx->Const.NativeIntegers is set, Core Mesa loads integer/boolean
uniforms directly, rather than loading the floating point equivalent.
So, when that's set, we don't need to perform any conversions.
Unfortunately, we can't properly support native integers with the old
vertex shader backend, so this patch leaves them disabled for now.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, native integer support was based on whether the driver
advertised GLSL 1.30 or not. However, drivers that natively support
integers may wish to do so for older GLSL versions as well. Adding this
new opt-in flag allows them to do so.
Currently disabled by default on all drivers, which was the existing
behavior (no drivers currently implement GLSL 1.30).
Fixes piglit tests on i965 with INTEL_GLSL_VERSION=130 set:
- spec/glsl-1.10/fs-uniform-int-110.shader_test
- spec/glsl-1.30/fs-uniform-int-130.shader_test
(it was doubly converting the data)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Fixes vs-atan-* and several others. This is not the real solution we
eventually want, which will pack floats, vec2s, and vec3s into vec4
registers, but this code should provide the framework for that.
This is a rather pessimistic calculation, since it doesn't distinguish
individual channels of a vec4, or elements of an array, but should be
a minimum start for register allocation.
The areamap contains precomputed data on different aliasing types.
It is necessary for good performance.
Signed-off-by: Lauri Kasanen <cand@gmx.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
The state tracker expects depth and stencil pixels interleaved.
Evergreen can bind an interleaved depth-stencil resource as a colorbuffer,
but not as a zbuffer.
The hardware can do the interleaving for us when decompressing.
Such that it actually works in apps which use both.
A separate buffer is allocated for stencil. The only exception is
the window-system-provided depth-stencil buffer, where depth and stencil
share the same buffer.
This fixes:
- fbo-depthstencil-GL_DEPTH24_STENCIL8-clear
- fbo-depthstencil-GL_DEPTH24_STENCIL8-drawpixels-FLOAT-and-USHORT
- fbo-depthstencil-GL_DEPTH24_STENCIL8-readpixels-24_8
- fbo-depthstencil-GL_DEPTH24_STENCIL8-readpixels-FLOAT-and-USHORT
This was an unfinished to-do item before.
With this patch and the two preceeding patches, piglit's
fbo-generatemipmap-array test runs and passes instead of generating
a GL error and dying on an assertion.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We could do 1D/2D arrays with textured quad rendering, but it'll take
some work (as with 3D textures).
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Q should not be significant for OPCODE_TEX, but it winds up getting
passed to the compute_lambda() function. Make sure it's 1.0 to
prevent garbage values, which is effectively what we get when the
swizzle is coord.xyzz (which is what GLSL gives us).
Part of the fix for piglit's fbo-generatemipmap-array test.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Declare _mesa_meta_begin()/end() in meta.h so that drivers can write
custom meta-ops (such as HiZ resolves for i965).
This necessitates moving the the META_* macros into meta.h. To prevent
naming collisions, this commit renames each macro to be MESA_META_*.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Commit 6eff33dc (glapi: generate ES dispatch headers from core mesa)
replaced the autogenerated files
src/mapi/es1api/main/{dispatch,remap_helper}.h with new autogenerated
files src/mesa/main/api_exec_es{1,2}_{dispatch,remap_helper}.h. This
patch updates the .gitignore files to properly ignore the new
autogenerated files, and stop ignoring the old autogenerated files.
Reviewed-by: Chia-I Wu <olv@lunarg.com>
The flush extensions flush call indicates end of frame and should only
be called once per frame. However, in the dri2SwapBuffer fallback
path, we call flush and then call dri2CopySubBuffer, which also calls
flush. Refactor the code to only call flush once.
Needed for GL3.
v2: evergreen support
I don't set PA_SU_SC_MODE_CNTL.MULTI_PRIM_IB_ENA.
piglit/primitive-restart does pass though. Tested on RV730 and EG-REDWOOD.
The MUL opcode does a 16bit * 32bit multiply, and we need to do the
MACH to get the top 16bit * 32bit added in.
Fixes fs-op-mult-int-*, fs-op-mult-ivec*
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Shader Model 3.0[1] requires that shaders be able to execute at least
65536 instructions. Bump Mesa maxExec to that limit. This allows
several vertex shaders in the OpenGL ES 2.0 conformance test suite to
run to completion.
1: http://en.wikipedia.org/wiki/High_Level_Shader_Language
Reviewed-by: Eric Anholt <eric@anholt.net>
This cleans up some code generated by the IR-to-Mesa pass for i915.
In particular, some shaders involving arrays of constant matrices
result in really bad code.
v2: Silence several warnings from merging the gl_constant_value work.
Fix DP[23] folding. Add support for a bunch more opcodes that appear
in piglit runs on i915.
Reviewed-by: Eric Anholt <eric@anholt.net>
!a && b occurs frequently when nexted if-statements have been
flattened. It should also be possible use a MAD for (a && b) || c,
though that would require a MAD_SAT.
Reviewed-by: Eric Anholt <eric@anholt.net>
The operation ir_binop_all_equal is !(a.x != b.x || a.y != b.y || a.z
!= b.z || a.w != b.w). Logical-or is implemented using addition
(followed by clampling to [0,1]) on values of 0.0 and 1.0. Replacing
the logical-or operators with addition gives !bool((int(a.x != b.x) +
int(a.y == b.y) + int(a.z == b.z) + int(a.w == b.w)). This can be
implemented using a dot-product with a vector of all 1.0. After the
dot-product, the value will be an integer on the range [0,4].
Previously a SEQ instruction was used to clamp the resulting logic
value to [0,1] and invert the result. Using an SGE instruction on the
negation of the dot-product result has the same effect. Many older
shader architectures do not support the SEQ instruction. It must be
emulated using two SGE instructions and a MUL. On these
architectures, the single SGE saves two instructions.
Reviewed-by: Eric Anholt <eric@anholt.net>
The operation ir_binop_any_nequal is (a.x != b.x) || (a.y != b.y) ||
(a.z != b.z) || (a.w != b.w), and that is the same as any(bvec4(a.x !=
b.x, a.y != b.y, a.z != b.z, a.w != b.w)). Implement the any() part
the same way the regular ir_unop_any is implemented.
Reviewed-by: Eric Anholt <eric@anholt.net>
This is just like the ir_binop_logic_or case. The operation
ir_unop_any is (a.x || a.y || a.z || a.w). Logical-or is implemented
using addition (followed by clampling to [0,1]) on values of 0.0 and
1.0. Replacing the logical-or operators with addition gives (a.x +
a.y + a.z + a.w). This can be implemented using a dot-product with a
vector of all 1.0.
Previously a SNE instruction was used to clamp the resulting logic
value to [0,1]. In a fragment shader, using a saturate on the
dot-product has the same effect. Adding the saturate to the
dot-product is free, so (at least) one instruction is saved.
In a vertex shader, using an SLT on the negation of the dot-product
result has the same effect. Many older shader architectures do not
support the SNE instruction. It must be emulated using two SLT
instructions and an ADD. On these architectures, the single SLT saves
two instructions.
Reviewed-by: Eric Anholt <eric@anholt.net>
Logical-or is implemented using addition (followed by clampling to
[0,1]) on values of 0.0 and 1.0. Replacing the logical-or operators
with addition gives a + b which has a result on the range [0, 2].
Previously a SNE instruction was used to clamp the resulting logic
value to [0,1]. In a fragment shader, using a saturate on the add has
the same effect. Adding the saturate to the add is free, so (at
least) one instruction is saved.
In a vertex shader, using an SLT on the negation of the add result has
the same effect. Many older shader architectures do not support the
SNE instruction. It must be emulated using two SLT instructions and
an ADD. On these architectures, the single SLT saves two
instructions.
Reviewed-by: Eric Anholt <eric@anholt.net>
Remove the inclusion of fpu_control.h from compiler.h. Since Bionic lacks
fpu_control.h, this fixes the Android build.
Also remove the sole use of the fpu_control bits, which was in debug.c.
Those were brianp's debug bits, and he approved of their removal.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad@chad-versace.us>
We can't just look at the instruction that happens to appear at the
start of the loop, because it might be some other exec size and cause
us to only loop on the first N channels. We always want 8 in our
current code (since 16 doesn't work so we don't do 16-wide fragment in
that case).
Fixes loop-03.vert, which was triggering the assertions.
Link failure is something that shouldn't happen, but we sometimes want
it during development. The precompile also allows analysis of shader
codegen with shader-db.
This fixes most of the regressions in the vs array test set from the
varying array indexing work, since the giant array that was originally
allocated in virtual GRF space never gets used and is only ever
read/stored from scratch space.
We keep building these strange interfaces for DP read/write where
there's a helper function with some partially-specific,
partially-general controls, which is used in exactly one place in code
generation. Making these public will let us set up those instructions
in the one place they're to be generated.
For structs/arrays/matrices, they were ending up as uint because we
forgot to set them. All varyings in GLSL 1.20 are of base type float,
so just force the matter here (which gets inherited at
emit_urb_writes() time).
Fixes vs-varying-array-mat2-col-rd.
The low-level IR is a mashup of brw_fs.cpp and ir_to_mesa.cpp. It's
currently controlled by the INTEL_NEW_VS=1 environment variable, and
only tested for the trivial "gl_Position = gl_Vertex;" shader so far.
This will be used by the new vertex shader backend. The scalarizing
passes are skipped for non-fragment, since vertex and geometry threads
are based on vec4s.
This patch fixes a bug when lowering an integer division:
x/y
to a multiplication by a reciprocal:
int(float(x)*reciprocal(float(y)))
If x was a plain int and y was an ivecN, the lowering pass
incorrectly assigned the type of the product to be float, when in fact
it should be vecN. This caused mesa to abort with an IR validation
error.
Fixes piglit tests {fs,vs}-op-div-int-ivec{2,3,4}.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I also needed to make some changes in u_vbuf_mgr in order to override
the caps from the driver and enable the fallback even though the driver
claims the format is supported.
It's going to flush client's commands in eglWaitClient(). Before this,
egl applications using pixmap or pbuffer flicker because of no flush.
Reviewed-by: Alan Hourihane
The vs-varying-array-mat2-col-row-wr test writes a mat2[3] constant to
a mat2[3] varying out array, and also statically accesses element 1 of
it on the VS and FS sides. At link time it would get trimmed down to
just 2 elements, and then codegen of the VS would end up generating
assignments to the unallocated last entry of the array. On the new
i965 VS backend, that happened to land on the vertex position.
Some issues remain in this test on softpipe, i965/old-vs and
i965/new-vs on visual inspection, but i965 is passing because only one
green pixel is probed, not the whole split green/red quad.
This patch extends ir_validate.cpp to check the following
characteristics of each ir_call:
- The number of actual parameters must match the number of formal
parameters in the signature.
- The type of each actual parameter must match the type of the
corresponding formal parameter in the signature.
- Each "out" or "inout" actual parameter must be an lvalue.
Reviewed-by: Chad Versace <chad@chad-versace.us>
These functions don't modify the target instruction, so it makes sense
to make them const. This allows these functions to be called from ir
validation code (which uses const to ensure that it doesn't
accidentally modify the IR being validated).
Reviewed-by: Chad Versace <chad@chad-versace.us>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When an out parameter undergoes an implicit type conversion, we need
to store it in a temporary, and then after the call completes, convert
the resulting value. In other words, we convert code like the
following:
void f(out int x);
float value;
f(value);
Into IR that's equivalent to this:
void f(out int x);
float value;
int out_parameter_conversion;
f(out_parameter_conversion);
value = float(out_parameter_conversion);
This transformation needs to happen during ast-to-IR convertion (as
opposed to, say, a lowering pass), because it is invalid IR for formal
and actual parameters to have types that don't match.
Fixes piglit tests
spec/glsl-1.20/compiler/qualifiers/out-conversion-int-to-float.vert and
spec/glsl-1.20/execution/qualifiers/vs-out-conversion-*.shader_test,
and bug 39651.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=39651
Reviewed-by: Chad Versace <chad@chad-versace.us>
libGLw is an old OpenGL widget library with optional Motif support.
It almost never changes and very few people actually still care about
it, so we've decided to ship it separately.
The new home for libGLw is: git://git.freedesktop.org/mesa/glw/
Reviewed-by: Brian Paul <brianp@vmware.com>
Previously if-statements were lowered from inner-most to outer-most
(i.e., bottom-up). All assignments within an if-statement would have
the condition of the if-statement appended to its existing condition.
As a result the assignments from a deeply nested if-statement would
have a very long and complex condition.
Several shaders in the OpenGL ES2 conformance test suite contain
non-constant array indexing that has been lowered by the shader
writer. These tests usually look something like:
if (i == 0) {
value = array[0];
} else if (i == 1) {
value = array[1];
} else ...
The IR for the last assignment ends up as:
(assign (expression bool && (expression bool ! (var_ref if_to_cond_assign_condition) ) (expression bool && (expression bool ! (var_ref if_to_cond_assign_condition@20) ) (expression bool && (expression bool ! (var_ref if_to_cond_assign_condition@22) ) (expression bool && (expression bool ! (var_ref if_to_cond_assign_condition@24) ) (var_ref if_to_cond_assign_condition@26) ) ) ) ) (x) (var_ref value) (array_ref (var_ref array) (constant int (5)))
The Mesa IR that is generated from this is just as awesome as you
might expect.
Three changes are made to the way if-statements are lowered.
1. Two condition variables, if_to_cond_assign_then and
if_to_cond_assign_else, are created for each if-then-else structure.
The former contains the "positive" condition, and the later contains
the "negative" condtion. This change was implemented in the previous
patch.
2. Each condition variable is added to a hash-table when it is created.
3. When lowering an if-statement, assignments to existing condtion
variables get the current condition anded. This ensures that nested
condition variables are only set to true when the condition variable
for all outer if-statements is also true.
Changes #1 and #3 combine to ensure the correctness of the resulting
code.
4. When a condition assignment is encountered with a condition that is
a dereference of a previously added condition variable, the condition
is not modified.
Change #4 prevents the continuous accumulation of conditions on
assignments.
If the original if-statements were:
if (x) {
if (a && b && c && d && e) {
...
} else {
...
}
} else {
if (g && h && i && j && k) {
...
} else {
...
}
}
The lowered code will be
if_to_cond_assign_then@1 = x;
if_to_cond_assign_then@2 = a && b && c && d && e
&& if_to_cond_assign_then@1;
...
if_to_cond_assign_else@2 = !if_to_cond_assign_then
&& if_to_cond_assign_then@1;
...
if_to_cond_assign_else@1 = !if_to_cond_assign_then@1;
if_to_cond_assign_then@3 = g && h && i && j;
&& if_to_cond_assign_else@1;
...
if_to_cond_assign_else@3 = !if_to_cond_assign_then
&& if_to_cond_assign_else@1;
...
Depending on how instructions are emitted, there may be an extra
instruction due to the duplication of the '&&
if_to_cond_assign_{then,else}@1' on the nested else conditions. In
addition, this may cause some unnecessary register pressure since in
the simple case (where the nested conditions are not complex) the
nested then-condition variables are live longer than strictly
necessary.
Before this change, one of the shaders in the OpenGL ES2 conformance
test suite's acos_float_frag_xvary generated 348 Mesa IR instructions.
After this change it only generates 124. Many, but not all, of these
instructions would have also been eliminated by CSE.
Reviewed-by: Eric Anholt <eric@anholt.net>
Now the condition (for the then-clause) and the inverse condition (for
the else-clause) get written to separate temporary variables. In the
presence of complex conditions, this shouldn't result in more code
being generated. If the original if-statement was
if (a && b && c && d && e) {
...
} else {
...
}
The lowered code will be
if_to_cond_assign_then = a && b && c && d && e;
...
if_to_cond_assign_else = !if_to_cond_assign_then;
...
Reviewed-by: Eric Anholt <eric@anholt.net>
EGL doesnt define howto manage different native platforms.
So mesa has a builtime configurable default platform,
whith non-standard envvar (EGL_PLATFORM) overwrites.
This caused unneeded bugreports, when EGL_PLATFORM was forgotten.
Detection is grouped into basic types of NativeDisplays (which itself
needs to be detected). The final decision is based on characteristcs
of these basic types:
File Desciptor based platforms (fbdev):
- fstat(2) to check for being a fd that belongs to a character device
- check kernel subsystem (todo)
Pointer to structuctures (x11, wayland, drm/gbm):
- mincore(2) to check whether its valid pointer to some memory.
- magic elements (e.g. pointers to exported symbols):
o wayland display stores interface type pointer (first elm.)
o gbm stores pointer to its constructor (first elm.)
o x11 as a fallback (FIXME?)
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
GLESv1 and GLESv2 have their own dispatch.h and remap_helper.h. These
headers are only used by api_exec_es1.c and api_exec_es2.c in core mesa.
Move the rules to generate them from glapi to core mesa.
Reviewed-by: Brian Paul <brianp@vmware.com>
[olv: updated after reviewing to fix SCons build]
glapi_gen.mk is supposed to be included by glapi users to simplify
header generation. This commit also makes es1api, es2api, and
shared-glapi use it.
Reviewed-by: Brian Paul <brianp@vmware.com>
[olv: updated after reviewing to prefix all variables in glapi_gen.mk by
glapi_gen]
glapi/gen-es/ defines two sets of GLAPI XMLs for OpenGL ES 1.1
(es1_API.xml) and 2.0 (es2_API.xml) respectively. They are used to
generate dispatch.h and remap_helper.h for GLES. Together with
gl_and_es_API.xml, we have to maintain three sets of GLAPI XMLs.
This commit makes dispatch.h and remap_helper.h for GLES be generated
from gl_and_es_API.xml.
Reviewed-by: Brian Paul <brianp@vmware.com>
add gl_api::filter_functions and gl_function::filter_entry_points to
filter out unwanted functions and entry points.
Reviewed-by: Brian Paul <brianp@vmware.com>
Move the list of entry points belong to GLES from mapi_abi.py to a new
file.
Until we figure out how to describe the APIs an entry point belongs to
in the XML file, and how to handle the case where an entry point others
alias is missing in some APIs, this is an easier solution than
maintaining another two sets of XMLs in glapi/gen-es/.
Reviewed-by: Brian Paul <brianp@vmware.com>
Remove the 'f' suffix from a float literal.
- .float 0.0f+1.0
+ .float 1.0
This fixes the following compile error with clang:
error: unexpected token in directive
.float 0.0f+1.0
^
Note: This is a candidate for the stable branches.
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Optional parallel rendering of spans using OpenMP.
Initial implementation for aa triangles. A new option for scons is
also provided to activate the openmp support (off by default).
Signed-off-by: Brian Paul <brianp@vmware.com>
After copy buffer on preGEN6, it is necessary to wait for the blit to
complete before returning data to the user.
This should fix the piglit test: copy_buffer_coherency (pre-GEN6).
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
"reg" was set in only one case, virtual GRFs pre register allocation,
and would be unset and have hw_reg set after allocation. Since we
never bothered with looking at virtual GRF number after allocation
anyway, just use the same storage and avoid confusion.
Besides separating out a logical step of the giant register allocator
function, this now communicates a bunch of the allocator information
through entries in brw_context, which will make this code partially
reusable for caching the expensive allocator setup.
It's fewer pointers to track, and when we start caching the register
set, should be algorithmically better in the cache hit case (lookup in
a byte-per-register array, instead of a linear walk through
desctiption of register classes to find how to translate that class).
This was a debugging aid at one point -- virtual grf 0 should never be
allocated, and it would be used if undefined register access occurred
in codegen. However, it made the confusing register allocation code
even more confusing by indexing things off of 1 all over.
At least one of the invariants verified by IR validation concerns the
relative ordering of toplevel constructs in the IR: references to
global variables must come after the declarations of those global
variables.
Since linking affects the ordering of toplevel constructs in the IR,
it's possible that a bug in the linker will cause invalid IR to be
generated, even if all the pre-linked shaders are valid. (In fact,
such a bug was fixed by the previous commit.)
Bugs like this are easily masked by further optimization passes,
particularly inlining. So to make them easier to track down, this
patch addes an IR validation step right after linking, and before
final optimization occurs. The validation only occurs on debug
builds.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When link_functions.cpp adds a new function to the final linked
program, it needs to add it after any global variable declarations
that the function refers to, otherwise the IR will be invalid (because
variable declarations must occur before variable accesses). The
easiest way to do that is to have the linker emit functions to the
tail of the final linked program.
The linker used to emit functions to the head of the final linked
program, in an effort to keep callees sorted before their callers.
However, this was not reliable: it didn't work for functions declared
or defined in the same compilation unit as main, for diamond-shaped
patterns in the call graph, or for some obscure cases involving
overloaded functions. And no code currently relies on this sort
order.
No Piglit regressions with i965 Ironlake.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
process_array_type() contains an assertion to verify that no IR
instructions are generated while processing the expression that
specifies the size of the array. This assertion needs to happen
_after_ checking whether the expression is constant. Otherwise we may
crash on an illegal shader rather than reporting an error.
Fixes piglit tests array-size-non-builtin-function.vert and
array-size-with-side-effect.vert.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Rearranged the logic for converting the ast for a function call to
hir, so that we constant fold before emitting any IR. Previously we
would emit some IR, and then only later detect whether we could
constant fold. The unnecessary IR would usually get cleaned up by a
later optimization step, however in the case of a builtin function
being used to compute an array size, it was causing an assertion.
Fixes Piglit test array-size-constant-relational.vert.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38625
The ast-to-hir conversion needs to emit function signatures in two
circumstances: when a function declaration (or definition) is
encountered, and when a built-in function is encountered.
To avoid emitting a function signature in an illegal place (such as
inside a function), emit_function() checked whether we were inside a
function definition, and if so, emitted the signature before the
function definition.
However, this didn't cover the case of emitting function signatures
for built-in functions when those built-in functions are called from
inside the constant integer expression that specifies the length of a
global array. This failed because when processing an array length, we
are emitting IR into a dummy exec_list (see process_array_type() in
ast_to_hir.cpp). process_array_type() later checks (via an assertion)
that no instructions were emitted to the dummy exec_list, based on the
reasonable assumption that we shouldn't need to emit instructions to
calculate the value of a constant.
This patch changes emit_function() so that it emits function
signatures at toplevel in all cases.
This partially fixes bug 38625
(https://bugs.freedesktop.org/show_bug.cgi?id=38625). The remainder
of the fix is in the patch that follows.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
opt_dead_functions contained a shortcut to skip processing the first
function's body, based on the assumption that IR functions are
topologically sorted, with callees always coming before their callers
(therefore the first function cannot contain any calls).
This assumption turns out not to be true in general. For example, the
following code snippet gets translated to IR that violates this
assumption:
void f();
void g();
void f() { g(); }
void g() { ... }
In practice, the shortcut didn't cause bugs because of a coincidence
of the circumstances in which opt_dead_functions is called:
(a) we do inlining right before dead function elimination, and
inlining (when successful) eliminates all calls.
(b) for user-defined functions, inlining is always successful, because
previous optimization passes (during compilation) have reduced
them to a form that is eligible for inlining.
(c) the function that appears first in the IR can't possibly call a
built-in function, because built-in functions are always emitted
before the function that calls them.
It seems unnecessarily fragile to have opt_dead_functions depend on
these coincidences. And the next patch in this series will break (c).
So I'm reverting the shortcut. The consequence will be a slight
increase in link time for complex shaders.
This reverts commit c75427f4c8.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This reverts an unnecessary part of commit 4683529048 and fixes misrendering
and an assertion failure in Cogs.
Fixes freedesktop.org bug 39888.
Reviewed-by: Brian Paul <brianp@vmware.com>
If there are any cases left where the st thinks that RGBA -> BGRA
will swap components, it will get what it deserves.
Now the GPU's 2D engine goes unused. What a shame.
validate_program relies on validate_shader_program to fill in errMsg;
empirically, there exist cases where that doesn't happen.
While tracking those down may be worthwhile, initializing the string so
we don't try to ralloc_strdup random garbage also seems wise.
Fixes issues caught by valgrind while running some test case.
NOTE: This is a candidate for stable release branches.
Reviewed-by: Chad Versace <chad@chad-versace.us>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
DRI2 will throw BadRequest for this when the client is not local, but
DRI2 is an implementation detail and not something callers should have
to know about. Silently swallow errors in this case, and just propagate
the failure through DRI2Connect's return code.
Note: This is a candidate for the stable release branches.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=28125
Signed-off-by: Christopher James Halse Rogers <christopher.halse.rogers@canonical.com>
This saves both register space and upload bandwidth for unused values.
Note that previously we were relying on the visitor not initially
generating references to different sets of uniforms between the 8-wide
and 16-wide code generation, and now we're relying on them dead-code
eliminating the same stuff, too.
We should remove the relocations which caused a validation failure
from the list, so that the kernel receives only the validated ones.
NOTE: This is a candidate for the 7.11 branch.
That code drops performance in Unigine Heaven and Tropics
by a factor of 10. That's too crazy even for a debug build.
NOTE: This is a candidate for the 7.11 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
Unlike C++, empty declarations such as
float;
should be valid. The spec is not explicit about this actually.
Some apps that generate their shader sources may rely on this. This was
noted when porting one of them to Linux from Windows.
Reviewed-by: Chad Versace <chad@chad-versace.us>
Note: this is a candidate for the 7.11 branch.
Resolve via glBlitFramebuffer allows resolving a sub-region of a
renderbuffer to a different location in any mipmap level of some
other texture, and, with a new extension, even scaling. Therefore,
location and size parameters are needed.
The mask parameter was added because resolving only depth or only
stencil of a combined buffer is possible as well.
Full information about the blit operation allows the drivers to
take the most efficient path they possibly can.
This avoids the following runtime error with EGL on platforms that
require linking with libm for nontrivial math functions:
failed to load module: /xorg/lib64/gbm/gbm_gallium_drm.so: undefined
symbol: powf
(Based on Kristóf RALOVICHs patch and Ian's suggestions in
http://lists.freedesktop.org/archives/mesa-dev/2011-August/010036.html)
Use backend_map kernel query if supported, otherwise analyze ZPASS_DONE
results to get the mask.
Fixes lockups with predicated rendering due to incorrect query buffer
initialization on some cards.
Note: this is a candidate for the 7.11 branch.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
These looked more like copy-and-paste to me than the others (which
looked more like possibly someone forgot to write some code in a
refactor), so I didn't verify where they came from.
This makes piglit a lot more happy. The errors are logged when
INTEL_DEBUG=fallbacks because the application is about to hit a big
software fallback. We frequently ask people to run applications that
are hitting software fallbacks with INTEL_DEBUG=fallbacks so the we
can help them debug the reason for the software fallback.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This can only happen in GLSL shaders because assembly shaders that use
too many temps are rejected by core Mesa. It is easiest to make this
happen with shaders that contain flow-control that could not be lowered.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Rely on the driver to do the right thing. This probably means falling
back to software. Page 88 of the OpenGL 2.1 spec specifically says:
"A shader should not fail to compile, and a program object should
not fail to link due to lack of instruction space or lack of
temporary variables. Implementations should ensure that all valid
shaders and program objects may be successfully compiled, linked
and executed."
There is no provision for saying "No" to a valid shader that is
difficult for the hardware to handle, so stop doing that.
On i915 this causes a large number of piglit tests to change from FAIL
to WARN. The warning is because the driver still emits messages to
stderr like "i915_program_error: Unsupported opcode: BGNLOOP".
It also fixes ES2 conformance CorrectFull_frag and CorrectParse1_frag
on i915 (and probably other hardware that can't handle loops).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This prevents assertion failures in ralloc_strcat. The ralloc_free in
_mesa_free_shader_program_data can be omitted because freeing the
gl_shader_program in _mesa_delete_shader_program will take care of
this automatically.
A bunch of this code could use a refactor to use ralloc a bit more
effectively. A bunch of the things that are allocated with malloc and
owned by the gl_shader_program should be allocated with ralloc (using
the gl_shader_program as the context).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
linker_warning is a new function. It's identical to linker_error
except that it doesn't set LinkStatus=false and it prepends "warning: "
on messages instead of "error: ".
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Remove the other places that set LinkStatus to false since they all
immediately follow a call to linker_error. The function linker_error
was previously known as linker_error_printf. The name was changed
because it may seem surprising that a printf function will set an
error flag.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
For power-of-two sizes, h0 == mt->height0 since it's already a multiple
of two. However, for NPOT, they're different; h1 should be computed
based on the original size.
Fixes piglit test "cubemap npot" and oglconform test "textureNPOT".
NOTE: This is a candidate for stable release branches.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Before, if any uniform or constant array was accessed with indirect
addressing, st_translate_program() would emit uniform constants in the place
of immediates. This behavior was unavoidable with ir_to_mesa/mesa_to_tgsi, but
glsl_to_tgsi can work around it since the GLSL IR backend and the TGSI
emission are both inside the state tracker.
Fixes a regression unintentionally introduced by "glsl_to_tgsi: fix shaders with
indirect addressing of temps" that caused missing leaves in 3dmark01 test 4 (Nature)
and missing/displaced textures on human models in Counter-Strike: Source.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Bryan Cain <bryancain3@gmail.com>
Thanks to Kenneth Graunke for pointing out that glsl_type::get_instance(base, 4, 1)
is the same as glsl_type::get_vec4_type(base).
The function was only used in st_glsl_to_tgsi, and this commit replaces that usage
with get_instance.
Disabled by default on all drivers. To enable it, change ctx->GLSLVersion to 130
in st_extensions.c. Currently, softpipe is the only driver with integer support.
The functionality is not used by anything yet, and the glUniform functions will
need to be reworked before this can reach its full usefulness. It is
nonetheless a step towards integer support in the state tracker and classic drivers.
It is still a work in progress at this point, but it produces working and
reasonably well-optimized code.
Originally based on ir_to_mesa and st_mesa_to_tgsi, but does not directly use
Mesa IR instructions in TGSI generation, instead generating TGSI from the
intermediate class glsl_to_tgsi_instruction. It also has new optimization
passes to replace _mesa_optimize_program.
The previous formula for atan(x,y) returned a value of +/- pi whenever
|x|<0.0001, and used a formula based on atan(y/x) otherwise. This
broke in cases where both x and y were small (e.g. atan(1e-5, 1e-5)).
This patch modifies the formula so that it returns a value of +/- pi
whenever |x|<1e-8*|y|, and uses the formula based on atan(y/x)
otherwise.
The previous formula for asin(x) was algebraically equivalent to:
sign(x)*(pi/2 - sqrt(1-|x|)*(A + B|x| + C|x|^2))
where A, B, and C were arbitrary constants determined by a curve fit.
This formula had a worst case absolute error of 0.00448, an unbounded
worst case relative error, and a discontinuity near x=0.
Changed the formula to:
sign(x)*(pi/2 - sqrt(1-|x|)*(pi/2 + (pi/4-1)|x| + A|x|^2 + B|x|^3))
where A and B are arbitrary constants determined by a curve fit. This
has a worst case absolute error of 0.00039, a worst case relative
error of 0.000405, and no discontinuities.
I don't expect a significant performance degradation, since the extra
multiply-accumulate should be fast compared to the sqrt() computation.
Fixes piglit tests {vs,fs}-asin-float and {vs,fs}-atan-*
The function used a variable named 'score', which was an outright lie.
A signature matches or it doesn't; there is no fuzzy scoring.
Change the return type of parameter_lists_match() to an enum, and
let ir_function::matching_sigature() switch on that enum.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Array constructors obey narrower conversion rules than other constructors
[1] --- they use the implicit conversion rules [2] instead of the scalar
constructor conversions [3]. But process_array_constructor() was
incorrectly applying the broader rules.
[1] GLSL 1.50 spec, Section 5.4.4 Array Constructors, page 52 (58 of pdf)
[2] GLSL 1.50 spec, Section 4.1.10 Implicit Conversions, page 25 (31 of pdf)
[3] GLSL 1.50 spec, Section 5.4.1 Conversion, page 48 (54 of pdf)
To fix this, first check (with glsl_type::can_be_implicitly_converted_to)
if an implicit conversion is legal before performing the conversion.
Fixes:
piglit:spec/glsl-1.20/compiler/structure-and-array-operations/array-ctor-implicit-conversion-bool-float.vert
piglit:spec/glsl-1.20/compiler/structure-and-array-operations/array-ctor-implicit-conversion-bvec*-vec*.vert
Note: This is a candidate for the 7.10 and 7.11 branches.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
The function is no longer used and has been replaced by
glsl_type::can_implicitly_convert_to().
Note: This is a candidate for the 7.10 and 7.11 branches.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Context
-------
In ast_function_expression::hir(), parameter_lists_match() checks if the
function call's actual parameter list matches the signature's parameter
list, where the match may require implicit conversion of some arguments.
To check if an implicit conversion exists between individual arguments,
type_compare() is used.
Problems
--------
type_compare() allowed the following illegal implicit conversions:
bool -> float
bvecN -> vecN
int -> uint
ivecN -> uvecN
uint -> int
uvecN -> ivecN
Change
------
type_compare() is buggy, so replace it with glsl_type::can_be_implicitly_converted_to().
This comprises a rewrite of parameter_lists_match().
Fixes piglit:spec/glsl-1.20/compiler/built-in-functions/outerProduct-bvec*.vert
Note: This is a candidate for the 7.10 and 7.11 branches.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
This method checks if a source type is identical to or can be implicitly
converted to a target type according to the GLSL 1.20 spec, Section 4.1.10
Implicit Conversions.
The following commits use the method for a bugfix:
glsl: Fix implicit conversions in non-constructor function calls
glsl: Fix implicit conversions in array constructors
Note: This is a candidate for the 7.10 and 7.11 branches.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
This is part of fixing a ~1% performance regression in OpenArena when
changing the fixed function fragment shader to using the new backend.
Right now this just avoids the LINTERP of the projector, not the math
using it.
Certain attributes (position, psize, etc.) don't
count as params; they are handled separately by the hw.
However, the VS is required to export at least one param
and r600_shader_from_tgsi() takes care of adding a dummy
export if there is none. Make sure the VS param export
count in the SPI properly accounts for this.
Note: This is a candidate for the 7.11 branch.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This was done in the old codegen path, but not the new one. Caught by
piglit fbo tests after the conversion to GLSL ff_fragment_shader.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The FF VS generation happens just after the FF FS generation in
state.c, so the ctx->VP._Current value is for the previous state
update's vertex shader, not the one that will be chosen as a result of
this state update. The vertexShader and vertexProgram variables
should be accurately telling us whether there's going to be a
ctx->VP._Current (except on _MaintainTnlProgram drivers, where it's
always true).
The glsl-vs-statechange-1 test was created to test for this, but it
turns out that the bug is hidden by the fact that we call
_mesa_update_state() twice per draw call -- once from
_mesa_valid_to_render() and once from vbo_draw_arrays(), and the
second one was fixing up the first one.
Reviewed-by: Brian Paul <brianp@vmware.com>
We have to make it through this loop processing the color multiple
times, so we can't go overwriting it on our first color buffer.
Reviewed-by: Brian Paul <brianp@vmware.com>
When we do a glReadPixels into the temporary buffer, we don't want to
use GL_LUMINANCE, GL_LUMINANCE_ALPHA or GL_INTENSITY since they will
compute L=R+G+B which is not what we want.
This bug has existed all along but was only exposed by the elimination
of the driver hook for glCopyTexImage() in
5874890c26.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=39604
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
The previous commit removed the last use of this field.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
The purpose of the (irb->draw_offset & 4095) != 0 check was to ensure
that we don't have XYy offsets into a tile, since Gen4 hardware doesn't
support that. However, it's insufficient: there are cases where
draw_offset & 4095 is 0 but we still have a Y-offset. This leads to an
assertion failure in brw_update_renderbuffer_surface with tile_y != 0.
Instead, simply call intel_renderbuffer_tile_offsets to compute the
actual X/Y offsets and check if either are non-zero. This makes both
the workaround and the assertion check the same things.
Fixes piglit test fbo-generatemipmap-formats, and should also fix
bugs #34009 and #39487.
NOTE: This is a candidate for stable release branches.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=34009
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=39487
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Chad Versace <chad@chad-versace.us>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
We were neglecting to load dvdx and dvdy. v is not optional.
Fixes glslparsertests tex-grad-0[12345].frag on Broadwater/Crestline.
(We still need an execution test using sampler1D.)
NOTE: This is a candidate for the 7.11 branch.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
The constant used in the radians() function didn't have enough
precision, causing a relative error of 1.676e-5, which is far worse
than the precision of 32-bit floats. This patch reduces the relative
error to 1.14e-9, which is the best we can do in 32 bits.
Fixes piglit tests {fs,vs}-radians-{float,vec2,vec3,vec4}.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
What a beast.
r300g doesn't depend on files from r300c anymore, so r300c is now left
to its own fate. BTW 'make test' can be invoked from the gallium/r300
directory to run some compiler unit tests.
The implementation deviated slightly from the GL_EXT_texture_sRGB spec
and from other implementations. A giant comment block was added to
justify the somewhat odd behavior of this function.
In addition, the interface had unnecessary cruft. The 'all' parameter
was false at all callers, so it has been removed.
Reviewed-by: Brian Paul <brianp@vmware.com>
If an application requests a generic compressed format for a texture
and the driver does not pick a specific compressed format, return the
generic base format (e.g., GL_RGBA) for the GL_TEXTURE_INTERNAL_FORMAT
query.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=3165
Reviewed-by: Brian Paul <brianp@vmware.com>
lower_variable_index_to_cond_assign runs until it can't make any more
progress. It then returns the result of the last pass which will
always be false. This caused the lowering loop in
_mesa_ir_link_shader to end before doing one last round of
lower_if_to_cond_assign. This caused several if-statements (resulting
from lower_variable_index_to_cond_assign) to be left in the IR.
In addition to this change, lower_variable_index_to_cond_assign should
take a flag indicating whether or not it should even generate
if-statements. This is easily controlled by
switch_generator::linear_sequence_max_length. This would generate
much better code on architectures without any flow contol.
Fixes i915 piglit regressions glsl-texcoord-array and
glsl-fs-vec4-indexing-temp-src.
Reviewed-by: Eric Anholt <eric@anholt.net>
The index buffer state emit only occurred if there was an IB in place
and we were in either a new batch or a new IB state. But because we
only flagged new IB state if IB state changed from the last IB state
we calculated, we could simply never emit IB state after batchbuffer
wraps if the first draw didn't use the IB and we didn't actually
change the IB.
Fixes piglit glx-multi-context-ib-1.
Fixes user-clip on 965 with 3D clears enabled. I created a separate
flag because I wanted to avoid the overhead of the matrix operations
in this path.
Reviewed-by: Brian Paul <brianp@vmware.com>
It turns out that internally the texture cache gets flushed in a
couple of cases, particularly around 2D operations mixed with 3D. In
almost all cases one of those happens between rendering to an
FBO-attached texture and rendering from that texture. However, as of
the next patch, glean tfbo (and the new fbo-flushing-2 test) would
manage to get stale texture values because one of those flushes didn't
occur. The intention of this code was always to get the render cache
cleared and ready to be used from the sampler cache (and it does on <=
gen4), so this just catches gen5 up.
This patch was also tested to fix fbo-flushing on gen7.
When emitting a MAC instruction in a vertex shader, brw_vs_emit()
calls accumulator_contains() to determine whether the accumulator
already contains the appropriate addend; if it does, then we can avoid
emitting an unnecessary MOV instruction.
However, accumulator_contains() wasn't checking the val.negate or
val.abs flags. As a result, if the desired value was the negation, or
the absolute value, of what was already in the accumulator, we would
generate an incorrect shader.
Fixes piglit test vs-refract-vec4-vec4-float.
Tested on Gen5 and Gen6.
Reviewed-by: Eric Anholt <eric@anholt.net>
On Ivybridge, the shadow comparitor goes in the first slot, rather than
at the end. It's not necessary to send u, v, and r.
Fixes tests texturing/texdepth and glean/fbo.
NOTE: This is a candidate for the 7.11 branch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Commit 53c89c67f3 ("i965: Avoid generating
MOVs for assignments of expressions.") added the line "this->result =
reg_undef" all over the code. Unfortunately, since Eric developed his
patch before I landed Ivybridge support, he missed adding it to
fs_visitor::emit_texture_gen7() after rebasing.
Furthermore, since I developed TXD support before Eric's patch, I
neglected to add it to the gradient handling when I rebased.
Neglecting to set this causes the visitor to use this->result as storage
rather than generating a new temporary. These missing statements
resulted in the same register being used to store several different
values.
Fixes the following piglit tests on Ivybridge:
- glsl-fs-shadow2dproj.shader_test
- glsl-fs-shadow2dproj-bias.shader_test
NOTE: This is a candidate for the 7.11 branch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The blend_quad function clobbers the actual render target color/alpha
values while applying the destination blend factor, which results in
restoring the wrong value during the masking stage for write-disabled
channels.
Reviewed-by: Brian Paul <brianp@vmware.com>
Just like the non-constant array index lowering pass, compare all N
indices at once. For accesses to a vec4, this saves 3 comparison
instructions on a vector architecture.
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously the code would just look at deref->array->type to see if it
was a constant. This isn't good enough because deref->array might be
another ir_dereference_array... of a constant. As a result,
deref->array->type wouldn't be a constant, but
deref->variable_referenced() would return NULL. The unchecked NULL
pointer would shortly lead to a segfault.
Instead just look at the return of deref->variable_referenced(). If
it's NULL, assume that either a constant or some other form of
anonymous temporary storage is being dereferenced.
This is a bit hinkey because most drivers treat constant arrays as
uniforms, but the lowering pass treats them as temporaries. This
keeps the behavior of the old code, so this change isn't making things
worse.
Fixes i965 piglit:
vs-temp-array-mat[234]-index-col-rd
vs-temp-array-mat[234]-index-col-row-rd
vs-uniform-array-mat[234]-index-col-rd
vs-uniform-array-mat[234]-index-col-row-rd
Reviewed-by: Eric Anholt <eric@anholt.net>
Leaving the unused registers with other values caused assertion
failures and other problems in places that blindly iterate over all
sources.
brw_vs_emit.c:1381: get_src_reg: Assertion `c->regs[file][index].nr !=
0' failed.
Fixes i965 piglit:
vs-uniform-array-mat[234]-col-row-rd
vs-uniform-array-mat[234]-index-col-row-rd
vs-uniform-array-mat[234]-index-row-rd
vs-uniform-mat[234]-col-row-rd
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This fixes many cases of accessing arrays of matrices using
non-constant indices at each level.
Fixes i965 piglit:
vs-temp-array-mat[234]-index-col-rd
vs-temp-array-mat[234]-index-col-row-rd
vs-temp-array-mat[234]-index-col-wr
vs-uniform-array-mat[234]-index-col-rd
Fixes swrast piglit:
fs-temp-array-mat[234]-index-col-rd
fs-temp-array-mat[234]-index-col-row-rd
fs-temp-array-mat[234]-index-col-wr
fs-uniform-array-mat[234]-index-col-rd
fs-uniform-array-mat[234]-index-col-row-rd
fs-varying-array-mat[234]-index-col-rd
fs-varying-array-mat[234]-index-col-row-rd
vs-temp-array-mat[234]-index-col-rd
vs-temp-array-mat[234]-index-col-row-rd
vs-temp-array-mat[234]-index-col-wr
vs-uniform-array-mat[234]-index-col-rd
vs-uniform-array-mat[234]-index-col-row-rd
vs-varying-array-mat[234]-index-col-rd
vs-varying-array-mat[234]-index-col-row-rd
vs-varying-array-mat[234]-index-col-wr
Reviewed-by: Eric Anholt <eric@anholt.net>
If the non-constant index was in the LHS of an assignment, any
existing condititon on that assignment would be lost.
Reviewed-by: Eric Anholt <eric@anholt.net>
If the non-constant index was in the LHS of an assignment, any
existing condititon on that assignment would be lost.
Fixes i965 piglit:
fs-temp-array-mat[234]-col-row-wr
fs-temp-array-mat[234]-index-col-row-wr
fs-temp-array-mat[234]-index-col-wr
fs-temp-array-mat[234]-index-row-wr
vs-varying-array-mat[234]-index-col-wr
Reviewed-by: Eric Anholt <eric@anholt.net>
The previous implementation could easily get tricked if the LHS of an
assignment included a non-constant index that was "inside" another
dereference. For example:
mat4 m[2];
m[0][i] = vec4(0.0);
Due to the way it tracked whether the array was being assigned, it
would think that the non-constant index was in an r-value. The new
code fixes that by tracking l-values and r-values differently. The
index is also replaced by cloning the IR and replacing the index
variable instead of the odd way it was done before.
v2: Apply some simplifications suggested by Eric Anholt. Making
assignment_generator::rvalue be ir_dereference instead of ir_rvalue
simplified the code a bit.
Fixes i965 piglit fs-temp-array-mat[234]-index-wr and
vs-varying-array-mat[234]-index-wr.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=34691
Reviewed-by: Eric Anholt <eric@anholt.net>
There's no reason for it to be there, and another class that may not
have access to the visitor will need it soon.
Reviewed-by: Eric Anholt <eric@anholt.net>
Not sure how I computed these, but they were wrong (which explains why
bumping the polynomial order before never improved precision).
This allows to pass the EXP test cases of PSPrecision/VSPrecision DCTs.
Add an iteration step, which makes rqsqrt precision go from 12bits to
24, and fixes RSQ/NRM test case of PSPrecision/VSPrevision DCTs.
There are no uses of this function outside shader translation.
These tests invoke do_lower_jumps() in isolation (using the glsl_test
executable) and verify that it transforms the IR in the expected way.
The unit tests may be run from the top level directory using "make
check".
For reference, I've also checked in the Python script
create_test_cases.py, which was used to generate these tests. It is
not necessary to run this script in order to run the tests.
Acked-by: Chad Versace <chad@chad-versace.us>
This patch adds a new build artifact, glsl_test, which can be used for
testing optimization passes in isolation.
I'm hoping that we will be able to add other useful standalone tests
to this executable in the future. Accordingly, it is built in a
modular fashion: the main() function uses its first argument to
determine which test function to invoke, removes that argument from
argv[], and then calls that function to interpret the rest of the
command line arguments and perform the test. Currently the only test
function is "optpass", which tests optimization passes.
This patch moves the following functions from main.cpp (the main cpp
file for the standalone executable that is used to create the built-in
functions) to standalone_scaffolding.cpp, so that they can be re-used
in other standalone executables:
- initialize_context()*
- _mesa_new_shader()
- _mesa_reference_shader()
*initialize_context contained some code that was specific to main.cpp,
so it was split into two functions: initialize_context() (which
remains in main.cpp), and initialize_context_from_defaults() (which is
in standalone_scaffolding.cpp).
Several Mesa headers redundantly define the INLINE macro. Adding this
guard prevents the compiler from complaining about macro redefinition.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad@chad-versace.us>
This is an alternative to the draw module's polygon stipple stage.
The softpipe implementation here is just a test. The advantange of
using the new polygon stipple utility module (with other drivers)
is we can avoid software vertex processing in the draw module and
get much better performance.
Polygon stipple doesn't require special vertex processing like
the other draw module stage.
u_vbuf_upload_buffers modifies the buffer offsets. If they are not
restored, and any of the vertex formats is not supported natively, the
next u_vbuf_mgr_draw_begin call will translate the vertex buffers with
incorrect buffer offsets.
ES 2.0.25 page 127 says:
If the value of FRAMEBUFFER_ATTACHMENT_OBJECT_TYPE is NONE, then
querying any other pname will generate INVALID_ENUM.
See also:
b9e9df78a0
NOTE: This is a candidate for the 7.10 and 7.11 branches.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The GLSL 1.20 and later specs say:
"Recursion is not allowed, not even statically. Static recursion is
present if the static function call graph of the program contains
cycles."
Recursion is detected and rejected both a compile-time and at
link-time. The complie-time check happens to detect some cases that
may be removed by various optimization passes. The spec doesn't seem
to allow this, but other vendors (e.g., NVIDIA) appear to only check
at link-time after all optimizations.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=33885
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The behavior of flushes in the hardware is a maze of twisty passages,
and strangely the VS constants appear to be loaded during a pipeline
flush instead of at the time of the packet emit according to the
simulator. On moving the STATE_BASE_ADDRESS packet to where it really
needed to live (in order for data loads by other packets to be
correct), we sometimes no longer got a flush between those packets
where we apparently needed it. This replicates the flushes implied by
a STATE_BASE_ADDRESS update, fixing the GPU hangs in OGLC and the
"engine" demo.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=36821
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=39257
Tested-by: Keith Packard <keithp@keithp.com> (bzflag and etracer fixed)
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
There's scary stuff going on in PIPE_CONTROL internals, and if the
BSpec says to do this to make PIPE_CONTROL work, I'll go ahead and do
it because we'll probably never be able to debug it after the fact.
v2: Use stall at scoreboard instead of depth stall, as noted by Ken.
For this and occlusion queries, we're trying to avoid setting
I915_GEM_DOMAIN_RENDER for the write domain, because the data written
is definitely not going through the render cache, but we do need to
tell the kernel that the object has been written. However, with using
I915_GEM_DOMAIN_GTT, the kernel on retiring the batchbuffer sees that
the w/a BO has a write domain of GTT, and puts it on the flushing
list. If something tries to wait for that BO to finish rendering
(such as the AUB dumper reading the contents of BOs), we get into
wait_request (since obj->active) but with a 0 seqno (since the object
is on the flushing list, not actually on a ringbuffer), and BUG_ONs.
To avoid the kernel bug (which I'm hoping to delete soon anyway), just
use I915_GEM_DOMAIN_INSTRUCTION like occlusion queries do. This
doesn't result in more flushing, because we invalidate INSTRUCTION on
every batchbuffer now that we're state streaming, anyway.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Kenneth Graunke <kenneth@whitecape.org>
This cuts out a large portion of the overhead of glClear() from
resetting the texenv state and recomputing the fixed function
programs. It also means less use of fixed function internally in our
GLES2 drivers, which is rather bogus.
Reviewed-by: Brian Paul <brianp@vmware.com>
When parsing S-Expressions, we need to store nul-terminated strings for
Symbol nodes. Prior to this patch, we called ralloc_strndup each time
we constructed a new s_symbol. It turns out that this is obscenely
expensive.
Instead, copy the whole buffer before parsing and overwrite it to
contain \0 bytes at the appropriate locations. Since atoms are
separated by whitespace, (), or ;, we can safely overwrite the character
after a Symbol. While much of the buffer may be unused, copying the
whole buffer is simple and guaranteed to provide enough space.
Prior to this, running piglit-run.py -t glsl tests/quick.tests with GLSL
1.30 enabled took just over 10 minutes on my machine. Now it takes 5.
NOTE: This is a candidate for stable release branches (because it will
make running comparison tests so much less irritating.)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Commit 1a339b6c71 made
st_ChooseTextureFormat map GL_RGBA with type GL_UNSIGNED_BYTE
to PIPE_FORMAT_A8B8G8R8_UNORM.
The image format for ARGB pixmaps is PIPE_FORMAT_B8G8R8A8_UNORM
however. This mismatch caused the texture to be recreated in
st_finalize_texture.
NOTE: This is a candidate for the 7.11 branch.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=39209
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
Reviewed-by: Stéphane Marchesin <marcheu@chromium.org>
Signed-off-by: Brian Paul <brianp@vmware.com>
This fixes a regression introduced by commit
a26121f375 (fd.o bug #39219).
Since the __glXInitialize() call should be unnecessary anyway, this is
probably a nicer fix for the original problem too.
NOTE: This is a candidate for the 7.10 and 7.11 branches.
Signed-off-by: Henri Verbeet <hverbeet@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: padfoot@exemail.com.au
Until now, the stencil buffer was allocated as a Y tiled buffer, because
in several locations the PRM states that it is. However, it is actually
W tiled. From the PRM, 2011 Sandy Bridge, Volume 1, Part 2, Section
4.5.2.1 W-Major Format:
W-Major Tile Format is used for separate stencil.
The GTT is incapable of W fencing, so we allocate the stencil buffer with
I915_TILING_NONE and decode the tile's layout in software.
This fix touches the following portions of code:
- In intel_allocate_renderbuffer_storage(), allocate the stencil
buffer with I915_TILING_NONE.
- In intel_verify_dri2_has_hiz(), verify that the stencil buffer is
not tiled.
- In the stencil buffer's span functions, the tile's layout must be
decoded in software.
This commit mutually depends on the xf86-video-intel commit
dri: Do not tile stencil buffer
Author: Chad Versace <chad@chad-versace.us>
Date: Mon Jul 18 00:38:00 2011 -0700
On Gen6 with separate stencil enabled, fixes the following Piglit tests:
bugs/fdo23670-drawpix_stencil
general/stencil-drawpixels
spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX16-copypixels
spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX16-drawpixels
spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX16-readpixels
spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX1-copypixels
spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX1-drawpixels
spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX1-readpixels
spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX4-copypixels
spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX4-drawpixels
spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX4-readpixels
spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX8-copypixels
spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX8-drawpixels
spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX8-readpixels
spec/EXT_packed_depth_stencil/fbo-stencil-GL_DEPTH24_STENCIL8-copypixels
spec/EXT_packed_depth_stencil/fbo-stencil-GL_DEPTH24_STENCIL8-readpixels
spec/EXT_packed_depth_stencil/readpixels-24_8
Note: This is a candidate for the 7.11 branch.
Signed-off-by: Chad Versace <chad@chad-versace.us>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
LLVM 3.0svn introduced a new type system. It defines a new way to create
named structs and removes the (now not needed) LLVMInvalidateStructLayout
function. See revision 134829 of LLVM.
Signed-off-by: Tobias Droste <tdroste@gmx.de>
Signed-off-by: Brian Paul <brianp@vmware.com>
In a rare case of building gallium only, we need to
check if the required packages are available
libdrm_[intel|nouveau] - gallium[i915 i965|nouveau]
v2: r300g and r600g do not need libdrm_radeon
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Marek Olšák <maraeo@gmail.com>
Including the full "3DSTATE_VF_STATISTICS" should make it easier to
cross-reference the code and documentation.
Also, move the 965/GM45 suffix to the beginning for consistency with
newer #defines.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
The documentation uses 3DSTATE_DRAWING_RECTANGLE, and we already had it
defined in brw_defines.h; we were simply using an old #define from
intel_reg.h.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
This is useful for shadow map generation. Tested with glsl-bug-22603,
which rendered the depth textures with fallbacks before.
Acked-by: Chad Versace <chad@chad-versace.us>
We were updating our new viewport using the old buffers' _WindowMap.m.
We can do less math and avoid using that deprecated matrix by just
folding the viewport calculation right in to the driver.
Fixes piglit fbo-depthtex.
i915_update_draw_buffers() already handles the fallback bit for
missing stencil region, so here we just need to handle whether the GL
thinks we have stencil data or not (and disable the test if so).
We were disabling it once at the moment we changed draw buffers, but
later enabling of depth test could turn it back on. Fixes
fbo-nodepth-test.
Note that ctx->DrawBuffer has to be checked because during context
create we get called while it's still unset. However, we know we'll
get an intel_draw_buffer() after that, so it's safe to make a silly
choice at this point.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=30080
The illusion of shared code here wasn't fooling anybody. It was
tempting to keep i830 and i915 still shared, but I think I actually
want to make them diverge shortly.
Reviewed-by: Chad Versace <chad@chad-versace.us>
This brings us into compliance with page 17 (page 22 of the PDF) of
the GLSL 1.20 spec:
"[Sampler types] can only be declared as function parameters or
uniform variables (see Section 4.3.5 "Uniform"). ... [Samplers]
cannot be used as out or inout function parameters."
The spec isn't explicit about whether this rule applies to
structs/arrays containing shaders, but the intent seems to be to
ensure that it can always be determined at compile time which sampler
is being used in each texture lookup. So to avoid creating a
loophole, the rule needs to apply to structs/arrays containing shaders
as well.
Fixes piglit tests spec/glsl-1.10/compiler/samplers/*.frag, and fixes
bug 38987.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38987
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The new location, as a member function of glsl_type, is more
consistent with queries like is_sampler(), is_boolean(), is_float(),
etc. Placing the function inside glsl_type also makes it available to
any code that uses glsl_types.
These happen to work because their values are the same as the equivalent
PIPE_TRANSFER_* flags, but it's still misleading.
Signed-off-by: Henri Verbeet <hverbeet@gmail.com>
The GLSL spec says:
"If a built-in function is redeclared in a shader (i.e., a
prototype is visible) before a call to it, then the linker will
only attempt to resolve that call within the set of shaders that
are linked with it."
This patch enforces this behavior. When a function call is processed
a flag is set in the ir_call to indicate whether the previously seen
prototype is the built-in or not. At link time a call will only bind
to an instance of a function that matches the "want built-in" setting
in the ir_call.
This has the odd side effect that first call to abs() in the shader
below will call the built-in and the second will not:
float foo(float x) { return abs(x); }
float abs(float x) { return -x; }
float bar(float x) { return abs(x); }
This seems insane, but it matches what the spec says.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=31744
The following resolves the build issues and missing symbols
Add "xvmc-nouveau/target.c" - missing symbol "driver_description"
Add "drivers/nvc0/libnvc0.a" - missing symbol "nvc0_screen_create"
Remove "drivers/softpipe/libsoftpipe.a" - unnessecary dependency
resolves build (when building without swrast)
Add "drivers/trace/libtrace.a" in Makefile
Note: With/without those patches xvmc-nouveau still segfaults
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Use all zpass data for predication instead of the last block only.
Use query buffer as a ring instead of reusing the same area
for each new BeginQuery. All query buffer offsets are in bytes
to simplify offsets math.
commit 1856230d9fa61710cce3e152b8d88b1269611a73
Author: José Fonseca <jose.r.fonseca@gmail.com>
Date: Tue Jul 12 23:41:27 2011 +0100
make: Use better var names on packaging.
commit d1ae72d0bd14e820ecfe9f8f27b316f9566ceb0c
Author: José Fonseca <jose.r.fonseca@gmail.com>
Date: Tue Jul 12 23:38:21 2011 +0100
make: Apply several of Dan Nicholson's suggestions.
commit f27cf8743ac9cbf4c0ad66aff0cd3f97efde97e4
Author: José Fonseca <jose.r.fonseca@gmail.com>
Date: Sat Jul 9 14:18:20 2011 +0100
make: Put back the tar.bz2 creation rule.
Removed by accident.
commit 34983337f9d7db984e9f0117808274106d262110
Author: José Fonseca <jose.r.fonseca@gmail.com>
Date: Sat Jul 9 11:59:29 2011 +0100
make: Determine tarballs contents via git ls-files.
The wildcards were a mess:
- lots of files for non Linux platforms missing
- several files listed and archived twice
Using git-ls-files ensures things are not loss when making the tarballs.
commit 34a28ccbf459ed5710aafba5e7149e8291cb808c
Author: José Fonseca <jose.r.fonseca@gmail.com>
Date: Sat Jul 9 11:07:14 2011 +0100
glut: Remove GLUT source.
Most distros ship freeglut, and most people don't care one vs the other,
and it hasn't been really maintained.
So it is better to have Mesa GLUT be revisioned and built separately
from Mesa.
commit 5c26a2c3c0c7e95ef853e19d12d75c4f80137e7d
Author: José Fonseca <jose.r.fonseca@gmail.com>
Date: Sat Jul 9 10:31:02 2011 +0100
Ignore the tarballs.
commit 26edecac589819f0d0efe2165ab748dbc4e53394
Author: José Fonseca <jose.r.fonseca@gmail.com>
Date: Sat Jul 9 10:30:24 2011 +0100
make: Create the Mesa-xxx-devel symlink automatically.
Also actually remote the intermediate uncompressed tarballs.
this moves getting the context into the debug in this function,
just spotted it trawling callgrind traces for other things.
Signed-off-by: Dave Airlie <airlied@redhat.com>
The forward references to video enum types in p_context.h causes
a massive number of compiler warnings (ISO C forbids forward references
to ‘enum’ types).
By putting the new video enums in a separate header that can be included
by p_context.h and p_screen.h we can avoid this.
Acked-by Christian König <deathsimple@vodafone.de>
inline the hotpath of the reference remaining the same. This shouldn't
penalise the slow path at all but improve the hot path so we don't have
to jump to the function.
It also moves some assert checks under an #ifndef NDEBUG.
Minor clean-ups added by Brian.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
Because we don't support them.
For instance, R32G32B32 is not R32G32B32X32 as was assumed.
Add support for R8G8B8X8_UNORM instead of R8G8B8_UNORM surfaces.
I think the past are those times when the gallium interface was changed all
the time. Now it is not, so there is no reason to always compile the libs
if they are not needed.
There seems to be a bug in r600g when uploading more than one layer of a
3D resource at once with a hardware blit.
So just do them one at a time to workaround this.
Bitmap caching shouldn't affect the results of the queries and
conditional render.
NOTE: This is a candidate for the 7.11 branch.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
We were failing at rounding, misplacing the non-baselevels. Fixes:
3DFX_texture_compression_FXT1/fbo-generate-mipmaps
ARB_texture_compression/fbo-generate-mipmaps
EXT_texture_compression_s3tc/fbo-generate-mipmaps
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The first rendering after context create didn't know of the color
buffer yet, triggering a sw fallback. The intel_prepare_render() from
intelSpanRenderStart then found the buffer and turned off fallbacks,
but intelSpanRenderFinish was never called and things were left
mapped. By checking buffers before making the call on whether to do
the fallback pipeline or not, we avoid the fallback change inside of
the rendering pipeline.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=31561
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
In some cases _mesa_create_context() can return NULL an in the mesa
state tracker, we do not concider the case, which may cause issues
within st_create_context_priv()
This patch adds a simple check (similar to the one in the dri drivers)
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
As Chia-I Wu said 'There are two libGL providers, Xlib and DRI based
they cannot coexist'
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Marek Olšák <maraeo@gmail.com>
This version is mostly Dan's post to the mesa-dev mailing list on
6/22/2011.
NOTE: This is a candidate for the 7.10 and 7.11 branches.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Dan Nicholson <dbn.lists@gmail.com>
According to the GLSL 1.20 specification, "it is a semantic error if
there are multiple ways to apply [implicit] conversions [...] such that
the call can be made to match multiple signatures."
Fixes a regression caused by 60eb63a855,
which implemented the wrong policy of finding a "closest" match.
However, this is not a revert, since the original code failed to
continue looking for an exact match once it found two inexact matches.
It's OK to have multiple inexact matches if there's also an exact match.
NOTE: This is a candidate for the 7.10 and 7.11 branches.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38971
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
This is exactly analogous to Eric's Gen6 change in commit
6861a70177. His explanation:
"This is just like PointSprite overrides, but it's always on for that
attribute."
Fixes glsl-fs-pointcoord and gtf/point_sprites.
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
NOTE: This is a candidate for the 7.11 branch.
This is exactly analogous to Eric's Gen6 change in commit
f304bb8a5d. His explanation:
"We were assuming that the input attribute n to the FS was
FRAG_ATTRIB_TEXn, which happened to be true often enough for our
testcases."
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
NOTE: This is a candidate for the 7.11 branch.
This is exactly analogous to Eric's Gen6 change in commit
e7280b16d6.
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
NOTE: This is a candidate for the 7.11 branch.
This is just barely more pretty-printing than we previously had, but
at least it doesn't leave out unit states in the log.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is quite a bit of spam, but I think it's useful to have in a full
INTEL_DEBUG=batch dump. And a lot of this spam on glxgears is just
because we're awful at handling our constants :/
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The previous brw_state_dump output was rather useless -- last used
program per batch, and just the hex. Now we dump all programs (since
we don't know which were used), and disassemble them. But that's a
ton of spam, and usually when looking into program contents we use
INTEL_DEBUG={vs,wm,misc,other} and when looking into state updates we
use INTEL_DEBUG=batch, so this dump usually just massively clutters up
the output.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now that we're using state base addresses for most things, we're less
interested in the absolute address of the state, and more in its
offset from the state base address (start of batchbuffer). Also,
reorder the printout so it looks more like the batchbuffer dump.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I want to make brw_state_dump.c handle more than just the last
statechange, so I want to keep track of what's in the batch state. By
using AUB file numbering for most of these packets, this may be
reusable for aub dumping.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This will let me hang cached compiler structs off of the context
without having to worry about cleaning them up at destroy time.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
There's no pretty way to avoid the overwriting of the src operands, so
just use a temporary destination and rely on the MOV optimization.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We were stomping over the source for the body of the LIT instruction
when doing the MOV of 1.0 to the uninteresting channels.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
From ARB_framebuffer_object:
If a buffer is specified in <mask> and does not exist in both the
read and draw framebuffers, the corresponding bit is silently
ignored.
Using GL_NONE as DataType of Z32_FLOAT_X24S8, not sure what I should put there.
The spec says the type is n/a.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The existing code was missing GL_DEPTH_COMPONENT32, resulting in it
wrongly returning the color buffer instead of the depth buffer.
Fixes an issue in PlaneShift 0.5.7 when casting spells. The game calls
CopyTexSubImage2D on buffers with a GL_DEPTH_COMPONENT32 internal
format, which (prior to this patch) resulted in an attempt to copy
ARGB8888 to X8_Z24.
Instead of adding the missing enumeration directly, convert the code to
use _mesa_is_depth_format() and _mesa_is_depthstencil_format() as these
should catch any newly added depth formats in the future.
NOTE: This is a candidate for the 7.10 and 7.11 branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This was tricky. We were doing a use-before-initialize of
grf_reg_count, but the value usually got overwritten anyway -- when we
didn't have to do a relocation (typical), or on gen5 when we didn't
have relocations at all.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38771
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Commit b46dc45cee claimed that
NEW_POLYGONSTIPPLE is gratuitous, but somehow just changed comments
and whitespace instead of actually removing the flag.
While we're at it, 3DSTATE_PS doesn't appear to need NEW_LINE or
NEW_POLYGON either (those are in 3DSTATE_WM). Also, 3DSTATE_WM
doesn't appear to need BRW_NEW_NR_WM_SURFACES or BRW_NEW_CURBE_OFFSETS
either (those are in 3DSTATE_PS).
NOTE: This is a candidate for the 7.11 branch.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
When switching channels with xine it sometimes happens that xine
destroys the drawable before we get a chance to call
DRI2DestroyDrawable, resulting in an x error.
SUB & LRP instructions should toggle NEG bit instead of setting it,
otherwise e.g. "SUB a,b,-1" is translated as "ADD a,b,-1"
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
For 0^0 case result of "LOG_CLAMPED ...,0" is -MAX_FLOAT, and then result of
"MUL_LIT ...,0,-MAX_FLOAT,..." is -MAX_FLOAT instead of 0 because of special
src1 checks for -MAX_FLOAT. So swap src0/1:
"MUL_LIT ...,-MAX_FLOAT,0,..." to get expected 0, then result of
"EXP_IEEE ...,0" is 1 as expected for LIT.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Create a new GLX drawable struct to track client related info, and add a
wrap counter to it drawable and track it as we receive events. This
allows us to support the full 64 bits of the event structure we pass to
the client even though the server only gives us a 32 bit count.
Reviewed-by: Michel Dänzer <michel@daenzer.net>
Reviewed-by: Jeremy Huddleston <jeremyhu@apple.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Normally lower_jumps.cpp doesn't need to lower a break instruction
that occurs at the end of a loop, because all back-ends can produce
proper GPU instructions for a break instruction in this "canonical"
location. However, if other break instructions within the loop are
already being lowered, then a break instruction at the end of the loop
needs to be lowered too, since after the optimization is complete a
new conditional break will be inserted at the end of the loop.
Without this patch, lower_jumps.cpp may require multiple passes in
order to lower all jumps. This results in sub-optimal output because
lower_jumps.cpp produces a brand new set of temporary variables each
time it is run, and the redundant temporary variables are not
guaranteed to be eliminated by later optimization passes.
Fixes unit test test_lower_breaks_6.
Previously, lower_jumps.cpp would break out of its loop after lowering
a jump instruction in just the then- or else-branch of a conditional,
and it would fail to lower a jump instruction occurring in the other
branch.
Without this patch, lower_jumps.cpp may require multiple passes in
order to lower all jumps. This results in sub-optimal output because
lower_jumps.cpp produces a brand new set of temporary variables each
time it is run, and the redundant temporary variables are not
guaranteed to be eliminated by later optimization passes.
Fixes unit test test_lower_returns_4.
The visitor class in lower_jumps.cpp never removes or replaces the
instruction being visited, but it frequently alters or removes the
instructions that follow it. Therefore, to make sure the altered IR
is visited, it needs to iterate through exec_lists using foreach_list
rather than visit_exec_list().
Without this patch, lower_jumps.cpp may require multiple passes in
order to lower all jumps. This results in sub-optimal output because
lower_jumps.cpp produces a brand new set of temporary variables each
time it is run, and the redundant temporary variables are not
guaranteed to be eliminated by later optimization passes.
Also, certain invariants assumed by lower_jumps.cpp may fail to hold,
causing assertion failures.
Fixes unit tests test_lower_pulled_out_jump,
test_lower_unified_returns, test_lower_guarded_conditional_break,
test_lower_return_non_void_at_end_of_loop, and test_lower_returns_3.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, lower_jumps.cpp would only lower return and continue
statements that appeared inside conditionals. This patch makes it
lower unconditional returns and continue statements that occur inside
a loop.
Such unconditional flow control statements would be unlikely to be
explicitly coded by a reasonable user, however they might arise as a
result of other optimizations.
Without this patch, lower_jumps.cpp might not lower certain return and
continue statements, causing some backends to fail.
Fixes unit tests test_lower_return_void_at_end_of_loop and
test_remove_continue_at_end_of_loop.
Previously, lower_jumps.cpp only lowered return statements that
appeared inside of an if statement.
Without this patch, lower_jumps.cpp might not lower certain return
statements, causing some back-ends to fail (as in bug #36669).
Fixes unit test test_lower_returns_1.
Previously, do_lower_jumps.cpp determined whether to lower return
statements in ir_lower_jumps_visitor::should_lower_jumps(). Moved
this logic to ir_lower_jumps_visitor::visit(ir_function_signature *),
so that it can be used in determining whether to lower a return
statement at the end of a function.
Previously ir_reader was only able to handle return of non-void.
This patch is necessary in order to allow optimization passes to be
tested in isolation.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
LLVM 3.0svn changes pretty rapidly. The change in
Target->createMCInstPrinter() signature which inspired commits
40ae214067 and
92e29dc5b0 has been reverted.
Signed-off-by: Gustaw Smolarczyk <wielkiegie@gmail.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
Otherwise PIPE_FORMAT_X8B8G8R8_UNORM and friends would fail.
NOTE: This is a candidate for the 7.10 and 7.11 branches.
Reviewed-by: Brian Paul <brianp@vmware.com>
When the state tracker adds a front buffer, nothing triggers a validate
drawable call, since the state tracker manager is never notified.
Force a validate drawable call by invalidating the framebuffer's stamp, so
that the window system's renderbuffer (if any) is picked up.
This fixes bug 38988
https://bugs.freedesktop.org/show_bug.cgi?id=38988
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Fixes segfault when running cubemap demo on i945. This happened
when intel_region_reference() was called in i915_set_draw_region()
with depth_region=NULL.
Reviewed-by: Eric Anholt <eric@anholt.net>
Even if we don't have a current context, if we're freeing the rb we
should free its region (and BO). The renderbuffer unreference checks
appear to be just cargo-cult from the region unreference code.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=30217
Reviewed-by: Chad Versace <chad@chad-versace.us>
As a result of this cleanup, a bug in
intel_process_dri2_buffer_no_separate_stencil() became quite apparent.
We were associating the NULL pointer after an unreference with the
STENCIL attachment -- clarify the logic and attach the right region.
Reviewed-by: Chad Versace <chad@chad-versace.us>
This should help us avoid leaking regions in region reference code by
making the API more predictable.
Reviewed-by: Chad Versace <chad@chad-versace.us>
This prevents developer surprise at seeing a GL_DEPTH_COMPONENT
texture have stencil bits, and avoids the metaops path accidentally
copying stencil bits around in glCopyTexImage(GL_DEPTH_COMPONENT) (and
being broken because swrast's glReadPixels(GL_UNSIGNED_INT_24_8) is
broken).
Acked-by: Chad Versace <chad@chad-versace.us>
We simply emit these using OUT_BATCH and bitshifting, as it results in
better compiled code than packed structures. Since our documentation
is public, it's not terribly useful to keep these around for reference.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Also rename it from CMD_STATE_INSN_POINTER to CMD_STATE_SIP to match the
documentation.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This is a little different from most because it's a single DWord;
there's no length field.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
I'm not sure about this one. The current code actually follows the spec, but
considering the spec is supposed to be written against GL 3.2 I'd say the spec
is broken. I filled out a spec feedback form over a month ago, but either the
form is broken, or nobody cares.
Signed-off-by: Henri Verbeet <hverbeet@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This is probably nicer if the array size ever changes.
NOTE: This is a candidate for the 7.11 branch.
Signed-off-by: Henri Verbeet <hverbeet@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The total number of units used by a shader is limited to MAX_TEXTURE_UNITS,
but the actual indices are only limited by MAX_COMBINED_TEXTURE_IMAGE_UNITS,
since they're shared between vertex and fragment shaders.
NOTE: This is a candidate for the 7.11 branch.
Signed-off-by: Henri Verbeet <hverbeet@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
It's not supposed to do conversion, but st sometimes asks us to.
Sometimes conversion is even wrong (e.g. between UNORM and SRGB).
This should now include all formats the 2D engine supports.
Fixes an assertion failure in the piglib out-01.frag
ARB_explicit_attrib_location test. The locations set via the layout
qualifier in fragment shader were not being applied to the shader
outputs. As a result all of these variables still had a location of
-1 set.
This may need some more work for pre-3.0 contexts. The problem is
dealing with generic outputs that lack a layout qualifier. There is
no way for the application to specify a location
(glBindFragDataLocation is not supported) or query the location
assigned by the linker (glGetFragDataLocation is not supported).
NOTE: This is a candidate for the 7.10 and 7.11 branches.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38624
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Vinson Lee <vlee@vmware.com>
And don't delete them. Let ralloc clean them up. Deleting the
temporary IR leaves dangling references in the prog_instruction. That
results in a bad dereference when printing the IR with MESA_GLSL=dump.
NOTE: This is a candidate for the 7.10 and 7.11 branches.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38584
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Using GLuint pointers worked when the pixel size was four bytes
or the row stride was a multiple of four but was otherwise broken.
Fixes failures found with the piglit fbo-stencil test.
This helps to fix https://bugs.freedesktop.org/show_bug.cgi?id=38729
NOTE: This is a candidate for the 7.11 branch.
The existing error result doesn't appear in the GL 2.1 or 3.2
compatibility specs, and triggers an unexpected GL error in Intel's
oglconform when it tries to reset the feedback state after usage so
that the "diff the state at error time vs. context init time" code
doesn't generate spurious diffs. The unexpected GL error then
translates into testcase failure. Brian wants the safety check on
buffer = NULL, though, so that people can't as easily set up a broken
buffer.
Fixes a bug caught by oglconform, and now piglit
ARB_vertex_program/getenv4d-with-error. The wrapping of an existing
GL function made it so that we couldn't distinguish an error in
looking up our arguments from an existing error. Instead, make a
helper function to choose the param, and use it from multiple callers.
v2: Move the success case line into the conditional, use COPY_4V more.
Commit 6750226e6d bumped the base MRF to
m2 instead of m0, but failed to adjust inst->mlen, which was being set
to the highest MRF. Subtracting the base MRF solves the issue.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
This fixes a regression introduced with commit
"st-api: Rework how drawables are invalidated v3"
where the glx state tracker manager would invalidate a drawable each time it
checks the drawable dimensions, even during a validate call, which
resulted in an endless loop, since the state tracker would immediately
detect the new invalidation and rerun the validate...
This change marks the drawable invalid only if the drawable dimensions actually
changed during the validate, which will result in at most a single
unnecessary validate by the context running a validate during which the
dimensions changed.
To avoid unnecessary validates altogether, we need to implement yet another
st-api change: Returning the current time stamp from the validate function,
as suggested by Chia-I Wu. The glx state tracker manager could then return
the stamp resulting from the last drawable dimension check.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
It makes things too random, as settings for temporary trials get stored
permannently, and it make difficult to build several platforms from the
same tree.
So disable it, again.
If a user-buffer was referenced twice by a draw command, the affected ranges
were uploaded separately, with only the last one being referenced by the
hardware. Make sure we upload only a single range.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
We currently always treat contents of user-buffers as volatile so
we don't need to take any particular action when the state tracker
announces that the contents has changed.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Viewperf uses some unusual vertex arrays where the stride is less
than the element size. In this case, the stride was 4 while the
element size was 12. The difference of 8 bytes causes us to miss
uploading the tail bit of the array data.
Typically the stride is >= the element size so there was no problem
with other apps.
Stream user buffer contents rather than trying to maintain persistent
host / hardware copies.
Resulting negative array offsets are not allowed by the hardware,
(well, at least not according to header files), so adjust index bias
to make all array offsets positive.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Make sure that the upload manager doesn't upload data that's not
dirty. This speeds up the viewperf test proe-04/1 a factor 5 or so on svga.
Also introduce an u_upload_unmap() function that can be used
instead of u_upload_flush() so that we can pack
even more data in upload buffers. With this we can basically reuse the
upload buffer across flushes.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
libdrm is used in multiple places. Always check for it and set
have_libdrm. Each user can then check the variable.
This is useful when only EGL and DRI drivers are needed.
The idea is that DRI driver, libGL and libOSMesa are libraries that can
be independently enabled, yet --with-driver does not allow us to easily
do that, if not impossible. This also matches what
--enable-{egl,xorg,d3d1x} do for the respective libraries.
There are two libGL providers: Xlib-based and DRI-based. They cannot
coexist. To be able to choose between them, --enable-xlib-glx is also
added.
With this commit, --with-driver=dri can be replaced by
$ ./configure --enable-dri --enable-glx --disable-osmesa
--with-driver=xlib can be replaced by
$ ./configure --disable-dri --enable-glx --enable-osmesa \
--enable-xlib-glx
and --with-driver=osmesa can be replaced by
$ ./configure --disable-dri --disable-glx --enable-osmesa
Some combinations that cannot be supported with --with-driver will
produce errors at the moment. But in the future, we would like to
support, for example,
$ ./configure --enable-dri --disable-glx --enable-egl
(build libEGL and DRI drivers, but not libGL)
Note that this commit still keeps --with-driver for transitional
purpose.
- Copy i915c's support for phases, that should allow us to run a coupe more shaders.
- Fix the error messages.
- Still try to proceed when we get a shader that's too long.
MOD_TO_FRACT was designed to lower the GLSL 1.20 mod() function, which
operates on floating point values. However, we also use ir_binop_mod
for GLSL 1.30's % operator, which operates on integers.
For now, make MOD_TO_FRACT only apply to floating-point mod operations.
In the future, we may want to add a lowering pass for integer-based mod.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, it would simply say "type error" in three different cases:
- The LHS is not an integer
- The RHS is not an integer
- The LHS and RHS have different base types (int vs. uint)
Now the error messages state the specific problem.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, ir_function::matching_signature had a fatal bug: if a
function had more than one non-exact match, it would simply return NULL.
This occured, for example, when looking for max(uvec3, uvec3):
- max(vec3, vec3) -> score 1 (found first)
- max(ivec3, ivec3) -> score 1 (found second...used to return NULL here)
- max(uvec3, uvec3) -> score 0 (exact match...the right answer)
This did not occur for max(ivec3, ivec3) since the second match found
was an exact match.
The new behavior is to return a match with the lowest score. If there
is an exact match, that will be returned. Otherwise, a match with the
least number of implicit conversions is chosen.
Fixes piglit tests max-uvec3.vert and glsl-inexact-overloads.shader_test.
NOTE: This is a candidate for the 7.10 and 7.11 branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
No MOV is necessary since signed/unsigned integers share the same
bit-representation; it's simply a question of interpretation. In
particular, the fs_reg::imm union shouldn't need updating.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Mesa IR actually stores all numbers as floating point, so this is
totally a farce, but we may as well keep it going.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reverts commit f41e1db327
"fix conversions from uint to bool and from float/bool to uint"
f2i, b2i, and b2i should not accept uint types. Use i2u and u2i.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
These are necessary to handle int/uint constructor conversions. For
example, the following code currently results in a type mismatch:
int x = 7;
uint y = uint(x);
In particular, uint(x) still has type int.
This commit simply adds the new operations; it does not generate them,
nor does it add backend support for them.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
We almost never want to specify a condition, and when we do we're
already thinking about it (because we're writing a lowering pass
generating the condition), so a default argument should make the code
more pleasant to read.
NOTE: This is a candidate for the 7.11 branch (we want to be able to
cherry-pick future code).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Our copy propagation tends to be bad at handling the later array
accesses of the matrix argument we moved to a temporary. Generally we
don't need to move it to a temporary, though, so this avoids needing
more copy propagation complexity.
Reduces instruction count of some Unigine Tropics and Sanctuary
fragment shaders that do operations on uniform matrix arrays by 5.9%
on gen6.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We were constrained to using temporaries because we were assuming
variables all over. This simplifies things a bit.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This awkward typing was to avoid shadowing the function argument (the
matrix) with the temporary deref (the column) before the
get_column()/get_element()s were moved into the expression/assignment
constructors. They're about to become not-variables, so the current
names had to go. This change is almost mechanical (other than
column_expr), so it should make the next diff clearer.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I think this makes the code more obvious by moving the declarations to
their single usage (now that we aren't using them to get at the ->type
field for expression constructors).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Move defintion of M_PI (for the benefit of <math.h> which do not define it), to
before the first use of it
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Brian Paul <brianp@vmware.com>
Commit 1a339b6c(st/mesa: prefer native texture formats when possible)
introduced two new arguments to the st_choose_format() functions.
This patch fixes the order and passes the correct internal_target
rather than GL_NONE
NOTE: This is a candidate for the 7.11 branch
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
The api and the state tracker manager code as well as the state tracker code
assumed that only a single context could be bound to a drawable. That is not
a valid assumption, since multiple contexts can bind to the same drawable.
Fix this by making it the state tracker's responsibility to update all
contexts binding to a drawable
Note that the state trackers themselves don't use atomic stamps on
frame-buffers. Multiple context rendering to the same drawable should
be protected by the application.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Instead of using a chain of manually maintained if/else blocks to
handle "#extension" directives, we now consult a table that specifies,
for each extension, the circumstances under which it is available, and
what flags in _mesa_glsl_parse_state need to be set in order to
activate it.
This makes it easier to add new GLSL extensions in the future, and
fixes the following bugs:
- Previously, _mesa_glsl_process_extension would sometimes set the
"_enable" and "_warn" flags for an extension before checking whether
the extension was supported by the driver; as a result, specifying
"enable" behavior for an unsupported extension would sometimes cause
front-end support for that extension to be switched on in spite of
the fact that back-end support was not available, leading to strange
failures, such as those in
https://bugs.freedesktop.org/show_bug.cgi?id=38015.
- "#extension all: warn" and "#extension all: disable" had no effect.
Notes:
- All extensions are currently marked as unavailable in geometry
shaders. This should not have any adverse effects since geometry
shaders aren't supported yet. When we return to working on geometry
shader support, we'll need to update the table for those extensions
that are available in geometry shaders.
- Previous to this commit, if a shader mentioned
ARB_shader_texture_lod, extension ARB_texture_rectangle would be
automatically turned on in order to ensure that the types
sampler2DRect and sampler2DRectShadow would be defined. This was
unnecessary, because (a) ARB_shader_texture_lod works perfectly well
without those types provided that the builtin functions that
reference them are not called, and (b) ARB_texture_rectangle is
enabled by default in non-ES contexts anyway. I eliminated this
unnecessary behavior in order to make the behavior of all extensions
consistent.
NOTE: This is a candidate for the 7.10 and 7.11 branches.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
These were previously 1-bit-wide bitfields. Changing them to bools
has a negligible performance impact, and allows them to be accessed by
offset as well as by direct structure access.
NOTE: This is a candidate for the 7.10 and 7.11 branches.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
From the OpenGL docs for GL_ARB_explicit_attrib_location:
This extension provides a method to pre-assign attribute locations to
named vertex shader inputs and color numbers to named fragment shader
outputs.
This was accidentally implemented for fragment shader inputs. This
patch fixes it to apply to fragment shader outputs.
Fixes piglit tests
spec/ARB_explicit_attrib_location/1.{10,20}/compiler/layout-{01,03,06,07,08,09,10}.frag
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
NOTE: This is a candidate for the 7.10 and 7.11 branches.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38624
The scissor state was incorrectly in a .prepare function instead of
.emit, so the packet would end up in the batch before the
STATE_BASE_ADDRESS. It appears that this doesn't actually hurt, as
the scissor address gets dereferenced according to the current SBA at
draw time.
All it's going to do is generate lots and lots and lots of
'warning: visibility attribute not supported in this configuration; ignored'
warnings
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Dan Nicholson <dbn.lists@gmail.com>
Considering fbdev as an in-kernel window system,
- opening a device opens a connection
- there is only one window: the framebuffer
- fb_var_screeninfo decides window position, size, and even color format
- there is no pixmap
Now EGL is built on top of this window system. So we should have
- the fd as the handle of the native display
- reject all but one native window: NULL
- no pixmap support
modeset support is still around, but it should be removed soon.
The system routine requires m0 be reserved for saving off architectural
state. Moved the allocation to start at 2 instead of 0.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, if max_depth were 1, the following code would see the
first if-statement (correctly) not get flattened, but the second
if-statement would (incorrectly) get flattened:
void main()
{
if (a)
gl_Position = vec4(0);
if (b)
gl_Position = vec4(1);
}
This is because the visit_leave(ir_if*) method would not decrement the
depth before returning on the first if-statement.
NOTE: This is a candidate for the 7.10 and 7.11 branches.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Typically this was done by having a surface creation function fail if
the format was not supported.
However, in some situations when changing hardware surface formats,
it's desirable to do this check before attempting costly readback operations.
Also updated the surface_redefine interface.
Bump minor.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Blending and maybe even alpha-test don't work with those formats.
Only supporting RGBA, BGRA, RGBX, BGRX.
NOTE: This is a candidate for the 7.10 and 7.11 branches.
Remove set_event_handler() and pass the event handler with
native_get_XXX_platform(). Add init_screen() so that the pipe screen is
created later. This way we don't need to pass user_data to
create_display().
If we happened to allocate a texture result (or other vector) to the
highest hardware register slot, and we were in 16-wide, we would
under-count the registers used and potentially wrap around to g0 if
that allocation crossed a 16-register block boundary. Bad rendering
and hangs ensued.
Tested-by: Ian Romanick <idr@freedesktop.org>
When the fill mode is PIPE_POLYGON_MODE_LINE we were basically
converting the polygon into triangles, then drawing the outline of all
the triangles. But we really only want to draw the lines around the
perimeter of the polygon, not the interior lines.
NOTE: This is a candidate for the 7.10 branch.
In gen6 and above, clip distances 0-3 are written to message register
3's xyzw components, and 4-7 to message register 4's xyzw components.
Therefore when when writing the clip distances we need to examine the
lower 2 bits of the clip distance index to see which component to
write to.
emit_vertex_write() was examining the lower 3 bits, causing clip
distances 4-7 not to be written correctly.
Fixes piglit test vs-clip-vertex-01.shader_test
Evergreen+ don't support multi-writes so we need to emulate
it in the shader. Fixes the following piglit tests:
fbo-drawbuffers-fragcolor
ati_draw_buffers-arbfp-no-option
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
In intel_draw_buffer, there exists a workaround to prevent
_mesa_update_framebuffer from creating a swrast depth wrapper when
using separate stencil. This commit fixes the workaround, which was
incomplete for s8z24 texture renderbuffers.
Fixes fbo-blit-d24s8 on gen5 with separate stencil manually enabled.
Signed-off-by: Chad Versace <chad@chad-versace.us>
Since all infrastructure is now in place to support packed
depth/stencil renderbuffers when using separate stencil, there is no
need for special cases when separate stencil is enabled.
Signed-off-by: Chad Versace <chad@chad-versace.us>
Also, in order to coerce intel_update_tex_wrapper_regions() to
allocate the hiz region, alter intel_update_tex_wrapper_regions() to
examine the renderbuffer format instead of the texture image format.
Signed-off-by: Chad Versace <chad@chad-versace.us>
... and into new function intel_update_tex_wrapper_regions.
This prevents code duplication in the next commit.
Also add a note explaining that the hiz region is broken for mipmapped
depth textures.
Signed-off-by: Chad Versace <chad@chad-versace.us>
... when using separate stencil.
Define function intel_tex_image_x8z24_create_renderbuffers and call it
in intelTexImage after the miptree has been created and filled with data.
Signed-off-by: Chad Versace <chad@chad-versace.us>
... because they will be needed by intel_tex_image_s8z24_create_renderbuffers.
Redeclared functions are:
intel_alloc_renderbuffer_storage
intel_renderbuffer_set_draw_offsets
Signed-off-by: Chad Versace <chad@chad-versace.us>
Redeclare as non-static because
intel_tex_image_s8z24_create_renderbuffers will use it.
Remove the 'wrapper' parameter, because there is no wrapper for
intel_texture_image.depth_rb and stencil_rb.
Signed-off-by: Chad Versace <chad@chad-versace.us>
Add the fields depth_rb and stencil_rb, and put hooks in place to
release the renderbuffers in intelFreeTextureImageData and
intelTexImage.
Signed-off-by: Chad Versace <chad@chad-versace.us>
Otherwise we can end up creating RGBA render targets (which are BGRA on the
hardware), and then we bind them as RGBA textures (which are RGBA on the
hardware). This generates software fallbacks every time we bind the frame as
a texture.
Commit 1a339b6c71 caused us to take
a different path through the glCopyTexSubImage() code. The
pipe_get_transfer() call neglected to pass the texture's level, face
and slice info. So we were always transferring from the 0th mipmap
level even when the source renderbuffer was a non-zero mipmap level
in a texture.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=38649
NOTE: This is a candidate for the 7.10 branch.
Once again, assuming the compiler is clever works out so poorly. The
generated code initialized the structure on the stack, then did a
lookup into it. This was a performance regression from
70c6cd39bd.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
It's common in applications just before the advent of
EXT_separate_shader_objects to have multiple linked shaders with the
same VS or FS. While we aren't detecting those at the Mesa level, we
can detect when our compiled output happens to match an existing
compiled program.
This patch was created after noting the incredible amount of compiled
program data generated by Heroes of Newerth. It reduces the program
data in use at the start menu (replayed by apitrace) from 828kb to
632kb, and reduces CACHE_NEW_WM_PROG state flagging by 3/4. It
doesn't impact our rate of hardware state changes yet, because things
depending on CACHE_NEW_WM_PROG also depend on BRW_NEW_FRAGMENT_PROGRAM
which is still being flagged.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This patch add the support for 24bpp in the dri/swrast implementation.
Signed-off-by: Marc Pignat <marc@pignat.org>
Signed-off-by: Brian Paul <brianp@vmware.com>
Use a typed struct to describe the native buffer and let the backends
map the native buffer to winsys_handle for
resource_from_handle/resource_to_handle.
When shared glapi is not enabled, there are two glapi providers and we
cannot decide which one to link to at build time. It results in
unresolved symbols in st/mesa. This commit makes st/mesa a loadable
module when shared glapi is not enabled, and hopes that the apps will
link to one of the glapi providers (GL or GLES).
Build pipe drivers here instead of using those built by the
soon-to-be-removed targets/egl.
[with an update by Benjamin Franzke to use --{start|end}-group]
Should unify this too, but will delay that until the planned
libdrm_nouveau/winsys changes which are likely to cause major
changes to this bo validation code too.
In fixed function, stride == 0 (e.g. glColor4f() outside of the draw
call) would get turned into uniform inputs, which is why it was
ignored originally in this test. For shaders, drivers end up seeing a
need to upload stride == 0 data, and get confused by needing to upload
when vbo_all_varyings_in_vbos() returned true. In the 965 driver
case, it wouldn't bother to compute the min/max index, and uploaded
nothing if the min/max wasn't known.
We've talked about removing the ff stride=0-into-uniforms code, so
this check shouldn't be missed once that's gone.
Fixes ARB_vertex_buffer_object/mixed-immediate-and-vbo
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=37934
Reviewed-by: Brian Paul <brianp@vmware.com>
We would still want to consider that data as being in a VBO even if we
managed to produce this case, which as far as I know we can't.
Reviewed-by: Brian Paul <brianp@vmware.com>
All the packets chosen before came from grepping the pdf for
nonpipelined, and these two came from grepping for non.pipelined. We
could stand a review by looking at all packets emitted and identifying
what kind they are.
Previously, the builtins in OES_texture_3D.{frag,vert} were only
compiling properly as a consequence of bug 38015, which allows
unsupported extensions to be enabled. This fix eliminates the builtin
compiler's reliance on bug 38015, so that bug 38015 can be fixed.
Fixes broken glTexImage2D with format=GL_RGBA since
1a339b6c71
The origin for this behaviour is that r600_is_format_supported
checks only against r600_state_inline.h tables not evergreens.
If possible, we want to match the hardware format to what the app uses. By
doing so, we avoid the need for pixel conversions and therefore greatly speed
up texture uploads.
evergreen+ stores depth and stencil separately so when we
allocate a depth/stencil fbo, make sure we allocate enough
memory for both depth and stencil buffers.
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Now all infrastructure is in place to support s8_z24 non-texture
renderbuffers for gen7.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Hiz buffer allocation can only occur if the 'else' branch has been taken,
so move the hiz buffer allocation into the 'else' branch.
Having the hiz buffer allocation dangling outside of the if-tree was just
damn confusing.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Add the following fields:
intel_renderbuffer.wrapped_depth;
intel_renderbuffer.wrapped_stencil
If the intel_context is using separate stencil and the renderbuffer has
a packed depth/stencil format, then wrapped_depth and wrapped_stencil are
the real renderbuffers.
Alter the following functions to accomodate the wrapped buffers:
intel_delete_renderbuffer
intel_draw_buffer
intel_get_renderbuffer
intel_renderbuffer_map
intel_renderbuffer_unmap
Subsequent commits allocate renderbuffer storage for wrapped_depth and
wrapped_stencil.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Commit b5c847c7ca erroneously disabled
support for S8_Z24 texture format when the context required separate
stencil (intel_context.must_use_separate_stencil).
But the GL spec requires implementations to support GL_DEPTH24_STENCIL8.
So we better find a way to fake it...
From page 180 (196 of pdf) of the OpenGL 3.0 spec:
In addition, implementations are required to support the following
sized internal [texture] formats.
[...]
- Combined depth+stencil formats: DEPTH32F_STENCIL8 and and
DEPTH24_STENCIL8.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad@chad-versace.us>
this drop a bunch of unnecessary checks (i.e. should be trapped
at gallium level), and also removes the switch statement in favour
of some calculated values for the vgt values.
Signed-off-by: Dave Airlie <airlied@redhat.com>
the attached patch should be an improvement over Vadim Girlin's patch
fixing LIT instruction for r600g (commit
2fe39b46e7).
Instructions used in tgsi_lit have been reordered to always write to a
dst channel after the same channel in src has been read (so if src ==
dst, input values are not overwritten before being used).
Signed-off-by: Dave Airlie <airlied@redhat.com>
We want to bind to our context before calling __glXSetCurrentContext or
messing with the gc rect in order to properly handle error conditions.
Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>
In applegl, GLX advertises the same extensions provided by OpenGL.framework
even if such extensions are not provided by glapi. This allows a client
to get access to such API.
Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>
FramebufferTextureLayer is an alias of FramebufferTextureLayerEXT, so
FramebufferTextureLayerARB needs to be listed as an alias of
FramebufferTextureLayerEXT rather than FramebufferTextureLayer.
Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>
Previously it was up to the driver or later code generator to reject
these shaders. It turns out that nobody did this.
This will need changes to support geometry shaders.
NOTE: This is a candidate for the stable branches.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=37743
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Since switching to hidden visibility on gcc, GLw apps were failing to
link. Use the GLAPI definition to use default visibility where necessary.
$ nm lib/libGLw.so | grep DrawingArea
0000000000004020 T GLwCreateMDrawingArea
0000000000003430 T GLwDrawingAreaMakeCurrent
0000000000003410 T GLwDrawingAreaSwapBuffers
0000000000204c60 D glwDrawingAreaClassRec
0000000000204d48 D glwDrawingAreaWidgetClass
00000000002053c0 D glwMDrawingAreaClassRec
00000000002054e0 D glwMDrawingAreaWidgetClass
Signed-off-by: Dan Nicholson <dbn.lists@gmail.com>
Tested-by: justin <jlec@gentoo.org>
This was spectacularly unsafe. On my system, address 0 happens to be
the hardware status page for the render ring, and the first quadword
of that happens to contain nothing we ever look at, but I sure didn't
look forward to having to debug some day when, for example, the kernel
happened to bind the ringbuffer before binding the hwsp.
That flag was leftover from gen4, where brw_curbe.c is choosing ranges
of the CURBE space for constants to live in, and the unit state tells
where to load them from. That's not the case on gen6 -- we don't set
this flag (since constants aren't in the URB), nor do we have any
state like that to upload.
If --disable-gallium is passed, llvm-config isn't checked for, so mark
it explicitly as absent, through LLVM_CONFIG=no.
Passing --disable-gallium would result in:
| ../configure: line 9739: --version: command not found
| ../configure: line 9740: --cppflags: command not found
| ../configure: line 9741: --libs: command not found
| ../configure: line 9743: --ldflags: command not found
With this commit, one gets that instead:
| configure: error: LLVM is required to build Gallium R300 on x86 and x86_64
Signed-off-by: Cyril Brulebois <kibi@debian.org>
This removes all the --enable-gallium-$driver options and --disable-gallium.
Gallium can be disabled by --with-gallium-drivers= (without parameters).
Default is:
--with-gallium-drivers=r300,swrast
Reviewed-by: Alex Deucher <alexdeucher@gmail.com>
There is an obvious redundancy:
--with-driver=dri VS --with-state-trackers=dri
--with-driver=xlib VS --with-state-trackers=glx
--enable-openvg VS --with-state-trackers=vega
--enable-egl VS --with-state-trackers=egl
This patch adds two new options for the remaining state trackers:
--enable-xorg
--enable-d3d1x
Reviewed-by: Alex Deucher <alexdeucher@gmail.com>
Our hardware doesn't have a sample_d_c message, so we have to do a
regular sample_d and emit instructions to manually perform the
comparison.
This requires a state dependent recompile whenever the sampler's compare
mode or function change. This adds the per-sampler comparison functions
to brw_wm_prog_key, but only sets them when the sampler's compare mode
is GL_COMPARE_R_TO_TEXTURE (i.e. only for shadow sampling).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This makes it available earlier, which will soon be necessary.
(Separating code motion from actual changes.)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This is somewhat ugly, but I couldn't think of a nicer way to handle the
interleaved coordinate/derivative parameter loading.
Ironlake and Sandybridge will still hit an assertion in visit().
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Prior to this patch, it would attempt to optimize and allocate registers
for the program even if it failed to compile. This seems wasteful.
More importantly, the "message length > 11" failure seems to choke the
instruction scheduler, making it somehow use an undefined value and
segmentation fault.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
There will be a little bit of thrashing of the program cache BO as the
cache warms up, but once the application is in steady state, this
reduces relocations on gen5 and later.
On my T420 laptop, cairogl firefox-talos-gfx performance improves 2.6%
+/- 1.3% (n=6). No statistically significant performance difference
on nexuiz (n=5).
The _ColorDrawBuffers[] wouldn't get updated despite us having updated
what it depends on (Attachments[]->Renderbuffer). Other callers of
_mesa_remove_attachment are already flagging _NEW_BUFFERS for other
reasons. The specific bug report that led to this fix (and
the fbo-finish-deleted testcase) was fixed by
23b6f9606d, though.
Reviewed-by: Brian Paul <brianp@vmware.com>
This loop is trying to see if all the buffers to be uploaded happen to
be the same increment from the start of the 3DSTATE_VERTEX_BUFFERS
currently loaded in the hardware. However, we might be at a smaller
offset than the previous set of VERTEX_BUFFERS, so we can't reuse
because that packet made the first entry be its starting offset (you
can't access outside the given bounds).
Fixes piglit ARB_vertex_buffer_object/elements-negative-offset.
Current LIT implementation uses dst components for storing temp
results, possibly overwriting still needed values (depends on the
swizzles).
This patch uses temp reg for one of such cases (found in etqw) and
fixes "LIT R.z, R.xyzz".
Tested on evergreen. Fixes some etqw-demo rendering glitches when
"Lighting" is set to "High" in the settings.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The current dri context unbind logic will leak drawables until the process
dies (they will then get released by the GEM code). There are two ways to fix
this: either always call driReleaseDrawables every time we unbind a context
(but that costs us round trips to the X server at getbuffers() time) or
implement proper drawable refcounting. This patch implements the latter.
Signed-off-by: Antoine Labour <piman@chromium.org>
Signed-off-by: Stéphane Marchesin <marcheu@chromium.org>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Throttle pretty hard in order to prioritize user-space interactivity over
3D application speed. May revisit this later.
Signed-off-by: Thomas <thellstrom@vmware.com>
When emitting either a hiz or stencil buffer, the 'separate stencil
enable' and 'hiz enable' bits are set in 3DSTATE_DEPTH_BUFFER. Therefore
we must emit both 3DSTATE_HIER_DEPTH_BUFFER and 3DSTATE_STENCIL_BUFFER.
Even if there is no stencil buffer, 3DSTATE_STENCIL_BUFFER must be
emitted; failure to do so causes a hang on gen5 and a stall on gen6.
This also fixes a silly, obvious segfault that occured when a hiz buffer
xor separate stencil buffer existed.
Fixes the piglit tests below on Gen5 when hiz and separate stencil are
manually enabled:
fbo-alphatest-nocolor
fbo-depth-sample-compare
fbo
hiz-depth-read-fbo-d24-s0
hiz-depth-stencil-test-fbo-d24-s0
hiz-depth-test-fbo-d24-s0
hiz-stencil-read-fbo-d0-s8
hiz-stencil-test-fbo-d0-s8
fbo-missing-attachment-clear
fbo-clear-formats
fbo-depth-*
Changes piglit test result from crash to fail:
hiz-depth-stencil-test-fbo-d0-s8
Signed-off-by: Chad Versace <chad@chad-versace.us>
[airlied: final chunk of Mike's patch from bug 37476
this uses a loop to emit the GRADIENTS and does a check to
see if we need to fetch to a temporary register. It also
increases the context src gpr to 4 which is needed here.]
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
NOTE: This is a candidate for stable release branches (and don't forget
to re-run "make builtins" after cherry-picking.)
Mike had actually done a lot of the TXD support in a patch in bug
37476 which I see now, I'll add the bits of his work that I didn't think
to add to my work.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This at least passes the piglit arb_shader_texture_lod-texgrad test,
the AMD shader analyzer seems to multiply the V component by an unspecified
constant value no idea why.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This sets the base level as the zero level, which fixes
piglit/texturing/tex-miplevel-selection*.
The r600 hardware ignores the BASE_LEVEL field in some cases, so we can't
use it.
Evergreen might need this too.
Commit 56ef62d988
"glsl: Generate readable unique names at print time."
changed ir_print_visitor to not generate @0x1234567 suffixes except
where necessary. So there's no need to manually remove them.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
This change to _glapi_create_table_from_handle causes it to fill the dispatch
table with NoOps for unimplemented functionality. This matches what is done
in indirect_init.c and also allows us to enable logging (when built with
-DDEBUG and the MESA_DEBUG or LIBGL_DEBUG environment variables are set) to
catch cases where clients are trying to use these unimplemented extentions.
Additionally, this fixes some gcc -pedantic warnings.
Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>
Check that the difference in array pointers/offsets from the 0th
array are less than the stride, for both VBOs and user-space arrays.
Previously, we were only doing this for the later.
This tightens up the interleaved array test and fixes a problem with
the llvmpipe driver where we were creating way too many vertex fetch
variants only because the pipe_vertex_element::src_offset values were
changing frequently. This change results in a 5x speed-up for one of
the viewperf tests.
Also, clean up the function to make it easier to understand.
The code was playing fast and loose with rowstrides, which meant that
if a driver chose anything different for its alignment requirements,
the generated mipmaps came out garbage. Unlike the uncompressed case,
we can't generate mipmaps directly into image->Data, so by using
TexImage2D we cut out most of the weird logic that existed to generate
in-place into ->Data. The up/downside is that the driver recovery
code for the fact that _mesa_generate_mipmaps whacked ->Data has to be
turned off for compressed now.
Fixes 6 piglit tests about compressed mipmap gen.
The path taken is wildly different based on this (do we generate from
a temporary image, or from level-1's data), and we appear to have
stride bugs in the compressed case that are tough to disentangle.
This just duplicates the code for the moment, the followon commit will
do the actual changes. Only real code change here is handling
maxLevel in one common place.
This is effectively just "round up when dividing by 4" compared to the
previous code. Fixes the broken stripe at the top of
fbo-generatemipmap-formats GL_EXT_texture_compression_rgtc.
We don't care just about the internalFormat/cpp/compressed, but about
the specific format chosen. We have no support for format
translations as part of texture validation, and furthermore it has
restrictions in the GL specification. However, we should be making
consistent decisions for this check anyway.
Generally image uploads to a the region occur at TexImage time, but
that's not the case for fallback _mesa_generate_mipmap(), and in this
path we were forgetting to align the width when dividing height. We
were just leaving out parts of the compressed block at 2x2 and 1x1
levels.
Fixes gen-compressed-teximage.
Copy-and-paste from the bgra cases. The C paths attempt to avoid
copying the 'x' channel, but it's harmless, you might as well. Good for
about 5% in glxgears (740 to 780 fps).
Signed-off-by: Adam Jackson <ajax@redhat.com>
glReadPixels() was performing RGB -> L conversion differently from the
glTexImage() style conversion appropriate for glCopyTexImage().
Fixes gles2conform copy_texture.
We were mapping the renderbuffer once, then walking over all the
buffers to map just the texture ones using the other texture mapping
function that handled the x/y offset to the image in the region. But
then we would go and overwrite *those* mappings with the original
mappings for depth/stencil, which was wrong.
Instead, just walk over the attachments once and map the attachments.
Wasn't that easy?
This is already pointing at 0 or Height - 1 and with an appropriate
pitch, so no need to recompute those values per customization of the
spans code. Cuts 3 out of 21kb of the compiled size.
Reviewed-by: Chad Versace <chad@chad-versace.us>
The "newImage" isn't particularly new -- it might be the same texture
that was attached to the same attachment point before. This function
also gets called when just rebinding back to an FBO with a texture
attachment.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad@chad-versace.us>
It was originally located in the region because the tracking of
depth/color buffers was on the regions, and getting back to the irb
would have been tricky. Now, we're keying off of the renderbuffer in
more places, which means we can move these fields where they belong.
This could fix potential rendering failure with a single texture
having multiple images attached to different renderbuffers across
shareCtx (as far as I can tell, this was the only failure we could
cause, since anything else should trigger intel_render_texture in
between, for example a BindFramebuffer).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad@chad-versace.us>
gc->vtable->destroy is always set and is used unconditionally
in other places, so don't bother checking for it first.
Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>
This stuff is really for software rendering, it's not core Mesa.
A small step toward pushing the FetchTexel() stuff down into swrast.
Reviewed-by: Eric Anholt <eric@anholt.net>
x86_64_entry_start needs to be declared static in the C code,
in order to have the correct address in entry_get_public
(seems not to be needed on x86).
The compiler needs to lookup a local not a global object.
Otherwise addresses needed for _glapi_proc_address will be computed
from some random offset (0x6400229a61058b48 in my case).
This sadly requires work in the VS to rescale them, because the
hardware doesn't support this format natively.
Fixes arb_es2_compatibility-fixed-type and gtf/fixed_data_type.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The function was named "find_unconditional_discard", but didn't
actually check that the discard statement found was unconditional.
Fixes piglit glsl-fs-discard-04.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
A trivial fix for error: format not a string literal and no format
arguments with compiling with -Werror=format-security flags.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If either depth or stencil buffer has packed depth/stencil format, then do
not use separate stencil.
Before this commit, emit_depthbuffer() incorrectly assumed that the
texture's stencil renderbuffer wrapper was a *separate* stencil buffer,
because the depth and stencil renderbuffer wrappers are distinct for
depth/stencil textures (that is, depth_irb != stencil_irb).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38134
Signed-off-by: Chad Versace <chad@chad-versace.us>
CONFIG regs (byte offsets 0x8000-0xac00) are single state and the pipeline
must be flushed and hw idle when they are changed. Border color regs
are in the CONFIG range and this is why a flush is required when changing
them. CONTEXT regs (byte offset 0x28000+) are multi-state and those do
not require flushes when changing them.
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
This is just like PointSprite overrides, but it's always on for that
attribute.
Fixes glsl-fs-pointcoord, gtf/point_sprites.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
We were assuming that the input attribute n to the FS was
FRAG_ATTRIB_TEXn, which happened to be true often enough for our
testcases.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
If the wrap R (3rd) mode is set to CLAMP or CLAMP_TO_BORDER and the texture
isn't 3D, r300 always samples the border color regardless of texture
coordinates.
I HATE THIS HARDWARE.
NOTE: This is a candidate for the 7.10 branch.
Ideally we'd have a compiler and register spilling and all that
but this is good enough for now to avoid the gpu hang in piglit,
glsl-vs-vec4-indexing-temp-dst-in-nested-loop-combined
on r600/r700 cards.
based on r600c patch
Andre Maasikas <amaasikas@gmail.com>
r600c: bump sq gpr resources if a shader needs more than default
Signed-off-by: Dave Airlie <airlied@redhat.com>
According to vol2a.07, it only applies from Cantiga to Sandybridge.
I found this in my ringbuffers while investigating various GPU hangs.
While it may not have been the cause, it seemed wise to remove it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Thanks to Chad's hard work implementing separate stencil and HiZ
support, this is entirely straightforward.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
We need to call add_validated_bo to do proper aperture space accounting.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Since Gen7 doesn't support packed depth/stencil, the stencil buffer
can't possibly be relevant for determining the depth format.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This makes mesa more consistent with glibtool and XCode where the
generated file matches the dylib id rather using an extra symlink
Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>
When it is sensible to do so,
1) intelCreateBuffer() now attaches separate depth and stencil
buffers
to the framebuffer it creates.
2) intel_update_renderbuffers() requests for the framebuffer
a separate stencil buffer (DRI2BufferStencil).
The criteria for "sensible" is:
- The GLX config has nonzero depth and stencil bits.
- The hardware supports separate stencil.
- The X driver supports separate stencil, or its support has not yet
been determined.
If the hardware supports hiz too, then intel_update_renderbuffers()
also requests DRI2BufferHiz.
If after requesting DRI2BufferStencil we determine that X driver did not
actually support separate stencil, we clean up the mistake and never ask
for DRI2BufferStencil again.
CC: Ian Romanick <idr@freedesktop.org>
CC: Kristian Høgsberg <krh@bitplanet.net>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Assert that the GLX config has an expected depth/stencil bit combination:
one of d24/s8, d16/s0, d0/s0. These are the only depth/stencil
configurations that we advertise.
Remove the check for software stencil, because given the assertions'
constraints the check always fails.
CC: Ian Romanick <idr@freedesktop.org>
CC: Kristian Høgsberg <krh@bitplanet.net>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Extract the code that queries DRI2 to obtain the DRIdrawable's buffers
into intel_query_dri2_buffers_no_separate_stencil().
Extract the code that assigns the DRI buffer's DRM region to the
corresponding renderbuffer into
intel_process_dri2_buffer_no_separate_stencil().
Rationale
---------
The next commit enables intel_update_renderbuffers() to query for separate
stencil and hiz buffers. Without separating the separate-stencil and
no-separate-stencil paths, intel_update_renderbuffers() degenerates into
an impenetrable labyrinth of if-trees.
CC: Ian Romanick <idr@freedesktop.org>
CC: Kristian Høgsberg <krh@bitplanet.net>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Add the fields below to intel_screen. The expression in parens is the
value to which intelInitScreen2() currently sets the field.
GLboolean hw_has_separate_stencil (true iff gen >= 7)
GLboolean hw_must_use_separate_stencil (true iff gen >= 7)
GLboolean hw_has_hiz (always false)
enum intel_dri2_has_hiz dri2_has_hiz (INTEL_DRI2_HAS_HIZ_UNKNOWN)
The analogous fields in intel_context now inherit their values from
intel_screen.
When hiz and separate stencil become completely implemented for a given
chipset, then the respective fields need to be enabled.
CC: Ian Romanick <idr@freedesktop.org>
CC: Kristian Høgsberg <krh@bitplanet.net>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
... which indicates if the X driver supports DRI2BufferHiz and
DRI2BufferStencil.
I'm placing this in its own commit due to the large comment block.
CC: Ian Romanick <idr@freedesktop.org>
CC: Kristian Høgsberg <krh@bitplanet.net>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Since the stencil buffer is interleaved, the generic Mesa renderbuffer
accessors do not suffice. Custom span functions are necessary.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad@chad-versace.us>
When emitting 3DSTATE_DEPTH_BUFFER, also emit 3DSTATE_HIER_DEPTH_BUFFER if
there is a hiz buffer. Ditto for 3DSTATE_STENCIL_BUFFER and a separate
stencil buffer.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
glapidispatch.h was located in glapi and shared with mesa core. Because
the way it was shared, mesa core must include it indirectly via
main/dispatch.h.
Now that it is no longer needed by glapi and is located in core mesa,
merging it with main/dispatch.h to avoid wrong uses.
Generate different glapidispatch.h's for GL and GLES. For GLES, we want
a local remap table.
This reverts commit 5af46e8360. The
commit will break GL remap table setup when main/glapidispatch.h is
regenerated.
See piglit dlist-fdo31590.c test and
http://bugs.freedesktop.org/show_bug.cgi?id=31590
In this case we had node->prim_count=1 but node->count==0 because the
display list started with glBegin() but had no vertices. The call to
glEvalCoord1f() triggered the DO_FALLBACK() path. When replaying the
display list, the old condition basically no-op'd the call to
vbo_save_playback_vertex_list call(). That led to the invalid operation
error being raised in glEnd().
NOTE: This is a candidate for the 7.10 branch.
Previously, we were errantly drawing some interior edges of clipped
polygons and quads. Also, we were introducing extra edges where
polygons intersected the view frustum clip planes.
The main problem was that we were ignoring the edgeflags encoded in
the primitive header's 'flags' field which are set during polygon/quad
->tri decomposition. We need to observe those during clipping. Since
we can't modify the existing vert's edgeflag fields, we need to store
them in a parallel array.
Edge flags also need to be handled differently for view frustum planes
vs. user-defined clip planes. In the former case we don't want to draw
new clip edges but in the later case we do. This matches NVIDIA's
behaviour and it just looks right.
Finally, note that the LLVM draw code does not properly set vertex
edge flags. It's OK on the regular software path though.
When GLX_INDIRECT_RENDERING is defined, some symbols are used in
libglapi.a but are not defined. Define them through the help of
glapitemp.h.
Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
This updates the apple dispatch table to match the current glapi.
Aliases are still not handled very well.
Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>
With this change, Apple's libGL is now using glapi rather than implementing
its own dispatch. In this implementation, two dispatch tables are created:
__ogl_framework_api always points into OpenGL.framework.
__applegl_api is the vtable that is used. It points into OpenGL.framework
or to local implementations that override / interpose this in OpenGL.framework
The initialization for __ogl_framework_api was copied from XQuartz with some
modifications and probably still needs further edits to better deal with
aliases.
This is a good step towards supporting both indirect and direct rendering
on darwin.
Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>
In starting the migration to using mapi, rename __gl_api to
__ogl_framework_api since it is a vtable for OpenGL.framework
Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>
So only with kernel version 2.7 can this work, thanks to Alex
for pointing that out. Also add a workaround for a hw bug.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Evergreen can do this as well as cayman, so we should enable it.
This fixes a gpu lockup with
glsl-vs-vec4-indexing-temp-dst-in-nested-loop-combined.shader_test
I need to add a better workaround for r600/r700.
Signed-off-by: Dave Airlie <airlied@redhat.com>
We weren't emitting the SQ setup regs at all which really is
fail.
When a state is always enabled we need to add it to the dirty list
as well.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Since resources don't generally vary in size, this splits
the emit path, it also takes into a/c that texture and vertex resources
have different number of relocs, and avoids emitting the extra
reloc for vertex resources.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Exit this loop early to avoid pointless iterations later.
Move the resource bos to the first two regs, it actually
doesn't matter which regs we use for this in resource land.
Signed-off-by: Dave Airlie <airlied@redhat.com>
The EXT_framebuffer_object spec (and later specs) say:
"If a buffer is specified in <mask> and does not exist in both
the read and draw framebuffers, the corresponding bit is silently
ignored."
Check for color, depth, and stencil that the source and destination
FBOs have the specified buffers. If the buffer is missing, remove the
bit from the blit request mask and continue.
Fixes the crash in piglit test 'fbo-missing-attachment-blit from', and
fixes 'fbo-missing-attachment-blit es2 from'.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=37739
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
NOTE: This is a candidate for the stable branches.
In an ES2 context (or if GL_ARB_ES2_compatibility) is supported, the
framebuffer can be complete with some attachments be missing. In this
case the _ColorDrawBuffers pointer will be NULL.
Fixes the crash in piglit test fbo-missing-attachment-clear.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=37739
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
NOTE: This is a candidate for the stable branches.
query->num_results already has the size in dwords of the query
buffer. There no need to multiply again. We were reading past
the end of the buffer, resulting in reading garbage.
Fixes:
https://bugs.freedesktop.org/show_bug.cgi?id=37028
agd5f: clarify the comment.
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
According to the hw documentation, the driver needs to:
- allocate 128 bits for each possible DB
- clear the 128 bits for each possible DB
- write 1 to bits 127 and 63 for upper DBs that don't
exist on a particular asic
Previously we were only doing these steps if the
asic had less than the max possible DBs.
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
With complex shaders there are often "holes" in the fs inputs, and we only
have 8 tex coorsd to map those to. To fix this, we remap fs inputs to [0..8].
This lets us to run many more GLSL programs.
At the end of flushing we were scanning over 450 blocks
with generally about 50 enabled. This reduces the scanning
to just the list of enabled blocks.
Signed-off-by: Dave Airlie <airlied@redhat.com>
There isn't much point taking the overhead of range/block lookups on resources
we aren't going to be getting resource registers at wierd offsets.
Signed-off-by: Dave Airlie <airlied@redhat.com>
resource setting could be a fair bit more lightweight,
this patch just separates the resource structs from the standard
reg tracking structs in the driver, later patches will improve
the winsys.
Signed-off-by: Dave Airlie <airlied@redhat.com>
we don't need to loop over all the registers unless we have
some bos in the block, also avoid setting the ctx flags,
and move the optional stuff down below this chunk.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reference implementation which produces high quality renderings.
Based on Higher Quality Elliptical Weighted Avarage Filter (EWA).
Signed-off-by: Brian Paul <brianp@vmware.com>
We implement line stipples, just not *quite* correctly. We have a
piglit testcase to use when we want to fix it, if we do. Until then,
don't lie to our test suites.
We do have hardware antialised lines. If we care, we should actually
fix them to be conformant (or as close as possible) instead of using
this knob to fool testcases using swrast.
For some interesting reading on the state of GL_*_SMOOTH across
several drivers, see:
http://homepage.mac.com/arekkusu/bugs/invariance/HWAA.html
From my reading of the GL 2.1 spec, no antialiasing is strictly
conformant for polygon smoothing. Yes, it's absurd, but then,
hardware doesn't support this so maybe it's not so absurd.
ir_print_visitor::visit(ir_constant *) was failing to index properly
into ir->type->fields.structure, so the first field name was being
reprinted for every field in the structure.
Signed-off-by: Brian Paul <brianp@vmware.com>
ast_expression::print() had an incorrect index into the subexpressions
array, so (a ? b : c) was being incorrectly rendered as (a ? b : b).
Signed-off-by: Brian Paul <brianp@vmware.com>
This makes this function not be an always miss for the branch predictor.
Noticed using cachegrind, makes a minor difference to gears numbers on r600g.
Signed-off-by: Dave Airlie <airlied@redhat.com>
These are handled separately in the winsys, so don't need the calculations
done at this point. this manifested as a crash in point-sprite,
Thanks to XoD on #radeon for pointing it out.
Signed-off-by: Dave Airlie <airlied@redhat.com>
The flush function, when asked for, should not return a NULL fence.
NULL can only be returned if fences are not implemented, and st/mesa
doesn't call any of the fence functions if it receives a NULL fence
(because some drivers don't even set the fence hooks).
ARB_sync is exposed if fence_finish is set.
Mesa now limits, by default, the max number of texture levels to 15 so we
can now support the architectural maximum for gen4-6 of 14.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This modifies the VGT state and move the SPI setup to its own discrete state.
It then just sets the SPI state up and the VGT state up once and modifies
them thereafter.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This splits the initialisation and the setting of values in the resource
buffers. We only should end up initialising once and updateing with new values
when needed.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This moves the overhead of working out the range/block to state build time,
it also allows the compiler to use constants for a lot of things instead
of working them out each time.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This moves the functions down the file, and also adds a ctx parameter.
This is precursor patch just moving stuff around and getting it ready.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This drop the r600_draw_vbo CPU usage on a run of nexuiz from 1.40% to 0.72%
in sysprof for me on my Fusion APU.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This range was 76 dwords long, the 75th dword changes, the first 60 or so
don't. split the block so it emits less often.
Signed-off-by: Dave Airlie <airlied@redhat.com>
glx code hasn't lived under xserver/GL for a long time now.
Signed-off-by: Nathan Kidd <nkidd@opentext.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
OpenGL 4.0 Compatibility, page 449:
If the value of FRAMEBUFFER_ATTACHMENT_OBJECT_TYPE is NONE, no
framebuffer is bound to target. In this case querying pname FRAMEBUFFER_-
ATTACHMENT_OBJECT_NAME will return zero, and all other queries will generate
an INVALID_OPERATION error.
Reviewed-by: Chad Versace <chad@chad-versace.us>
This avoids the extra CMP and the predication on SEL, so in addition
to one less instruction, it makes scheduling less constrained.
Improves glbenchmark Egypt performance 0.6% +/- 0.2% (n=3). Reduces
FS instruction count across affected shaders in shader-db by 1.3%
without regressing any.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reduces compiled size of brw_wm_surface_state.o another 1.9%.
Overall, this brw_wm_surface_state reduction series cuts
firefox-talos-gfx runtime by 0.68% +/- 0.42% (n=6).
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
It turns out that gcc is just awful at generating code for
brw_structs.h style state setup, and using bitshifting on u32s
generates better code while being similarly readable (and more
verifiable compared to the specs, using the INTEL_MASK macro).
It's only used in the old fragment program path, to avoid projection
when w is always 1. We do want to do this in the new path pre-gen6
too, but we'll probably do it through the ir.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Oddly, this increases compiled code size. (marking the 'if' as likely
also increases code size, but not as much).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Interestingly, the compiler wasn't doing this for us at -O2, so we
were doing the computation for every non-_ReallyEnabled unit.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
- all asics need to emit CONTEXT_CONTROL
- all r6xx asics need to emit 3D_START_CMDBUF
The ddx and r600c already do this. r600g should as well.
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
We are getting inconsistent methods for endian detection (same answer when
it works, just doesn't work on some platforms) depending on whether __GLIBC__
is defined, which of course depends on include ordering before p_config.h
Just make p_config.h include limits.h to solve this.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
On my original R600 card this at least lets gnome shell run for a while longer
and the piglit r300-readcache test case works a lot more reliably.
Still a few more stability issues running a piglit test run though.
Signed-off-by: Dave Airlie <airlied@redhat.com>
The spec doesn't state it should be an error, but. We have this piglit test
useprogram-inside-begin that passes with this commit. No idea what's correct.
NOTE: This is a candidate for the 7.10 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
The conditional rendering should be able to kill CopyPixels.
I assume the render condition has no effect on resource_copy_region.
This fixes piglit:
- NV_conditional_render/copypixels
NOTE: This is a candidate for the 7.10 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
Always default to DEFAULT_*_FORMATS for mandatory GL formats.
(st_choose_format must not fail for those)
Use DEFAULT_RGBA when alpha is required instead of RGB.
Use DEFAULT_RGB otherwise.
These are more or less the remaining differences between the old code and
the new one.
Reviewed-by: Brian Paul <brianp@vmware.com>
The problem is: The second time the function is called with a new
internal format, strb->format is usually not PIPE_FORMAT_NONE.
RenderbufferStorage(... GL_RGBA8 ...);
RenderbufferStorage(... GL_RGBA16 ...); // had no effect on the format
Broken with: fd6f2d6e57
Test: piglit/fbo-storage-completeness
NOTE: This is a candidate for the 7.10 branch.
(if fd6f2d6e57 is cherry-picked as well)
Reviewed-by: Brian Paul <brianp@vmware.com>
Lowered indirect addressing can create lots of immediates.
Fixes piglit/glsl-fs-uniform-array-7 on r300g.
NOTE: This is a candidate for the 7.10 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
Based uppon a patch from Pali Rohár <pali.rohar@gmail.com>.
This seems to get at least YUV->RGB conversion working.
So a simple "mplayer -vo vdpau" now seems to work fine.
From now on, depth test is always enabled in hardware.
If depth test is disabled in Gallium, the hardware Z function is set to ALWAYS.
If there is no zbuffer set, the colorbuffer0 memory is set as a zbuffer
to silence the CS checker.
This fixes piglit:
- occlusion-query-discard
- NV_conditional_render/bitmap
- NV_conditional_render/drawpixels
- NV_conditional_render/vertex_array
We want to check for Success, otherwise it will fail even with the right visual.
NOTE: This is a candidate for the 7.10 branch.
Signed-off-by: Antoine Labour <piman@chromium.org>
Signed-off-by: Stéphane Marchesin <marcheu@chromium.org>
Signed-off-by: Brian Paul <brianp@vmware.com>
When using _mesa_layout_parameters, all params copied in the 'layout'
output in the PASS 1 don't modify StateFlags (because they are simply
memcpy'ed).
This patch fixes the problem, assuring output gl_prog_param_list
StateFlags field is the same as the input one.
NOTE: This is a candidate for the 7.10 branch.
Signed-off-by: Brian Paul <brianp@vmware.com>
At glLinkShaders time, a fail() call in FS compile in 8-wide (the one
that's required to succeed, though we may relax that at some point for
pre-Ironlake performance) will now report out as a link error.
We now have:
brw_fs.cpp handles calling out to everything and optimization.
brw_fs_visitor.cpp handles translating to our LIR.
brw_fs_emit.cpp handles emitting from our LIR to native code.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
There's an assumption here that fixed GRFs will never intersect with
the allocated GRFs. That's true today, though it might change some
day if we decide to register-allocate the regs containing push
constants once they're dead.
This fixes a regression in 0f7325b890 in
Lightsmark from the texture instructions now containing g0 references
instead of having that be implied. Performance is improved 15.2% +/-
3.6% (n=3).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=34968
emit_add_a16 was using the incorrect source.
This caused adds in the form of:
add u16 $a0 s32 $a1 u32 0x00000200
to have a source AREG of $a0 instead of $a1.
Fixes World of Warcraft in OpenGL and D3D without GLSL.
They were occupying whole 32-bit words, despite being only 10 or so
bits. Reduces code size slightly (80/3300 bytes).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
From the GL 2.1 spec:
"Required perspective-correct interpolation for all fragment
attributes except depth in sections 3.4.1 and 3.5.1, effectively
making GL PERSPECTIVE CORRECT HINT a no-op."
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
First, FBO read/draw == NULL validation happens in mesa core not
intelReadBuffers -> intel_draw_buffers. Second, that condition is no
longer tested for in our driver since ARB_ES2_compatibility was added.
Reviewed-by: Brian Paul <brianp@vmware.com>
Otherwise, the driver is likely to draw the flushed vertices to the
new drawbuffer instead of the old one, missing the point of the flush.
Reviewed-by: Brian Paul <brianp@vmware.com>
From the ARB_ES2_compatibility spec:
"(8) How should we handle draw buffer completeness?
RESOLVED: Remove draw/readbuffer completeness checks, and treat
drawbuffers referring to missing attachments as if they were NONE."
Fixes arb_es2_compatibility-drawbuffers when the short-circuit for
ARB_ES2_compatibility in the previous commit is dropped.
Reviewed-by: Brian Paul <brianp@vmware.com>
glDrawBuffers pointing at an unattached buffer is supposed to be
incomplete without ARB_ES2_compatibility. The testcase to catch the
bug of not implementing that bit of the spec was tricked by this
missing piece of state update.
Reviewed-by: Brian Paul <brianp@vmware.com>
If we use FBOs to access mipmap levels with glRead/Draw/CopyPixels()
we need to be sure to access the correct mipmap level/face/slice.
Before, we were just passing zero in quite a few places.
This fixes the new piglit fbo-mipmap-copypix test.
NOTE: This is a candidate for the 7.10 branch.
V_SQ_CF_WORD1_SQ_CF_INST_HALT is 0x1f on both
evergreen and cayman.
Reported-by: Gustaw Smolarczyk <wielkiegie@gmail.com>
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
The logic of intel_draw_buffers() expected that stencil buffers were
always combined depth/stencil.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad@chad-versace.us>
When a texture is attached to multiple FBO's, a separate renderbuffer
wrapper is created for each attachment. This necessitates storing the hiz
region for these renderbuffers in the texture itself instead of the
renderbuffer wrapper.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Before this commit, the renderbuffer's region was updated in
intel_renderbuffer_texture(). This commit moves the update into
intel_update_wrapper(), which is a more logical location for updates.
This is in preparation for the next commit, which allocates and
updates the texture's hiz region in intel_update_wrapper(). Having the two
region updates located in the same function makes good form.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad@chad-versace.us>
A hiz surface must be supplied to the hardware when rendering to a depth
buffer with hiz. There are three potential places to store that surface:
1. Allocate a larger intel_region for the depthbuffer, and let the
region's tail be the hiz surface.
2. Allocate a separate intel_region for hiz, and store it as
brw_context state.
3. Allocate a separate intel_region for hiz, and store it in
intel_renderbuffer.
We choose method 3.
Method 1 has not been chosen due to future complications it might cause
when requesting a DRI drawable's depth buffer attachment from X.
Method 2 has not been chosen because storing the hiz region apart from
the depth region makes lazy hiz/depth resolves difficult to implement.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Given a format, is_hiz_depth_format() indicates if HiZ can be enabled on
a depthbuffer of that format.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
... in intel_alloc_renderbuffer_storage(). The stencil buffer has quirky
pitch requirements, so its region allocation is a special case.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
When hardware supports separate stencil, enable support for separate
depth/stencil texture formats in the table
intel_context.ctx.TextureFormatsSupported. If the hardware must use
separate stencil, then disable support for combined depth/stencil formats.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Prefer MESA_FORMAT_X8_Z24 over MESA_FORMAT_S8_Z24 for textures with
internal format GL_DEPTH_COMPONENT*.
i965 needs MESA_FORMAT_X8_Z24 for HiZ and separate stencil.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Add the following flags:
intel_context.has_separate_stencil
intel_context.must_use_separate_stencil
intel_context.has_hiz
The flags are currently set to false, and will be enabled for a given
chipset once the feature is completely implemented.
Since it may be some time before these features are completed, their
values can be overridden with environment variables INTEL_HIZ and
INTEL_SEPARATE_STENCIL. Valid values for these environment variables are
"0" and "1".
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
... because that's not a safe thing to do. The request buffer is shared
storage among all threads, and after UnlockDisplay the 'req' pointer may
point into someone else's request.
NOTE: This is a candidate for the 7.10 branch.
Signed-off-by: Adam Jackson <ajax@redhat.com>
Cayman is the RadeonHD 69xx series of GPUs. This adds support for
3D acceleration to the r600g driver.
Major changes:
Some context registers moved around - mainly MSAA and clipping/guardband related.
GPR allocation is all dynamic
no vertex cache - all unified in texture cache.
5-wide to 4-wide shader engines (no scalar or trans slot)
- some changes to how instructions are placed into slots
- removal of END_OF_PROGRAM bit in favour of END flow control clause
- no vertex fetch clause - TC accepts vertex or texture
Signed-off-by: Dave Airlie <airlied@redhat.com>
These don't need one, and I was seeing 0xff being returned and set in
the GPU registers with some tests.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Instead of using a giant switch statement with lots of code, use a
table to convert GL format enums to pipe formats.
Tested by running the old code next to the new and asserting that
the return value was the same for piglit tests.
We're doing a linear search, but if that ever appears to be too slow
the table could easily be sorted or hashed.
Certain applications (e.g., Bernina My Label, and the Windows
implementation of Processing language) destroy the device context used when
creating the frame-buffer, causing presents to fail because we were still
referring to the old device context internally.
This change ensures we always use the same HDC passed to the ICD
entry-points when available, or our own HDC when not available (necessary
only when flushing on single buffered visuals).
Since the SET_xxx and GET_xxx macros used to initialize the remap_table
have been replaced by inline functions, the missing late macro expansion
leads to driDispatchRemapTable not being redefined to remap_table, which
in turn causes the remap_table not to be setup properly.
This commit fixes the issue by moving the table redefinition after the
definition of driDispatchRemapTable but in front of the inline function
definitions.
Despite that negative values aren't sensible here, making this unsigned
is dangerous. Consider get_pointer_generic, which computes a value of
the form:
void *base + (int x * int stride + int y) * unsigned bpp
The usual arithmetic conversions will coerce the (x*stride + y)
subexpression to unsigned. Since stride can be negative, this is
disastrous.
Fixes at least the following piglit tests on Ironlake:
fbo/fbo-blit-d24s8
spec/ARB_depth_texture/fbo-clear-formats
spec/EXT_packed_depth_stencil/fbo-clear-formats
NOTE: This is a candidate for the 7.10 branch.
Reviewed-by: Chad Versace <chad.versace@intel.com>
Signed-off-by: Adam Jackson <ajax@redhat.com>
According to OpenGL 3.1 chapter 2.1.5 the representation without zero
should only be used for vertex attribute values, but not for textures
or frame-buffers.
According to OpenGL 3.1 chapter 2.1.5 the representation without zero
should only be used for vertex attribute values, but not for textures
or frame-buffers.
The coordinate offsets set in the m1 header are for textureOffset;
they have nothing to do with textureGrad (TXD).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
The same as 3e43adef95 but for Gen7.
This doesn't quite fix GL_ARB_depth_texture/fbo-clear-formats; there's
still a 1 pixel wide black line on the right edge of the smaller squares.
The results were entirely wrong before, and are at least close now.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Since wayland 4bde293ff8109d55eeaee8732f5a6ee0c8cd4bd9 we cant
lookup visuals, as we dont receive the visual token events.
The format for pixmap-images thus has to default to argb for now.
GLES uses GL_APIENTRYP instead of GLAPIENTRYP, which breaks with the
latest API table generation code. This fixes the issue by emitting a
definition for GL_APIENTRYP when generating the GLES files.
This prevents the error
prog: for the -disable-mmx option: may only occur zero or one times!
when creating a new context after XCloseDisplay with DRI drivers linked
with a shared LLVM 2.8 library.
In particular, this fixes the case where a vertex shader only uses
generic vertex attributes (non-0th). Before, we were no-op'ing the
glDrawArrays/Elements().
This fixes the new piglit pos-array test.
NOTE: This is a candidate for the 7.10 branch.
Previously, always did unorm8->float/nonlinear-to-linear conversion (using
lookup table), then convert back to nonlinear (using the expensive math
func pow among others), and finally convert back to int (assuming caller
wants unorm8), because the float texture fetch function is used for getting
the actual texel values. This should probably all be changed at some point,
but for now simply enable the memcpy path also for srgb formats (but if for
instance swizzling is required, still the whole conversion will be done).
Clip distance is calculated each time vertex position is written
which is suboptiomal is some cases but very safe.
User clip planes are an obsolete feature anyway.
Every time number of clip planes increases, the vertex program
is recompiled.
That ensures no overhead in normal case (no user clip planes)
and reasonable overhead otherwise.
Fixes 3D windows in compiz, and reflection effect in neverball.
Also fixes compiz expo plugin when windows were dragged and each
window shown 3 times.
This was going to get in the way of separate depth/stencil (which
wants to know about both, and whether they are the same rb), and also
wasn't a sufficient flag for the fix in the following commit.
In the 16-wide rework, I missed that we were setting some things to be
SIMD16 mode (corresponding to their setup in emit_texture_gen4()).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These fields are documented to be in the payload, and though the FB
write docs say they *aren't* in the payload, for all other fields the
payload and header is structured so that no overwriting is required
except for non-default options.
It turns out there's nothing in the hardware preventing this. It
appears that it ought to work on pre-gen6 as well, but just produces
GPU hangs.
Improves glbenchmark Egypt framerate 4.4% +/- 0.3% (n=3), and Pro by
2.6% +/- 0.6% (n=3).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
As of gen6, alt-mode (which we use) MOVs of floats are not raw --
they'll modify infs/nans. This broke discard and alpha test in
16-wide, where apparently the upper 8 bits of the pixel enables being
set were causing the whole value to get trashed upon being moved.
Treating the values as UD instead of float makes sure they get
preserved. While I'm here, replace the two 8-wide moves of the halves
of the header with a single compressed move.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=36648
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is part of fixing fbo-alphatest-nocolor -- a regression in
35e8fe5c99 after the initial regression,
that had us using a garbage BLEND_STATE[0] (in particular, the alpha
test enable) if no color buffer was bound.
I thought I was thwarted initially when I couldn't do conditional mod
on a MOV, and couldn't use two immediate constants in one instruction.
But g0 != g0 is also a way to produce a failing comparison.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Anisotropic filtering extension for swrast intended to be used by osmesa
to create high quality renderings.
Based on Higher Quality Elliptical Weighted Avarage Filter (EWA).
A 2nd implementation using footprint assembly is also provided.
Signed-off-by: Brian Paul <brianp@vmware.com>
Correctly links against selinux library when MESA is built with --enable-selinux option.
Fixes bug #36333 in Freedesktop bugzilla
Signed-off-by: Dave Airlie <airlied@redhat.com>
This function was taking a lot more CPU than required due to it memsetting
a bunch of memory that didn't require it from what I can see.
We should only memset here when we are about to fill out the sampler,
otherwise we end up doing a bunch of memsets for everytime this function
is called, basically setting 0 memory to 0.
Signed-off-by: Dave Airlie <airlied@redhat.com>
The change for GPU hanging in 13bab58f04
fell back even when rb == NULL, which is wrong for GLES2 and caused
segfaulting in GLES2 conformance. For the GPU hang case (where the
broken 2D driver failed to allocate a BO for the window system
renderbuffer), it also would assertion fail/segfault immediately after
the fallback setup when the renderbuffer map failed.
Fixes GLES2 conformance packed_depth_stencil.
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Texture LOD Bias is now S4.8 instead of S4.6;
Min LOD, and Max LOD are now U4.8 instead of U4.6.
Fixes piglit test tex-miplevel-selection.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
The data port messages for this are rather different. For now, fail to
compile rather than hanging the GPU.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
On gen4/5, the RNDZ and RNDE instructions return floor(x), but set special
"round increment bits" in the flag register; a predicated ADD (+1) fixes
the result.
The documentation still lists '.r' as existing, and says that the
predicated add is necessary, but it apparently lies. According to the
simulator, BRW_CONDITIONAL_R (7) is not a valid conditional modifier
and the RNDZ and RNDE instructions simply produce the correct value.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The MATH instruction cannot handle source modifiers, even on Gen7.
So, apply this workaround for Sandybridge on Ivybridge as well.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This fixes piglit test glsl-fs-loop-continue.shader_test on Ivybridge.
According to the documentation, the CONT instruction's UIP field should
point to the WHILE instruction on both Sandybridge and Ivybridge.
The previous code made UIP point to the implicit DO instruction, which
seems incorrect. I'm not sure how it could have worked on Sandybridge.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Ivybridge's IF instruction doesn't support conditional modifiers.
It also introduces UIP, which must point to the ENDIF instruction.
ELSE and ENDIF remain the same except that JIP moves from dst to src1.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Ivybridge puts the shadow comparator first, then lod/bias, and finally
the coordinate---unlike previous generations which always reserved four
slots for the coordinate at the beginning.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Most of this code copied from brw_wm_sampler_state.c.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
I'm still not happy with the amount of code duplication here, but it
will have to do for now.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Otherwise, Ivybridge seems to ignore the newly supplied data, giving us
rubbish for vertices.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This shouldn't be done using MRFs, but until I have a proper solution
for dealing with MRFs, this allows my hack to keep working.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The message header is still incorrect, but this is a start.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Ivybridge's SEND instruction uses GRFs instead of MRFs. Unfortunately,
a lot of our code explicitly uses MRFs, and rewriting it would take a
fair bit of effort. In the meantime, use a hack:
- Change brw_set_dest, brw_set_src0, and brw_set_src1 to implicitly
convert any MRFs into the top 16 GRFs.
- Enable gen6_resolve_implied_move on Ivybridge: Moving g0 to m0
actually moves it to g111 thanks to the previous hack.
It remains to officially reserve these registers so the allocator
doesn't try to reuse them.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This also disables the HiZ and separate stencil buffers. We still need
to implement stencil.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Since we currently only support sampling in the fragment shader, we only
bother to emit the PS variant. In the future we'll need to emit others.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This may not be necessary, but it seems like a good idea.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Ivybridge can update each stage's binding table pointer independently,
so we want separate dirty bits. Previous generations can simply
subscribe to all three dirty bits and emit as usual.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Copied from gen6_vs_state.c; reuses create_vs_constant_bo from there.
The 3DSTATE_VS command is identical but 3DSTATE_CONSTANT_VS is not.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
SF and CLIP viewport state has been combined into SF_CLIP_VIEWPORT;
SF_CLIP and CC state pointers can now be uploaded independently.
Some portions of the hardware documentation refer to separate upload
commands for SF and CLIP; these are outdated and incorrect.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Copied from gen6_clip_state.c.
This enables early culling and sets the necessary fields. Otherwise, it
is entirely the same, so I doubt this patch is strictly necessary for a
functional driver.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The state itself still seems to be the same; the only change is that
each part (CC, BLEND, DEPTH_STENCIL) can now be uploaded independently.
Thus, we still rely on the code in gen6_cc.c to set up the state.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Copied from gen6_wm_state.c.
The main change from Sandybridge seems to be that 3DSTATE_WM was split
into two separate state packet commands: 3DSTATE_WM and 3DSTATE_PS.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Copied from gen6_sf_state.c.
The main change from Sandybridge seems to be that 3DSTATE_SF was split
into two separate state packet commands: 3DSTATE_SF and 3DSTATE_SBE
("setup backend"). The bit-offsets are even the same - only the DWords
numbers have shuffled around a bit.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Currently this always reserves 16kB for push constants, regardless of
how much space is needed, and partitions it evenly betwen the VS and FS.
This is probably not ideal, but is straightforward.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Currently, gen7_atoms is a verbatim copy of gen6_atoms; future commits
will update it to contain gen7-specific state.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Currently, IS_GEN7, IS_IVYBRIDGE, IS_IVB_GT1, and IS_IVB_GT2 all return
false. This allows me to write the code for them before actually adding
the PCI IDs and thus enabling the hardware.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The documentation uses the term "vertex URB entries", the code talks
about "entry size", and so on. Also, handles are just "pointers" to
entries (actually small integers).
Also rename max_gs_handles to max_gs_entries.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The primary motivation for this is to better support Ivybridge control
flow. Ivybridge IF instructions need to point to the first instruction
of the ELSE block -and- the ENDIF instruction; the existing code only
supported back-patching one instruction ago.
A second goal is to simplify and centralize the back-patching, hopefully
clarifying the code somewhat.
Previously, brw_ELSE back-patched the IF instruction, and brw_ENDIF
back-patched the previous instruction (IF or ELSE). With this patch,
brw_ENDIF is responsible for patching both the IF and (optional) ELSE.
To support this, the control flow stack (if_stack) maintains pointers to
both the IF and ELSE instructions. Unfortunately, in single program
flow (SPF) mode, both were emitted as ADD instructions, and thus
indistinguishable.
To remedy this, this patch simply emits IF and ELSE, rather than ADDs;
brw_ENDIF will convert them to ADDs (the SPF version of back-patching).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This hides the IF stack and back-patching of IF/ELSE instructions from
each of the code generators, greatly simplifying the interface.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This would be so much easier if we were using C++; we could simply use
constructors and destructors. Instead, we have to update all the
callers.
While we're at it, ralloc various brw_wm_compile fields rather than
explicitly calloc/free'ing them.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
On Sandybridge, we don't need to break down primitives. There's no need
to bother setting up brw_compile and such if it's not going to be used;
bail as early as possible.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
ctx->Light.ProvokingVertex depends on _NEW_LIGHT.
Found by inspection.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This makes it symmetric with brw_set_dest, which is convenient, and will
also allow for assertions to be made based off of intel->gen.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This fixes piglits fragment-and-vertex-texturing test on llvmpipe for me.
I've no idea if someone had another plan for this that is smarter than what
I've done here, but what I've basically done is
split fragment and vertex sampler and sampler_view setup function, factor
out the common chunks of both.
side-cleanups:
drop st->state.sampler_list - unused
don't update border color if we have no border color.
should fix https://bugs.freedesktop.org/show_bug.cgi?id=35849
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
I'm hard pressed to think of any reason a gallium thread would want to
receive a signal, especially considering its probably loaded as a library
and you don't want the threads interfering with the main threads signal
handling.
This solves a problem loading llvmpipe into the X server for AIGLX,
where the X server relies on the SIGIO signal going to the main thread,
but once llvmpipe loads the SIGIO can end up in any of its threads.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Nothing special, just changing conditions for when HiZ can be enabled and
when HiZ memory becomes invalid.
I was thinking about it again and realized it had not been quite right.
This is actually just the message descriptor for Gen6+ dataport access;
it has nothing to do with the render cache. Access to the sampler cache
and constant cache also would use this struct; rename for clarity.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
These are documented on page 245 of IHD_OS_Vol4_Part2.pdf (the public
Sandybridge documentation/SEND instruction description).
Somebody had the bright idea to reuse gen4/5 defines labelled READ/WRITE
which just happened to be the same values as Render Cache/Sampler Cache.
It turns out that this field has nothing to do with READ/WRITE on
Sandybridge, but rather represents which data port to direct it to.
This was especially confusing in brw_set_dp_read_message, which
used "BRW_MESSAGE_TARGET_DATAPORT_WRITE." In a read function.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
For example, an indirect load like "ld b128 $r0q c0[$r0]" seems to
overwrite the address register before finishing the load, but only
if there are a lot of threads running.
Visible as displaced geoemtry in Unigine Heaven.
According to my documentation this is actually "Media Block Write" on
Gen4-5; there has never been a "DWord Block Write."
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
only allocate the blocks ptr in the range if we ever have one,
otherwise don't bother wasting the memory.
valgrind glxinfo
before:
==967== in use at exit: 419,754 bytes in 706 blocks
==967== total heap usage: 3,552 allocs, 2,846 frees, 3,550,131 bytes allocated
after:
==5227== in use at exit: 419,754 bytes in 706 blocks
==5227== total heap usage: 3,452 allocs, 2,746 frees, 3,140,531 bytes allocate
Signed-off-by: Dave Airlie <airlied@redhat.com>
This drops 6k of the text segment, a minor drop in the ocean, however
it also makes the code a lot cleaner and removes a lot of duplicated
information, hopefully making it more maintainable.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This table covered a large range unnecessarily, reduce the address
range covered, use the fact that the bottom two bits aren't significant,
and remove unused fields from the range struct. It also drops the hash_size/shift in context in favour of a define, which should make doing the math
a bit less CPU intensive.
valgrind glxinfo
Before:
==320== in use at exit: 419,754 bytes in 706 blocks
==320== total heap usage: 3,691 allocs, 2,985 frees, 7,272,467 bytes allocated
After:
==967== in use at exit: 419,754 bytes in 706 blocks
==967== total heap usage: 3,552 allocs, 2,846 frees, 3,550,131 bytes allocated
Signed-off-by: Dave Airlie <airlied@redhat.com>
Currently r600g always maps every bo, this is quite pointless as it wastes
VM and on 32-bit with wine running VM space is quite useful.
So with this patch we don't create the mappings until first use, without
tiling enabled this probably won't make a major difference on its own,
but with tiled staged uploads it should avoid keeping maps for most of the
textures unnecessarily.
v2: add bo data ptr check
Signed-off-by: Dave Airlie <airlied@redhat.com>
typedef void (GLAPIENTRYP _GLUfuncptr)(); causes the following warning:
function declaration isn't a prototype.
Signed-off-by: José Fonseca <jfonseca@vmware.com>
GetVertexAttrib*{,ARB} is no longer aliased to the NV calls.
This fixes tracing yofrankie with apitrace, given it requires accurate
results from GetVertexAttribiv*.
NOTE: This is a candidate for the stable branches.
Mesa already supports this because of NV_fragment_program.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Marek Olšák <maraeo@gmail.com>
If state tracker asked us to map resource directly and we can't
do it (because of tiling), return NULL instead of doing full transfer
- state tracker should handle it and fallback to some other method
or repeat transfer without PIPE_TRANSFER_MAP_DIRECTLY.
It greatly improves performance of xorg state tracker on nv50+,
because its fallback (DFS/UTS) is much faster than full transfer.
Eliminates unaligned accesses on strict architectures. Spotted by Jay
Estabrook.
Signed-off-by: Matt Turner <mattst88@gmail.com>
NOTE: This is a candidate for the 7.10 branch.
GLSL stopped using:
BRA, EXP, LOG, LRP, NRM3, NRM4, XPD.
GLSL started using:
KIL, SCS, SSG, SWZ.
(omg why SWZ? isn't proc_src_register flexible enough?)
GLSL doesn't use these opcodes some Radeons do support:
ARR, DP2A, DST, LRP, XPD.
These opcodes are now unused:
AND, NOT, NRM3, NRM4, OR, XOR.
(plus maybe the NV extensions which are unused by Gallium)
In addition to that, we don't use two-dimensional indirect addressing,
which the Mesa IR can do.
PIPE_ARCH_UNKNOWN_ENDIAN is used no where else. All #else branches of
ifdef PIPE_ARCH_LITTLE assume big-endian. Not #error'ing out here
only serves to allow bad things to happen.
Signed-off-by: Matt Turner <mattst88@gmail.com>
1/ln(2) is equivalent to log2(e), so define it as such.
log2(e) = ln(e)/ln(2) = 1/ln(2)
Worst of all, the definitions for M_LOG2E and ONE_DIV_LN2
(right beside each other!) weren't the same.
Reviewed-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
Setting SOURCE_FORMAT to EXPORT_NORM is an optimization.
Leaving SOURCE_FORMAT at 0 will work in all cases, but is less
efficient. The conditions for the setting the EXPORT_NORM
optimization are as follows:
R600/RV6xx:
BLEND_CLAMP is enabled
BLEND_FLOAT32 is disabled
11-bit or smaller UNORM/SNORM/SRGB
R7xx/evergreen:
11-bit or smaller UNORM/SNORM/SRGB
16-bit or smaller FLOAT
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Hopefully we can find out the proper fix for this, but for now
this makes the fbo mipmap tests pass on my rv670 (x2 card).
Signed-off-by: Dave Airlie <airlied@redhat.com>
r6xx asics have some problems with the surface
sync logic for the CB and DB. It's recommended
to use the event write interface for flushing
the DB/CB caches rather than the sync packets.
A single event write flush flushes all dst
caches, so we only need one for all CBs and DB.
Should fix:
https://bugs.freedesktop.org/show_bug.cgi?id=35312
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This seems more in line with what the documentation suggests we should be
doing. It doesn't fix the rv635 regression, though I thought it might,
so it means I've no idea whats actually going wrong there.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Alex Deucher <alexdeucher@gmail.com>
We only handle a 32 bit swap count, so use the new structure definitions.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
After sending the GLXChangeDrawableAttributes request, we also set a
local set of attributes on the DRI drawable. But in the indirect case
this array won't be present, so skip the setting in that case to avoid a
crash.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
We use a hidden window for pbuffer contexts, but Windows limits window
sizes to the desktop size by default. This means that creating a big
pbuffer on a small resolution single monitor would truncate the pbuffer
size to the desktop.
This change overrides the windows maximum size, allow to create windows
arbitrarily large.
Pointed out by clang:
src/gallium/auxiliary/draw/draw_context.h:251:41: warning: implicit conversion
from enumeration type 'enum pipe_cap' to different enumeration type
'enum pipe_shader_cap' [-Wconversion]
return tgsi_exec_get_shader_param(param);
~~~~~~~~~~~~~~~~~~~~~~~~~~ ^~~~~
Otherwise there would be no way to know whether the state has been changed.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This reverts commit 1dc204d145.
MC_COORD_TRUNCATE is for MPEG and produces quite an interesting behavior
on regular textures. Anyway that commit broke filtering in demos/cubemap.
The new allocator uses ra and does swizzle packing.
Also, a data structure (struct rc_variable) and associated functions have
been added for generating UD and DU chains.
This function can be used to avoid creating single register classes for
input/payload registers. This makes optimistic coloring less likely
to fail.
Reviewed-by: Eric Anholt <eric@anholt.net>
The instruction scheduler will sometimes leave orphaned sources when
converting instructions from RGB to Alpha. If one of these orphaned
sources has an index greater than the maximum temporary register index,
then the compiler will incorrectly report "Too many hardware temporaries
used". The dead sources pass cleans up these orphaned sources.
GL_FIXED should not be accepted in the other gl*Pointer calls in OpenGL.
There is a new piglit for this: arb_es2_compatibility-fixed-type.
Reviewed-by: Brian Paul <brianp@vmware.com>
We were accidentally leaving blending enabled for LogicOp GL_COPY,
which ARB_color_buffer_float/GL_RGBA32F-render (and friends) caught.
Additionally, the GL spec says that no LogicOp should be done to
floating-point targets, and the GPU gets really angry even if you say
to LogicOp GL_COPY to float.
As we expanded the usage of the state cache, it grew extra
functionality. However, with the recent state streaming rework, we're
back to the state cache being used only for shader kernels, which is
the piece of GPU state that's actually expensive to compute again from
scratch, since it involves compiling.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now that all the dynamic state is streamed through the top of the
batchbuffer, we can cut out many of our relocations to that state by
using the base address.
Improves 3DMMES taiji performance 3.3% +/- 0.4% (n=15).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Overall, across this series since the last set of numbers, gen6 3DMMES
taiji performance has dropped 0.8% +/- 0.3% (n=15), probably due to
the increased reissuing of state from some of the state objects that
otherwise never changed, and increased occurrence of the per-batch
overhead as we've increased how much we put in the batch BO without
increasing the batch BO's size.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The samplers are about to become streamed for gen6 performance, which
would cause this unit to blow out the state cache.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is in a way a revert of f5bb775fd1.
The tiny win that had will be overwhelmed by the win of using the gen6
dynamic state base address.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Improves 3DMMES taiji demo performance by 5.1% +/- 1.9% (n=15), by
reducing CPU time spent thrashing around those tiny little constant BOs.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The payload regs can go all the way up to register 60+, so just give
them 8 bits to be addressed by instead of 3-4 (which made source_w_reg
of 8 end up 0). There's no reason to aggressively pack these fields,
as they are just used as compiler information, where being easier to
access is probably more important than shaving a byte or two off of
the structure.
Fixes piglit fragcoord_w.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=36649
I was promoting to float for ARB_color_buffer_float unclamped, which
failed when ARB_texture_float wasn't present. Since the metaops don't
need results outside of [0,1] when not drawing to a floating point
destination, they can just use a fixed point texture when floating
point destinations are impossible.
Fixes regression in fdo23670-depth_test when --enable-texture-float is
not present.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=36473
Since commit de579a1 "Include GIT SHA1 in GL version string"
$ git status
On branch master
Your branch is ahead of 'origin/master' by 2 commits.
Untracked files:
(use "git add <file>..." to include in what will be committed)
src/mesa/main/git_sha1.h
nothing added to commit but untracked files present (use "git add" to track)
Add git_sha1.h to .gitignore so git knows not to warn it is present but untracked
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Also use MAX3 and incorporate Ian's suggestion in texformat.c.
I don't think wrapping u_format_rgb9e5.h in another header and thus making it
more complicated is worth it.
swrast support done.
There is no renderbuffer support in swrast, because it's not required
by the extension.
Reviewed-by: Brian Paul <brianp@vmware.com>
I was wondering why I had been getting GL_RGBA for GL_RGB9_E5.
Instead of setting GL_RGBA and CHAN_TYPE for most types,
use the helper functions to obtain the info.
Reviewed-by: Brian Paul <brianp@vmware.com>
If we run out of bin memory and do an early return from
lp_setup_begin_query() we'd omit setting the setup->active_query
pointer. Then, when lp_setup_end_query() was later called, the
assertion for setup->active_query == pq would fail. Moving the
assigment in lp_setup_begin_query() avoids that.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Including windows.h was ineffective on MSVC because we define the NOGDI macro,
which skips the wingdi.h include.
Unsetting NOGDI is also a bad idea because it causes all sort of symbol
clashes with SGI code.
The real problem is that WINGDAPI was not being defined, also due to NOGDI,
so simply define it to blank if not done already. This seems to make
everybody happy.
The default value is 64 but drivers usually advertise more, like 4096.
Allows ARB vp/fp programs to use more parameters.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This reverts commit 50ade6ea69.
Fixes jerky rendering again on apps that don't block on the GPU per
frame and are GPU bound (e.g. 3DMMES on Ironlake). The whole point of
this complicated throttle scheme is to wait on frame n-1 to have
started rendering before starting frame n's rendering. Otherwise, the
GPU-bound app will race ahead and call the GL to draw many
nearly-identical frames, then >0ms later get stuck waiting for them
(all dispatched at about the same time) to retire, then render a new
batch of nearly-identical frames.
If GL_RGB16F or GL_RGB32F is specified let's try the 3-component float
texture formats before trying the 4-component ones. Before this,
GL_RGB16/32F were treated the same as GL_RGBA16/32F.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
The actual code that needs this include is just using
"if defined (PIPE_OS_UNIX)", and the two conditions should match.
This should also make the file compile under Hurd.
This is more painful than instruction scheduling, as we have to
compare two MRF writes to see if they coincide, and have to handle
partial GRF writes before that (for example, the result of a math
instruction written to color).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
All that needed fixing was skipping the newly-possible
uncompressed/sechalf partial GRF constant writes.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Most of the work of the scheduler is agnostic to wide dispatch. It
operates on our virtual GRF file, which means instructions are
generally referring to 8 or 16 wide naturally. For the MRF file
management we're trying to track the actual hardware MRF file, so we
need to watch if an instruction writes multiple MRFs.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is glued in in a bit of an ugly way -- we rely on the uniforms
having been set up by 8-wide dispatch, and we just reuse them without
the ability to add new uniforms for any reason, since the 8-wide
compile is already completed. Today, this all works out because our
optimization passes are effectively the same for both and even if they
weren't, we don't reduce the set of uniforms pushed after
optimization.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Without this, consumers often have to keep linked lists of the
entries, at additional malloc cost.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These reduce an emitted (not decoded) instruction per shader on
g4x/gen5, but may allow for additional register coalescing as well.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
At this point it doesn't do uniforms, which have to be laid out the
same between 8 and 16. Other than that, it supports everything but
flow control, which was the thing that forced us to choose 8-wide for
general GLSL support.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Note that the virtual grfs are in increments of the dispatch_width,
not hardware registers -- this makes the 16-wide emit and 8-wide emit
mostly the same.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I hit this when testing RV350, which lacks RGB10_A2 render target
support. It had been missed when implementing the format and probably
unused by anything else too.
Not applicable to 7.10.
Reviewed-by: Eric Anholt <eric@anholt.net>
"st/mesa: check image size before copy_image_data_to_texture()" caused
a regression in piglit fbo-generatemipmap-formats test on all gallium drivers.
Level 0 for NPOT textures will not match minified values, so don't do this
check for level 0.
Signed-off-by: Dave Airlie <airlied@redhat.com>
In the initial code if we had nothing in the vector slots r would
never get reset to 0, so we'd fail to compile shaders, after the previous
commit this would happen for the LIT tests. When I fixed that we did a lot
of unnecessary loops through all the vector states when we had no vector
slots filled. So this patch optimises thing for the scalar only state.
This fixes the 3 LIT piglit tests on r600g.
Signed-off-by: Dave Airlie <airlied@redhat.com>
In the R600 ISA document:
Section 4.7.5 Cycle restrictions for the ALU.trans states that
PV/PS have cycle restrictions wrt constants.
This is part of a fix for the LIT tests
Signed-off-by: Dave Airlie <airlied@redhat.com>
Fixes a bug in Trine where fragment.color would write
FRAG_RESULT_COLOR (which is interpreted by drivers as being the "write
this to all color buffers" option) instead of FRAG_RESULT_DATA0 (just
the first target).
Fixes piglit ATI_draw_buffers/arbfp-no-index.
This extension support consists of replacing
"gl_texture_obj->Sampler." with "_mesa_get_samplerobj(ctx, unit)->".
One instance of referencing the texture's base sampler remains in the
initial miptree allocation, where I'm not sure we have a clear
association with any texture unit.
Tested with piglit ARB_sampler_objects/sampler-objects.
Reviewed-by: Brian Paul <brianp@vmware.com>
Since we lack hardware support for it, this is a simple matter of
checking _mesa_check_conditional_render at the entrypoints, and
suppressing it for the metaops where it doesn't apply.
Reviewed-by: Brian Paul <brianp@vmware.com>
The NV_conditional_render spec calls out specific operations that
conditional rendering applies to, which doesn't include these.
Fixes NV_conditional_render/generatemipmap on swrast.
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested with rgtc-teximage-0[12].
EXT_texture_compression_rgtc/fbo-generatemipmap-formats fails in NPOT
just like S3TC does.
Reviewed-by: Brian Paul <brianp@vmware.com>
This assertion doesn't make any sense to me -- the convertFormat is
already something valid (tested above), and the BaseFormat dictated by
convertFormat doesn't matter to the function about to be called (it's
the datatype/comps that were pulled out of convertFormat).
Fixes assertion failure in
GL_EXT_texture_compression_rgtc/fbo-generatemipmap-formats
(still has a rendering failure in NPOT like S3TC does).
Reviewed-by: Brian Paul <brianp@vmware.com>
We were falling through to the default R8 and RG88 formats instead of
compressing when possible. Noticed by swrast fbo-blending-formats
actually doing rendering.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
They were totally broken for several releases.
scons now builds everything the project files built and more, and can be
kept up-to-date with little effort.
Need to reset the point/line/tri functions to point to the "first"
versions whenever we flush vertices. Fixes unfilled polygon rendering
errors seen in demos/samples/logo.c. See comments for more info.
NOTE: This is a candidate for the 7.10 branch.
Broken with e5c6a92a12. (ARB_color_buffer_float)
Clamping should occur if type != float, otherwise the MSBs of the resulting
pixels are killed off. For example, reading back LUMINANCE = R+G+B can be
greater than 0xff, but the result is naturally masked by 0xff
for UNSIGNED_BYTE, leading to bogus results.
The following bug report seems to want clamping to occur if type == half_float
too. Not sure what's correct.
Bug: [bisected pineview] oglc case pxconv-read failed
https://bugs.freedesktop.org/show_bug.cgi?id=35852
Tested by: Fang Xun <xunx.fang@intel.com>
Reviewed-and-tested-by: Ian Romanick <ian.d.romanick@intel.com>
None of this ever gets used. Fog is always calculated by a fragment
program. Even though the fixed-function fog unit is never used, state
updates are still sent to the hardware. Removing those spurious state
updates can't hurt performance.
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Corbin Simpson <MostAwesomeDude@gmail.com>
Acked-by: Alex Deucher <alexdeucher@gmail.com>
Fragment programs are generated by core Mesa for fixed-function.
Because of this, there's no reason to handle cases where there is no
fragment program for fog.
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Corbin Simpson <MostAwesomeDude@gmail.com>
Acked-by: Alex Deucher <alexdeucher@gmail.com>
All drivers expect this to always be GL_NONE. Don't let there be any
opportunity for a bad value to leak out and infect some unsuspecting
driver. If any driver for hardware that had fixed-function
per-fragment fog (i915 and perhaps some r300-ish) was ever going to
add support, it would have done it by now.
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Corbin Simpson <MostAwesomeDude@gmail.com>
Acked-by: Alex Deucher <alexdeucher@gmail.com>
This patch fixes two bugs related to fog in the fixed-function
fragment shader generation code.
Fog was only lowered to instructions if MRTs were used. The fragment
shader assembler always lowers "fog option" code to instructions, and
many drivers (e.g., r300) expect this.
When fog lowering did happen, it was after the instruction count was
checked against implementation limits. Since fog lowering may add up
to 5 instructions, a program that was below the limits before lowering
may exceed the limits after lowering.
NOTE: This is a candidate for the stable branches.
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Corbin Simpson <MostAwesomeDude@gmail.com>
Acked-by: Alex Deucher <alexdeucher@gmail.com>
We should only copy images into the dest texture if the size is correct.
This fixes a failed assertion when finalizing a texture with mis-defined
mipmap levels such as:
level 0: 32x32
level 1: 8x8
Also, fix incorrect mipmap level used in assertion at the top of
copy_image_data_to_texture().
NOTE: This is a candidate for the 7.10 branch.
Lots of code (deleted by this patch) tried to make type == result->type,
but not all cases did. Don't pretend; just use result->type.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Things definitely remaining todo: switch statements, clip distances.
On 965, we also need real integers in the VS, and implementations of
some things like isinf/isnan.
Reviewed-by: Brian Paul <brianp@vmware.com>
For 1 and 2-channel formats the hardware only supports rendering to R
and RG. To do I and L render targets we just call them R and
everything works out. For A, we would need to rewrite the CC to do
the alpha channel's blending on color instead, and send the fragment
alpha down the red channel. For LA, there doesn't seem to be any
hope, because we can't do independent color/alpha blending while
treating the LA surface as RG.
Reviewed-by: Brian Paul <brianp@vmware.com>
The blitter only does up 32bpp at a time, so we handle it by mangling
coordinates and calling the surface 32bpp.
Fixes ARB_texture_rg/fbo-generatemipmap-formats-float with ARB_texture_float.
Reviewed-by: Brian Paul <brianp@vmware.com>
Of these, intel will be using I and L initially, and A once we rewrite
fragment shaders and the CC for rendering to it as R.
Reviewed-by: Brian Paul <brianp@vmware.com>
Keep track of when the caches are dirty, and only flush them when
the framebuffer state is set and when the context is flushed.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This fixes piglit's draw-instanced-divisor test for softpipe on both
the generic and SSE paths. This is temporary until we have the
correct per-array max_index information.
this needs revisiting, we really don't want to be flushing all 32 of these,
but currently we don't flush any of them, and it seems to have caused a regression
as reported on irc with doom3 on evergreen.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Writes within ELSE blocks were being ignored which prevented us from
discovering all possible writers for some register values.
Fixes piglit glsl-fs-raytrace-bug27060
This gets me from 2200 to 1978 dwords for a gears frame.
This is due to us having some 32-dwords blocks in the SPI, that we only
modify the first dwords off.
v2: fix dirty reg count from Bas Nieuwenhuizen
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is a first step to decreasing the CPU usage, by decreasing how much
stuff we pass to the GPU and hence to the kernel CS checker.
This adds a check to see if the values we need to write are actually dirty,
and avoids writing if they are. However certain register need to always
be written so we add a new flag to say which ones should be always written
if used. (Note this could probably be done cleaner with a larger refactoring,
since I think the CONST_BUFFER_SIZE_PS/VS and CONST_CACHE_PS/VS might
be better off as a special state).
It also moves the need_bo to be a flags on the register now.
With this, a frame of gears goes from emitting 3k dwords to emitting 2k dwords,
and I'm sure it could get a lot smaller.
v2: fix some evergreen dirty bits.
Original patch from: Bas Nieuwenhuizen, I NIHed nearly the same thing
before seeing his patch on the list, oops.
Reviewed-by: Bas Nieuwenhuizen
Signed-off-by: Dave Airlie <airlied@redhat.com>
Most of the newer portions of the code use OUT_BATCH style. I prefer
this style because it offers a clear distinction between a) hardware
messages/structures with a mandatory format, and b) data structures for
our own internal use that we can format however we want.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Since we never enable the GS on Sandybridge, there's no need to allocate
it any URB space.
Furthermore, the previous calculation was incorrect: it neglected to
multiply by nr_vs_entries, instead comparing whether twice the size of
a single VS URB entry was bigger than the entire URB space. It also
neglected to take into account that vs_size is in units of 128 byte
blocks, while urb_size is in bytes.
Despite the above problems, the calculations resulted in an acceptable
programming of the URB in most cases, at least on GT2.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The expression
x = y, 5, 3;
will generate
0:7(9): warning: left-hand operand of comma expression has no effect
The warning is only emitted for the left-hand operands, becuase the
right-most operand is the result of the expression. This could be
used in an assignment, etc.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This reverts what remains of commit
28bab24e16. It was garbage, trying to
use a MESA_FORMAT enum as a preprocessor token, and I don't know how I
thought it was even tested.
Reviewed-by: Brian Paul <brianp@vmware.com>
The GL_RED and GL_RG were tricking this code into executing, but it's
totally unprepared for a 16-bit channel and just rescaled the values
down to 0. We don't have anything with <8bit channels alongside >8bit
channels, so disabling it should be safe.
Reviewed-by: Brian Paul <brianp@vmware.com>
This will replace the current (broken by trying to use an enum in the
preprocessor) spantmp2.h support I wrote for the intel driver.
Reviewed-by: Brian Paul <brianp@vmware.com>
Since we're using GTT mappings now (no manual detiling), there's
really nothing special to accessing these buffers, other than needing
the new RowStride field of gl_renderbuffer to accomodate padding.
Reduces the driver size by 2.7kb, and improves glean depthStencil
performance 3-10x (!)
Reviewed-by: Brian Paul <brianp@vmware.com>
This will allow some drivers to reuse the core renderbuffer.c get/put
row functions in place of using the spantmp.h macros. Note that
unlike textures, we use a signed integer here to allow for handling
FBO orientation.
Reviewed-by: Brian Paul <brianp@vmware.com>
Everything appears to already be in place for this. Fixes aborts in:
ARB_texture_rg/fbo-alphatest-formats-float
ARB_texture_rg/fbo-blending-formats-float.
Reviewed-by: Brian Paul <brianp@vmware.com>
The _mesa_base_fbo_format variant doesn't handle some texture
internalformats, such as "3".
Fixes:
fbo-blending-formats.
fbo-alphatest-formats
EXT_texture_sRGB/fbo-alphatest-formats
Reviewed-by: Brian Paul <brianp@vmware.com>
The very presence of this extension breaks things.
This should bring us closer to being able to run Unigine Heaven.
The extension will be re-enabled once gl_InstanceID is implemented.
This was copy-and-paste from originally trying to get DP read/write
working reliably, and notably for other common messages (URB, sampler)
we weren't doing this.
Most of this is code movement to get the scratch space allocated in a
shared location. Other than that, the only real changes are that the
old oword block messages now operate on oword-aligned areas (with new
messages for unaligned access, which we don't do), and that the
caching control is in the SFID part of the descriptor instead of
message control.
Fixes glsl-fs-convolution-1.
It was accepting only GL_DUDV_ATI and not the specific sized format
GL_DU8DV8_ATI. Fixes assertion failure at startup in Shadowgrounds.
Reviewed-by: Brian Paul <brianp@vmware.com>
The 095-recursive-define test case was triggering infinite recursion
with the following test case:
#define A(a, b) B(a, b)
#define C A(0, C)
C
Here's what was happening:
1. "C" was pushed onto the active list to expand the C node
2. While expanding the "0" argument, the active list would be
emptied by the code at the end of _glcpp_parser_expand_token_list
3. When expanding the "C" argument, the active list was now empty,
so lather, rinse, repeat.
We fix this by adjusting the final popping at the end of
_glcpp_parser_expand_token_list to never pop more nodes then this
particular invocation had pushed itself. This is as simple as saving
the original state of the active list, and then interrupting the
popping when we reach this same state.
With this fix, all of the glcpp-test tests now pass.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=32835
Signed-off-by: Carl Worth <cworth@cworth.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-and-tested-by: Kenneth Graunke <kenneth@whitecape.org>
Without it gcc complains:
nv50_screen.c: In function ‘nv50_screen_is_format_supported’:
nv50_screen.c:48: warning: implicit declaration of function ‘util_format_is_supported’
and handles it wrongly - util_format_is_supported returns boolean, which is typedef'ed
to uchar, but function without prototype is assumed to return int.
For me nv50_screen_is_format_supported was returning true for float formats without
--enable-texture-float...
Sounds very unlikely, but I don't have a better explanation at the
moment.
The GPU throws page faults at the first page after the code buffer
quite frequently on startup, and traces don't show us overflowing.
This pass coverts CMP T0, T1 T2 T0 -> MOV T0, T2 when the CMP
instruction is the first instruction to write to register T0.
This pass is useful for hardware that requires a lot of lowering passes
that generate many CMP instructions.
So --enable-texture-float it is.
Hardware drivers (including the Gallium ones) should
use #ifdef TEXTURE_FLOAT_ENABLED to hide any code that may
expose floating-point renderbuffers via any interface,
public or private.
v2: Print a warning when using --enable-texture-float.
Squashed commit of the following:
Author: Marek Olšák <maraeo@gmail.com>
mesa: handle floating-point formats in _mesa_base_fbo_format
mesa: add ARB/ATI_texture_float, remove MESAX_texture_float
commit 123bb110852739dffadcc81ad80b005b1c4f586d
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Wed Aug 25 01:35:42 2010 +0200
mesa: compute floatMode for FBOs and return it on RGBA_FLOAT_MODE
It's clear enough that the current segmentation fault isn't what we
want. And it's also very easy to know what we do want here, (just
check with any functional C preprocessor such as "gcc -E").
Add the desired output as an expected file so that the test suite
gives useful output, (showing the omitted output and the segfault),
rather than just reporting "No such file" for the expected file.
These were all written as generic list functions, (accepting and returning
a list to act upon). But they were only ever used with parser->active as
the list. By simply accepting the parser itself, these functions can update
parser->active and now return nothing at all. This makes the code a bit
more compact.
And hopefully the code is no less readable since the functions are also
now renamed to have "_parser_active" in the name for better correlation
with nearby tests of the parser->active field.
The common case for this test suite is to quickly test that everything
returns the correct results. In this case, the second run of the test
suite under valgrind was just annoying, (and the user would often
interrupt it).
Now, do what is wanted in the common case by default (just run the
test suite), and require a run with "glcpp-test --valgrind" in order
to test with valgrind.
The expected file here captures the current behavior of glcpp (which
is to generate an obscure "syntax error, unexpected $end" diagnostic
for this case).
It would certainly be better for glcpp to generate a nicer diagnostic,
(such as "missing closing parenthesis in function-like macro
definition" or so), but the current behavior is at least correct, and
expected. So we can make the test suite more useful by marking the
current behavior as expected.
The expected file here captures the current behavior of glcpp (which
is to generate a division-by-zero error) for this case.
It's easy to argue that it should be short-circuiting the evaluation
and not generating the diagnostic (which happens to be what gcc does).
But it doesn't seem like we should force this behavior on our
pre-processor, (and, as always, the GLSL specification of the
pre-processor is too vague on this point).
This test is behaving just fine already---it's generating an informative
diagnostic, ("error: division by 0 in preprocessor directive"), so adding
this in the expected file makes things pass.
We could actually try to do an early return both for gallium textures and
malloc memory textures, but I'm not sure exactly which situations
stImage->pt is NULL, and whether texImage->Data == NULL would be acceptible
or not.
Reviewed-by: Brian Paul <brianp@vmware.com>
This is the same as ARB_draw_buffers (which derived from it), except
for s/ARB/ATI/. The glapi bits were already in place, and what was
missing was just the ARB_fp part. The new Humble Bundle game "trine"
tries to use this extension without checking that it's exposed, which
this works around.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=36182
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This is like what we do for add/mul, but we have to invert the
predicate to choose the other source instead.
This removes 5 extra moves of constants in nexuiz shaders. No
statistically significant performance difference on my Sandybridge
laptop (n=5).
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We were letting any old operand through, which generally resulted in
assertion failures later.
Fixes array-logical-xor.vert.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We just do the AST-to-HIR processing, and only push the instructions
if needed in the constant false case.
Fixes glslparsertest/glsl2/logic-02.frag
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We just do the AST-to-HIR processing, and only push the instructions
if needed in the constant true case.
Fixes glslparsertest/glsl2/logic-01.frag
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
By always using a boolean, we should generally avoid further
complaints. The failure case I see is logic_not, where the user might
understandably make the mistake of using `!' on a boolean vector (like
a piglit case did recently!), and then get a further complaint that
the new boolean type doesn't match the bvec it gets assigned to.
Fixes invalid-logic-not-06.vert (assertion failure when the bad type
ends up in an expression and ir_constant_expression gets angry).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=33314
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
tgsi_helper_copy is used on several occasions to copy a temporary result
into the real destination register to emulate writemasks for OP3 and
reduction operations. According to R600 ISA that's unnecessary.
This patch fixes this use for MAD, CMP and DP4.
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.