Choose MESA_FORMAT_ARGB2101010 when storing
GL_RGBA + GL_UNSIGNED_INT_2_10_10_10_REV or
GL_RGB + GL_UNSIGNED_INT_2_10_10_10_REV.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Mesa core's copyteximage calls the driver with format/type==GL_NONE
to "Allocate texture memory". In this case, we shouldn't call
_mesa_store_teximage.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
In ES or GL+GL_ARB_ES2_compatibility, the usage of
format = IMPLEMENTATION_COLOR_READ_FORMAT +
type = IMPLEMENTATION_COLOR_READ_TYPE
can function, even if the src/dst int vs. non-int types
differ.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
If the source read buffer is integer based, and the the read
pixels type is RGBA/UNSIGNED_BYTE, then use the integer pixel
conversion path.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
This function checks for ES3 compatible
format/type/internalFormat/dimension combinations.
[jordan.l.justen@intel.com: additional tweaks for gles3-gtf]
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
gles3conform expects than when converting from a signed
int to an unsigned byte, the output will be clamped at a
max of 0x7f. This impacts conversion from
int16_t => uint8_t and int32_t => uint8_t.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
This should be squashed into the earlier patch when mailing it out for
review or merging it to master.
The error path was missing a "return" like all the other error paths.
Also, we may as well call it glDrawBuffers in the error message since
the ARB suffix doesn't exist in ES 3.
Fixes error EGL_BAD_ATTRIBUTE in the tests below on Intel Sandybridge:
* piglit egl-create-context-verify-gl-flavor, testcase OpenGL ES 3.0
* gles3conform, revision 19700, when runnning GL3Tests with -fbo
This plumbing is added in order to comply with the EGL_KHR_create_context
spec. According to the EGL_KHR_create_context spec, it is illegal to call
eglCreateContext(EGL_CONTEXT_MAJOR_VERSION_KHR=3) with a config whose
EGL_RENDERABLE_TYPE does not contain the EGL_OPENGL_ES3_BIT_KHR. The
pertinent
portion of the spec is quoted below; the key word is "respectively".
* If <config> is not a valid EGLConfig, or does not support the
requested client API, then an EGL_BAD_CONFIG error is generated
(this includes requesting creation of an OpenGL ES 1.x, 2.0, or
3.0 context when the EGL_RENDERABLE_TYPE attribute of <config>
does not contain EGL_OPENGL_ES_BIT, EGL_OPENGL_ES2_BIT, or
EGL_OPENGL_ES3_BIT_KHR respectively).
To create this patch, I searched for all the ES2 bit plumbing by calling
`git grep "ES2_BIT\|DRI_API_GLES2" src/egl`, and then at each location
added a case for ES3.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
If the hardware/driver combo supports GLES3, then set the GLES3 bit in
intel_screen's bitmask of supported DRI API's. Neither the EGL nor GLX
layer uses the bit yet.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This enum corresponds to EGL_OPENGL_ES3_BIT_KHR.
Neither the GLX nor EGL layer use the enum yet.
I don't like the GLES bits. I'd prefer that all GLES APIs be exposed
through a single API bit, as is done in GLX_EXT_create_context_es_profile.
But, we need this GLES3 enum in order to do the plumbing necessary to
correctly support EGL_OPENGL_ES3_BIT_KHR as required by the
EGL_KHR_create_context spec.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Each driver (i830, i915, i965) used independent but similar code to
validate the requested context version. With the rececnt arrival of GLES3,
that logic has needed an update. Rather than apply identical updates to
each drivers validation code, let's just move the validation into the
shared routine intelInitContext.
This refactor required some incidental changes to functions
i830CreateContext and intelInitContext. For each function, this patch:
- Adds context version parameters to the signature.
- Adds a DRI_CTX_ERROR out param to the signature.
- Sets the DRI_CTX_ERROR at each early return.
Tested against gen6 with piglit egl-create-context-verify-gl-flavor.
Verified that this patch does not change the set of exposed EGL context
flavors.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Before this patch, intelInitScreen2 set DRIScreen::api_mask with the hacky
heuristic below:
if (gen >= 3)
api_mask = GL | GLES1 | GLES2;
else
api_mask = 0;
This hack was likely broken on gen2 (i830), but I don't care enough to
properly investigate. It appears that every EGLConfig on i830 has
EGL_RENDERABLE_TYPE=0, and thus eglCreateContext will never succeed.
Anyway, moving on to living drivers...
With the arrival of EGL_OPENGL_ES3_BIT_KHR, this heuristic is now
insufficient. We must enable the GLES3 bit if and only if the driver is
capable of creating a GLES3 context. This requires us to determine the
maximum supported context version supported by the hardware/driver for
each api *during initialization of intel_screen*.
Therefore, this patch adds four new fields to intel_screen which indicate
the maximum supported context version for each api:
max_gl_core_version
max_gl_compat_version
max_gl_es1_version
max_gl_es2_version
The api mask is now correctly set as:
api_mask = GL;
if (max_gl_es1_version > 0)
api_mask |= GLES1;
if (max_gl_es2_version > 0)
api_mask |= GLES2;
Tested against gen6 with piglit egl-create-context-verify-gl-flavor.
Verified that this patch does not change the set of exposed EGL context
flavors.
v2:
- Replace the if-tree on gen with a switch, for Ian.
- Unconditionally enable the DRI_API_OPENGL bit, for Ian.
v3:
- Drop max gl version to 1.4 on gen3 if !has_occlusion_query,
because occlusion queries entered core in 1.5. For Ian.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick.intel.com>
Since patch "i965: Validate requested GLES context version in
brwCreateContext", we have been able to create ES 3.0 contexts due to the
max version check. So...bump the max version.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
The GLSL ES 3.0 spec (Section 12.17) says:
"GLSL ES 1.00 removed token pasting and other functionality."
NOTE: This is a candidate for the stable branches.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Carl Worth <cworth@cworth.org>
Simply emitting a nicely-formatted error message if any undefined macro is
encountered in a parser context expecting an expression.
With this commit, the following piglit test now passes:
spec/glsl-es-3.00/compiler/undefined-macro.vert
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This can be triggered either by creation of a GLES context (with
api == API_OPENGLES2) or else by a #version directive with version
value 100 or with a string of "es" following the version value.
There's no behavioral change with this commit—just preparation for ES-specific
behavior in the preprocessor in the future.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
I'm not sure if this is the correct fix. The
_mesa_es_error_check_format_and_type function (used above in the ES 1
and 2 cases) was originally added for glTexImage checking and allows
GL_DEPTH_STENCIL/GL_UNSIGNED_INT_24_8 combinations. Using it in ES 3
causes other tests to regress.
Fixes es3conform's packed_depth_stencil_error test.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
INVALID_ENUM is for when the type is simply not known.
Fixes part of es3conform's packed_depth_stencil_error test.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
ES 3 specifies some formats as texture-only (i.e., not available for
renderbuffers).
See the "Required Texture Formats" section (pg 126) of the ES 3 spec.
Fixes es3conform's color_buffer_unsupported_format test.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
According to both the GL 3.0 and ES 3.0 specifications (table 2.7 for GL
and table 2.8 for ES), the default value of BUFFER_ACCESS_FLAGS is
supposed to be zero.
Note that there are two related quantities: the obsolete BUFFER_ACCESS
enum and the new BUFFER_ACCESS_FLAGS bitfield.
BUFFER_ACCESS can only be GL_READ_ONLY, GL_WRITE_ONLY, or GL_READ_WRITE;
BUFFER_ACCESS_FLAGS can easily represent all three via GL_MAP_WRITE_BIT,
GL_MAP_READ_BIT, and their logical or. It also supports more flags.
Thus, Mesa only stores the bitfield, and simply computes the old enum
when queried, via simplified_access_mode(bufObj->AccessFlags).
The tricky part is that, while BUFFER_ACCESS_FLAGS defaults to 0,
BUFFER_ACCESS defaults to GL_READ_WRITE for desktop [GL 3.0, table 2.8]
and GL_WRITE_ONLY_OES for ES [the GL_EXT_map_buffer_range extension].
Mesa tried to implement this by setting the default AccessFlags to
GL_MAP_READ_BIT | GL_MAP_WRITE_BIT on desktop, and GL_MAP_WRITE_BIT on
ES. But in all specifications, it needs to be 0.
This patch moves that logic into simplified_access_mode(): when
AccessFlags == 0, it now returns GL_READ_WRITE for desktop and
GL_WRITE_ONLY for ES 1/2. (BUFFER_ACCESS doesn't exist on ES 3.0,
so it's irrelevant there.)
With that in place, it changes the AccessFlags default to 0.
Fixes three es3conform tsets:
- copy_buffer_defaults
- map_buffer_range_modify_indices
- pixel_buffer_object_default_parameters
Perhaps most importantly, this patch adds comments quoting the relevant
spec paragraphs above each error condition.
It also makes three changes:
- For FBOs, GL_COLOR_ATTACHMENTm where m >= MaxDrawBuffers is supposed
to generate INVALID_OPERATION (not INVALID_ENUM).
- Constants that refer to multiple buffers (such as FRONT, BACK, LEFT,
RIGHT, and FRONT_AND_BACK) are supposed to generate INVALID_OPERATION,
not INVALID_ENUM.
- In ES 3.0, for FBOs, buffers[i] must be NONE or GL_COLOR_ATTACHMENTi
or else INVALID_OPERATION occurs. (This is a new restriction.)
Fixes es3conform's draw-buffers-api test.
This requires some derived state. The cut vertex used is either the
value specified by glPrimitiveRestartIndex or it's hard-coded to ~0.
The derived state gl_array_attrib::_RestartIndex captures this value.
In addition, the derived state gl_array_attrib::_PrimitiveRestart is set
whenever either gl_array_attrib::PrimitiveRestart or
gl_array_attrib::PrimitiveRestartFixedIndex is set.
v2: Use _mesa_is_gles3.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
The function was named badly and wasn't in the dispatch table,
making it hard to find.
Fixes transform_feedback2_states and gets a few other transform
feedback tests closer to working in es3conform.
Reviewed-by Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
For glGetIntegerv, add support for the following in an OpenGL ES 3.0
context:
GL_MAJOR_VERSION
GL_MINOR_VERSION
GL_NUM_EXTENSIONS
See Table 6.29 of the OpenGL ES 3.0 spec.
Fixes error GL_INVALID_ENUM in piglit egl-create-context-verify-gl-flavor,
testcase for OpenGL ES 3.0.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
The ES 3 spec says that the minumum allowable value is 2^24-1, but the
GL 4.3 and ARB_ES3_compatibility specs require 2^32-1, so return 2^32-1.
Fixes es3conform's element_index_uint_constants test.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
From GL/GLES/GL_CORE and GLES2 -> GL/GL_CORE/GLES2.
Yes, we really were exposing ES2_compatibility queries on ES 1.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Fixes the transform_feedback2_init_defaults test from es3conform.
The ES 3 spec lists these as TRANSFORM_FEEDBACK_PAUSED and
TRANSFORM_FEEDBACK_ACTIVE.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This code now lives in an external tree.
For the next Mesa release fetch the code from the master branch
of this LLVM repo:
http://cgit.freedesktop.org/~tstellar/llvm/
For all subsequent Mesa releases, fetch the code from the official LLVM
project:
www.llvm.org
- skip the vertex buffer reallocation in flush and just use
the unsynchronized flag to get new memory.
- remove the cruft needed to get around the issues with the vertex buffer
reallocation in flush
- use pb_buffer instead of pipe_resource
This patch fixes intel_miptree_unmap_etc() (which decompresses ETC
textures to linear) to pay attention to map->x and map->y when writing
to the destination image. Previously these values were ignored,
causing the xoffset and yoffset parameters passed to
glCompressedTexSubImage2D() to be ignored.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
- We should use a 3D transfer of size Width x 1 x NumLayers.
- We should use layer_stride instead of stride.
(even though they are likely to be equal with 1D array textures)
Reviewed-by: Brian Paul <brianp@vmware.com>
There was the fast path based on _mesa_format_matches_format_and_type
for GetTexImage, but it never worked, because the Mesa format we were testing
there was always compressed. Further testing showed that the fast path
had been completely broken.
In this commit, the somewhat limited helper util_create_rgba_texture is
no longer used and instead, custom code for the texture creation is added,
which tries to find the best matching RGBA8 format, so that we can hit
the fast path *always* if the read format is a variant of RGBA8 and supported
by the driver.
Reviewed-by: Brian Paul <brianp@vmware.com>
Usage with pipe_context:
pipe->flush(pipe, NULL, PIPE_FLUSH_END_OF_FRAME);
Usage with st_context_iface:
st->flush(st, ST_FLUSH_END_OF_FRAME, NULL);
The flag is only a hint for drivers. Radeon will use it for buffer eviction
heuristics in the kernel (e.g. for queries like how many frames have passed
since a buffer was used).
The flag is currently only generated by st/dri on SwapBuffers.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Stéphane Marchesin <marcheu@chromium.org>
Automake 1.13 creates a bunch of new build artefacts:
- bin/test-driver, a script for running tests.
- *.trs files for every "make check" test result.
- *.log files containing the output of every test run by "make check".
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Every generation except Gen7 creates SURFACE_STATE entries via a
uint32_t array. Only Gen7 uses the older bitfield structure, which we
moved away from because it was less efficient. Convert it for
consistency.
This reduces the compiled size of gen7_wm_surface_state.o by 2.86% in a
release build.
v2: Fix accidental use of BRW_SURFACE_WIDTH/HEIGHT in brw_state_dump.c;
switch back to gen7_set_surface_mcs_info setting surf[6] directly
(both per Eric's review comments).
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Since wl_display_dispatch_queue() returns the number of processed events
or -1 on error, only cancel the roundtrip if an -1 is returned.
This also fixes a potential memory corruption bug happening when the
roundtrip does an early return and the callback later writes to the then
out of scope stack allocated `done' parameter.
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
In Jelly Bean, the interface to ANativeWindow changed. The change included
adding a new parameter the queueBuffer and dequeueBuffer methods,
removing the lockBuffer method, and requiring libsync.
v2:
- s/fence_fd == -1/fence_fd != -1/
- Fix leak. Close the fence_fd.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Define the following Make variables:
MESA_ANDROID_MAJOR_VERSION
MESA_ANDROID_MINOR_VERSION
MESA_ANDROID_VERSION
These variable will allow us to make version-dependent decisions on
library dependencies. In particular, building Mesa against JellyBean will
require libsync.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Commit f22d49de added the SamplerParamter* functions but only used
ASSERT_OUTSIDE_BEGIN_END inside the -f and -fv versions.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch adds functionality to Mesa to upload compressed
2-dimensional array textures, using the glCompressedTexImage3D and
glCompressedTexSubImage3D calls.
Fixes piglit tests "EXT_texture_array/compressed *" and "!OpenGL ES
3.0/ext_texture_array-compressed_gles3 *". Also partially fixes GLES3
conformance test "CoverageES30.test".
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
The old error reporting was completely bogus, passing _mesa_error() a
format string that didn't even match the remaining arguments. Also,
in many cases the number of dimensions in the TexImage call was not
preserved in the error message (e.g. an error in glTexImage2D was
reported simply as an error in glTexImage).
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
If the call fails, we should return NULL from XMesaCreateVisual().
This was found when Waffle tried to create a visual with depth/stencil
bits = -1. That's an illegal value for glXChooseFBConfig() and we should
return NULL in that situation.
Note: This is a candidate for the stable branches.
Dungeon Defenders hits TexImage()'s try_pbo_upload() path where
image->Width == 2, which doesn't meet intelEmitCopyBlit's requirement
that the pitch needs to be a multiple of 4.
Since intelEmitCopyBlit can already fail for a myriad of other reasons,
and it's not clear that other callers are immune to this failure mode,
simply make it return false rather than assert.
Fixes Dungeon Defenders on i965/Ivybridge. Now playable (aside from
having to work around the EXT_bindable_uniform issue).
NOTE: This is probably a candidate for the 9.0 branch.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We don't need them now that our set of parameter pointers points at the
GL core storage for them. This should save memory/bandwidth/overhead in
uniform updates.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
NumParameters used to be an upper bound on the number of vec4s to be
uploaded, which was basically safe (unless your buffer was bound near
the top of address space *and* you array indexed outside the buffer, in
which case I think you might GPU hang). As I migrate the driver away
from ParameterValues[], this is no longer true.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Like in the FS, there's no reason to use an external copy if the
ParameterValues[] relayout of it isn't the layout we need.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If adding scale parameters during program compile caused a realloc of
ParameterValues, then the driver uniform storage set up by
_mesa_associate_uniform_storage() would point to potentially freed
memory.
Note that this uses TexturesUsed, which may change at runtime for GLSL
when sampler uniforms change. This is a flaw in our handling of texrect
in general, and not one I'm fixing currently.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We don't have native hardware support for these, so they get promoted to
RGBA, in which case we don't have hardware dealing with the channel
swizzling for us.
Fixes piglit EXT_texture_snorm/texwrap formats bordercolor (-swizzled).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I had left this out for a long time because it regressed some
depthstencil-render-miplevels cases when it was enabled. Now that the
bugs causing those are fixed, there's nothing stopping us.
Improves glbenchmark 2.1 offscreen performance by 7.3% +/- 2.8% (n=10).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This worked out before because the parent was always 4 bytes so it
didn't affect the layout, but now we want to support Z16 too.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixing these rendering bugs has been implicated in performance
regressions (which may be unfixable), but at least knowing that it's
happening should help diagnose those regressions.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
The ETC1 changes failed at this, so let's make sure it will be caught in
testing next time.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This was caught by the assertion in the next commit. It fixes the
remaining piglit depthstencil-render-miplevels cases, probably by
avoiding broken stencil copies in the validation path.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Relayout is expensive, so it's something developers (both us and others)
should know about when it happens.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Rename existing _Used flag to EverBound.
The GL 4.3 and ES 3.0 specs say
These names are marked as used, for the purposes of GenVertexArrays
only, but they do not acquire array state until they are first bound.
This also affects Apple VAOs, which is fine since the
APPLE_vertex_array_object spec says
A vertex array object is created by binding an unused name. This
binding is accomplished by calling BindVertexArrayAPPLE with id set
to the name of the new vertex array object.
Fixes arb_vertex_array_object_isvertexarray.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The GL 4.3 an ES 3.0 specs say
A transform feedback object is created by binding a name returned by
GenTransformFeedbacks with the command
void BindTransformFeedback( enum target, uint id );
Fixes arb_transform_feedback2-istransformfeedback and part of
es3conform's CoverageES30.test.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
i.e. we have to allocate a temporary tiled resource if dst isn't tiled.
This fixes hardlocks on r6xx-r7xx, though using a linear resource is forbidden
on later asics as well.
NOTE: This is a candidate for the stable branches.
No piglit regressions and now passes glsl-uniform-out-of-bounds-2.
validate_uniform_parameters now checks that the array index is
valid. This means if an index is out of bounds, glGetUniform* now
fails with GL_INVALID_OPERATION, as it should.
_mesa_uniform and _mesa_uniform_matrix also call
validate_uniform_parameters so the bounds checks there became
redundant and were removed.
The test in glGetUniformLocation is modified to check array bounds
so it now returns GL_INVALID_INDEX (-1) if you ask for the location
of a non-existent array element, as it should.
Signed-off-by: Frank Henigman <fjhenigman@google.com>
Reviewed-by: Stéphane Marchesin <marcheu@chromium.org>
It's a build time option you need to set R600_TRACE_CS to 1 and it
will print to stderr all cs along as cs trace point value which
gave last offset into a cs process by the GPU.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
htile is used for HiZ and HiS support and fast Z/S clears.
This commit just adds the htile setup and Fast Z clear.
We don't take full advantage of HiS with that patch.
v2 really use fast clear, still random issue with some tiles
need to try more flush combination, fix depth/stencil
texture decompression
v3 fix random issue on r6xx/r7xx
v4 rebase on top of lastest mesa, disable CB export when clearing
htile surface to avoid wasting bandwidth
v5 resummarize htile surface when uploading z value. Fix z/stencil
decompression, the custom blitter with custom dsa is no longer
needed.
v6 Reorganize render control/override update mecanism, fixing more
issues in the process.
v7 Add nop after depth surface base update to work around some htile
flushing issue. For htile to 8x8 on r6xx/r7xx as other combination
have issue. Do not enable hyperz when flushing/uncompressing
depth buffer.
v8 Fix htile surface, preload and prefetch setup. Only set preload
and prefetch on htile surface clear like fglrx. Record depth
clear value per level. Support several level for the htile
surface. First depth clear can't be a fast clear.
v9 Fix comments, properly account new register in emit function,
disable fast zclear if clearing different layer of texture
array to different value
v10 Disable hyperz for texture array making test simpler. Force
db_misc_state update when no depth buffer is bound. Remove
unused variable, rename depth_clearstencil to depth_clear.
Don't allocate htile surface for flushed depth. Something
broken the cliprect change, this need to be investigated.
v11 Rebase on top of newer mesa
v12 Rebase on top of newer mesa
v13 Rebase on top of newer mesa, htile surface need to be initialized
to zero, somehow special casing first clear to not use fast clear
and thus initialize the htile surface with proper value does not
work in all case.
v14 Use resource not texture for htile buffer make the htile buffer
size computation easier and simpler. Disable preload on evergreen
as its still troublesome in some case
v15 Cleanup some comment and remove some left over
v16 Define name for bit 20 of CP_COHER_CNTL
Signed-off-by: Pierre-Eric Pelloux-Prayer <pelloux@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
This bring r600g allmost inline with closed source driver when
it comes to flushing and synchronization pattern.
v2-v4: history lost somewhere in outer space
v5: Fix compute size of flushing, use define for flags, update
worst case cs size requirement for flush, treat rs780 and
newer as r7xx when it comes to streamout.
v6: Fix num dw computation for framebuffer state, remove dead
code, use define instead of hardcoded value.
v7: Remove dead code
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Previously, Mesa code assumed that glReadBuffer(GL_NONE) was only
valid for user-created framebuffer objects. However, the spec is
quite clear that is should also be valid for the default framebuffer.
From section 18.2.1 ("Obtaining Pixels from the Framebuffer") of the
GL 4.3 spec:
"When READ_FRAMEBUFFER_BINDING is zero, i.e. the default
framebuffer, src must be one of the values listed in table 17.4,
including NONE."
Similar language exists in the GLES 3.0 spec, and in desktop GL all
the way back to ARB_framebuffer_object.
Partially fixes GLES3 conformance test "CoverageES30.test".
NOTE: This is a candidate for stable branches.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
It was slightly wrong: we were computing the longest duration of
the query among all the rasterizer tasks.
Regardless, for tile-based implementations such as llvmpipe, time differences
will never be very useful, because rendering before/during/after the query
is all interleaved. And this is expected, see ARB_timer_query spec, issue 10.
In particular, piglit ext_timer_query-time-elapsed still fails, because
it makes assumptions that don't hold true in in tiled architectures. Not
sure how to fix that though.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
ARB/EXT_timer_query's definition of GL_TIME_ELAPSED match precisely the
subtraction of two GL_TIMESTAMP queries.
And for a lot of drivers, that's precisely how they have to implement
internally -- by emitting two hardware timestamp queries.
So, to simplify driver implementation, simply allow doing so in the state
tracker.
Eventually if no driver implements PIPE_QUERY_TIME_ELAPSED then we could
retire it.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The burst was incorrectly used, because ELEM_SIZE was always 0.
I don't know if the burst works, because I don't know of any test
which uses it.
NOTE: This is a candidate for the stable branches.
Reviewed-by: Dave Airlie <airlied@redhat.com>
The old call to tgsi_exec_machine_bind_shader() in
softpipe_delete_fs_state() was never called since the shader's original
tokens are never passed to the tgsi interpreter (only shader _variant_
tokens are). Now, unbind the variant's tokens from the tgsi interpreter
when we free the variant.
This doesn't fix any known bugs but it's the right thing to do.
Note: This is a candidate for the stable branches.
In exec_prepare() we were comparing pointers to see if the fragment
shader variant had changed before calling tgsi_exec_machine_bind_shader().
This didn't work reliably when there was a lot of shader token malloc/
freeing going on because the memory might get reused.
Instead, bind the shader variant during regular state validation.
Fixes http://bugs.freedesktop.org/show_bug.cgi?id=40404
(fixes a couple of piglit's glsl-max-varyings test)
Note: This is a candidate for the stable branches.
This force surface allocated from ddx to be consider as height
aligned on 8 and fix 1D->2D tiling transition that result from
this.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
brw_emit_vertices contains special case logic to handle the case where
a vertex shader doesn't read any inputs. This special case logic was
incorrectly activating in the case were the only vertex input is
gl_VertexID. As a result, if a shader used gl_VertexID but used no
other inputs, then all vertices got a gl_VertexID of zero.
Fixes oglconform test "ubo-usage advanced.transform_feedback".
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The rather unweildy logic for determining this condition was repeated
in a large number of places. This patch consolidates it to a single
inline function.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This patch implements the following behaviours, which are mandated by
the GL 4.3 and GLES3 specs.
1. Regarding the GL_TRANSFORM_FEEDBACK_BUFFER_SIZE query: "If the
... size was not specified when the buffer object was bound
(e.g. if it was bound with BindBufferBase), ... zero is returned."
(GL 4.3 section 6.7.1 "Indexed Buffer Object Limits and Binding
Queries").
2. "BindBufferBase binds the entire buffer, even when the size of the
buffer is changed after the binding is established. It is
equivalent to calling BindBufferRange with offset zero, while size
is determined by the size of the bound buffer at the time the
binding is used." (GL 4.3 section 6.1.1 "Binding Buffer Objects to
Indexed Targets"). I interpret "at the time the binding is used"
to mean "at the time of the call to glBeginTransformFeedback".
3. "Regardless of the size specified with BindBufferRange, or
indirectly with BindBufferBase, the GL will never read or write
beyond the end of a bound buffer. In some cases this constraint may
result in visibly different behavior when a buffer overflow would
otherwise result, such as described for transform feedback
operations in section 13.2.2." (GL 4.3 section 6.1.1 "Binding
Buffer Objects to Indexed Targets").
Item 1 has been part of the spec all the way back to the inception of
the EXT_transform_feedback extension. Items 2 and 3 were added in GL
4.2 and GLES 3.
Prior to GL 4.2, in place of items 2 and 3, the spec simply said
"BindBufferBase is equivalent to calling BindBufferRange with offset
zero and size equal to the size of buffer." For transform feedback,
Mesa behaved as though this meant "...equal to the size of buffer at
the time of the call to BindBufferBase". However, this was
problematic because it left it ambiguous what to do if the buffer is
shrunk between the call to BindBuffer{Base,Range} and the call to
BeginTransformFeedback. Prior to this patch, Mesa's behaviour was to
try to write beyond the end of the buffer, likely resulting in memory
corruption. In light of this, I'm interpreting the spec change as a
clarification, not an intended behavioural change, so I'm making the
change apply regardless of API version.
Fixes GLES3 conformance test transform_feedback2_pause_resume.test.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
In desktop GL, if a draw call would cause transform feedback buffers
to overflow, the draw call should succeed, and the extra primitives
should simply not be recorded in the transform feedback buffers.
In GLES3, however, if a draw call would cause transform feedback
buffers to overflow, the draw call is supposed to produce an
INVALID_OPERATION error and no drawing should occur.
This patch implements the GLES3-required behaviour.
Fixes GLES3 conformance test "transform_feedback_overflow.test".
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
In GLES3, only glDrawArrays() and glDrawArraysInstanced() calls are
allowed when transform feedback is active.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Previously, the i965 driver contained code to compute the maximum
number of vertices that could be written without overflowing any
transform feedback buffers. This code wasn't driver-specific, and for
GLES3 support we're going to need to use it in core mesa. So this
patch moves the code into a core mesa function,
_mesa_compute_max_transform_feedback_vertices().
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
v2: Eliminate C++-style variable declarations, since these won't work
with MSVC.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
No functional change--this simply paves the way to allow futures
patches to call vbo_count_tessellated_primitives() during error
checking, before the _mesa_prim struct has been constructed.
This will be needed for GLES3, which requires draw calls to fail if
there is not enough space available in transform feedback buffers to
accommodate the primitives to be drawn.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Since the idea is to just expand or shrink the bit width but not otherwise do
conversion we also need to adjust the sign bit according to src, otherwise
the conversion code will incorrectly clamp the values. (Since this only works
for casting to ordinary floats the norm and fixed bits should always be fine.)
This fixes the remaining piglit attribs GL3 failures.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
a460aea3f1 wasn't entirely correct,
since all coords are already ints hence need to skip the iround.
Passes piglit texelFetch with sampler1DArray/sampler2DArray.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Make sure drivers initialize the version before:
* _mesa_initialize_exec_table is called
* _mesa_initialize_exec_table_vbo is called
* A context is made current
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The driver should call _mesa_initialize_vbo_vtxfmt after
computing the context version.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Drivers must compute the context version, and then call
_mesa_initialize_exec_table themselves.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In a future patch the exec functions will no longer set up
by _mesa_initialize_context and _vbo_CreateContext.
Therefore we must call _mesa_initialize_exec_table and
_mesa_initialize_exec_table_vbo.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This change forces the context version to be computed before
initilizing the exec dispatch tables.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In glapi/gl_genexec.py:
* Remove _mesa_alloc_dispatch_table call
In glapi/gl_genexec.py and api_exec.h:
* Rename _mesa_create_exec_table to _mesa_initialize_exec_table
In context.c:
* Call _mesa_alloc_dispatch_table instead of _mesa_create_exec_table
* Call _mesa_initialize_exec_table (this is temporary)
Once all drivers have been modified to call
_mesa_initialize_exec_table, then the call to
_mesa_initialize_context can be removed from context.c.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is used by st_BlitFramebuffer() / r600_blit(), and ARB_fbo allows
overlapped blits, even though the result is undefined. No piglit regressions
on r600g / CYPRESS.
Signed-off-by: Henri Verbeet <hverbeet@gmail.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
struct brw_instruction and the related instruction emitting code won't
be useful on Gen8+, as the instruction encoding changed. However, the
struct brw_reg code is still extremely valuable.
While we're at it, fix up some style points:
- s/GLuint/unsigned/g
- s/GLint/int/g
- s/GLshort/int16_t/g
- s/GLushort/uint16_t/g
- s/INLINE/inline/g
- Replace tabs with spaces
- Put return types on a separate line from the function name/parameters
- Remove trailing whitespace
- Remove extraneous whitespace around function parameters
Reviewed-by: Eric Anholt <eric@anholt.net>
This adds the extensions + the tex buffer support for checking
the formats.
There is a piglit test enhancement sent to that list.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Not sure what was going on here, but running piglit with debug builds
might be a good plan :-)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
No statistically significant performance difference on glbenchmark 2.7
(n=60). It reduces cycles spent in the vertex shader by 3.3% +/- 0.8%
(n=5), but that's only about .3% of all cycles spent according to the
fixed shader_time.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The way our visitor works, scalar expression/swizzle results that get
stored in channels other than .x will have an intermediate MOV from
their result in the .x channel to the real .y (or whatever) channel, and
similarly for vec2/vec3 results.
By knowing how to adjust DP4-type instructions for optimizing out a
swizzled MOV, we can reduce instructions in common matrix multiplication
cases.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The compute-to-mrf code is really twitchy, and it's hard to construct
GLSL testcases for it. This unit test is also really hard to work with
(for example, if your instruction is removed by dead code elimination,
you end up inspecting something irrelevant), but I did use it for
debugging some of the commits to follow.
I called it test_vec4_register_coalesce because the compute-to-mrf code
is about to morph into that.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The final halt of the fragment shader turns off the remaining channels,
then jumps such that everything is turned back on. So, we can have our
last ENDIF of the shader point at that directly.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
From the Ivybridge PRM, Volume 4, Part 3, section 6.24 (page 172):
"The endif instruction is also used to hop out of nested conditionals by
jumping to the end of the next outer conditional block when all
channels are disabled."
Also:
"Pseudocode:
Evaluate(WrEn);
if ( WrEn == 0 ) { // all channels false
Jump(IP + JIP);
}"
First, ENDIF re-enables any channels that were disabled because they
didn't match the conditional. If any channels are active, it proceeds
to the next instruction (IP + 16). However, if they're all disabled,
there's no point in walking through all of the instructions that have no
effect---it can jump to the next instruction that might re-enable some
channels (an ELSE, ENDIF, or WHILE).
Previously, we always set JIP on ENDIF instructions to 2 (which is
measured in 8-byte units). This made it do Jump(IP + 16), which just
meant it would go to the next instruction even if all channels were off.
It turns out that walking over instructions while all the channels are
disabled like this is worse than just instruction dispatch overhead: if
there are texturing messages, it still costs a couple hundred cycles to
not-actually-read from the texture results.
This patch finds the next instruction that could re-enable channels and
sets JIP accordingly.
Reviewed-by: Eric Anholt <eric@anholt.net>
V3: Put enable in an existing block rather than making a new
one for no good reason.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
V2: Moved up into emit(ir_texture *) to avoid duplication and fix
ordering for Gen7; Gen6 math quirks moved into previous patches.
Tested on Gen6 only; passes all the cube_map_array piglits.
V3: Fixed weird whitespace
V4: Use sampler->type; otherwise broken on arrays of samplers.
v5: Minor style fixes (by anholt)
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
V4: Fix various style nits as pointed out by Eric, and expand IMM
operands on both Gen6 and Gen7.
v5: minor style nits (by anholt)
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
V3: Fixed weird whitespace
V4: Use sampler's type rather than variable's type; otherwise broken
with arrays of samplers. (Thanks Eric)
v5: Fix a couple more style nits (by anholt)
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This causes immediate values to get moved to a temp on gen7, which is needed
for an upcoming change but hadn't happened in the visitor until then.
v2: Drop gen > 7 checks (doesn't exist), and style-fix comments (changes by
anholt).
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Actually switch on the other math instructions mentioned in the
comment.
v3: Add timing data for textureSize(), and clean up some long comment
lines.
Testing shader_time of fs16 shaders on a few frames of various apps:
nexuiz improved by 2.9% +/- 1.5% (n=10)
no difference on GLB2.5 (n=36, outliers removed)
no difference on GLB2.7 (n=25)
etqw improved by 2.6% +/- 2.2% (n=25)
no difference on lightsmark (n=25)
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
I've tested this to be true with various ALU ops on gen7 (with the
exception of MADs, which go at either 3 or 4 cycles per dispatch).
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
This gives the instruction scheduler a chance to schedule between the
loads, whereas before it was restricted due to the dependencies between
the MRFs for setting them up.
For one shader in gles3conform, it goes from getting stuck in register
allocation for as long as anybody's bothered to leave it running down
to 23 seconds, thanks to the LIFO scheduling.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
This came from an idea by Ben Segovia. 16-wide pixel shaders are very
important for latency hiding on i965, so we want to try really hard to
get them. If scheduling an instruction makes some set of instructions
available, those are probably the ones that make the instruction's
result dead. By choosing those first, we'll have a tendency to reduce
the amount of live data as opposed to creating more.
Previously, we were sometimes getting this behavior out of the
scheduler, which was what produced the scheduler's original performance
wins on lightsmark. Unfortunately, that was mostly an accident of the
lame instruction latency information that I had, which made it
impossible to fix the actual scheduling for performance. Now that we've
fixed the scheduling for setup for register allocation, we can safely
update the latency parameters for the final schedule.
In shader-db, we lose 37 16-wide shaders, but gain 90 new ones. 4
shaders that were spilling change how many registers spill, for a
reduction of 70/3899 instructions.
v2: Simplify the new loop.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Sometimes I've got a patch for a performance optimization that's not
showing a statistically significant performance difference on reported
FPS, but still seems like a good idea because it ought to reduce time
spent in the shader. If I can see the total number of cycles spent in
the shader stage being optimized, it may show that the patch is still
worthwhile (or point out that it's actually broken in some way).
Some shaders experience resets more than others, which skews the numbers
reported. Attempt to correct for this by linearly scaling according to
the number of resets that happen.
Note that will not be accurate if invocations of shaders have varying
times and longer invocations are more likely to reset. However, this
should at least be better than the previous situation.
I'm about to emit other kinds of writes besides time deltas, and it
turns out with the frequency of resets, we couldn't really use the old
time delta write() function more than once in a shader.
This patch implements varying packing between varyings.
Previously, each varying occupied components 0 through N-1 of its
assigned varying slot, so there was no way to pack two varyings into
the same slot. For example, if the varyings were a float, a vec2, a
vec3, and another vec2, they would be stored as follows:
<----slot1----> <----slot2----> <----slot3----> <----slot4----> slots
* * * * * * * * * * * * * * * *
flt x x x <vec2-> x x <--vec3---> x <vec2-> x x varyings
(Each * represents a varying component, and the "x"s represent wasted
space).
This change packs the varyings together to eliminate wasted space
between varyings, like so:
<----slot1----> <----slot2----> <----slot3----> <----slot4----> slots
* * * * * * * * * * * * * * * *
<vec2-> <vec2-> flt <--vec3---> x x x x x x x x varyings
Note that we take advantage of the sort order introduced in previous
patches (vec4's first, then vec2's, then scalars, then vec3's) to
minimize how often a varying is "double parked" (split across varying
slots).
Reviewed-by: Eric Anholt <eric@anholt.net>
v2: Skip varying packing if ctx->Const.DisableVaryingPacking is true.
This patch implements varying packing within varyings that are
composed of multiple vectors of size less than 4 (e.g. arrays of
vec2's, or matrices with height less than 4).
Previously, such varyings used up a full 4-wide varying slot for each
constituent vector, meaning that some of the components of each
varying slot went unused. For example, a mat4x3 would be stored as
follows:
<----slot1----> <----slot2----> <----slot3----> <----slot4----> slots
* * * * * * * * * * * * * * * *
<-column1-> x <-column2-> x <-column3-> x <-column4-> x matrix
(Each * represents a varying component, and the "x"s represent wasted
space). In addition to wasting precious varying components, this
layout complicated transform feedback, since the constituents of the
varying are expected to be output to the transform feedback buffer
contiguously (e.g. without gaps between the columns, in the case of a
matrix).
This change packs the constituents of each varying together so that
all wasted space is at the end. For the mat4x3 example, this looks
like so:
<----slot1----> <----slot2----> <----slot3----> <----slot4----> slots
* * * * * * * * * * * * * * * *
<-column1-> <-column2-> <-column3-> <-column4-> x x x x matrix
Note that matrix columns 2 and 3 now cross a boundary between varying
slots (a characteristic I call "double parking" of a varying).
We don't bother trying to eliminate the wasted space at the end of the
varying, since the patch that follows will take care of that.
Since compiler back-ends don't (yet) support this packed layout, the
lower_packed_varyings function is used to rewrite the shader into a
form where each varying occupies a full varying slot. Later, if we
add native back-end support for varying packing, we can make this
lowering pass optional.
Reviewed-by: Eric Anholt <eric@anholt.net>
v2: Skip varying packing if ctx->Const.DisableVaryingPacking is true.
On hardware that supports a limited number of texture indirections,
varying packing will comsume an extra texture indirection, since ALU
operations are needed in the fragment shader to unpack the varyings
before any texturing can be done.
This patch introduces a new driver option,
ctx->Const.DisableVaryingPacking, which can be used by a driver to opt
out of varying packing if the extra texture indirection is costly
enough to outweigh the advantages of packing varyings.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
This lowering pass generates GLSL code that manually packs varyings
into vec4 slots, for the benefit of back-ends that don't support
packed varyings natively.
No functional change--the lowering pass is not yet used.
Reviewed-by: Eric Anholt <eric@anholt.net>
v2: Don't use ir_hierarchical_visitor--just loop over instructions
directly. Also, make the names of the packed varyings include the
names of the original varyings that were packed into them.
This patch paves the way for varying packing by adding a sorting step
before varying assignment, which sorts the varyings into an order that
increases the likelihood of being able to find an efficient packing.
First, varyings are sorted into "packing classes" by considering
attributes that can't be mixed during varying packing--at the moment
this includes base type (float/int/uint/bool) and interpolation mode
(smooth/noperspective/flat/centroid), though later we will hopefully
be able to relax some of these restrictions. The number of packing
classes places an upper limit on the amount of space that must be
wasted by varying packing, since in theory a shader might nave 4n+1
components worth of varyings in each of m packing classes, resulting
in 3m components worth of wasted space.
Then, within each packing class, varyings are sorted by vector size,
with vec4's coming first, then vec2's, then scalars, and then finally
vec3's. The motivation for this order is that it ensures that the
only vectors that might be "double parked" (with part of the vector in
one varying slot and the remainder in another) are vec3's.
Note that the varyings aren't actually packed yet, merely placed in an
order that will facilitate packing.
Reviewed-by: Eric Anholt <eric@anholt.net>
This patch further subdivides the loop that assigns varying locations
into two phases: one phase to match up the varyings between shader
stages, and one phase to assign them varying locations.
In between the two phases the matched varyings are stored in a new
data structure called varying_matches. This will free us to be able
to assign varying locations in any order, which will pave the way for
packing varyings.
Note that the new varying_matches::assign_locations() function returns
the number of varying slots that were used; this return value will be
used in a future patch.
Reviewed-by: Eric Anholt <eric@anholt.net>
This patch subdivides the loop that assigns varying locations into two
phases: one phase to match up varyings between shader stages (and
assign them varying locations), and a second phase to record the
varying assignments for use by transform feedback.
This paves the way for varying packing, which will require us to
further subdivide the first phase.
In addition, it lets us avoid a clumsy O(n^2) algorithm, since we can
now record the locations of all transform feedback varyings in a
single pass through the tfeedback_decls array, rather than have to
iterate through the array after assigning each varying.
Reviewed-by: Eric Anholt <eric@anholt.net>
Currently, the location of each varying is recorded in ir_variable as
a multiple of the size of a vec4. In order to pack varyings, we need
to be able to record, e.g. that a vec2 is stored in the second half of
a varying slot rather than the first half.
This patch introduces a field ir_variable::location_frac, which
represents the offset within a vec4 where a varying's value is stored.
Varyings that are not subject to packing will always have a
location_frac value of zero.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously, the linker used a value of -1 in ir_variable::location to
denote a generic input or output of the shader that had not yet been
matched up to a variable in another pipeline stage.
This patch introduces a new ir_variable field,
is_unmatched_generic_inout, for that purpose.
In future patches, this will allow us to separate the process of
matching varyings between shader stages from the processes of
assigning locations to those varying. That will in turn pave the way
for packing varyings.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously, link_invalidate_variable_locations() was only called
during assign_attribute_or_color_locations() and
assign_varying_locations(). This meant that in the corner case when
there was only a vertex shader, and varyings were being captured by
transform feedback, link_invalidate_variable_locations() wasn't being
called for the varyings.
This patch migrates the calls to link_invalidate_variable_locations()
to link_shaders(), so that they will be called in all circumstances.
In addition, it modifies the call semantics so that
link_invalidate_variable_locations() need only be called once per
shader stage (rather than once for inputs and once for outputs).
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This patch modifies the clip distance lowering pass so that the new
symbol it generates (glClipDistanceMESA) is added to the shader's
symbol table.
This will allow a later patch to modify the linker so that it finds
transform feedback varyings using the symbol table rather than having
to iterate through all the declarations in the shader.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This builds on the previous draw/softpipe patch.
So llvmpipe does streamout calls after clip/viewport stages,
but we have the pre-clip position stored for later use, so
when we are doing transform feedback, and its the position vertex
grab the vertex from the stored pre clip position.
The perfect fix is too probably add a codegen transform feedback
stage in between shader and clip stages, but this is good enough
for now.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds support to draw for the new features of transform feedback.
a) fix count_from_stream_output, using max_index+1 for now but it looks
like it should be valid as its derived from the vertex elements/vbo.
b) fix striding and dst offsets in output buffers - was just wrong before.
c) fix crash if tfb is suspended (so.num_targets == 0)
This also enables the new features on softpipe. It should be possible
to enable them on llvmpipe as well after this commit, but would need
to schedule piglit runs.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Every call to _cl_program::build() was erasing the binaries and logs for
every device associated with the program. This is incorrect because
it is possible to build a program for only a subset of devices and so
any device not being build should not have this information erased.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Override the cross_compiling and ac_tool_prefix variables by reassigning
to them instead of redefining the macros. Redefining them will actually
cause the variable names to be replaced instead of their content.
Furthermore push the definition of CPPFLAGS before running the checks
for the build tools to avoid the host CPPFLAGS from leaking into the
build CPPFLAGS.
While at it drop the redefinition of AC_TRY_COMPILER which hasn't been
used since autoconf 2.50 and make sure that all definitions are properly
popped when done (LDFLAGS, ac_cv_prog_CPP, ac_cv_prog_CXXCPP).
Acked-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Thierry Reding <thierry.reding@avionic-design.de>
Since we don't call lp_build_sample_common() in the texel fetch path we missed
the layer fixup code. If someone would have tried to do texelFetch with array
textures it would have crashed for sure.
Not really tested (can't run the piglit test being able to use texelFetch with
array samplers for now with llvmpipe).
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Previously, if the client program didn't specify a stride when setting
up a vertex attribute, we used _mesa_sizeof_type() to compute the size
of the type, and multiplied it by the number of components.
This didn't work for the 2_10_10_10 formats, since _mesa_sizeof_type()
returns -1 for those types, resulting in all kinds of havoc, since it
was causing the hardware to be programmed with a negative stride
value.
This patch adds a new function _mesa_bytes_per_vertex_attrib(), which
is similar to the existing function _mesa_bytes_per_pixel(), but which
computes the size of a vertex attribute based on the type and the
number of formats. For packed formats (currently only the 2_10_10_10
formats), it verifies that the number of components is correct and
returns the size of the packed format. For unpacked formats, it
returns the size of the type times the number of components.
In addition, this patch adds an assertion so that if we ever forget to
update _mesa_bytes_per_vertex_attrib() when adding a new vertex
format, we'll see the problem quickly rather than having to debug a
subtle conformance test failure.
Fixes GLES3 conformance tests
vertex_type_2_10_10_10_rev_{conversion,divisor,stride_pointer}.test.
Reviewed-by: Brian Paul <brianp@vmware.com>
The GL 3.1 and ES 3.0 specs say of glGetActiveUniformsiv:
"If an error occurs, nothing will be written to params."
So, make a pass through the indices and check that they're valid before
the pass that actually writes to params. Checking pname happens on the
first iteration of the second loop.
Fixes es3conform's getactiveuniformsiv_for_nonexistent_uniform_indices
test.
NOTE: This is a candidate for the 9.0 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Otherwise messages say silly things like
glGetActiveUniformBlockiv(block index -1 >= 0)
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
try_rewrite_rhs_to_dst is a quick optimization to avoid generating new
temporaries (and MOVs from those temporaries to the dest) for every
expression tree we visit. By generating better code in simple cases, we
reduce the burden on later optimization passes like register coalescing.
Previously, we compared inst->regs_written() to lhs->vector_elements
to make sure the instruction generating our value wrote the same number
of components as our destination register.
However, this fails in some cases. One example is texturing (which
produces a vec4) into gl_FragData[i]. Technically, gl_FragData[i] is
also a vec4. However, the destination VGRF actually has size 4n (where
n is the size of the array).
split_virtual_grfs() can't split VGRFs that are used by SEND messages
which require contiguous destination registers (like texturing), and
register allocation needs all VGRFs to have sizes between 1 and 4.
Amnesia: The Dark Descent hits this case: a texturing instruction
(4 components) gets rewritten to the gl_FragData output register
(which was 4*3 = 12 components), causing the register allocator to
hit the "we rely on split_virtual_grfs" assertion.
This makes it possible to play Amnesia.
Reviewed-by: Eric Anholt <eric@anholt.net>
This is redundant since we're calling draw_bind_fragment_shader()
which already does a flush.
v2: the redundant flush in llvmpipe_set_constant_buffer() has
already been removed by commit 3427466e6d
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Fetch shaders are usually destroyed at the context destruction by the state
tracker, so we can put them all in a large buffer without wasting memory.
This reduces the number of relocations sent to the kernel a little bit.
Tested-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Instead of having a 4-byte buffer for each streamout target, we suballocate
each dword from a 4K buffer.
This further reduces the overall number of relocations.
Tested-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
u_upload_mgr suballocates memory from a large buffer and maps the allocated
range (unsychronized), which is perfect for short-lived staging buffers.
This reduces the number of relocations sent to the kernel.
Tested-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
There are 2 ways. I prefer the former:
GALLIUM_MSAA=n
__GL_FSAA_MODE=n
Tested with ETQW, which doesn't support MSAA on Linux. This is
the only way to get MSAA there.
Reviewed-by: Brian Paul <brianp@vmware.com>
There are only 2 possible usages: render target and depth stencil.
Both can be derived from the surface format, so the flag is redundant.
And it's going away...
Reviewed-by: Brian Paul <brianp@vmware.com>
This adds seamless sampling for cubemap boundaries if requested.
The corner case averaging is messy but seems like it should be spec
compliant.
The face direction stuff is also a bit messy, I've no idea if that could
or should be simpler, or even if all my directions are fully correct!
v1.1: update comments, drop unneeded seamless calls for nearest, fix
if statement layout.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This follow the code from the i965 driver, and emits the structs
and arrays recursively.
This fixes an assert in the two UBO tests
fs-struct-copy-complicated and
vs-struct-copy-complicated
These tests now pass on softpipe, with no regressions.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This fixes some use-after-free issues. I haven't measured any real
performance difference with a handful of Mesa demos.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Before this we only supported user-based constant buffers.
First, we basically plumb pipe_constant_buffer objects through llvmpipe
rather than pipe_resource objects.
Second, update llvmpipe_set_constant_buffer() and try_update_scene_state()
so they understand both resource- and user-based constant buffers.
The problem with user constant buffers is the potential for use-after-free,
as seen in some WebGL tests. The next patch will flip the switch for
resource-based const buffers.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
I had tried this in the past, but ran into trouble with applications
that sample from undiscarded pixels in the same subspan. To fix that
issue, only jump to the end for an entire subspan at a time.
Improves GLbenchmark 2.7 (1024x768) performance by 7.9 +/- 1.5% (n=8).
v2: Drop the br variable in the jump instruction -- if I ever do jumps
pre-gen6, it'll be a different code block anyway since we don't have
HALT until gen6.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This makes much more sense on gen6+, and will also prove useful for
early exit of shaders on discard.
v2: fix up a stale comment from before converting gen4-5.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We're going to redo discard handling to track discards in the other flag
subregister, saving instructions in the discard and allowing predicated
jumps out to the end of the shader.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This makes our output more consistent with other disasm tools, and
will be necessary when we start using f0.1.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We've been calling it a register number, it's actually the subregister,
and things will get confusing once we start using it if it isn't fixed.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
There's a flag subreg nr field in bits2 next to src0.vertstride, but
there shouldn't be anything in bits3 next to src1.vertstride.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes compiler warning:
drm/native_drm.c: In function ‘native_create_display’:
drm/native_drm.c:180:21: warning: ‘device’ may be used uninitialized in this function [-Wmaybe-uninitialized]
drm/native_drm.c:157:24: note: ‘device’ was declared here
Signed-off-by: Tobias Droste <tdroste@gmx.de>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
This fixes a number of crashes on r600g due to the fact that
lp_build_mul assumes vector types when optimizing mul to bit shifts.
This bug was uncovered by 0ad1fefd69
Noticed would fail, we were doing two things wrong
a) 1d arrays require the layers in height
b) minifying the layers field.
v2: don't change height code, fixup completely inside txq
as suggested by Roland.
v3: just add minify before texture array size
v1: Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
command mistakenly used vector instead of scalar emit (the more or less
identical code in radeon is already correct).
Seems like it would be broken ever since kms probably.
Should fix bugs 22576, 26809.
I noticed the texelFetch offset test failed on 2D rect samplers
with GLSL 1.40. This is because I wrote the immediate->offset
translation wrong.
Fixed the translation to actually use the ureg info to set the
offsets up.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This ports over from the dri2 code to the drisw bits. It means 3.1
core contexts now work for softpipe.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is needed to compute render_to_fbo. It even has the comment.
NOTE: This is a candidate for stable branches.
Reviewed-by: Eric Anholt <eric@anholt.net>
This patch does two things:
1. Constant buffer state changes were broken (but happened to work by
dumb luck). The problem is we weren't calling draw_do_flush() in
draw_set_mapped_constant_buffer() when we changed that state. All the
other draw_set_foo() functions were calling draw_do_flush() already.
2. Use a simpler state validation step when we're changing light-weight
parameter state such as constant buffers, viewport dims or clip planes.
There's no need to revalidate the whole pipeline when changing state
like that. The new validation method is called bind_parameters()
and is called instead of the prepare() method. A new
DRAW_FLUSH_PARAMETER_CHANGE flag is used to signal these light-weight
state changes. This results in a modest but measurable increase in
FPS for many Mesa demos.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
When one function is changed, also look at the other.
Presently, there are some differences with respect to geometry
shaders and instanced drawing...
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
this adds UBO support to the state tracker, it works with softpipe
as-is.
It uses UARL + CONST[x][ADDR[0].x] type constructs.
v2: don't disable UBOs if geom shaders don't exist (me)
rename upload to bind (calim)
fix 12 -> 13 comparison as comment (calim + brianp)
fix signed->unsigned (Brian)
remove assert (Brian)
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds the necessary changes to the st to allow texture buffer object
support if the driver advertises it.
v1.1: remove extra blank line and whitespace
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This patch enables support for ETC2 compressed textures on
all intel hardware. At present, ETC2 texture decoding is not
available on intel hardware. So, compressed ETC2 texture data
is decoded in software and stored in a suitable uncompressed
MESA_FORMAT at the time of glCompressedTexImage2D. Currently,
ETC2 formats are only exposed in OpenGL ES 3.0.
V2: Use single etc_wraps variable for both etc1 and etc2.
V3: Remove redundant code and use just one intel_miptree_map_etc()
and intel_miptree_unmap_etc() function.
Choose MESA_FORMAT_SIGNED_{R16, GR1616} for ETC2 signed-{r11, rg11}
formats
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Tested-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Data in GL_COMPRESSED_SRGB8_PUNCHTHROUGH_ALPHA1_ETC2 format is decoded and stored
in MESA_FORMAT_SARGB.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Data in GL_COMPRESSED_RGB8_PUNCHTHROUGH_ALPHA1_ETC2 format is decoded and stored
in MESA_FORMAT_RGBA8888_REV.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Data in GL_COMPRESSED_SIGNED_RG11_EAC format is decoded and stored in
MESA_FORMAT_SIGNED_GR1616.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Data in GL_COMPRESSED_SIGNED_R11_EAC format is decoded and stored in
MESA_FORMAT_SIGNED_R16.
v2:
16 bit signed data is converted to 16 bit unsigned data by
adding 2 ^ 15 and stored in an unsigned texture format.
v3:
1. Handle a corner case when base code word value is -128. As per
OpenGL ES 3.0 specification -128 is not an allowed value and should
be truncated to -127.
2. Converting a decoded 16 bit signed data to 16 bit unsigned data by
adding 2 ^ 15 gives us an output which matches the decompressed image
(.ppm) generated by ericsson's etcpack tool. ericsson is also doing this
conversion in their tool because .ppm image files don't support signed
data. But gles 3.0 specification doesn't suggest this conversion. We
need to keep the decoded data in signed format. Both signed format
tests in gles3 conformance pass with these changes.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Tested-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Data in GL_COMPRESSED_RG11_EAC format is decoded and stored in
MESA_FORMAT_RG1616.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Data in GL_COMPRESSED_R11_EAC format is decoded and stored in
MESA_FORMAT_R16.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Data in GL_COMPRESSED_SRGB8_ALPHA8_ETC2_EAC format is decoded and stored
in MESA_FORMAT_SARGB8.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Data in GL_COMPRESSED_RGBA8_ETC2_EAC format is decoded and stored
in MESA_FORMAT_RGBA8888_REV.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Data in GL_COMPRESSED_SRGB8_ETC2 format is decoded and stored
in MESA_FORMAT_SARGB8.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Data in GL_COMPRESSED_RGB8_ETC2 format is decoded and stored in
MESA_FORMAT_RGBX8888_REV.
v2: Use CLAMP macro and stdbool.h
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This patch changes nonlinear_to_linear() function to non static inline
and makes it available outside format_unpack.c. Also, removes the
duplicate copies in other files.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
It is required by OpenGL ES 3.0 to support ETC2 textures.
This patch adds new MESA_FORMATs for following etc2 texture
formats:
GL_COMPRESSED_RGB8_ETC2
GL_COMPRESSED_SRGB8_ETC2
GL_COMPRESSED_RGBA8_ETC2_EAC
GL_COMPRESSED_SRGB8_ALPHA8_ETC2_EAC
GL_COMPRESSED_R11_EAC
GL_COMPRESSED_RG11_EAC
GL_COMPRESSED_SIGNED_R11_EAC
GL_COMPRESSED_SIGNED_RG11_EAC
MESA_FORMAT_ETC2_RGB8_PUNCHTHROUGH_ALPHA1
MESA_FORMAT_ETC2_SRGB8_PUNCHTHROUGH_ALPHA1
Above formats are currently available in only gles 3.0.
v2: Add entries in texfetch_funcs[] array.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
v3 (Paul Berry <stereotype441@gmail.com>): comment out symbols that
are not implemented yet, so that this commit compiles on its own;
future commits will uncomment the symbols as they become available.
Removes a collision of the object file name for main/hash_table
and program/hash_table.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The ES 3 conformance suite unbinds buffers (by binding buffer 0) and
passes zero for the size and offset, which the spec explicitly
disallows. Otherwise, this seems like a reasonable thing to do.
Khronos will be changing the spec to allow this (bug 9765). Fixes
es3conform's transform_feedback_init_defaults test.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Just enough for draw module to work ok.
This improves "piglit attribs GL3", though something fishy is still
happening with certain unsigned integer values.
Reviewed-by: Brian Paul <brianp@vmware.com>
The ADDR file is cumbersome for native integer capable drivers. We
should consider deprecating it eventually, but this just adds support
for indirection from TEMP registers.
Reviewed-by: Brian Paul <brianp@vmware.com>
Support 16 (defined in LP_MAX_TGSI_CONST_BUFFERS) as opposed to 32 (as
defined by PIPE_MAX_CONSTANT_BUFFERS) because that would make the jit
context become unnecessarily large.
v2: Bump limit from 4 to 16 to cover ARB_uniform_buffer_object needs,
per Dave Airlie.
Reviewed-by: Brian Paul <brianp@vmware.com>
All MSAA buffers are allocated privately and resolved into the DRI-provided
back and front buffers.
If an MSAA visual is chosen, the buffers st/mesa receives are all
multi-sample. st/mesa doesn't have access to the single-sample buffers
in that case.
This makes MSAA work in games like Nexuiz.
Reviewed-by: Brian Paul <brianp@vmware.com>
- We can use a single loop for adding new configs.
- The useless parameter depth_bits is removed.
- The maximum number of samples is bumped to 32.
- We can support Z16_UNORM and Z32_UNORM unconditionally since the zbuffers
are private.
Reviewed-by: Brian Paul <brianp@vmware.com>
This disables DRI2 sharing of zbuffers. The window zbuffer is allocated just
like any other texture - through resource_create.
The idea of allocating a zbuffer through DRI2 isn't very useful with MSAA,
where a single-sample zbuffer is useless.
IIRC, the Intel driver does the same thing.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Just use pipe->blit, which can do resolve, flipping, and format conversions.
The util_blit_pixels codepath is still there for the cases where we have to
force alpha to 1.
This also turns on acceleration for copying GL_DEPTH_STENCIL.
This may not be strictly necessary, but every other rule in the grammar ends
with a semicolon. It also appears that this was supposed to be commited with
the original patch that changed this rule, but the wrong version of the patch
was accidentally pushed.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Note that while 'packed' is a reserved word in GLSL ES, row_major is not.
This means that we have to use the string-based matching for that.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Carl Worth <cworth@cworth.org>
Nearly all of the builtin functions in GLSL 3.00 ES are already
implemented in Mesa; this patch enables them.
A few functions are not implemented yet; those have been commented
out, with a FIXME comment to act as a reminder of what still needs to
be implemented. Here is the complete list: packSnorm2x16,
unpackSnorm2x16, packUnorm2x16, unpackUnorm2x16, packHalf2x16,
unpackHalf2x16.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Carl Worth <cworth@cworth.org>
These functions are defined in GLSL 1.50 and GLES 3.00 ES.
The formulas have been extracted from the existing implementation of
inverse().
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Carl Worth <cworth@cworth.org>
This patch also adds assertions so that when we add new GLSL versions,
we'll notice that we need to update the builtin variables.
[v2, idr]: s/Frab/Frag/ Noticed by Eric.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> [v1]
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Carl Worth <cworth@cworth.org>
This patch implements all of the built-in types for GLSL 3.00 ES.
This is almost exactly the same as the set of built-in types for GLSL
1.30, except ate 1D samplers are skipped, and samplerCubeShadow is
added.
This patch also addes an assertion so that when we add new GLSL
versions, we'll notice that we need to update the types.
In review, Eric noted:
"This change looks correct. The overall interaction of profiles is
getting ugly, though. I'm imagining a restructure of the symbol
table population so that there's a big list of types, and each
#version has a nice list of strings of type names copy and pasted
out of its spec."
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Carl Worth <cworth@cworth.org>
This patch updates the following linker checks to do the right thing
in GLSL 3.00 ES:
- Failing to write to gl_Position is allowed in GLSL 1.40+ as well as
GLSL 3.00 ES.
- It is an error to write to both gl_ClipVertex and gl_ClipDistance in
GLSL 1.30+. This does not apply to GLSL 3.00 ES.
- GLSL 3.00 ES uses the same varying counting rules as GLSL 1.00 ES.
- In GLSL 1.30 and GLSL 3.00 ES, "discard" terminates the shader.
- In GLSL 1.00 ES and GLSL 3.00 ES, both a fragment and a vertex
shader must be present.
[v2, idr]: Fix minro typo in a comment. Noticed by Ken.
[v3, idr]: s/IsEs(Shader|Prog)/IsES/ Suggested by Ken and Eric.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Carl Worth <cworth@cworth.org>
Previously we recorded just the GLSL version (or the max version, if
GLSL 1.10 and GLSL 1.20 programs were linked together).
[v2, idr]: s/IsEs(Shader|Prog)/IsES/ Suggested by Ken and Eric.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Carl Worth <cworth@cworth.org>
Previously, we prohibited mixing of shading language versions if
min_version == 100 or max_version >= 130. This was technically
correct (since desktop GLSL 1.30 and beyond prohibit mixing of shading
language versions, as does GLSL 1.00 ES), but it was confusing. Also,
we asserted that all shading language versions were between 1.00 and
1.40, which was unnecessary (since the parser already checks shading
language versions) and doesn't work for GLSL 3.00 ES.
This patch changes the code to explicitly check that (a) ES shaders
aren't mixed with desktop shaders, (b) shaders aren't mixed between ES
versions, and (c) shaders aren't mixed between desktop GLSL versions
when at least one shader is GLSL 1.30 or greater. Also, it removes
the unnecessary assertion.
[v2, idr]: Slightly tweak the is_es_prog detection to occur outside the loop
instead of doing something special on the first loop iteration. Suggested by
Ken.
[v3, idr]: s/IsEs(Shader|Prog)/IsES/ Suggested by Ken and Eric.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> [v1]
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Carl Worth <cworth@cworth.org>
Previously we recorded just the GLSL version, with the knowledge that
100 means GLSL 1.00 ES. With the advent of GLSL 3.00 ES, this is
going to get more complex, and eventually will probably become
ambiguous (GLSL 4.00 already exists, and GLSL 4.00 ES is likely to be
created some day).
To reduce confusion, this patch simply records whether the shader is
GLSL ES as an explicit boolean.
[v2, idr]: s/IsEs(Shader|Prog)/IsES/ Suggested by Ken and Eric.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> [v1]
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Carl Worth <cworth@cworth.org>
Note that GLSL 1.00 is selected using "#version 100", so "#version 100
es" is prohibited.
v2: Check for GLES3 before allowing '#version 300 es'
v3: Make sure a correct language_version is set in
_mesa_glsl_parse_state::process_version_directive.
Signed-off-by: Paul Berry <stereotype441@gmail.com>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Carl Worth <cworth@cworth.org>
Version directive handling is going to have to be used within two
parser rules, one for desktop-style version directives (e.g. "#version
130") and one for the new ES-style version directive (e.g. "#version
300 es"), so this patch moves it to a function that can be called from
both rules.
No functional change.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Carl Worth <cworth@cworth.org>
Version directive handling is going to have to be used within two
parser rules, one for desktop-style version directives (e.g. "#version
130") and one for the new ES-style version directive (e.g. "#version
300 es"), so this patch moves it to a function that can be called from
both rules.
No functional change.
[mattst88] v2: Use intmax_t instead of int for version argument. Would
otherwise write garbage after #version since PRIiMAX was reading 64-bits
instead of 32.
[idr] v3: A later commit fixes the caller of
_glcpp_parser_handle_version_declaration to pass the correct number of
parameters. Fix it in the patch that changes the interface instead.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Carl Worth <cworth@cworth.org>
This patch turns on the following features for GLSL ES 3.00:
- Array constructors, whole array assignment, and array comparisons.
- Second and third operands of ?: may be arrays.
- Use of "in" and "out" qualifiers on globals.
- Bitwise and modulus operators.
- Integral vertex shader inputs.
- Range-checking of literal integers.
- array.length method.
- Function calls may be constant expressions.
- Integral varyings must be qualified with "flat".
- Interpolation and centroid qualifiers may not be applied to vertex
shader inputs.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Carl Worth <cworth@cworth.org>
GLSL ES 3.00 adds the following keywords over GLSL 1.00: uint,
uvec[2-4], matNxM, centroid, flat, smooth, various samplers, layout,
switch, default, and case.
Additionally, it reserves a large number of keywords, some of which
were already reserved in versions of desktop GL that Mesa supports,
some of which are new to Mesa.
A few of the reserved keywords in GLSL ES 3.00 are keywords that are
supported in all other versions of GLSL: attribute, varying,
sampler1D, sampler1DShador, sampler2DRect, and sampler2DRectShadow.
This patch updates the lexer to handle all of the new keywords
correctly when the language being parsed is GLSL 3.00 ES.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Carl Worth <cworth@cworth.org>
This patch expands the lexer KEYWORD macro to take two additional
arguments: the GLSL ES versions in which the given keyword was first
reserved, and supported, respectively. This will allow us to
trivially add support for GLSL 3.00 ES keywords, even though the set
of GLSL 3.00 ES keywords is neither a subset or a superset of the
keywords corresponding to any desktop GLSL version.
The new KEYWORD macro makes use of the
_mesa_glsl_parse_state::is_version() function, so it accepts 0 as
meaning "unsupported" (rather than 999, which we used previously).
Note that a few keywords ("packed" and "row_major") are supported
*either* when GLSL 1.40 is in use or when ARB_uniform_buffer_obj
support is enabled. Previously, we handled these by cleverly taking
advantage of the fact that the KEYWORD macro didn't parenthesize its
arguments in the usual way. Now they are handled more
straightforwardly, with a new macro, KEYWORD_WITH_ALT.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Carl Worth <cworth@cworth.org>
Previous to this patch, we were not very consistent about the errors
we generate when a shader tried to use a feature that is prohibited in
the current GLSL version. Some error messages failed to mention the
GLSL version currently in use (or did so inaccurately), and some error
messages failed to mention the first GLSL version in which the given
feature is allowed.
This patch reworks all of the error checks to use the check_version()
function, which produces error messages in a standard form
(approximately "$FEATURE forbidden in $CURRENT_GLSL_VERSION
($REQUIRED_GLSL_VERSION required).").
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Carl Worth <cworth@cworth.org>
With the advent of GLSL 3.00 ES, the version checks we perform in the
GLSL compiler (to determine which language features are present) will
become more complicated. To reduce the complexity, this patch adds
functions check_version() and is_version() to _mesa_glsl_parse_state.
These functions take two version numbers: a desktop GLSL version and a
GLSL ES version, and return a boolean indicating whether the GLSL
version being compiled is at least the required version. So, for
example, is_version(130, 300) returns true if the GLSL version being
compiled is at least desktop GLSL 1.30 or GLSL 3.00.
The check_version() function additionally produces an error message if
the version check fails, informing the user of which GLSL version(s)
support the given feature.
[v2, idr]: Add PRINTFLIKE annotation to the new method. The numbering of th
parameters is correct because GCC is silly.
[v3, idr]: Fix copy-and-paste error in the comment before
_mesa_glsl_parse_state::is_version. Noticed by Ken.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Carl Worth <cworth@cworth.org>
Fixes a bug where version_string would be left uninitialized if no
GLSL "#version" directive was used.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Carl Worth <cworth@cworth.org>
This will be useful in generating more helpful error messages,
especially with the addition of GLSL 3.00 ES support.
[v2, idr]: Rename ctx parameter to mem_ctx
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Carl Worth <cworth@cworth.org>
Previously, we stored the GLSL language version in the
glsl_symbol_table struct. But this was unnecessary--all
glsl_symbol_table needs to know is whether functions and variables
have separate namespaces (they do in GLSL 1.10 only).
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Carl Worth <cworth@cworth.org>
Adding this now makes it easier to develop and test GLES3 features, since we
can do initial development and testing using desktop GL. Later GLSL compiler
patches check for either ctx->Extensions.ARB_ES3_compatibility or
_mesa_is_gles3 to allow certain features (i.e., "#version 300 es").
[v2, idr]: Just edits to the commit message.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Carl Worth <cworth@cworth.org>
Previously, the user could send in a pointer that was not created
by mesa. When we dereferenced that pointer, there would be an
exception.
Now we keep a set of pointers and verify that the pointer
exists in that set before dereferencing it.
Note: This fixes several crashing gles3conform tests.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Note: The GL/GLES3 web man pages don't seem to properly
document glWaitSync's error when the sync object is invalid.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
From: git://people.freedesktop.org/~anholt/hash_table
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
[jordan.l.justen@intel.com: minor rework for mesa]
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
u_rect.h said these should move to a different file, and u_surface seems
a better home.
Leave #include "util/u_surface.h" to avoid having to touch thousand of
files.
Reviewed-by: Brian Paul <brianp@vmware.com>
- Re-implement os_time_get in terms of os_time_get_nano() for consistency
- Use CLOCK_MONOTONIC as recommended
- Only use clock_gettime on Linux for now.
Reviewed-by: Brian Paul <brianp@vmware.com>
Otherwise the driver announces 4096 vertex shader constants and other
way too high limits.
NOTE: This is a candidate for stable release branches.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
GL/gl.h provides some definitions (GL_FALSE, GL_ONE, etc) that have
the same value as other gl headers but are represented differently
(0 vs 0x0 and 1 vs 0x1).
This causes compiler warnings about redefining such definitions when
including GL/gl.h with other gl headers.
Fixes http://bugs.freedesktop.org/show_bug.cgi?id=57802
Signed-off-by: Brian Paul <brianp@vmware.com>
Several issues actually:
- Fix a regression in unsigned normalized in the rescaling
[0, 255] to [0, 256]
- Ensure we use signed shifts where appropriate (instead of
unsigned shifts)
- Refactor the code slightly -- move all the logic inside
lp_build_lerp_simple().
This change, plus an adjustment in the tolerance of signed normalized
results in piglit fbo-blending-formats fixes bug 57903
Reviewed-by: Brian Paul <brianp@vmware.com>
They need to be converted to the native integer type to prevent garbage
in higher order bits from being printed.
Reviewed-by: Brian Paul <brianp@vmware.com>
Remove the draw_vs_set_constants() and draw_gs_set_constants()
functions and the draw->vs.aligned_constants,
draw->vs.aligned_constant_storage and draw->vs.const_storage_size
fields. None of it was used.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Commit 4097308 fixed the build in a questionable way. It worked at the
time, but, as Ian pointed out, the fix would likely fail at a future
commit due to the indeterminism of parallel builds. And that's exactly
what happened; the fix no longer works. `mm -j4` on Fedora 17 fails for
me.
The problem is that there is no rule for program_parse.tab.h. To fix that,
this patch adds a rule that makes program_parse.tab.c depend on
program_parse.tab.h. Technically, the c file does not depend on the
h file. However, because the two files are generated together by a single
invocation of Bison, any rule that forces execution of Bison is
sufficient.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
I'd written most of this ages ago, but never finished it off.
This passes 115/130 piglit tests so far. I'll look into the
others as time permits.
v1.1: fix calloc return check as suggested by Jose.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This can be used for two purposes: Using hand-coded shaders to determine
per-instruction timings, or figuring out which shader to optimize in a
whole application.
Note that this doesn't cover the instructions that set up the message to
the URB/FB write -- we'd need to convert the MRF usage in these
instructions to GRFs so that our offsets/times don't overwrite our
shader outputs.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
v2: Check the timestamp reset flag in the VS, which is apparently
getting set fairly regularly in the range we watch, resulting in
negative numbers getting added to our 32-bit counter, and thus large
values added to our uint64_t.
v3: Rebase on reladdr changes, removing a new safety check that proved
impossible to satisfy. Add a comment to the AOP defs from Ken's
review, and put them in a slightly more sensible spot.
v4: Check timestamp reset in the FS as well.
Fixes flat shading for AA lines. demos/src/trivial/line-smooth is a
test case which hits this.
Note: This is a candidate for the stable branches.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
x11_screen.c includes xf86drm.h, which comes from libdrm-dev.
This patch fixes this build error.
Compiling src/gallium/state_trackers/egl/x11/x11_screen.c ...
src/gallium/state_trackers/egl/x11/x11_screen.c:30:21: fatal error: xf86drm.h: No such file or directory
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Serious Sam 3 had a shader hitting this path, but it's used rarely so it
didn't show a significant performance difference (n=7). It does reduce
compile time massively, though -- one shader goes from 14s compile time
and 11723 instructions generated to .44s and 499 instructions.
Note that some shaders lose 16-wide mode because we don't support
16-wide and pull constants at the moment (generally, things looping over
a few-element array where the loop isn't getting unrolled). Given that
those shaders are being generated with 15-20% fewer instructions, it
probably outweighs the loss of 16-wide.
The gen7 send-from-GRF path is sufficiently different from the perspective of
IR generation and optimization that I just made it a separate opcode.
v2: fix whitespace, rebase on Ken's recent refactor.
As of gen7, we can skip the header on some messages, and this can make
optimization on those messages much nicer when you've got GRFs instead of MRFs
as the source.
This is a temporary hack. I believe the only way of properly fixing this
is to check buffer overflow just before fetching based on addresses,
instead of number of vertices/instances. This change simply allows tests
that stress buffer overflows to complete without asserting, and should
not affect valid rendering.
Reviewed-by: Brian Paul <brianp@vmware.com>
We need to clamp vertex buffer fetch based on its size, not based on the
user specified max index hint.
This matches draw_pt_fetch_run() above.
NOTE: This is a candidate for the stable branches.
Reviewed-by: Brian Paul <brianp@vmware.com>
A single vertex size is chosen for the whole pipeline. So the number of
geometry shader outputs must also be taken in consideration.
Reviewed-by: Brian Paul <brianp@vmware.com>
There is more work necessary to properly support buffers in shaders, but
this gets things a bit further along.
Reviewed-by: Brian Paul <brianp@vmware.com>
This fixes fdo bug 57755 and most of the failures of piglit fbo-blending-formats
GL_EXT_texture_snorm.
GL_INTENSITY_SNORM is still failing, but problem is probably elsewhere,
as GL_R8_SNORM works fine.
Now that _mesa_BindFramebuffer does the right thing in ES contexts when the
gl_extensions::ARB_framebuffer_object bit is set, the Intel driver doesn't
need this hack.
No piglit or GLES2 conformance regressions observed on IVB, and this
patch (and the previous) fix es3conform's framebuffer_srgb_draw and
transform_feedback_misc tests.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Desktop OpenGL implementations that support either
GL_ARB_framebuffer_object or OpenGL 3.0 must require names from
glGenFramebuffers for glBindFramebuffer. We have enforced this rule for
quite some time. However, OpenGL ES 1.0, 2.0, and 3.0 implementations
are required to allow user-defined names (e.g., not from
glGenFramebuffers{OES,}).
The Intel drivers have hacked around this by not enabling
GL_ARB_framebuffer_object in an ES context. Instead, just pick the
correct behavior in _mesa_BindFramebuffer based on the context API.
Chad pointed out in a review e-mail:
"I'd like to point out, though, that glBindFramebufferEXT and
glBindRenderbufferEXT are still broken on desktop GL because they
don't accept user-genned names. But that fix belongs to a different
series."
Currently glBindFramebufferEXT is an alias for glBindFramebuffer.
Unalising two functions presents some difficulty, so we'll have to
revisit this eventually.
v2: Perform same check in _mesa_BindRenderbuffer too.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> [v1]
The NV formulation of primitive restart is turned on/off with
glEnableClientState/glDisableClientState. These two functions don't
exist in core contexts, which mean that GL_NV_primitive_restart is
essentially useless...even broken.
However, leaving it on causes oglconform's primitive-restart-nv tests to
run in OpenGL 3.1 contexts, which results in them all failing. This
patch causes 29 subtests to go from "fail" to "not run".
NOTE: This is a candidate for stable branches.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
I keep accidentally trying to use it. "fs" is a sensible name for
fragment shader debugging, and "wm" is...not. It's also more symmetric
with "vs".
Leave INTEL_DEBUG=wm because old habits die hard.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Also remove the recently added and overloaded LLVM_CXXFLAGS from CXXFLAGS.
Note: This is a candidate for the stable branches.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
After walking our IR instructions (Mesa or GLSL), we don't want to also
mark the start of the FB/URB writes or whatever as being that IR. This
can end up being misleading when the end of the IR visit got copy
propagated out to a later instruction in the URB writes.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The VP generation doesn't set up the output reg strings, so if you
didn't happen to get these values as 0 on the stack, you'd lose.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Bison -o parameter expects a .c file.
The corresponding .h filename is obtained
by removing the extension of the initial .c.
This was breaking compilation on Ubuntu 12.04
libmesa_dricore_intermediates/libmesa_dricore.a(program_parse.tab.o): In
function `_mesa_parse_arb_program':
external/mesa/src/mesa/program/program_parse.y:2682: multiple definition
of `_mesa_parse_arb_program'
libmesa_dricore_intermediates/libmesa_dricore.a(lex.yy.o):external/mesa/src/mesa/program/program_parse.y:2682:
first defined here
Signed-off-by: Adrian Marius Negreanu <adrian.m.negreanu@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-and-tested-by: Chad Versace <chad.versace@linux.intel.com>
In my testing I haven't found any cases where we get a null context
pointer, but it might still be possible. Check for null just to be safe.
Note: This is a candidate for the stable branches.
Only fail if GLX_SAMPLE_BUFFERS_ARB or GLX_SAMPLES_ARB are non-zero.
We were already doing this in the older swrast/glx code.
This fixes a piglit/waffle problem where we'd always fail to get a
visual/config and report the test as "skip".
Note: This is a candidate for the stable branches.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We were warning when there was no current context and we're about
to delete a renderbuffer, but that happens fairly often and isn't
really a problem.
Fixes http://bugs.freedesktop.org/show_bug.cgi?id=57754
Note: This is a candidate for the stable branches.
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
This required an update for the query storage in llvmpipe, there
can now be an active query per query type, so an occlusion query
can run at the same time as a time elapsed query.
Based on PIPE_QUERY_TIME_ELAPSED patch from Dave Airlie.
v2: fix up piglits for timers (also from Dave Airlie)
a) if we don't render anything the result is 0, so just
return the current time
b) add missing screen get_timestamp callback.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: José Fonseca <jfonseca@vmware.com>
we need to rely on util code for fetching those, just like before
9f06061d50.
Fixes bugs 57699 and 57756.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Tell LLVM the exact alignment we can guarantee, based on the fs block
dimensions, pixel format, and the alignment of the resource base pointer
and stride.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The fs shader now depends on the color buffer formats. The shader key was
extended to accommodate this, but llvmpipe_update_derived needs to be
updated to check the framebuffer dirty flag.
This fixes bug 57674.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
I fixed the only known bugs on r500 with 0222b2bd41.
Now there are no piglit regressions with Hyper-Z and all apps I tested seem
to work.
To summarize how it works:
- Only one process can use it at a time. This is a hardware limitation.
- The first process to clear a zbuffer gets the exclusive access to use
Hyper-Z.
- Compositors don't use any zbuffer, so they won't steal it, but some web
browsers do, so make sure there's no web browser running if you want your
game to use Hyper-Z.
- There's no need to restart an app which couldn't get the access to Hyper-Z.
Just quit the app which took it, the driver can turn it on for the other app
in the middle of rendering.
- If an app gets the access to Hyper-Z, it prints "radeon: Acquired Hyper-Z"
to stdout.
r300-r400:
Hyper-Z will be enabled by default on r300-r400 once sufficient testing is
done with piglit and Lightsmark at least.
Be sure to set the env var RADEON_HYPERZ and run piglit with parameters: -c 0
This fixes wrong rendering in Lightsmark and
the piglit/depthstencil-render-miplevels.
I think I fixed Hyper-Z. So far every app seems to work like a charm.
The following commit broke the i965 build:
commit 4a486f8bf2
Author: Marek Olšák <maraeo@gmail.com>
Date: Fri Nov 23 18:31:42 2012 +0100
glx/dri2: add and use new driver hook flush_with_flags
That commit added a forward declaration of enum __DRI2throttleReason to
dri_interface.h. C++ 98 does not allow forward declarations of enums.
The fix: Move the enum's definition to earlier in the file.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
The border clamping code is unnecessary, since we don't care if a wrapped
coord value is -1 or <-1 (same for length vs. >length), in either case the
border handling code will mask out the offset and replace the texel value with
the border color.
Note that technically this is not entirely correct. Omitting clamping on the
float coords means that flt->int conversion may result in undefined values for
values of very large magnitude.
However there's no reason we should honor this here since:
a) we don't care for that for ordinary wrap modes in the aos code when
converting coords and the problem is worse there (as we've got only
effectively 24 instead of 32bits)
b) at least in some cases the clamping was done already in int space hence
doing nothing to fix that problem.
c) with sse2 flt->int conversion with such values results in 0x80000000 which
is just perfect (for clamp to border - not so much for the ordinary clamp to
edge).
Reviewed-by: Brian Paul <brianp@vmware.com>
Coverity pointed out this uninitialised class member.
Note: This is a candidate for stable branches.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
coverity pointed out this field was being used uninitialised.
Note: This is a candidate for stable branches.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
all unsigneds are >= 0 :-)
There may be an argument for leaving this in, in case someone
changes min_lod to an integer, so feel free to apply or drop.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reported by coverity scan.
v2: fix second case
Note: This is a candidate for stable branches.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
I haven't confirmed this is doing the correct thing, but at
least this might make someone review it!
Reported by internal RH coverity scan.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
the critical error would use driverName.
Found by internal RH coverity scan.
Note: This is a candidate for stable branches.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This reverts commit 962a1c07b4.
Further testing revealed that this commit can cause the pre-processor to enter
infinite loops. For now, simply revert this code until a cleaner,
better-tested version is available.
Previously, we were only supporting line-continuation backslash characters
within lines of pre-processor directives, (as per the specification). With
OpenGL 4.2 and GLES3, line continuations are now supported anywhere within a
shader.
While changing this, also fix a bug where the preprocessor was ignoring
line continuation characters when a line ended in multiple backslash
characters.
The new code is also more efficient than the old. Previously, we would
perform a ralloc copy at each newline. We now perform copies only at each
occurrence of a line-continuation.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When a client frame callback is executed and the client starts rendering
again, the egl event queue might not have been dispatched so that the
buffer release event for the previous frame hasn't been processed. In
that case a third buffer is allocated, even though it would be possible
to reuse the buffer that was just released.
The wl_display_dispatch_queue_pending() entry point is available from
wayland-client 1.0.2, so require that in configure.ac. Also, just
let the pkg-config macro throw its own error, which will show what version
we were looking for and failed to find.
Note: This is a candidate for stable branches.
Signed-off-by: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>
Commit ca3ed3e024 fixed the problem where
eglMakeCurrent would trigger a getbuffer callback that then breaks the
following wl_egl_window_resize() call. However, we still need to
invalidate buffers in eglSwapBuffers, since in wayland we always swap
buffers, so the dri driver needs to come out and ask us for the next buffer
after each swapbuffer.
Note: this is a candidate for stable branches.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
These helper macros save you from writing nasty expressions like:
if ((inst->src[1].type == BRW_REGISTER_TYPE_F &&
inst->src[1].imm.f == 1.0) ||
((inst->src[1].type == BRW_REGISTER_TYPE_D ||
inst->src[1].type == BRW_REGISTER_TYPE_UD) &&
inst->src[1].imm.u == 1)) {
Instead, you simply get to write inst->src[1].is_one(). Simple.
Also, this makes the FS backend match the VS backend (which has these).
This patch also converts opt_algebraic to use the new helper functions.
As a consequence, it will now also optimize integer-typed expressions.
Reviewed-by: Eric Anholt <eric@anholt.net>
The use-after-free happened when the renderbuffer was shared by multiple
contexts and we tried to delete the renderbuffer using a context which
was previously deleted.
Note: this is a candidate for the stable branches.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
To fix a pipe_context::surface_destroy() use-after-free problem.
We previously added pipe_sampler_view_release() for similar reasons.
Note: this is a candidate for the stable branches.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
We sometimes need a rendering context when deleting renderbuffers.
Pass it explicitly instead of trying to grab a current context
(which might be NULL). The next patch will make use of this.
Note: this is a candidate for the stable branches.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
We used to invalidate the drawable after a call to eglSwapBuffers(),
so that a wl_egl_window_resize() would take effect for the next frame.
However, that leads to calling dri2_get_buffers() when eglMakeCurrent()
is called with the current context and surface, and a later call to
wl_egl_window_resize() would not take effect until the next buffer
swap.
Instead, add a callback from wl_egl_window_resize() back to the wayland
egl platform, and invalidate the drawable only when it is resized.
This solves a bug on wayland clients when going back to windowed mode
from fullscreen when clicking a pop up menu, where the window size
after this would be the fullscreen size.
Note: this is a candidate for stable branches.
CC: wayland-devel@lists.freedesktop.org
Completely forgot about updating Makefile when removing it. Stephane
already fixed the make build, but there were a few mentions of
lp_tile_soa left in the tree.
The gen4 simd16 workaround looks at ir->type to determine how much
storage to allocate for the simd16 value. In fragment programs,
texturing only ever returns float vec4s (unlike GLSL, which can also
have scalar floats or vector integers), so this is the right type.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=56962
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We need to rebase colors (ex: set G=B=0) when getting GL_LUMINANCE
textures in following cases:
1. If the luminance texture is actually stored as rgba
2. If getting a luminance texture, but returning rgba
3. If getting an rgba texture, but returning luminance
A similar fix was pushed by Brian Paul for uncompressed textures
in commit: f5d0ced.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=47220
Observed no regressions in piglit and ogles2conform due to this fix.
This patch will cause failures in intel oglconform pxconv-gettex,
pxstore-gettex and pxtrans-gettex test cases. The cause of failures
is a bug in test cases. Expected luminance value is calculted
incorrectly in test cases: L = R+G+B.
V2: Set G = 0 when getting a RG texture but returning luminance.
Note: This is a candidate for stable branches.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <idr@freedesktop.org>
Drop these from the known limitations list since support was recently added
for these.
Also, fix a typo while in the area, (and the oddly missing final newline).
Reviewed-by: Matt Turner <mattst88@gmail.com>
This test file is very similar to test 113-line-and-file-macros but uses token
pasting for cleaner quiz answers (without spaces between the digits). This
test passes thanks to the recent addition of support for pasting INTEGER
tokens, (but would have failed without that).
(Note that this test is distinct from test 059-token-pasting-integer which
pastes integers parsed from the source. Those are parsed to INTEGER_STRING
tokens and are already pasted correctly as verified by that test. The only way
to generate the INTEGER tokens which currently fail to paste is with an
internal define such as __LINE__ that results in an integer.)
Reviewed-by: Matt Turner <mattst88@gmail.com>
As recently tested in the additions to the invalid paste test, it is illegal
to paste a non-digit sequence onto the end of an integer.
The 082-invalid-paste test should now pass again.
Reviewed-by: Matt Turner <mattst88@gmail.com>
The current code lets a few invalid pastes through, such as an string pasted
onto the end of an integer. Extend the invalid-paste test to catch some of
these.
Reviewed-by: Matt Turner <mattst88@gmail.com>
This time creating a new _token_list_create_with_one_integer function
modeled after the existing _token_list_create_with_one_space function
(both implemented with new _token_list_create_with_one_ival).
Reviewed-by: Matt Turner <mattst88@gmail.com>
This function is getting a little long too read. Simplify it by pulling
up one assignment from every condition.
Reviewed-by: Matt Turner <mattst88@gmail.com>
These tokens are easy to expand by just looking at the current, tracked
location values, (and no need to look anything up in the hash table).
Add a test which verifies __LINE__ with several values, (and verifies __FILE__
for the single value of 0). Our testing framework isn't sophisticated enough
here to have a test with multiple file inputs.
This commit fixes part of es3conform's preprocess16_frag test.
Reviewed-by: Matt Turner <mattst88@gmail.com>
This should help avoid confusion now that we're using the gl_api enum
to distinguishing between core and compatibility API's. The
corresponding enum value for core API's is API_OPENGL_CORE.
Acked-by: Eric Anholt <eric@anholt.net>
Acked-by: Matt Turner <mattst88@gmail.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
The current implementation was close by not fully correct: several
operations that should be done in floating point were being done in
integer.
Fixes piglit fbo-clear-formats GL_ARB_texture_float
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
untested (couldn't get the piglit test to run even with version overrides)
but seemed blatantly wrong.
In any case it would only affect an error case which when it would happen
probably all hope is lost anyway.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
This adds array (1d,2d) texture support to llvmpipe.
Though probably should do something about 1d array textures requiring gobs
of memory (this issue is not strictly limited to arrays but it is probably
worse there).
Initial code by Jakob Bornecrantz <jakob@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Support 1d and 2d array textures (including shadow samplers),
and (as a side effect mostly) also shadow cube samplers.
Seems to pass the relevant piglit tests both for sampling and rendering
to (though some require version overrides).
Since we don't support render target indices rendering to array textures
is still restricted to a single layer at a time.
Also, the min/max layer in the sampler view (which is unnecessary for GL)
is ignored (always use all layers).
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Now dead code.
Also had to remove the show_tiles/show_subtiles because now the color
buffers are always stored in their native format, so there is no longer
an easy way to paint the tile sizes.
Depth-stencil buffers are still swizzled.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Update llvmpipe_is_format_supported and llvmpipe_is_format_unswizzled
so that only the formats that we can render without swizzling are
advertised.
We can still render all D3D10 required formats except
PIPE_FORMAT_R11G11B10_FLOAT, which needs to be implemented in a future
opportunity.
Removal of rendertarget swizzling will be done in a subsequent change.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
It is buggy (it was giving wrong results for some of the formats with
padding), and util_format_description::is_array already does precisely
what's intended.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
This is what we want in practice.
The only change is in PIPE_FORMAT_R8SG8SB8UX8U_NORM, which no longer is
considered an array format.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
This patch fixes various format manipulation for big-endian
architectures.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This patch fixes various format manipulation for big-endian
architectures.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This patch adds two more functions in type conversions header:
* lp_build_bswap: construct a call to llvm.bswap intrinsic for an
element
* lp_build_bswap_vec: byte swap every element in a vector base on the
input and output types.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This patch fixes the vector constant generation used for vector shuffle
for big-endian machines.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This patch enforces the clear of NJ bit in VSCR Altivec register so
denormal numbers are handles as expected by IEEE standards.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This patch adds Altivec intrinsics for float vector types. It changes
the SSE specific definitions to a platform neutral and adds the calls
to Altivec intrinsic builder.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This patch add correct vector addition and substraction intrisics when
using Altivec with PPC. Current code uses default path and LLVM backend
ends up issuing carry-out arithmetic instruction while it is expected
saturated ones.
It also includes a fix for PowerPC where char are unsigned by default,
resulting in bogus values for vector shifting.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This patch adds PPC Altivec support for pack/unpack operations using Altivec
supported vector type (8xi8, 16xi16, 4xi32, 4xf32).
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The brw_compile structure contains the brw_instruction store and the
brw_eu_emit.c state tracking fields. These are only useful for the
final assembly generation pass; the earlier compilation stages doesn't
need them.
This also means that the code generator for future hardware won't have
access to the brw_compile structure, which is extremely desirable
because it prevents accidental generation of Gen4-7 code.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Compiling shaders requires several main steps:
1. Generating VS IR from either GLSL IR or Mesa IR
2. Optimizing the IR
3. Register allocation
4. Generating assembly code
This patch splits out step 4 into a separate class named "vec4_generator."
There are several reasons for doing so:
1. Future hardware has a different instruction encoding. Splitting
this out will allow us to replace vec4_generator (which relies
heavily on the brw_eu_emit.c code and struct brw_instruction) with
a new code generator that writes the new format.
2. It reduces the size of the vec4_visitor monolith. (Arguably, a lot
more should be split out, but that's left for "future work.")
3. Separate namespaces allow us to make helper functions for
generating instructions in both classes: ADD() can exist in
vec4_visitor and create IR, while ADD() in vec4_generator() can
create brw_instructions. (Patches for this upcoming.)
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Final code generation should never fail. This is a bug, and there
should be no user-triggerable cases where this could occur.
Also, we're not going to have a fail() method after the split.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
The brw_compile structure is closely tied to the Gen4-7 hardware
encoding. However, do_vs_prog is very generic: it just calls out to
get a compiled program and then uploads it.
This isn't ultimately where we want it, but it's a step in the right
direction: it's now closer to the code generator.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
During compilation, we allocate a bunch of things: the IR needs to last
at least until code generation...and then the program store needs to
last until after we upload the program.
For simplicity's sake, just keep it all around until we upload the
program. After that, it can all be freed.
This will also save a lot of headaches during the upcoming refactoring.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
We used to steal it out of the brw_compile struct, but that won't be
initialized in time soon (and is eventually going away).
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
We used to steal it out of the brw_compile struct...but vec4_visitor
isn't going to have one of those in the future.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This leaves only the final code generation stage in brw_vec4_emit.cpp,
moving the payload setup, run(), and brw_vs_emit functions to brw_vec4.cpp.
The fragment shader backend puts these functions in brw_fs.cpp, so this
patch also helps with consistency.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
I ran across this while running a glGenerateMipmap() test.
_meta_GenerateMipmap sets MESA_META_TRANSFORM, which causes
_mesa_meta_begin to try and set a default orthographic projection.
Unfortunately, if the drawbuffer isn't set up, ctx->DrawBuffer->Width
and Height are 0, which just causes an GL_INVALID_VALUE error.
Fixes oglconform's fbo/mipmap.automatic, mipmap.manual, and
mipmap.manualIterateTexTargets.
Reviewed-by: Brian Paul <brianp@vmware.com>
The rest of the plumbing was in place already.
I have tested this by turning on all GL 3.1 features.
The drivers not supporting GL 3.1 will fail to create a core profile
as they should.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Add a DEBUG_FREED_MEMORY option to help catch use-after-free errors.
Add debug_memory_check() function which can be periodically called to
check that all known blocks are good.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Will allow formats with padding, e.g. RGBX.
Will now allow swizzled formats as long as the alpha is channel 3.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
And add test cases to ensure that this works
- 110 verifies that glcpp rejects #elif<digits> which glcpp
previously accepted.
- 111 verifies that glcpp accepts #if followed immediately by
(, +, -, !, or ~.
- 112 does the same as 111 but for #elif.
See 17f9beb6 for #if change.
Reviewed-by: Carl Worth <cworth@cworth.org>
radeonsi now supports Z16 and doesn't fail these assertions anymore.
This partially reverts commit 7bba4879bb, but
leaves the error messages in place to allow diagnosing such problems even with
non-debugging builds.
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Fixes this SCons build error on Mac OS X if X11 is found.
NameError: name 'ws_xlib' is not defined:
File "SConstruct", line 144:
duplicate = 0 # http://www.scons.org/doc/0.97/HTML/scons-user/x2261.html
File "scons-2.2.0/SCons/Script/SConscript.py", line 614:
return method(*args, **kw)
File "scons-2.2.0/SCons/Script/SConscript.py", line 551:
return _SConscript(self.fs, *files, **subst_kw)
File "scons-2.2.0/SCons/Script/SConscript.py", line 260:
exec _file_ in call_stack[-1].globals
File "src/SConscript", line 34:
SConscript('gallium/SConscript')
File "scons-2.2.0/SCons/Script/SConscript.py", line 614:
return method(*args, **kw)
File "scons-2.2.0/SCons/Script/SConscript.py", line 551:
return _SConscript(self.fs, *files, **subst_kw)
File "scons-2.2.0/SCons/Script/SConscript.py", line 260:
exec _file_ in call_stack[-1].globals
File "src/gallium/SConscript", line 135:
'targets/libgl-xlib/SConscript',
File "scons-2.2.0/SCons/Script/SConscript.py", line 614:
return method(*args, **kw)
File "scons-2.2.0/SCons/Script/SConscript.py", line 551:
return _SConscript(self.fs, *files, **subst_kw)
File "scons-2.2.0/SCons/Script/SConscript.py", line 260:
exec _file_ in call_stack[-1].globals
File "src/gallium/targets/graw-xlib/SConscript", line 9:
ws_xlib,
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
According to the ARB_vertex_type_2_10_10_10_rev specification:
"The error INVALID_ENUM is generated by VertexP*, NormalP*,
TexCoordP*, MultiTexCoordP*, ColorP*, or SecondaryColorP if <type>
is not UNSIGNED_INT_2_10_10_10_REV or INT_2_10_10_10_REV."
Fixes 7 subcases of oglconform's packed-vertex test.
v2: Add "gl" prefix to error messages (pointed out by Brian).
Also rebase atop the ctx plumbing.
Reviewed-by: Brian Paul <brianp@vmware.com>
Traditionally, OpenGL has had two separate equations for converting from
signed normalized fixed-point data to floating point data. One was used
primarily for vertex data, while the other was primarily for texturing
and framebuffer data.
However, ES 3.0 and GL 4.2 change this, declaring there's only one
equation to be used in all cases. Unfortunately, it's the other one.
v2: Correctly convert 0b10 to -1.0, as pointed out by Chris Forbes.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
The rules for converting these values actually depend on the current
context API and version. The next patch will implement those changes.
v2: Mark ctx as const, as suggested by Brian.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Fixes part of es3conform's transform_feedback_init_defaults test.
NOTE: This is a candidate for the stable branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
v2: Perform this count the same way as elsewhere in this file, per
Brian Paul's review.
Fixes part of es3conform's transform_feedback_init_defaults test.
NOTE: This is a candidate for the stable branches.
Reviewed-by: Brian Paul <brianp@vmware.com>
Previously this function would assert if the format didn't fit an expected 4 channel format size.
Now will work with any format type with any amount of channels.
Signed-off-by: José Fonseca <jfonseca@vmware.com>
The brw_compile structure contains the brw_instruction store and the
brw_eu_emit.c state tracking fields. These are only useful for the
final assembly generation pass; the earlier compilation stages doesn't
need them.
This also means that the code generator for future hardware won't have
access to the brw_compile structure, which is extremely desirable
because it prevents accidental generation of Gen4-7 code.
v2: rzalloc p, as suggested by Eric.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Compiling shaders requires several main steps:
1. Generating FS IR from either GLSL IR or Mesa IR
2. Optimizing the IR
3. Register allocation
4. Generating assembly code
This patch splits out step 4 into a separate class named "fs_generator."
There are several reasons for doing so:
1. Future hardware has a different instruction encoding. Splitting
this out will allow us to replace fs_generator (which relies
heavily on the brw_eu_emit.c code and struct brw_instruction) with
a new code generator that writes the new format.
2. It reduces the size of the fs_visitor monolith. (Arguably, a lot
more should be split out, but that's left for "future work.")
3. Separate namespaces allow us to make helper functions for
generating instructions in both classes: ADD() can exist in
fs_visitor and create IR, while ADD() in fs_generator() can
create brw_instructions. (Patches for this upcoming.)
Furthermore, this patch changes the order of operations slightly.
Rather than doing steps 1-4 for SIMD8, then 1-4 for SIMD16, we now:
- Do steps 1-3 for SIMD8, then repeat 1-3 for SIMD16
- Generate final assembly code for both modes together
This is because the frontend work can be done independently, but final
assembly generation needs to pack both into a single program store to
feed the GPU.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Final code generation should never fail. This is a bug, and there
should be no user-triggerable cases where this could occur.
Also, we're not going to have a fail() method in a moment.
v2: Just abort() rather than assert, to cover the NDEBUG case
(suggested by Eric).
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
All we really need is a memory context and the instruction list; passing
a backend_visitor is just convenient at times.
This will be necessary two patches from now.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
The brw_compile structure is closely tied to the Gen4-7 hardware
encoding. However, do_wm_prog is very generic: it just calls out to
get a compiled program and then uploads it.
This isn't ultimately where we want it, but it's a step in the right
direction: it's now closer to the code generator.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
We used to steal it out of the brw_compile struct...but fs_visitor
isn't going to have one of those in the future.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Also change it from a brw_fragment_program to a gl_fragment_program,
since that seems to be what everything wants anyway.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
We can easily recover it from prog, and this makes it clear that we
aren't passing additional information in.
v2: Use an if-statement rather than the ?: operator (suggested by Eric).
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Also, rather than having brw_wm_fs_emit poke at it directly, make it a
parameter to the fs_visitor constructor.
All other changes generated by search and replace (with occasional
whitespace fixup).
v2: Make dispatch_width const (as suggested by Paul); fix doxygen
mistake (pointed out by Eric); update for rebase.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Now that we only have the one backend, there's no real point in keeping
this separate. Moving it should allow some future simplifications.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Everybody determines this by checking if fp's OutputsWritten field
contains the FRAG_RESULT_DEPTH bit. Rather than having payload setup
check this and set the computes_depth flag, we can just do the check in
the only place that actually used it: emit_fb_writes().
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
No longer have to split fetching into quads dynamically if mip levels
are not the same for all quads (aos sampling still always splits due
to performance reasons).
Instead handle multiple mip levels further down, minification etc. takes
this into account.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
This also adds some code to handle per-quad lods for more than 4-wide fetches,
because otherwise I'd have to integrate the texelFetch function into
the splitting stuff... (but it is not used yet outside texelFetch).
passes piglit fs-texelFetch-2D, fails fs-texelFetchOffset-2D due to I believe
a test error (results are undefined for out-of-bounds fetches, we return
whatever is at offset 0, whereas the test expects [0,0,0,1]).
Texel offsets are only handled by texelFetch for now, though the interface
can handle it for everything.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
v2 (Kayden): Move the enable into an existing intel->gen >= 4 block
(as suggested by Ian).
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Implements BGRA swizzle, sign recovery, and normalization
as required by ARB_vertex_type_10_10_10_2_rev.
V2: Ported to the new VS backend, since that's all that's left;
fixed normalization.
V3: Moved fixups out of the GLSL-only path, so it works for FF/VP too.
V4 (Kayden): Rework ES3 normalization, don't heap allocate registers;
tidy comments.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Flag the need for various workarounds to be applied by
the vertex shader.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Next few patches build on this to add other workarounds
for packed formats.
V2: rename BRW_ATTRIB_WA_COMPONENTS to BRW_ATTRIB_WA_COMPONENT_MASK;
V3 (Kayden): remove separate bit for ES3 signed normalization
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Always use R10G10B10A2_UINT; Most of the other formats we'd like
don't actually work on the hardware. Will emit w/a for scaling,
sign recovery and BGRA swizzle in the VS.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Until we have proper 'make dist' this is an improvement of the current
situation, because each time some old Makefiles got converted to automake
we had to update the tarballs target.
NOTE: This is a candidate for the 9.0 branch.
Cc: Eric Anholt <eric@anholt.net>
Acked-by: Matt Turner <mattst88@gmail.com>
We can't support IF statements in 16-wide on these. To get back to 16-wide
for these shaders, we need to support predicate on discard instructions in the
backend IR, which is something we've sort of got on the list to do anyway.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=55828
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Commit 774fb90db3 introduced a ralloc context to
each user of struct brw_compile, but for this one a NULL context was used,
causing the later ralloc_free(mem_ctx) to not do anything.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=55175
NOTE: This is a candidate for the stable branches.
We have a special case where non-shadow comparison with LOD requires using a
SIMD16 vec4 in an 8-wide shader, which appears in the register allocator as a
size 8 vgrf.
Fixes assertions in various piglit tests and webgl conformance.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=56521
Since a signed 2-bit integer can only represent -1, 0, or 1, it is
tempting to simply to convert it directly to a float. This maps it
onto the correct range of [-1.0, 1.0]. However, it gives different
values compared to the usual equation:
(2.0 * 1.0 + 1.0) * (1.0 / 3.0) = +1.0 (same)
(2.0 * 0.0 + 1.0) * (1.0 / 3.0) = +0.33333333... (different)
(2.0 * -1.0 + 1.0) * (1.0 / 3.0) = -0.33333333... (different)
According to the GL_ARB_vertex_type_2_10_10_10_rev extension, signed
normalization is performed using equation 2.2 from the GL 3.2
specification, which is:
f = (2c + 1)/(2^b - 1). (2.2)
Comments below that equation state: "In general, this representation is
used for signed normalized fixed-point parameters in GL commands, such
as vertex attribute values." Which is what we're doing here.
The 3.2 specification goes on to declare an alternate formula:
f = max{c/(2^(b-1) - 1), -1.0} (2.3)
which is closer to the existing code, and maps the end points to exactly
-1.0 and 1.0. Comments below the equation state: "In general, this
representation is used for signed normalized fixed-point texture or
framebuffer values." Which is *not* what we're doing here.
It then states: "Everywhere that signed normalized fixed-point
values are converted, the equation used is specified." This is the real
clincher: the extension explicitly specifies that we must use equation
2.2, not 2.3. So we need to do (2x + 1) / 3.
This matches the behavior expected by oglconform's packed-vertex test,
and is correct for desktop GL (pre-4.2). It's not correct for ES 3.0,
but a future patch will correct that.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Marek Olšák <maraeo@gmail.com>
For the 10-bit components, the divisor was incorrect. A 10-bit signed
integer can represent -2^9 through 2^9 - 1, which leads to the following
ranges:
(float)value.x -> [ -512, 511]
2.0F * (float)value.x -> [-1024, 1022]
2.0F * (float)value.x + 1.0F -> [-1023, 1023]
So dividing by 511 would incorrectly scale it to approximately:
[-2.001956947, 2.001956947]. To correctly scale to [-1.0, 1.0], we need
to divide by 1023.
This correctly implements the desktop GL rules. ES 3.0 has different
rules, but those will be implemented in a separate patch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Marek Olšák <maraeo@gmail.com>
The bug was found by Coverity.
NOTE: This is a candidate for the stable branches.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This gives us checking of our arguments (no more passing 1 operand to
BRW_OPCODE_MUL!), at the cost of a couple of extra parens.
v2: Rebase on gen6-if fix.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
This was a regression in the brw_fs_fp.cpp change. We just need to return
something good enough to get the IR generation to the end without crashing,
but ir->type isn't initialized and we wanted something of the coordinate's
type anyway.
Fixes around 30 piglit cases on my ilk system in drawpixels and framebuffer
blit.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=56962
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The theory of the guardband is that you extend the clip volume to avoid
expensive clipping computation, and just let fragments outside the viewport
get clipped by the drawable's bounds. But if a smaller-than-window-size
viewport is set, and we don't also happen to have a scissor set, then
rendering could incorrectly extend outside of the viewport when it should have
been clipped to the viewport.
Fixes the new piglit triangle-guardband-viewport test.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
NOTE: This is a candidate for the 9.0 branch.
When you're comparing to the spec, you're trying to immediately see what
numbered dword of the packet your bit ends up in.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
NOTE: This is a candidate for the 9.0 branch.
All Intel code is compiled with -std=c99. There is no excuse to not use
designated initializers.
As a nice benefit, the code is now more friendly to grep. Without
designated initializers, psychic prowess is required to find the
initialization of DRI extension function pointers with grep. I have
observed several people, when they first encounter the DRI code, fail at
statically chasing the DRI function pointers due to this problem.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
The dri directory is compiled with -std=c99. There is no excuse to not use
designated initializers.
As a nice benefit, the code is now more friendly to grep. Without
designated initializers, psychic prowess is required to find the
initialization of DRI extension function pointers with grep. I have
observed several people, when they first encounter the DRI code, fail at
statically chasing the DRI function pointers due to this problem.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
For a packed depth/stencil buffer on separate stencil hardware, the
separate depth miptree is set up with alignment of 4,4 and the separate
stencil miptree is setup with alignment of 8,8. We can't just use the
irb->draw_{x,y} offsets for stencil, since that is the offset in the
depth miptree.
Fixes 12 piglit depthstencil testcases on ivb.
Acked-by: Chad Versace <chad.versace@linux.intel.com>
Given that we have the mask information here (assuming the rebase is to
the same tiling, which is safe), we can just save a set of miptrees and
offsets and the global intra-tile offset in the context and cut out a
bunch of logic. This will also save emitting the next fix I need to do
twice.
Acked-by: Chad Versace <chad.versace@linux.intel.com>
Fixes a theoretical problem where we had an aligned depth buffer and a
misaligned stencil buffer with a matching tile offset, so we would fail
to rebase depth even after the needed tile offset changed due to the
rebase of stencil.
It should also fix double-rebase of a misaligned packed depth/stencil
renderbuffer, which may have been a performance issue.
Acked-by: Chad Versace <chad.versace@linux.intel.com>
We were always passing 0 for one of the two fields, and the code just used
whichever one wasn't 0.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
I noticed these in the next patch where these paths were using the Face
of a teximage but didn't have array handling.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Apparently this was accidentally marked as unimplemented, and thus not
put in the dispatch table.
Fixes 7 es3conform tests:
- copy_buffer_parameters
- copy_buffer_data
- copy_buffer_usage
- pixel_buffer_object_bind
- pixel_buffer_object_parameteriv
- pixel_buffer_object_texture_read
- pixel_buffer_object_usage
v2: Also update the DispatchSanity test for this change.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Only legacy OpenGL allows the use of non-gen'd names. Core profiles
and ES 3 both require the use of glGenQueries().
Note that BeginQuery doesn't exist in ES 1 or ES 2.
Fixes es3conform's occlusion_query_invalid_beginquery test.
Reviewed-and-tested-by: Matt Turner <mattst88@gmail.com>
GL_READ_FRAMEBUFFER and GL_DRAW_FRAMEBUFFER are valid targets in ES 3.
Fixes 23 es3conform framebuffer_blit tests. Two more go from fail to
crash, but that appears to be because they actually run now.
Reviewed-and-tested-by: Matt Turner <mattst88@gmail.com>
Calling glTexParameteri() with pname GL_TEXTURE_MAX_LEVEL and either a
target of GL_TEXTURE_RECTANGLE or a negative value previously generated
GL_INVALID_OPERATION. However, GL_INVALID_VALUE seems more appropriate.
Fixes oglconform's api-error/negative.glTexParameter and es3conform's
sgis_texture_lod_basic_error.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-and-tested-by: Matt Turner <mattst88@gmail.com>
The new brw_reg always had type BRW_REGISTER_TYPE_F, rather than
inheriting the original type of the ATTR file register.
In the past, this hasn't been a problem since we only execute this code
when fixing up GL_FIXED attributes, which always have float types.
However, we'll soon be using it for ARB_vertex_type_10_10_10_2 support,
which uses D and UD types.
Reviewed-by: Eric Anholt <eric@anholt.net>
For GLES1 and GLES2, brwCreateContext neglected to validate the requested
context version received from the DRI layer. If DRI requested an OpenGL
ES2 context with version 3.9, we provided it one.
Before this fix, the switch statement that validated the requested GL
context flavor was an ugly #ifdef copy-paste mess. Instead of reproducing
the copy-past-mess for GLES1 and GLES2, I first refactored it. Now the
switch statement is readable.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
It seems that -NDEBUG and other flags might still be leaked through
those variables, so strip those off there as well.
NOTE: This is a candidate for the 9.0 branch.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
In addition to registers used by instructions, fs_visitor maintains
direct references to certain "special" values used for inputs/outputs.
When I added VGRF compaction, I overlooked these, believing that these
direct references weren't used once instructions were generated. That
was wrong. For example, pixel_x/y are used in virtual_grf_interferes(),
which is called by optimization passes and register allocation.
This patch treats all of them as used and patches them after compacting.
While it's not strictly necessary to patch all of them (as some aren't
used after emitting code), it seems safer to simply fix them all.
Fixes oglconform's textureswizzle/advanced.shader.targets, piglit's
glsl-fs-lots-of-tex, and glean's texCombine on pre-Gen6 hardware.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=56790
Reviewed-by: Eric Anholt <eric@anholt.net>
The goal of that change was to skip counting things that aren't actually
outputs from the VS to the FS. However, explicit_location isn't set in
the case of linker-assigned locations (the common case), so basically
varying component counting got disabled. At this stage of the linker,
we've already ensured that var->location is set, so we can just look at
it without worrying.
Fixes i965 assertion failure with the new
piglit glsl-max-varyings --exceed-limits.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=51545
Reviewed-by: Brian Paul <brianp@vmware.com>
The diff looks funny, but it's moving the integer vs non-integer check
below the _mesa_source_buffer_exists() check that ensures
_ColorReadBuffer is non-null, so we get a GL_INVALID_OPERATION instead
of a segfault. This looks like it had regressed in the
_mesa_error_check_format_and_type() changes, which removed the first of
the two duplicated checks for the source buffer. Fixes segfault in the
new piglit ARB_framebuffer_object/negative-readpixels-no-rb.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=45877
NOTE: This is a candidate for the stable branches.
Reviewed-by: Brian Paul <brianp@vmware.com>
Now that we're using the new backend, we may actually put things into push
constants if you have too many uniform values uploaded. Also, correctly
account for texture rectangle params and drop the old special case for the
0.0/1.0 params from the old backend.
MinGW has snprintf.
The patch fixes these warnings with the MinGW SCons build.
src/gallium/auxiliary/util/u_snprintf.c:459:1: warning: no previous prototype for ‘util_vsnprintf’ [-Wmissing-prototypes]
src/gallium/auxiliary/util/u_snprintf.c:1436:1: warning: no previous prototype for ‘util_snprintf’ [-Wmissing-prototypes]
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Tested-by: Brian Paul <brianp@vmware.com>
If an instruction reads from a constant register that contains
immediates using an invalid swizzle, we can avoid generating MOV
instructions to fix up the swizzle by loading the immediates into a
different constant register that can be read using a valid swizzle.
This only affects r300 and r400 cards.
For example:
CONST[1] = { -3.5000 3.5000 2.5000 1.5000 }
MAD temp[4].xy, const[0].xy__, const[1].xz__, input[0].xy__;
========== Before this change would be lowered to: =========
CONST[1] = { -3.5000 3.5000 2.5000 1.5000 }
MOV temp[0].x, const[1].x___;
MOV temp[0].y, const[1]._z__;
MAD temp[4].xy, const[0].xy__, temp[0].xy__, input[0].xy__;
========== After this change is lowered to: ===============
CONST[1] = { -3.5000 3.5000 2.5000 1.5000 }
CONST[2] = { 0.0000 -3.5000 2.5000 0.0000 }
MAD temp[4].xy, const[0].xy__, const[2].yz__, input[0].xy__;
============================================================
This change reduces one of the Lightsmark shaders from 133 to 91
instructions.
v2:
- Fix crash caused by swizzles with only inline constants.
Use per asic golden values.
Programming this register doesn't seem to be strictly
necessary on SI, but programming it wrong leads to
rendering issues or reduced performance so just
go ahead and program the golden values explicitly
to avoid any potential problems down the road.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
For precise lts support I had to do some magic with the library names, which works fine
as long as the libraries from pkg-config are used.
The parts with src/gallium/targets/va-*/Makefile will not apply on the master branch,
but do apply to the 9.0 branch.
NOTE: This is a candidate for the 9.0 branch.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Acked-by: Matt Turner <mattst88@gmail.com>
fixes regression introduced in 9078441072
Targets for making lex.yy.c program_parse.tab.c and program_parse.tab.h
got moved into its own Makefile
Reviewed-by: Matt Turner <mattst88@gmail.com>
This was added in version 22 of the GL_ARB_sync spec.
Fixes gles3conform's sync_error_waitsync_timeout test.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
All the other range checks on index already return the proper error,
INVALID_VALUE.
Fixes gles3conform's instanced_arrays_invalid test.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
brw_optimize.c's brw_opcodes table was a copy of brw_disasm.c's
opcode_descs table, but with an additional field: is_arith. Now that
I've deleted that, the two are identical. Keep the one in brw_disasm.c.
Reviewed-by: Eric Anholt <eric@anholt.net>
All users of basic block analysis simply create their own local
variables. Nobody uses the visitor-wide field.
Reviewed-by: Eric Anholt <eric@anholt.net>
The old brw_remove_grf_to_mrf_moves() pass is obsolete and replaced by
fs_visitor::compute_to_mrf().
The old brw_remove_duplicate_mrf_moves() pass is obsolete and replaced
by fs_visitor::remove_duplicate_mrf_writes().
The remaining pass, brw_set_dp4_dependency_control(), is currently
unused, but could be, so I'm leaving it for now.
Reviewed-by: Eric Anholt <eric@anholt.net>
At this point, it's just gl_shader_program. Nobody even uses it; even
the program that creates them only returns gl_shader_program pointers.
Reviewed-by: Eric Anholt <eric@anholt.net>
The passthrough pipeline needs to check index values (which might be passed
through) as they can be invalid (which causes crashes and various assertion
failures if the clip code runs). Obviously, rendering won't be well-defined,
but those bogus indices might come directly from apps.
There were already debug printfs which reported the out-of-bounds indices but
we really ought to not crash.
While checking at that point doesn't seem like the most efficient solution,
it seems there isn't really another appropriate function to do it.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Assert the the CB format is valid and default to
the INVALID hw format rather than ~0U when the format
doesn't match for non-debug builds.
v2: use INVALID hw format rather than ~0U
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Assert that the DB format is valid and default to
the INVALID hw format rather than ~0U when the format
doesn't match for non-debug builds.
v2: use INVALID hw format rather than ~0U
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
This is necessary for backwards compatibility with pre-SI for stencil.
Fixes a number of stencil related piglit tests, and real apps using stencil.
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
On Gen6-7, we don't compact clip planes, and nr_userclip_plane_consts
is the last bit set, so iterating from i = 0..nr_userclip_plane_consts
covers all active clip planes and is the right thing to do.
works and is the right thing to do.
However, that doesn't work at all on Gen4-5. Since we don't compact
clip planes, we skip over ones which aren't active (via the continue
statement). We also set set nr_userclip_plane_consts to the number of
active clip planes, which means that we end the loop after checking that
many bits. If the set of clip planes wasn't contiguous, this means we'd
fail to find the last few.
By changing the iteration to MAX_CLIP_PLANES, we correctly find all of
the active clip planes.
Fixes regressions since 66c8473e02 (replacing the old VS backend) in
Piglit's spec/glsl-1.20/execution/clipping/fixed-clip-enables and
oglconform's mustpass(basic.clip) and userclip(basic.allCases).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=56791
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
There's no compaction, so we can drop that code and simply use 'i'.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Since Gen4-5 compacts clip planes and Gen6-7 doesn't, it makes sense to
split them into separate code paths. This patch simply copies the code
to both halves; the next commits will simplify it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The previous 1023-entry chaining hash table never resized, so it was very
inefficient when there were many objects live. While one could have an even
more efficient implementation than this (keep an array for genned names with
packed IDs, or take advantage of the fact that key == hash or key ==
*(uint32_t *)data to store less data), this is fairly fast, and I want a nice
replacement hash table for other parts of Mesa, too.
It improves Minecraft performance 12.3% +/- 1.4% (n=9), dropping hash lookups
from 8% of the profile to 0.5%.
I also tested cairo-gl, which should be a pessimal workload for this hash
table: around 247000 FBOs created and destroyed, only around 65 live at any
time, and few lookups of them between creation and destruction. No
statistically significant performance difference at n=76 (mean 20.3/20.4
seconds, sd 2.8/3.2 seconds). If I remove the >20 seconds outliers that
appear to be due to thermal throttling, there's possibly a .97% +/- 0.31%
performance win (n=61/59). The choice of cutoff for outliers feels a lot like
cooking the data, but I've gone through this process 3 times for minor
iterations of the code with the same conclusion each time.
Reviewed-by: Brian Paul <brianp@vmware.com>
Acked-by: Chad Versace <chad.versace@linux.intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
Mesa's chaining hash table for object names is slow, and this should be much
faster. I namespaced the functions under _mesa_*, to avoid visibility
troubles that we may have had before with hash_table_* functions.
v2: Move .c file to main/, const a few things, clean up loop conditions,
add/extend some comments.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
sparc/clip.c got moved to sparc/sparc-clip.c to avoid doing this workaround in
the parent directory.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
While simplifying mesa/Makefile.am, the more important feature of this commit
is allowing a file with the same name to appear in both main/ and program/.
v2: [chadv] Add changes to Android makefiles.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
Signed-off-by: Chad Versace <chad.versace@linux.intel.com> (v2)
The pair of files src/mesa/Android.mk and src/mesa/Android.gen.mk are too
long and complex to be easily understood. This patch belongs to a series
that decomposes them into several easily digestible makefiles.
This patch move the rules for libmesa_st_mesa.a from Android.mk to
Android.libmesa_st_mesa.mk.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
The pair of files src/mesa/Android.mk and src/mesa/Android.gen.mk are too
long and complex to be easily understood. This patch belongs to a series
that decomposes them into several easily digestible makefiles.
This patch move the rules for libmesa_dricore.a from Android.mk to
Android.libmesa_dricore.mk.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
The pair of files src/mesa/Android.mk and src/mesa/Android.gen.mk are too
long and complex to be easily understood. This patch belongs to a series
that decomposes them into several easily digestible makefiles.
This patch move the rules for host executable mesa_gen_matypes from
Android.mk to Android.mesa_gen_matypes.mk.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
The pair of files src/mesa/Android.mk and src/mesa/Android.gen.mk are too
long and complex to be easily understood. This patch belongs to a series
that decomposes them into several easily digestible makefiles.
This patch move the rules for the host and target libmesa_glsl_utils.a
from Android.mk to Android.libmesa_glsl_utils.mk.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
They were always used with the corresponding *_FILES variables now that
automake handles rule generation.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Array textures were broken.
NOTE: This is a candidate for the stable branches.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Array textures were broken.
NOTE: This is a candidate for the stable branches.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
It was pretty broken with array textures, where the array size (height or
depth depending on the target) shouldn't be magnified.
The guessing also doesn't fail with 1D and cube textures.
NOTE: This is a candidate for the stable branches.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
MaxLog2 led to bugs, because it didn't work well with 1D and 3D textures.
NOTE: This is a candidate for the stable branches.
v2: correct the comment at MaxNumlevels
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
This might have a slight overhead but handling mip offsets more like
the width (and image) strides should make some things easier (mip level
being just part of the offset calculation) later.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
This is preparation work for using mip level offsets + base_ptr for texture
sampling instead of per-mip pointers.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
I can never remember what "AB" means, and having to constantly consult
the docs is annoying. Just add comments to the top which explain each
of the abbreviations.
Previously, we used these XML annotations to make the code generation
scripts aware of any instances where the Mesa implementation of a
function had a prefix other than "_mesa_". Now that all of the mesa
implementation functions have been renamed to match the XML, we only
need to handle exec="skip", exec="dynamic", and the default case of
exec="mesa".
Acked-by: Brian Paul <brianp@vmware.com>
Previously, we used the mesa_name XML attribute to make the code
generation scripts aware of any instances where the Mesa
implementation of a function had a different function name suffix than
the primary name in the XML. Now that all of the Mesa implementation
functions have been renamed to match the XML, this attribute is no
longer necessary.
Acked-by: Brian Paul <brianp@vmware.com>
This patch changes the use of const in the type signatures of
_mesa_ShaderSource() and _mesa_TransformFeedbackVaryings(), to match
the type signatures in the GL spec. This avoids warnings when
building the code-generated api_exec.c file.
Note: previously we avoided the build warnings because these functions
were being type-checked against ShaderSourceARB and
TransformFeedbackVaryingsEXT; those functions are semantically
equivalent, but have fewer const qualifiers in their type signatures.
Acked-by: Brian Paul <brianp@vmware.com>
Vector indexing on matrixes generates several copy of the
constant matrix, for instance vec=mat4[i][j] generates :
vec=mat4[i].x;
vec=(j==1)?mat4[i].y;
vec=(j==2)?mat4[i].z;
vec=(j==3)?mat4[i].w;
In the case of constant matrixes, the mat4[i] expression generates
copy of the 16 elements of the matrix 4 times ; indirect addressing
also prevents some conservative CSE algorithms (like the one in LLVM)
from factoring the mat4[i] expression.
This patch will make the vec_index_to_cond pass generates :
temp = mat4[i];
vec=temp.x;
vec=(j==1)?temp.y;
vec=(j==2)?temp.z;
vec=(j==3)?temp.w;
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We were accidentally setting bit 14 in DWord 2 (which is Reserved/MBZ)
rather than bit 14 in DWord 3 (which is AA Line Distance Mode).
There's also no reason to ever set it to legacy mode; the bit is only
used when drawing antialiased lines anyway. Set it unconditionally.
NOTE: This is a candidate for stable branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The parameters and operation of this function changed, but I didn't
bother to change the prologue comment.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
This reverts commit 0d61f879a1.
Assigning the FS inputs to the 12 bit field is fine since we don't care
about the higher FS inputs. Maybe I'll revisit silencing the compiler
warning another day.
By moving the HASH_LINE rule out of control_line: and into line:, we avoid
adding control_line's additional \n (as seen in the first hunk).
mattst88: Carl and I determined independently of Fabian that the 091
test needed to be modified identically to this, and our patch to fix the
test was more complicated.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=51506
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Carl Worth <cworth@cworth.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously we were accepting garbage after #else and #endif tokens when
the previous preprocessor conditional evaluated to false (eg, #if 0).
When the preprocessor hits a false conditional, it switches the lexer
into the SKIP state, in which it ignores non-control tokens. The parser
pops the SKIP state off the stack when it reaches the associated #elif,
#else, or #endif. Unfortunately, that meant that it only left the SKIP
state after the lexing the entire line containing the #token and thus
would accept garbage after the #token.
To fix this we use a mid-rule, which is executed immediately after the
#token is parsed.
NOTE: This is a candidate for the stable branch
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=56442
Fixes: preprocess17_frag.test from oglconform
Reviewed-by: Carl Worth <cworth@cworth.org> (glcpp-parse.y)
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Brian reported seeing:
r600_texture.c: In function ‘r600_texture_create_object’:
r600_texture.c:468:12: warning: format ‘%llu’ expects type ‘long long unsigned int’, but argument 3 has type ‘uint64_t’
r600_texture.c:468:12: warning: format ‘%llu’ expects type ‘long long unsigned int’, but argument 4 has type ‘uint64_t’
r600_texture.c:485:12: warning: format ‘%llu’ expects type ‘long long unsigned int’, but argument 3 has type ‘uint64_t’
r600_texture.c:485:12: warning: format ‘%llu’ expects type ‘long long unsigned int’, but argument 4 has type ‘uint64_t’
this should wrap over them fine.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This contains the evergreen support.
Support is possible on rv670 upwards and the code in here
should work, but it doesn't and I haven't debugged it to
figure out why.
Beyond just adding support for the cube map array sampling,
r600 resinfo isn't conformant with the GL specification,
which states the number of layers should be returned for
the textureSize, so we have to track in an external
constant buffer the layers for each sampler if we need
them in the shader.
v2: only update the sampler constants if the sampler views have changed,
as suggested by Marek.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
draw_delete_geometry_shader() seems to be the real one.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
util_pack_z_stencil was being unconditionally invoked for all formats,
causing an assertion failure for Z32_FLOAT_S8X24_UINT.
NOTE: Candidate for the stable branches.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Alpha is also 1 for formats like R32G32_FLOAT.
NOTE: Candidate for the stable branches.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
We must multiply the factor against the destination, not the source.
NOTE: Candidate for the stable branches.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
For drivers with native integer / SM4 support this is just an hindrance.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The GL_POINT_BIT state attribute GL_POINT_SPRITE_COORD_ORIGIN
is only supported on OpenGL-2.0 or later. Prevent glPopAttrib()
from trying to restore it on OpenGL-1.4 implementations which
support GL_ARB_POINT_SPRITE, as otherwise the sequence...
glPushAttrib(GL_POINT_BIT);
glPopAttrib();
throws an GL_INVALID_ENUM error in glPopAttrib().
See also commit f778174ea1
NOTE: This is a candidate for the 9.0 branch.
Signed-off-by: Mario Kleiner <mario.kleiner@tuebingen.mpg.de>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Since cf438f5375e242, we store actual integers for the attribute data.
We just need to reinterpret the GLfloat array as a GLint/GLuint array
so we can read the proper data.
Fixes oglconform's glsl-vertex-attrib/basic.VertexAttribI[1234][u]i
subtests (after fixing an unrelated bug in those test cases).
v2: Use the COPY_4V macro to be concise.
NOTE: This is a candidate for the stable branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Marek Olšák <maraeo@gmail.com> [v1]
This adds support to the softpipe texture sampler and tgsi exec.
In order to handle the extra input to the texture sampling,
I've had to expand the interfaces to take a c1 value for storing
the texture compare value for the TEX2 case.
v1.1: add comments (Brian)
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds mesa state tracker support for the new extension,
along with glsl->tgsi conversion to use the new opcodes
where appropriate.
v2: fix assert found running textureSize tests.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This just adds the texture target and capability along
with 3 new opcodes required to support this extension.
As this extension requires some texture opcodes with samp + 5 args,
we need to use another src register, this is only required
for TEX, TXL and TXB opcodes to implement this spec.
TEX2 is required for shadow cube map arrays
TXL2 is required for cube map array sampler + explicit lod
TXB2 is required for cube map array sampler + lod bias
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds all the new builtins + the new sampler types,
and hooks them up if the extension is supported.
v2: fix missing signatures for grad/lod
fix missing textureSize clarifications
fix compare vs starts with usage
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds the mesa core + texture + fbo support for the
texture cube map array extension.
v2:
add comment to _mesa_num_tex_faces related to cube map arrays (Brian)
drop wrong comment cut-n-paste (Brian)
fix / 6 maximum check issue (Kenneth)
coalsece some array case statements (Kenneth)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
While developing cube map array support I found that we didn't
support this properly, also piglit didn't test for it at all.
I've submitted a test to piglit to check for this, and this
fixes explicit lod and lod bias with cube maps.
NOTE: This is a candidate for the 9.0 branch.
Signed-off-by: Dave Airlie <airlied@redhat.com>
For cube map arrays I'll need another driver private constant
buffer, and looking forward to UBOs. So clean up with some
defines, that can be modified when adding cube map array and ubos
later.
Signed-off-by: Dave Airlie <airlied@redhat.com>
It is common for complicated shaders, particularly code-generated ones, to
have a big array of uniforms or attributes, and a prologue in the shader that
dereferences from the big array to more informatively-named local variables.
Then there will be some small control flow operation (like a ? : statement),
and then use of those informatively-named variables. We were emitting extra
MOVs in these cases, because copy propagation couldn't reach across control
flow.
Instead, implement dataflow analysis on the output of the first copy
propagation pass and re-run it to propagate those extra MOVs out.
On one future Steam release, reduces VS+FS instruction count from 42837 to
41437. No statistically significant performance difference (n=48), though, at
least at the low resolution I'm running it at.
shader-db results:
total instructions in shared programs: 722170 -> 702545 (-2.72%)
instructions in affected programs: 260618 -> 240993 (-7.53%)
Some shaders do get hurt by up to 2 instructions, because a choice to copy
propagate instead of coalesce or something like that results in a dead write
sticking around. Given that we already have instances of those instructions
in the affected programs (particularly unigine), we should just improve dead
code elimination to fix the problem.
I've no idea why there isn't a piglit that triggers this behaviour,
but while enabling TBOs for softpipe and r600g, I noticed all the
integer tests failed. I tracked it back to the TXF returning a float
when it should be returning an int. This fixed it and I haven't
seen any regressions in a full piglit run on softpipe.
http://bugs.freedesktop.org/55010
NOTE: This is a candidate for the 9.0 branch.
Signed-off-by: Dave Airlie <airlied@redhat.com>
If a frame callback is not destroyed when destroying a surface, its
handler function will be invoked if the surface was destroyed after the
callback was requested but before it was invoked, causing a write on
free:ed memory.
This can happen if eglDestroySurface() is called shortly after
eglSwapBuffers().
Note: This is a candidate for stable branches.
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
These were only used for geometry shader support back in the days before
the new GLSL compiler. Future geometry shader support will not use
these.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Fixes a uninitialized pointer read defect reported by Coverity.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
The new code-generated version of _mesa_create_exec_table() populates
the entire dispatch table (except for dynamic functions) by itself; it
no longer calls separate functions to initialize parts of the dispatch
table. This patch removes those no-longer-needed functions.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This patch adjusts makefiles to cause src/mesa/main/api_exec.c to be
generated using src/mapi/glapi/gen/gl_genexec.py. There should be no
functional change.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This script generates the file api_exec.c, which contains just the
function _mesa_create_exec_table(), based on the XML files in
src/mapi/glapi/gen.
The following XML attributes, in particular, are used:
- "es1" indicates functions that should be available in ES1 contexts.
- "es2" indicates functions that should be available in ES2/ES3
contexts.
- "exec" indicates which Mesa function should be dispatched to. E.g.
if the GL function is glFoo(), then:
- exec="mesa" (the default) dispatches to _mesa_Foo().
- exec="check" dispatches to _check_Foo().
- exec="es" dispatches to _es_Foo().
- exec="loopback" dispatches to loopback_Foo().
- exec="skip" or exec="dynamic" causes this function to be skipped;
either it is not yet supported ("skip"), or its dispatch table
entry will be dynamically populated based on GL state ("dynamic").
- "desktop" indicates functions that should be available in desktop GL
(non-ES) contexts.
- "deprecated" indicates functions that should not be available in
core contexts.
- "mesa_name" indicates functions whose implementation in Mesa has a
different suffix than the corresponding GL function name.
The generated code looks roughly like this (showing just a single
statement in each block for brevity):
struct _glapi_table *
_mesa_create_exec_table(struct gl_context *ctx)
{
struct _glapi_table *exec;
exec = _mesa_alloc_dispatch_table(_gloffset_COUNT);
if (exec == NULL)
return NULL;
if (_mesa_is_desktop_gl(ctx)) {
SET_ActiveProgramEXT(exec, _mesa_ActiveProgramEXT);
/* other functions not shown */
}
if (_mesa_is_desktop_gl(ctx) || _mesa_is_gles3(ctx)) {
SET_BeginQueryARB(exec, _mesa_BeginQueryARB);
/* other functions not shown */
}
if (_mesa_is_desktop_gl(ctx) || ctx->API == API_OPENGLES) {
SET_GetPointerv(exec, _mesa_GetPointerv);
/* other functions not shown */
}
if (_mesa_is_desktop_gl(ctx) || ctx->API == API_OPENGLES || ctx->API == API_OPENGLES2) {
SET_ActiveTextureARB(exec, _mesa_ActiveTextureARB);
/* other functions not shown */
}
if (_mesa_is_desktop_gl(ctx) || ctx->API == API_OPENGLES2) {
SET_AttachShader(exec, _mesa_AttachShader);
/* other functions not shown */
}
if (ctx->API == API_OPENGL) {
SET_Accum(exec, _mesa_Accum);
/* other functions not shown */
}
if (ctx->API == API_OPENGL || ctx->API == API_OPENGLES) {
SET_AlphaFunc(exec, _mesa_AlphaFunc);
/* other functions not shown */
}
if (ctx->API == API_OPENGLES) {
SET_AlphaFuncxOES(exec, _es_AlphaFuncx);
/* other functions not shown */
}
return exec;
}
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This patch updates gl_XML.py to parse the new XML attributes "exec",
"desktop", "deprecated", and "mesa_name", which will be needed to code
generate _mesa_create_exec_table().
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
gl_XML.py's gl_function class keeps track of an entry_point_api_map
property that tracks, for each set of aliased functions, which ES1 or
ES2 version the given function name first appeared in.
This patch aggregates that information together across aliased
functions, into an easier-to-use api_map property.
Future patches will use this information when code generating
_mesa_create_exec_table(), to determine which set of dispatch table
entries should be populated based on the API.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Some of the functions that we store in the dispatch table are declared
as non-static in their .c files and are inserted into the dispatch
table directly by _mesa_create_exec_table(). Other functions are
declared as static, and are inserted into the dispatch table by a
dedicated function that lives in the same .c file
(e.g. _mesa_loopback_init_api_table() in api_loopback.c).
This patch makes all of these functions non-static, and creates
appropriate prototypes for them, so that in future patches we can
populate the entire dispatch table using a single code-generated
function.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
When the XML lists one or more GL api functions as aliases for another
GL function, the mesa function that implements the functionality is
usually named after the canonical version of the function (the one
that is the target of the aliases). For example, FogCoordd is listed
as an alias of FogCoorddEXT, and the Mesa function implementing the
functionality is called loopback_FogCoorddEXT.
However, there are exceptions. For example, Enablei is listed as an
alias of EnableIndexedEXT, but the Mesa function implementing the
functionality is called _mesa_EnableIndexed.
To account for these anomalies, this patch annotates the XML with
"mesa_name" attributes, which describe how to adjust the function name
to find the corresponding Mesa function.
For example:
<function name="EnableIndexedEXT" mesa_name="-EXT">...</function>
<function name="IsProgramNV" mesa_name="-NV+ARB">...</function>
means that EnableIndexedEXT is implemented by a Mesa function called
_mesa_EnableIndexed, and IsProgramNV is implemented by a Mesa function
called _mesa_IsProgramARB.
Future patches will use this annotation when code generating
_mesa_create_exec_table(), to determine the name of the Mesa function
that should be stored in each dispatch table entry.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Future patches will use this annotation when code generating
_mesa_create_exec_table(), to determine which functions should be
skipped when the API is desktop GL.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Future patches will use this annotation when code generating
_mesa_create_exec_table(), to determine which functions should be
dispatched to ES-specific implementations. exec="es" indicates that
the ES-specific implementation has a name beginning with "_es_"
(e.g. _es_QueryMatrixxOES), and exec="check" indicates that the
ES-specific implementation has a name beginning with "_check_"
(e.g. _check_GetTexGenxvOES).
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Future patches will use this annotation when code generating
_mesa_create_exec_table(), to determine which functions should be
dispatched to functions in api_loopback.c.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Future patches will use this annotation when code generating
_mesa_create_exec_table(), to determine which functions should be
skipped because Mesa dispatches them differently depending on GL
state.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Future patches will use this annotation when code generating
_mesa_create_exec_table(), to determine which functions should be
skipped because they aren't implemented by Mesa.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Future patches will use this annotation when code generating
_mesa_create_exec_table(), to determine which functions should be
skipped in core contexts.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We were already doing this for some GLX extensions, but not others.
This patch makes our use of window_system="glX" consistent.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This patch standardizes the category names used in the glapi XML files
to begin each extension name with the prefix "GL_" or "GLX_". There
is no functional change, because these category names are not used in
the generated code.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This allows the GLES1.1 dispatch sanity test to be run on all builds,
even builds that do not include GLES1 support.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
fragprog_inputs_read is a 12-bit bitfield so check the assigned value.
MSVC warns on the assignment. Not easy to fix but let's do a sanity check.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The decompression is done in-place and only the compressed tiles are
decompressed. Note: R6xx-R7xx can do that only with Z16 and Z32F.
The texture unit is programmed to use non-displayable tiling and depth
ordering of samples, so that it can fetch the texture in the native DB format.
The latest version of the libdrm surface allocator is required for stencil
texturing to work. The old one didn't create the mipmap tree correctly.
We need a separate mipmap tree for stencil, because the stencil mipmap
offsets are not really depth offsets/4.
There are still some known bugs, but this should save some memory and it also
improves performance a little bit in Lightsmark (especially with low
resolutions; tested with Radeon HD 5000).
The DB->CB copy is still used for transfers.
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
The functions were broken, because they converted ints to floats.
Now we can finally advertise OpenGL 3.0. ;)
In this commit, the vbo module also tracks the type for each attrib
in addition to the size. It can be one of FLOAT, INT, UNSIGNED_INT.
The little ugliness is the vertex attribs are declared as floats even though
there may be integer values. The code just copies integer values into them
without any conversion.
This implementation passes the glVertexAttribI piglit test which I am going
to commit in piglit soon. The test covers vertex arrays, immediate mode and
display lists.
NOTE: This is a candidate for the stable branches.
Reviewed-by: Brian Paul <brianp@vmware.com>
v2: cosmetic changes as suggested by Brian
Integer textures generate invalid operation in glGenerateMipmap.
So, the code related to integer textures is now redundant.
Note: This is a candidate for stable branches.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Khronos has reached a conclusion and disallowed following texture formats in
glGenerateMipMap():
(a) ASTC textures
(b) integer internal formats (e.g., RGBA8UI, RG16I)
(c) textures with stencil formats (e.g., STENCIL_INDEX8)
(d) textures with packed depth/stencil formats (e.g, DEPTH24_STENCIL8)
https://cvs.khronos.org/bugzilla/show_bug.cgi?id=9471
Note: This is a candidate for stable branches.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This is part of fixing gl-3.1/genned-names.
v2: Fix a missing return value.
NOTE: This is a candidate for the 9.0 branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
It's usually forced to 1 by the surface format, but sometimes we actually have
alpha present because it's the only format available.
Fixes piglit texwrap bordercolor tests for OpenGL 1.1, GL_EXT_texture_sRGB and
GL_ARB_texture_float.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If the index buffer is full of values like "0 1 2 3", but basevertex is 4, we
need to upload at least vertex data for elements 4 5 6 7. Whether we also
upload 0 1 2 3 is a question of whether there are VBOs present or not -- see
the code setting start_vertex_bias in brw_draw_upload.c.
Fixes piglit draw-elements*base-vertex user_varrays
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Otherwise, if we had a set of prims passed in with a num_instances varying
between them, we wouldn't upload enough (or too much!) from user vertex
arrays.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The brw_draw_upload.c start_vertex_bias code has support for doing the rebase
without rewriting the index buffer by applying a basevertex. It looks like
vbo_rebase_prims() is not equipped to handle basevertex.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We haven't been only tracking raw GRF-GRF moves since the constant propagation
merge, and also the extension for source modifiers and uniforms.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Given that we handle similarly-regioned GRFs registers for our copy
propagation from our UNIFORM file, there's no reason not to allow it.
The shader-db impact is negligible -- +90 instructions total, 2 shaders helped
and 7 hurt (slightly increased register pressure increased spilling), but this
is to prevent regression in other shaders when fixing copy_propagation to
reduce register pressure in the shaders that are hurt here.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If we put the register coalescing in between the two, then we end up with code
sequences involving dead writes that the dead code elimination doesn't know
how to remove. In place of making dead code elimination smart (which we
should do, too), make it less important for the moment.
shader-db results:
total instructions in shared programs: 722240 -> 721275 (-0.13%)
instructions in affected programs: 50573 -> 49608 (-1.91%)
(no shaders regressed).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
During code generation, we create tons of temporary variables, many of
which get immediately killed and are never used. Later optimization and
analysis passes, such as compute_live_intervals, loop over all the
virtual GRFs. By compacting them, we can save a lot of overhead.
Reduces compilation time in L4D2's largest fragment shader from 10.2
seconds to 5.2 seconds (50%). Drops compute_live_variables() from
10-12% of another game's startup time to 8%.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
The function list was generated from glcorearb.h for GL 4.3.
Note that many GL 4.X functions are commented out, and indicate
that they need to be added to Mesa's XML.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
We also no longer call _swrast_CreateContext, _tnl_CreateContext
or _swsetup_CreateContext when creating the context.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
If a GL function was introduced in a later GL version than the
context we are testing, then it is okay if it is set to the
_mesa_generic_nop function.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
This will be used by GL CORE contexts to differentiate functions that
can be set to nop from functions that are required for a particular
context version.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
This function can be re-added with an actual implementation
when ARB_geometry_shader4 is supported.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
This function can be re-added with an actual implementation
when ARB_geometry_shader4 is supported.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
ProgramParameteri will be required for ARB_geometry_shader4
or GLES3. Don't enable this function until either of those
is supported.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
These functions are part in GL 4.3. Moving this will allow
ProgramParameteriARB to alias ProgramParameteri.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
These EXT_separate_shader_objects function will no longer be
enabled for CORE profiles:
* UseShaderProgramEXT
* ActiveProgramEXT
* CreateShaderProgramEXT
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
This fixes the Android build after the move of builtin_stubs.cpp into
the builtin_compiler subdirectory. This patch is untested.
Signed-off-by: Thierry Reding <thierry.reding@avionic-design.de>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Note this by itself is not enough to fix scons build -- it will fail
until you remove:
rm -rf build/*/glsl/builtin_compiler
because that node was a filei before, but it will be now a directory.
This also means that bisecting across this change will require wiping
the build directory..
The builtin_compiler binary is used during the build process to generate
code for the builtin GLSL functions. Since this binary needs to be run
on the build host, it must not be cross-compiled.
This patch fixes the build system to compile a second version of the
source files and the builtin_compiler binary itself for the build
system. It does so by defining the CC_FOR_BUILD and CXX_FOR_BUILD
variables, which are searched for by the configure script and point to
the location of native C and C++ compilers.
In order for this to work properly, builtin_function.cpp is removed
from BUILT_SOURCES, otherwise the build system would try to generate it
before having had a chance to descend into the builtin_compiler
subdirectory. With the builtin_compiler and glsl_compiler now being
generated at different stages, the build instructions for glsl_compiler
can be simplified a bit.
Signed-off-by: Thierry Reding <thierry.reding@avionic-design.de>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Variable indexing of non-uniform arrays only exists in GLSL. Likewise,
OPCODE_CAL/OPCODE_RET only existed to try and support GLSL's function
calls. We don't use Mesa IR for GLSL, and these features are explicitly
disallowed by ARB_vertex_program/ARB_fragment_program and never
generated by ffvertex_prog.c.
Since they'll never happen, there's no need to check for them, which
saves us from walking through all the Mesa IR instructions.
Reviewed-by: Eric Anholt <eric@anholt.net>
Rather than having two separate backends, just create a small layer that
translates the subset of Mesa IR used for ARB_vertex_program and fixed
function programs to the Vec4 IR. This allows us to use the same
optimization passes, code generator, register allocator as for GLSL.
v2: Incorporate Eric's review comments.
- Fix use of uninitialized src_swiz[] values in the SWIZZLE_ZERO/ONE
case: just initialize it to 0 (.x) since the value doesn't matter
(those channels get writemasked out anyway).
- Properly reswizzle source register's swizzles, rather than overwriting
the swizzle.
- Port the old brw_vs_emit code for computing .x of the EXP2 opcode.
- Update comments, removing mention of NV_vertex_program, etc.
- Delete remaining #warning lines and debug comments.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
v2: Properly use "conditionalmod" pre-Gen6, rather than the incorrectly
copy-and-pasted "BRW_CONDITIONAL_G".
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This will become necessary once we start supporting ARB programs and
fixed function in this backend.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This patch removes the generated files api_exec_es1.c,
api_exec_es1_dispatch.h, and api_exec_es1_remap_helper.h (and the
source files and build rules used to generate them), since they are no
longer used. GLES1 now uses the same dispatch table layout as all the
other APIs.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch modifies context creation code for GLES1 to use
_mesa_create_exec_table() (which is used for all other APIs) instead
of the GLES1-specific _mesa_create_exec_table_es1().
There is a slight change in functionality. As a result of a mistake
in the code generation of _mesa_create_exec_table_es1(), it does not
include glFlushMappedBufferRangeEXT or glMapBufferRangeEXT (this is
because when support for those two functions was added in commit
762d9ac, src/mesa/main/APIspec.xml wasn't updated). With this patch,
glFlushMappedBufferRangeEXT and glMapBufferRangeEXT are properly
included in the dispatch table. Accordingly, dispatch_sanity.cpp is
modified to expect these two functions to be present.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Leave GLES1.1 dispatch sanity test disabled when not building
GLES1 support.
Currently, _mesa_create_exec_table() (in api_exec.c) is used for all
APIs except GLES1. In GLES1, _mesa_create_exec_table_es1() (a code
generated function) is used instead.
In principle, this shouldn't be necessary. It should be possible for
api_exec.c to contain the logic for populating the dispatch table for
all API's.
This patch paves the way for using _mesa_create_exec_table() instead
of _mesa_create_exec_table_es1(), by making _mesa_create_exec_table()
(and the functions it calls) expose the correct subset of desktop GL
functions for GLES1.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch creates a header querymatrix.h, to allow functions defined
in querymatrix.c to be used from other .c files. It also switches
from the nonstandard GL_APIENTRY to GLAPIENTRY.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Don't declare _mesa_Get{Integer,Float}v in querymatrix.c.
Instead, just include main/get.h.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
This patch adds the usual boilerplate (copyright notice and guards
against redundant inclusion) to es1_conversion.h. It also moves the
definition of GL_APIENTRY from es1_conversion.c.
This allows es1_conversion.h to be safely included from other .c files.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Use copyright notice from src/mesa/main/es_generator.py (the
script that used to generate this file).
Previously dispatch table-related code was generated from gl_API.xml,
so it did not include slots for GLES1-only functions (such as those
taking fixed-point arguments).
This patch generates dispatch table-related code from
gl_and_es_API.xml, so that GLES1-only functions are included. This
paves the way for future patches that will unify the GLES1 dispatch
table with the dispatch tables for the other APIs.
The following generated files are affected:
- glapi_x86.S
- glapi_x86-64.S
- glapi_sparc.S
- glprocs.h
- glapitemp.h
- glapitable.h
- glapi_gentable.c
- dispatch.h
- remap_helper.h
Since this change affects makefiles, a full rebuild is required.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Adjust dependencies to ensure that generated files will be rebuilt
whenever any ES-related XML source files are changed.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Previously, when code-generating aliased functions in glapitemp.h, we
weren't consistent about which function alias we used to obtain the
parameter names, with the risk that we would generate incorrect code
like this:
KEYWORD1 void KEYWORD2 NAME(Foo)(GLint x)
{
(void) x;
DISPATCH(Foo, (x), (F, "glFoo(%d);\n", x));
}
KEYWORD1 void KEYWORD2 NAME(FooEXT)(GLint y)
{
(void) x;
DISPATCH(Foo, (x), (F, "glFooEXT(%d);\n", x));
}
At the moment there are no aliased functions with mismatched parameter
names, so this isn't the problem. But when we introduce GLES1
functions into the dispatch table, there will be
(MapBufferRange/MapBufferRangeEXT). This patch paves the way for that
by fixing the code generation script to handle the mismatch correctly.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
This ensures that GLES1-only typedefs are available in these files.
In a future patch, this will allow us to expand the dispatch table to
include GLES1-only functions.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
In commits bad96f6 and e7dd2e5 I added the following aliases:
- ClampColor -> ClampColorARB
- VertexAttribDivisor -> VertexAttribDivisorARB
But I neglected to update check_table.cpp, causing "make check" to
fail for non-shared-glapi builds.
This patch removes the functions that are now aliased from
check_table.cpp, so that "make check" works correctly again.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
When copy_array_to_vbo_array encountered an array with src_stride == 0
and dst_stride != 0, we would replicate out the single element to the
whole size (max - min + 1). This is unnecessary: we can simply upload
one copy and set the buffer's stride to 0.
Decreases vertex upload overhead in an upcoming Steam for Linux title.
Prior to this patch, copy_array_to_vbo_array appeared very high in the
profile (Eric quoted 20%). After the patch, it disappeared completely.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
This essentially reverts the following:
commit c625aa19cb
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Fri Feb 18 10:37:43 2011 +0000
intel: extend current vertex buffers
While working on optimizing an upcoming Steam title, I broke this code.
Eric expressed his doubts about this optimization, and noted that the
original commit offered no performance data.
I ran before and after benchmarks on Xonotic and Citybench, and found
that this code made no difference. So, remove it to reduce complexity
and make future work simpler.
Reviewed-by: Eric Anholt <eric@anholt.net>
The problem was we set VRAM|GTT for relocations of STATIC resources.
Setting just VRAM increases the framerate 4 times on my machine.
I rewrote the switch statement and adjusted the domains for window
framebuffers too.
NOTE: This is a candidate for the stable branches.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
By removing the array size, the static assertion to check for missing
elements can do its job properly. This will catch cases where a new
Mesa format is added but the swrast texfetch code isn't updated.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
On r6xx/r7xx shader resource management need to make sure that the
shader does not goes over the gpr register limit. Each specific
asic has a maxmimum register that can be split btw shader stage.
For each stage the shader must not use more register than the
limit programmed.
v2: Print an error message when discarding draw. Don't add another
boolean to context structure, but rather propagate the discard
boolean through the call chain.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
This is a regression since b3921e1f53.
The array stores VS outputs, not FS inputs.
Now llvmpipe can do 32 varyings too.
NOTE: This is a candidate for the stable branches.
Reviewed-by: Brian Paul <brianp@vmware.com>
For Intel, expose it only if gen >= 4.
For Gallium, expose it only if PIPE_CAP_SM3 is advertised.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
v2: update relnotes-9.1
v3: use align_malloc and align_free for malloced buffers in r300g
v4: document the new CAP in the docs
This allows updating only a subrange of buffer bindings.
set_vertex_buffers(pipe, start_slot, count, NULL) unbinds buffers in that
range. Binding NULL resources unbinds buffers too (both buffer and user_buffer
must be NULL).
The meta ops are adapted to only save, change, and restore the single slot
they use. The cso_context can save and restore only one vertex buffer slot.
The clients can query which one it is using cso_get_aux_vertex_buffer_slot.
It's currently set to 0. (the Draw module breaks if it's set to non-zero)
It should decrease the CPU overhead when using a lot of meta ops, but
the drivers must be able to treat each vertex buffer slot as a separate
state (only r600g does so at the moment).
I can imagine this also being useful for optimizing some OpenGL use cases.
Reviewed-by: Brian Paul <brianp@vmware.com>
It was defined as an empty function since Nov 2010 and was ultimately
removed completely.
See xserver commit 1cb0261
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Fixes build error on Cygwin and Solaris. _R, _G, and _B are used in
ctype.h on those platforms.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
With the explit NUM_TEXTURE_TARGETS array size, the assertion that
Elements(targets) == NUM_TEXTURE_TARGETS would pass even if elements
were missing.
Reviewed-by: Eric Anholt <eric@anholt.net>
Patch adds additional singlesample config with 565 color buffer,
24 bit depth and 8 bit stencil buffer. This makes Quadrant benchmark
work on Android. Tested with Sandybridge and Ivybridge machines.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
This is instead of the pair of GLenums for format and type that were
previously used. This is necessary for the Intel drivers to expose sRGB
framebuffer formats.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
There is no gl_format in Mesa that corresponds to this arrangement, so I
have a very hard time believing that this works.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, if the server didn't send a GLX_FRAMEBUFFER_SRGB_CAPABLE_EXT
tag, it would still be set to GLX_DONT_CARE (which is -1). Set it to
GL_FALSE instead.
NOTE: This is a candidate for stable release branches.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: Maciej Wieczorek <maciej.t.wieczorek@intel.com>
This fixes an issue where glsl_to_tgsi_visior::get_opcode() would emit the
wrong opcode because the register type was GLSL_TYPE_ARRAY/STRUCT instead of
GLSL_TYPE_FLOAT/INT/UINT/BOOL, so the function would use the float opcodes for
operations on integer or boolean values dereferenced from an array or
structure. Assertions have been added to get_opcode() to prevent this bug
from reappearing in the future.
NOTE: This is a candidate for the stable branches.
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Andreas Boll <andreas.boll.dev@gmail.com>
The 2x and 4x MSAA cases are completely broken. The lfdptr instruction returns
garbage there.
The 8x MSAA case is broken on Cayman, though at least the result looks somewhat
correct.
Only the 8x MSAA case works on Evergreen and is enabled.
llvm-3.2svn r166772 no longer requires RTTI for lib/Support.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
LLVM 3.1+ haven't more "extern unsigned llvm::StackAlignmentOverride"
and friends for configuring code generation options, like stack
alignment.
So I restrict assiging of lvm::StackAlignmentOverride and other
variables to LLVM 3.0 only, and wrote similiar code using
TargetOptions.
This patch fix segfaulting of WINE using llvmpipe built with LLVM 3.1
Signed-off-by: Alexander V. Nikolaev <avn@daemon.hole.ru>
Signed-off-by: José Fonseca <jose.r.fonseca@gmail.com>
This is a leftover from when we had to split those two functions due to
the separate BO validation step.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
"Active" is an already-used term for the query being between
glBeginQuery() and glEndQuery(), while this is tracking whether the
start of the packet pair for emitting state has been inserted into the
current batchbuffer.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Put the back face colour right after the front face colour in the LDS parameter
space.
Fixes 18 piglit tests related to two sided lighting.
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
It's required. The CP uses this to properly allocate new
contexts. Also do a CS partial flush since we are updating
CONFIG regs which are single state.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Use printf instead of debug_printf to be consistent with print
statements in rest of unit tests.
This also fixes the lack of print output with the MinGW build of
u_format_compatible_test.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Global initializers using the ?: operator with at least one non-constant
operand generate ir_if statements. For example,
float foo = some_boolean ? 0.0 : 1.0;
becomes:
(declare (temporary) float conditional_tmp)
(if (var_ref some_boolean)
((assign (x) (var_ref conditional_tmp) (constant float (0.0))))
((assign (x) (var_ref conditional_tmp) (constant float (1.0)))))
This pattern is necessary because the second or third arguments could be
function calls, which create statements (not expressions).
The linker moves these global initializers into the main() function.
However, it incorrectly had an assertion that global initializer
statements were only assignments, calls, or temporary variable
declarations. As demonstrated above, they can be if statements too.
Other than the assertion, everything works fine. So remove it.
Fixes new Piglit test condition-08.vert, as well as an upcoming
game that will be released on Steam.
NOTE: This is a candidate for stable release branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Consider the following code, which reinterprets a register as a
different type:
mov(8) g6<1>F g1.4<0,4,1>.xF
and(8) g5<1>.xUD g6<4,4,1>.xUD 0x7fffffffUD
Copy propagation would notice that we can replace the use of g6 with
g1.4 and eliminate the MOV. Unfortunately, it failed to preserve the UD
type, incorrectly generating:
and(8) g5<1>.xUD g6<4,4,1>.xF 0x7fffffffUD
Found while debugging Ian's uncommitted ARB_vertex_program LOG opcode
test with my new Mesa IR -> Vec4 IR translator.
NOTE: This is a candidate for stable release branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Consider the following code sequence:
mul(8) g4<1>F g1<0,4,1>.wzwwF g3<4,4,1>.wzwwF
mov.sat(8) m1<1>.xyF g4<4,4,1>F
mul(8) g4<1>F g1<0,4,1>.xxyxF g3<4,4,1>.xxyxF
mov.sat(8) m1<1>.zwF g4<4,4,1>F
The compute-to-MRF pass will discover the first mov.sat and attempt to
replace it by rewriting earlier instructions. Everything works out,
so it replaces scan_inst's destination file, reg, and reg_offset,
resulting in:
mul(8) m1<1>F g1<0,4,1>.wzwwF g3<4,4,1>.wzwwF
mul(8) g4<1>F g1<0,4,1>.xxyxF g3<4,4,1>.xxyxF
mov.sat(8) m1<1>.zwF g4<4,4,1>F
Unfortunately, it loses the .xy writemask on the mov.sat's MRF
destination. While this doesn't pose an immediate problem, it then
proceeds to transform the second mov.sat, resulting in:
mul(8) m1<1>F g1<0,4,1>.wzwwF g3<4,4,1>.wzwwF
mul(8) m1<1>F g1<0,4,1>.xxyxF g3<4,4,1>.xxyxF
Instead of writing both halves of the vector (like the original code),
it overwrites the full vector both times, clobbering the desired .xy
values.
When encountering a MOV, the compute-to-MRF code scans for instructions
which generate channels of the MOV source. It ensures that all
necessary channels are available (possibly written by several
instructions). In this case, *more* channels are available than
necessary, so we want to take the subset that's actually used.
Taking the bitwise and of both writemasks should accomplish that.
This was discovered by analyzing an ARB_vertex_program test
(glean/vertProg1/MUL test (with swizzle and masking)) with my new
Mesa IR -> Vec4 IR translator code. However, it should be possible
with GLSL programs as well.
NOTE: This is a candidate for stable release branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, we used lookahead patterns to differentiate:
#define FOO(x) function macro
#define FOO (x) object macro
Unfortunately, our rule for function macros:
{HASH}define{HSPACE}+/{IDENTIFIER}"("
relies on infinite lookahead, and apparently triggers a Flex bug where
the generated code overflows a state buffer (see YY_STATE_BUF_SIZE).
There's no need to use infinite lookahead. We can simply change state,
match the identifier, and use a single character lookahead for the '('.
This apparently makes Flex not generate the giant state array, which
avoids the buffer overflow, and should be more efficient anyway.
Fixes piglit test 17000-consecutive-chars-identifier.frag.
NOTE: This is a candidate for every release branch ever.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Carl Worth <cworth@cworth.org>
While copying the values into the batch space, we advance the param
pointer. The debug code then tries to iterate over all the uploaded
values, starting at param...which is now the end of the uploaded data,
rather than the start.
This patch saves a pointer to the start of push constant space before
it gets altered and switches the debug code to use that.
Tested by uncommenting the code and examining the output of
glsl-vs-clamp-1.shader_test. Previously all values appeared to be zero.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Since ES3.0 is backward compatible with 2.0, we check that all the 2.0
functions and additional 3.0 functions exist.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Previously we just printed the dispatch table index and the user had
to convert it to a function name. That was a pain because when
FEATURE_remap_table is defined, the assignment of functions to
dispatch table entries is done at run time.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Previously this function was only implemented for non-shared-glapi
builds. Since the function is only intended for debugging purposes we
use a simple O(n) algorithm.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
When specifying per-target CFLAGS (e.g., ralloc_test_CFLAGS) AM_CFLAGS
are not used. AM_CPPFLAGS should be used for includes anyway.
Fixes a build problem since 41b14d125:
CC ralloc_test-ralloc.o
In file included from ../../../src/glsl/ralloc.c:42:0:
../../../src/glsl/ralloc.h:57:27: fatal error: main/compiler.h: No such file or directory
Acked-by: Paul Berry <stereotype441@gmail.com>
Catches problems such as (in the gles3 branch)
glcpp-parse.y: In function '_glcpp_parser_handle_version_declaration':
glcpp-parse.y:1990:39: warning: format '%lli' expects argument of type
'long long int', but argument 4 has type 'int' [-Wformat]
As a side-effect, remove ralloc.c's likely/unlikely macros and just use
the ones from main/compiler.h.
NOTE: This is a candidate for the release branches.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Fixes the problem where configure from the tarball would report missing
files:
$ ./configure
configure: error: cannot find install-sh, install.sh, or shtool in bin
NOTE: This is a candidate for the 9.0 branch.
4bits and 3bits quantitization values differ significantly for
values other than 0 and 1.
Fixes piglit draw-pixels for softpipe/llvmpipe.
NOTE: Probably a candidate for stable branches.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
This fixes an issue where glsl_to_tgsi_visior::get_opcode() would emit the
wrong opcode because the register type was GLSL_TYPE_ARRAY/STRUCT instead of
GLSL_TYPE_FLOAT/INT/UINT/BOOL, so the function would use the float opcodes for
operations on integer or boolean values dereferenced from an array or
structure. Assertions have been added to get_opcode() to prevent this bug
from reappearing in the future.
This silences a zillion GCC warnings like:
../../../src/mesa/main/pack.c: In function '_mesa_pack_rgba_span_from_uints':
../../../src/mesa/main/pack.c:560:13: warning: comparison of unsigned expression < 0 is always false [-Wtype-limits]
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The layer dimension of array textures is not subject to mipmap minification.
OTOH we were missing an assertion for the depth dimension.
Fixes assertion failures with piglit {f,v}s-textureSize-sampler1DArrayShadow.
For some reason, they only resulted in piglit 'warn' results for me, not
failures.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=56211
NOTE: This is a candidate for the stable branches.
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Andreas Boll <andreas.boll.dev@gmail.com>
cuts down the while loop iterations from 4600 to 380 commits at the
moment
NOTE: This is a candidate for the stable branches.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This function is only useful for the ARB_{vertex,fragment}_program
extensions, which we don't expose in core contexts.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
glGetPointerv was de-deprecated in GL 4.3, because GL 4.3 adds
functionality from KHR_debug and ARB_debug_output, which require
glGetPointerv.
This patch modifies _mesa_create_exec_table() to populate
glGetPointerv in the dispatch table for core contexts.
Technically this is not in compliance with the spec--what we really
ought to do for core contexts is expose glGetPointerv only when a GL
4.3 context is in use or one of the two extensions is present.
However, it seems silly to go to that extra work, since the only
client-visible effect would be for glGetPointerv to raise an
INVALID_OPERATION error instead of an INVALID_ENUM error. Besides,
the other functions set up by _mesa_create_exec_table() only depend on
the API in use, not on the GL version or extensions supported.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
There's no reason to have separate slots in the dispatch table for
these two functions, since they are synonymous.
Note: previous to this patch, we never populated the dispatch table
slot for VertexAttribDivisor, which was ok, since it is not required
until 3.3. After this patch, both functions will be usable provided
that the ARB_instanced_arrays extension is present.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
There's no reason to have separate slots in the dispatch table for
these two functions, since they are synonymous.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
With the previous two commits, this fixes piglit
GL_ARB_occlusion_query2/api.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
There's a similar test below, but it's not the same: that one checks whether
this query object is already active (potentially on another target).
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We should use the later since we're freeing the memory with free(),
not the gallium FREE() macro.
This fixes a mismatch when using the gallium debug memory functions.
NOTE: This is a candidate for the 9.0 branch.
We need to create bos suitable for cursor usage that we can map and
write data into. The kms dumb ioctls is all we need for this, so drop
the dependency on libkms.
Given the usecase we have of trying to measure timestamps across individual
draw calls, flushing will totally mess up what people are trying to measure.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The theory I had when I wrote the code was that you wanted to minimize latency
on your queries because the app was going to ask soon. Only, it turns out
that everybody batches up their queries and asks for the results later (often
after the next SwapBuffers!), so this was a pessimization.
Until now, I had no workload where it mattered enough to benchmark. Recently
I started playing some Minecraft, which uses tons of queries to decide whether
to render chunks of the terrain. For that app, avoiding the flush in the
query-generation loop improves performance 22.7% +/- 4.7% (n=3) on an apitrace
capture of it (confirmed in game by watching the fps meter found by pressing
F3, 15/16 -> 20/21 fps).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
otherwise some compilers will throw error
"error: format not a string literal and no format arguments"
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
If GL_BASE_LEVEL==0 and GL_MAX_LEVEL==0 that's a pretty good hint that
there'll be a single mipmap level in the texture.
Google Earth sets the texture's state this way before the first glTexImage
call. This saves a bit of texture memory.
Fixes piglit tests "unpack-teximage2d --pbo=* --format=GL_BGRA" on
Sandybridge+.
The fastpath was checking an incomplete set of pixel unpack state. This
patch adds checks for all the fields of gl_pixelstore_attrib that affect
2D texture uploads. Also, it begins permitting the case where
GL_UNPACK_ROW_LENGTH is 0.
Ideally, we would just ask a unicorn to JIT this fastpath for us in
a way that safely handles the unpacking state. Until then, it's safer if
only a small set of situations activate the fastpath.
v2: Use _mesa_is_bufferobj(), per Anholt.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
It doesn't provide the cross-process buffer sharing that a window system
pixmap could otherwise support and we don't have anything left that uses
this type of surface.
The 0.99.0 Wayland release changes the event API to provide a thread-safe
mechanism for receiving events specific to a subsystem (such as EGL) and
we need to use it in the EGL platform.
The Wayland protocol now also requires a commit request to make changes
take effect, issue that from eglSwapBuffers.
Now that we've replaced all the variable settings other than reg_width, it's
easy to hang on to this (the expensive part of setting up the allocator).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Improves performance of the Lightsmark penumbra shadows scene by 15.7% +/-
1.0% (n=15), by eliminating register spilling. (tested by smashing the list of
scenes to have all other scenes have 0 duration -- includes additional
rendering of scene description text that normally doesn't appear in that
scene)
v2: Allow allocation of all but g0/g1 of the payload.
v3: Pull count_to_loop_end() out to a helper function.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v2, recommended v3)
Based on split_virtual_grfs(), we choose the same set every time, so set it in
stone. This will help us avoid regenerating the somewhat expensive
class/register set setup every compile.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is derived from the FS visitor code for the same, but tracks each channel
separately (otherwise, some typical fill-a-channel-at-a-time patterns would
produce excessive live intervals across loops and cause spilling).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=48375
(crash -> failure, can turn into pass by forcing unrolling still)
These messages always have m0 = g0 and m1 = offset, and write has m2 = data.
Avoids regression in opt_compute_to_mrf() with a change to scratch writes to
set up the data as an MRF write in the IR.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Note that BRW_PREDICATE_NONE is 0 and BRW_PREDICATE_NORMAL is 1, so that's a
lot like the true/false we had in the FS before.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
fs_bblock_link -> bblock_link
fs_bblock -> bblock_t (to avoid conflicting with all the fs_bblock *bblock)
fs_cfg -> cfg_t (to avoid conflicting with all the fs_cfg *cfg)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This fixes confusion by the upcoming live variable analysis which saw e.g. use
of temp.w when only temp.xyz were initialized in the basic block, and
concluded that temp.w must have come from outside of the block (even though it
was never initialized anywhere).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Due to a string mismatch, INTEL_swap_event wasn't listed among GLX
extensions for the connection, even when present on both client and
server. That is, glXQueryServerString and glXGetClientString reported the
extension, but glXQueryExtensionsString did not.
Note: This is a candidate for the stable branches.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=56057
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Per commentary and direction in the LLVM community, support for ppc64 is
going into MCJIT rather than the old JIT. There is no existing support
in prior llvm versions, so no need to specify LLVM version numbers.
Signed-off-by: Will Schmidt <will_schmidt@vnet.ibm.com>
Signed-off-by: José Fonseca <jfonseca@vmware.com>
The GCC c99 standard on Cygwin sets __STRICT_ANSI__ and symbols such as
strdup are not available.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Note that we are missing the ARB_internalformat_query extension, which
provides the glGetInternalformativ function needed by GL ES 3.0.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
The relevant ES2 code is always in Mesa. Always building the tests
ensures that things aren't accidentally broken when people don't build
with --enable-es2.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This code is twisty, and the comment before most of the blocks was actually
giving me the opposite impression from its intention: We want to apply as much
of our offset as possible through coarse tile-aligned adjustment, since we can
do so independently per buffer, and apply the minimum we can through
fine-grained drawing offset x/y, since it has to agree between all buffers.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
There are a number of places where some obscure piece of the code is not
currently worth fixing, and we have some workaround behavior available. It's
nicer for users to do some lame workaround than to just assert, but without
asserts we never knew when the workaround was at fault.
This should give us a nice compromise: Execute the workaround, but mention
that the obscure workaround was hit.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Note: mapi_abi can consume API information from either XML or a .csv
file. A side effect of this change is that the ES1 and ES2 API
printers can only be used with XML input now. That's ok, since the
.csv input format is only used for the OpenVG API.
Tested-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously, the ES1, ES2, and shared GLAPI printers passed a list of
function names to the base class constructor, which was used by the
_override_for_api() function to loop over all the API functions and
adjust their 'hidden' and 'handcode' attributes as appropriate for the
API flavour being code-generated.
This patch lifts the loop from _override_for_api() into its caller,
and makes it into a polymorphic function, so that the derived classes
can customize its behaviour directly. In a future patch, this will
allow us to override the 'hidden' and 'handcode' attributes based on
information from the XML rather than a list of functions.
Tested-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously, _get_api_entries() would make a deep copy of each element
in the entries table before modifying the 'hidden' and 'handcode'
attributes. This was unnecessary, since the entries aren't used again
after this function. Removing the copy simplifies the code, because
it is no longer necessary to adjust the alias pointers to point to the
copied entries.
Tested-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Currently mapi_abi.py uses hardcoded lists of function names (in
gles_api.py) to determine which functions need to be included in the
GLES 1 or GLES 2 API. This patch removes a sanity check which
verified that all GLES functions listed in the hardcoded lists were
actually present in the XML.
Later patches in this series will modify mapi_abi.py to determine
which functions need to be included in the GLES 1 or GLES 2 API based
directly on the XML. Once that is done, the sanity check will be
redundant. Removing the sanity check now will simplify the patches to
come.
Tested-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Currently, the set of functions which exist in GLES1 or GLES2 is
determined by hardcoded lists of function names in gles_api.py. This
patch encodes that information into the XML files using new
attributes, es1 and es2.
The es1 attribute denotes the first version of GLES 1 in which the
function exists (e.g. es1="1.1" means the function exists in GLES 1.1
but not GLES 1.0). "none" (the default) means the function is not
available in any version of GLES 1.
The es2 attribute denotes the first version of GLES 2/3 in which the
function exists (e.g. es2="2.0" means the function exists in both GLES
2.0 and GLES 3.0). "none" (the default) means the function is not
available in any version of GLES 2 or GLES 3.
Note that since GLES 3 is a strict superset of GLES 2, there is no
need for a separate attribute for it; instead, 'es2="3.0"' should be
used to denote functions that are present in GLES 3 but not GLES 2.
This patch only adds information about GLES versions 1.0, 1.1, and
2.0.
Later patches will modify the python code generation scripts to use
this information rather than the hardcoded lists in gles_api.py.
Tested-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
An unfortunate quirk of Python 2 is that there are two types of
classes: "classic" classes (which are backward compatible with some
unfortunate design decisions made early in Python's history), and
"new-style" classes. Classic classes have a number of limitations
(for example they don't support super()) and are unavailable in Python
3. There's really no reason to use classic classes, except in
unmaintained legacy code. For more information see
http://www.python.org/download/releases/2.2.3/descrintro/.
This patch upgrades the Python code in src/mapi/glapi/gen to use
exclusively new-style classes.
Tested-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Now that ARB programs and fixed function are routed through the new
backend, shader might be NULL. Don't do INTEL_DEBUG=perf support in
that case, since it relies on shader->compiled_once.
Since INTEL_DEBUG=perf wasn't previously supported, this maintains the
status quo. It might be nice to support it someday, however.
This could be moved to brw_shader_program instead of brw_shader, but
it appears even prog can be NULL in that case.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
MaxIfDepth of 0 means "flatten all the time", not "never flatten".
This is only desirable on hardware that can't support control flow;
software rasterization and most hardware drivers want this.
This alters behavior for swrast as well as i915. Tested on i915.
NOTE: This is a candidate for stable release branches.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
The previous patch removed the producer of things in this file.
Since there aren't any, we can remove it.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
All flags are now gone, so we can stop storing and passing this around.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Nobody ever set the flag, which makes this dead code.
v2: Leave the ureg_DECL_fs_input_cyl function in place, even though it's
unused, since VMWare uses it for their internal projects.
Reviewed-by: Eric Anholt <eric@anholt.net>
GLSL doesn't use the program code anymore. Accordingly, there were no
consumers of these flags, so there's no need to define them.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
These were only part of NV_fragment_program, so we can kill them.
The fact that PROGRAM_NAMED_PARAM appears in r200_vertprog.c is rather
comedic, but also demonstrates that people just spam the various types
of parameters everywhere because they're confusing.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Without NV programs, there's no need for the compatible_program_targets
function. A simple (non-)equality check will do.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Also remove a leftover remnant from NV_vertex_program.
v2: Update for Imre's get changes.
Reviewed-by: Brian Paul <brianp@vmware.com> [v1]
Reviewed-by: Eric Anholt <eric@anholt.net> [v1]
Previously, Mesa used nvprogram.c's _mesa_GetVertexAttribPointervNV()
function to implement this GL call. There was also a second
implementation in varray.c, _mesa_GetVertexAttribPointervARB(), which
was entirely unused.
The varray.c variant has an additional assertion and checks the index
against ctx->Const.VertexProgram.MaxAttribs rather than
MAX_VERTEX_GENERIC_ATTRIBS. However, that variable is defined to the
same value, so it should be fine.
This will allow us to kill the duplicate function.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Also kill the resulting dead code for display list handling.
v2: Also kill dlist's OPCODE_REQUEST_RESIDENT_PROGRAMS_NV.
Reviewed-by: Brian Paul <brianp@vmware.com> [v1]
Reviewed-by: Eric Anholt <eric@anholt.net>
The NamedParameter functions were introduced in NV_fragment_program, and
are not shared with any other extensions.
Although this patch appears to remove the LocalParameter functions, it
does not: the ARB_fragment_program section also set them up. Now we
simply initialize them a single time.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
No hardware drivers support this, it's obsolete, and unlikely to be
useful without NV_vertex_program, which is gone now.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
dri2DrawableGetMSC(), dri2WaitForMSC() and dri2WaitForSBC() were
inadvertently changed to return 0 on success. This resulted in the callers
returning an error to the client.
Restore the previous behavior and also check that the reply pointers are
valid before accessing them.
Reviewed-by: Eric Anholt <eric@anholt.net>
Note that _mesa_GetVertexAttribPointervNV() is actually
glGetVertexAttribPointerv(), which operates on the generic attributes. The
geometry shader initialization looks like arbitrary cruft to me.
Reviewed-by: Brian Paul <brianp@vmware.com>
Note that the MAP2 getters were missing from the implementation. Neat.
v2: Rebase on top of get.c changes.
Reviewed-by: Brian Paul <brianp@vmware.com> (v1)
It wasn't supported in hardware, and the comments in the code indicated no
known uses (similar to my experience on Intel) and a possible intent to remove
it.
Reviewed-by: Brian Paul <brianp@vmware.com>
We were holding on to this code because we were aware that NWN 1 had some
support for vertex programs -- no other linux programs I've come across would
use it (since other software also has ARB_vp or GLSL support). Only, it turns
out that NWN doesn't even give us any vertex programs. Given that we have
known issues where the extension has never been fully supported, just give up
on it.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=46795
Reviewed-by: Brian Paul <brianp@vmware.com>
- stopped using util_color
- reformatted to occupy less characters per line.
- used memcpy for the border color
- used pipe_color_union in the state structure
And the clear color too, though that may be an issue only with GL_RGB if it's
actually RGBA in the driver.
NOTE: This is a candidate for the stable branches.
Reviewed-by: Brian Paul <brianp@vmware.com>
v2: The types of st_translate_color parameters were changed to gl_color_union
and pipe_color_union as per Brian's comment.
configure.ac would previously refuse to complete if libX11 wasn't
installed, even if we'd disabled GLX and weren't building an X11 EGL
platform. Make the check simply set the no_x variable that's used (but
never set) immediately below for what looks like this very case.
Signed-off-by: Daniel Stone <daniel@fooishbar.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Dan Nicholson <dbn.lists@gmail.com>
commit a010215463 removed ES2 specific dispatch
table and remap_helper, since now we are using dispatch.h which is generated
from gl_and_es_API.xml we need to generate a matching remap_helper using the
same xml.
Note: This is a candidate for the 9.0 branch.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
lp_build_rsqrt initially did not do any newton-raphson step. This meant that
precision was only ~11 bits, but this handled both input 0.0 and +infinity
correctly. It did not however handle input 1.0 accurately, and denormals
always generated infinity result.
Doing a newton-raphson step increased precision significantly (but notably
input 1.0 still doesn't give output 1.0), however this fails for inputs
0.0 and infinity (both result in NaNs).
Try to fix this up by using cmp/select but since this is all quite fishy
(and still doesn't handle denormals) disable for now. Note that even with
workarounds it should still have been faster since the fallback uses sqrt/div
(which both use the usually unpipelined and slow divider hw).
Also add some more test values to lp_test_arit and test lp_build_rcp() too while
there.
v2: based on José's feedback, avoid hacky infinity definition which doesn't
work with msvc (unfortunately using INFINITY won't cut it neither on non-c99
compilers) in lp_build_rsqrt, and while here fix up the input infinity case
too (it's disabled anyway). Only test infinity input case if we have c99,
and use float cast for calculating reference rsqrt value so we really get
what we expect.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
"get_transfer + transfer_map" becomes "transfer_map".
"transfer_unmap + transfer_destroy" becomes "transfer_unmap".
transfer_map must create and return the transfer object and transfer_unmap
must destroy it.
transfer_map is successful if the returned buffer pointer is not NULL.
If transfer_map fails, the pointer to the transfer object remains unchanged
(i.e. doesn't have to be NULL).
Acked-by: Brian Paul <brianp@vmware.com>
Only the first 'nr_cbufs' color buffers in the pipe_framebuffer_state are
valid. The rest of the color buffer pointers might be unitialized.
Fixes a regression in the piglit fbo-srgb-blit test since changes in the
gallium blitter code.
NOTE: This is a candidate for the 9.0 branch (just to be safe).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This should improve our ability to register allocate without spilling.
Unfortuantely, due to the live variable analysis being ignorant of loops, we
still have register allocation failures on some programs.
v2: Add more context to the comment explaining the function.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
Before, we'd spill one reg, then continue on without actually register
allocating, then assertion fail when we tried to use a vgrf number as a
register number.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
To validate this code, I ran piglit -t vs quick.tests with the "go spill
everything" debugging code enabled. There was only one regression:
glsl-vs-unroll-explosion simply ran out of registers. This should be
fine in the real world, since no one actually spills every single
register.
NOTE: This is a candidate for the 9.0 branch. Even if it proves to have
bugs, it's likely better than simply failing to compile.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
move_grf_array_access_to_scratch() calculates scratch buffer offsets in
bytes. However, emit_scratch_read/write() expects the base_offset
parameter to be measured in OWords.
As a result, a shader using a scratch read/write offset greater than
zero (in practice, a shader containing more than one variable in
scratch) would use too large an offset, frequently exceeding the
available scratch space.
This patch corrects the mismatch by removing spurious conversion from
OWords to bytes in move_grf_array_access_to_scratch().
This is based on a patch by Paul Berry.
NOTE: This is a candidate for stable release branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Version 12 of the EGL_KHR_create_context spec changed this behavior.
NOTE: This is a candidate for the 9.0 branch
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
This brings us into accordance with the official Python style guide
(http://www.python.org/dev/peps/pep-0008/#indentation).
To preserve the indentation of the c code that is generated by these
scripts, I've avoided re-indenting triple-quoted strings (unless those
strings appear to be docstrings).
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Should fix MSVC build, as windows.h also defines CONST.
CONST usage in get.c is not new, so probably this just appeared now due
to changes in the includes.
This got broken by:
7182a1f glapi: rename/move GL_POLYGON_OFFSET_BIAS to its extension
section
Fix it by appending the _EXT suffix to the enum in the test too.
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Oliver McFadden <oliver.mcfadden@linux.intel.com>
This will be needed by the next patch, which will switch to using
the parameter descriptor- and hash tables generated by the script.
The hash algorithm remains the same, the output parameter descriptor
table format changes slightly. There the TYPE_API_MASK entries are
removed and an invalid NULL entry is inserted at the beginning. This is
ok, as get.c:find_value() doesn't rely on TYPE_API_MASK any more to
detect an invalid enum.
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Oliver McFadden <oliver.mcfadden@linux.intel.com>
The following enums used to be extensions but later became part of the
core specification. The _EXT/_ARB versions of these are not present in
in the current XML spec files, only defined in GL/glext.h
Later we'll need to look up these in a python script using the XML spec.
As a preparation for that remove the _EXT,_ARB suffix from these enums
and rename GL_DISTANCE_ATTENUATION_EXT to GL_POINT_DISTANCE_ATTENUATION.
Naturally, all enums keep their numerical values.
Note that similar renames shouldn't be necessary in the future: in case
of a new extension the XML spec is updated with the new _EXT/_ARB etc.
name and this name is added to the enum table in get.c. Later the
extension may become part of the core spec, at which point the name w/o
the _EXT/_ARB suffix is added to the XML spec and the table in get.c
remains the same.
GL_BLEND_DST_ALPHA_EXT
GL_BLEND_DST_RGB_EXT
GL_BLEND_SRC_ALPHA_EXT
GL_BLEND_SRC_RGB_EXT
GL_COLOR_SUM_EXT
GL_COMPRESSED_TEXTURE_FORMATS_ARB
GL_CURRENT_FOG_COORDINATE_EXT
GL_CURRENT_SECONDARY_COLOR_EXT
GL_DISTANCE_ATTENUATION_EXT
GL_FOG_COORDINATE_ARRAY_EXT
GL_FOG_COORDINATE_ARRAY_STRIDE_EXT
GL_FOG_COORDINATE_ARRAY_TYPE_EXT
GL_FOG_COORDINATE_SOURCE_EXT
GL_FRAGMENT_SHADER_DERIVATIVE_HINT_ARB
GL_PACK_IMAGE_HEIGHT_EXT
GL_PACK_SKIP_IMAGES_EXT
GL_SECONDARY_COLOR_ARRAY_EXT
GL_SECONDARY_COLOR_ARRAY_SIZE_EXT
GL_SECONDARY_COLOR_ARRAY_STRIDE_EXT
GL_SECONDARY_COLOR_ARRAY_TYPE_EXT
GL_UNPACK_IMAGE_HEIGHT_EXT
GL_UNPACK_SKIP_IMAGES_EXT
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Oliver McFadden <oliver.mcfadden@linux.intel.com>
When traversing the hash table looking up an enum that is invalid we
eventually reach the first element in the descriptor array. By looking
at the type of that element, which is always TYPE_API_MASK, we know that
we can stop the search and return error. Since this element is always
the first it's enough to check for its index being 0 without looking at
its type.
Later in this patchset, when we generate the hash tables during build
time, this will allow us to remove the TYPE_API_MASK and related flags
completly.
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Oliver McFadden <oliver.mcfadden@linux.intel.com>
The glGet hash was initialized only once for a single GL API, even if
the application later created a context for a different API. This
resulted in glGet failing for otherwise valid parameters in a context
if that parameter was invalid in another context created earlier.
Fix this by using a separate hash table for each API.
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Oliver McFadden <oliver.mcfadden@linux.intel.com>
This should be named GL_POLYGON_OFFSET_BIAS_EXT and listed under the
EXT_polygon_offset section. (Solution by Ian Romanick)
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Oliver McFadden <oliver.mcfadden@linux.intel.com>
POLY_OFFSET_DB_FMT_CNTL is moved to the framebuffer state, because it only
depends on the zbuffer format.
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
This is not so trivial, because we disable blending if the dual src
blending is turned on and the number of color outputs is less than 2.
I decided to create 2 command buffers in the blend state object and just
switch between them when needed, because there are other states unrelated
to blending (like the color mask) and those shouldn't be changed
(the old code had it wrong).
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
r600_command_buffer is not an atom.
The "atoms" have evolved into state slots (or groups of state slots) where
you can bind states. There is a fixed amount of atoms (state slots)
in the context.
The command buffers are nothing like that. They represent states, not state
slots.
We could probably give r600_atom a better name someday.
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
The invalidate event support is a careful dance between driver and loader,
where both have to say they can handle it, and then the loader reports
invalidate events for the driver so the driver can do the optimization.
The EGL code doesn't report __DRIuseInvalidateExtension to the driver, so it
has no responsibility to call the driver's invalidate function, and the driver
is doing the glViewport hack because it assume. This is not
the only time invalidate would need to be called (we need it *any* time an
invalidate event comes down the pipe, but we don't watch for them), so just
stop calling the driver's function.
Acked-by: Chad Versace <chad.versace@linux.intel.com>
This behavior mostly matches glx_dri2. It's slightly complicated in
comparison because EGL exposes the implementation limits in the EGL config.
Note that platform_x11 was the only one setting swap_available, so the move of
the MaxSwapInterval into it is appropriate.
Acked-by: Chad Versace <chad.versace@linux.intel.com>
It's been in place but never enabled since 2010. Note how one piece called a
DRI2 function, suggesting never being tested.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
dri_interface.h comes from our tree, so why litter our tree with ifdefs for
older versions of it?
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
dri_interface.h comes from our tree, so why litter our tree with ifdefs for
older versions of it?
I left in the DRI_TEX_BUFFER_VERSION ifdefs, which is broken and uncompiled
(the version wasn't bumped from 2 to 3 when the patch was landed), but I don't
know what should be done with it.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
I'm going to transition a bunch of the protocol to using XCB so we can stop
rolling it ourselves.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
The EGLNative* types are all defined to be pointers across all our EGL
implementations, but in the X11 platform they're actually just XIDs (32-bit
integers).
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Commit 006c1a3c65 introduced a call to
clock_gettime, but failed to include <time.h>, breaking the build in
some cases.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Ever since df4a88ac, the check for compressed formats has been
unnecessary. And ever since cb72ec5f, the build has been broken with
FEATURE_ES. Remove it, as it does nothing.
Signed-off-by: Daniel Stone <daniel@fooishbar.org>
Signed-off-by: Marek Olšák <maraeo@gmail.com>
Use a simple chaining hash table for the ACP. This is not really very good,
because we still do a full walk of the tree per destination write, but it
still reduces fp-long-alu runtime from 5.3 to 3.9s.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This means that we don't get constant prop across into the first block after a
BRW_OPCODE_IF or a BRW_OPCODE_DO, but we have hope for properly doing it
across control flow at some point. More importantly, with the next commit it
will help avoid O(n^2) with instruction count runtime for shaders that have
many constant moves.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This makes a giant pile of code newly dead. It also fixes TXB on newer
chipsets, which has been totally broken (I now have a piglit test for that).
It passes the same set of Ian's ARB_fragment_program tests. It also improves
high-settings ETQW performance by 3.2 +/- 1.9% (n=3), thanks to better
optimization and having 8-wide along with 16-wide shaders.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=24355
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I don't know of any programs that would need more than this. The larger
programs I've seen have neared 100 instructions. This prevent excessive
runtimes of automatic tests that attempt to test up to the exposed maximums
(like fp-long-alu).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
ARB_fp doesn't go through the GLSL optimizer, and these were things you see
frequently thanks to conditionals being lowered to SLT/SGE and MUL.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This will be reused from the ARB_fp compiler. I touched up the pre-gen6 path
to not overwrite dst in the first instruction, which prevents the need for
aliasing checks (we'll need that in the ARB_fp compiler, but it actually
hasn't been needed in this codebase since the revert of the nasty old
MOV-avoidance code). I also made the conditional_mod between gen6 and
pre-gen6 consistent, which shouldn't matter except for denorm/(+/-)0
comparisons where the choice between left and right hand side of the
comparison changes.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We'll want to reuse this for ARB_fp handling.
v2: Fold the remaining bit of emit_texcoord back into visit(ir_texture).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Applications may destroy HDC at any time. So always get a HDC as needed.
Fixes lack of presents with Solidworks eDrawings when screen resolution is
changed.
Reviewed-by: Brian Paul <brianp@vmware.com>
'#extension foo: enable' is harmless. The functionality is only
actually enabled if the extension is supported. The shader won't use
the functionality if it's not supported, so we're fine.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
The diff looks weird, but this moves the code from the first 'if
(ctx->Const.GLSLVersion < 130)' block down into the second block. It
also moves some variable decalarations closer to their use.
NOTE: This is a candidate for the 9.0 branch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
When an occlusion query was active, the derived DB state wasn't changed
for u_blitter even though all the occlusion queries were suspended.
It's fixed by moving the state update into the emit functions, which are
called whenever queries are stopped or suspended.
pipe_resource can be shared between contexts, we shouldn't modify its
description. Instead, let's use the resource "views" (sampler views and
surfaces), where we can freely change almost any property of a resource.
The idea here is to not flag _NEW_VARYING_VP_INPUTS when shaders (either
GLSL or ARB vp/fp) are in use. If either TNL or TexEnv programs are
active, at least one stage is using fixed function.
On Pineview, fixes 20 Piglit, 60 oglconforms, and 7 ES 1.1 conformance
tests, as well as missing textures in Xonotic. These were all
regressions since commit fb4a34e60e.
NOTE: This is a candidate for the 9.0 branch.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=49127
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=54807
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
When using u_blitter, the state was being saved from saved_*, but we
don't use that. So after u_blitter resumed we got some corrupted
state in.
So let's just remove the saved_* stuff. I thought it was weird but
harmless, it's actually broken.
This function is only present in GLES1 and in the OpenGL compatibility
profile.
Fixes the following "make check" failure:
[----------] 1 test from DispatchSanity_test
[ RUN ] DispatchSanity_test.GLES2
Mesa warning: couldn't open libtxc_dxtn.so, software DXTn
compression/decompression unavailable
dispatch_sanity.cpp:122: Failure
Value of: table[i]
Actual: 0x4de54e
Expected: (_glapi_proc) _mesa_generic_nop
Which is: 0x41af72
i = 321
[ FAILED ] DispatchSanity_test.GLES2 (4 ms)
[----------] 1 test from DispatchSanity_test (4 ms total)
NOTE: This is a candidate for stable release branches.
Reviewed-by: Oliver McFadden <oliver.mcfadden@linux.intel.com>
Tested-by: Oliver McFadden <oliver.mcfadden@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Since we started doing fixups for different render target formats,
this has been an issue. Instead just don't do anything, when the
program gets emitted later it'll get the correct fixup.
Fixes a bunch of piglit tests.
This simply avoids some failed assertions but there's no reason to
call the driver hooks for storing a tex image if its size is zero.
Note: This is a candidate for the stable branches.
413c49141 added an optimisation to improve the performance of teximage
under a limited set of circumstances. If GL_EXT_unpack_subimage has been
used then we we must also skip this optimisation since the optimised
codepath does not take the packing values into consideration.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
I think libtool should be handling this for us, but the build fails for
Jordan because libdricommon (a static library, which uses expat) appears
before -lexpat on the linker command.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Tested-by: Jordan Justen <jordan.l.justen@intel.com>
Previously, we considered all registers as candidates for spilling.
This was counterproductive--for any registers that have already been
removed from the interference graph, there is no benefit to spilling
them, since they don't contribute to register pressure.
This patch ensures that we will only try to spill registers that are
still in the interference graph after register allocation has failed.
This is consistent with the recommendations of the paper "Retargetable
Graph-Coloring Register Allocation for Irregular Architectures", on
which our register allocator is based.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Wine or a windows app changes fpucw to 0x7f, causing doubles to be equivalent
to floats, which broke the calculation of FPS.
We should be very careful about using doubles in Mesa.
Henri Verbeet adds:
For reference, this is done by for example d3d9 when a D3D device is
created without D3DCREATE_FPU_PRESERVE set. In the general case
applications can do all kinds of terrible things to the FPU control
word of course.
[ RUN ] EnumStrings.LookUpByNumber
enum_strings.cpp:43: Failure
Value of: _mesa_lookup_enum_by_nr(everything[i].value)
Actual: "GL_COMPRESSED_RGBA_S3TC_DXT3_ANGLE"
Expected: everything[i].name
Which is: "GL_COMPRESSED_RGBA_S3TC_DXT3_EXT"
enum_strings.cpp:43: Failure
Value of: _mesa_lookup_enum_by_nr(everything[i].value)
Actual: "GL_COMPRESSED_RGBA_S3TC_DXT5_ANGLE"
Expected: everything[i].name
Which is: "GL_COMPRESSED_RGBA_S3TC_DXT5_EXT"
[ FAILED ] EnumStrings.LookUpByNumber (2 ms)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=55505
Signed-off-by: Oliver McFadden <oliver.mcfadden@linux.intel.com>
The EGL_NOK_swap_region2 spec states that the rectangles are specified
with a bottom-left origin within a surface coordinate space also with a
bottom left origin, so this patch ensures the rectangles are flipped
before passing them on to dri2_copy_region.
Fixes piglit's egl-nok-swap-region test.
Tested-by: Matt Turner <mattst88@gmail.com>
A compressed texture image size doesn't have to be a multiple of the
compressed block size (only sub-images do). Fixes issues when building
compressed mipmaps because we often wind up with non-block-size images
for the higher mipmap levels.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=55445
Note: This is a candidate for the stable branches.
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Sven Arvidsson <sa@whiz.se>
We were previously using the TGSI input index, which can exceed the number of
parameters passed from the vertex shader via the parameter cache. Now we use
a separate index which only counts those parameters.
Prevents piglit regressions with the following fix.
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Port the 'glcpp: fix abuse of yylex' commit to Android.mk
Also, since the Android.*.mk are sourced in a global namespace,
the local-y-to-c-and-h is prefixed with the LOCAL_MODULE name,
The initial fix commit is 53d46bc787
There's also a bugzilla for this: 54947
Signed-off-by: Negreanu Marius Adrian <adrian.m.negreanu@intel.com>
Reviewed-by: Oliver McFadden <oliver.mcfadden@linux.intel.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Commit 8d9778589f added all-targets to the
LLVM_COMPONENTS list, but this component does not exist with LLVM 2.8.
Adding all-targets is not necessary for any drivers, and it seems to be
left over from earlier versions of the commit mentioned above.
Tested-by: Stéphane Marchesin <marcheu@chromium.org>
The items are ordered in the item list by their offsets, with the lowest
offset coming first in the list. The old code was assuming that new
items being added to the list would always have a greater offset than
the first item in the list, however this is not always the case.
This can be used to initialize the CB* registers for buffers without a
radeon_surface.
v2:
- Get correct group_bytes value from r600_screen
- Stop setting unnecessary fields
Reviewed-by: Marek Olšák <maraeo@gmail.com>
This also fixes a lot tests, especially all the clip-and-scissor-blit MSAA
piglit tests.
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The original blit function is extended and the otAher functions reuse it.
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Fixes this build error on Cygwin.
Explicit dependency `src/glsl/builtins/tools/texture_builtins.py' not
found, needed by target
`build/cygwin-x86-debug/glsl/builtin_function.cpp'.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Often, the original shader IR isn't terribly interesting because a lot
of crucial optimizations haven't been done (such as inlining built-ins).
ir_to_mesa used to print this out for us, but since we don't use it, we
have to do it ourselves.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The anonymous namespace should keep these private classes to file scope,
preventing clashes with other symbols of the same name elsewhere.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
From SandyBridge PRM, volume 2 Part 1, section 12.2.3, BLEND_STATE:
DWord 1, Bit 30 (AlphaToOne Enable):
"If Dual Source Blending is enabled, this bit must be disabled"
Note: This is a candidate for stable branches.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Change the format to MAJOR.MINOR[FC]
For example: 2.1, 3.0FC, 3.1
The FC suffix indicates a forward compatible context, and
is only valid for versions >= 3.0.
Examples:
2.1: GL Legacy/Compatibility context
3.0: GL Legacy/Compatibility context
3.0FC: GL Core Profile context + Forward Compatible
3.1: GL Core Profile context
3.1FC: GL Core Profile context + Forward Compatible
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
intelDestroyContext will eventually be called, and it will clean things
up. The call to brwInitVtbl is moved earlier so that
intelDestroyContext can call the device-specific destructor. This also
makes the code look more like the i915 code.
NOTE: This is a candidate for the 9.0 branch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=54301
_glapi_table is a struct full of named function pointers, while the generated
code just wants to treat it as an array of function pointers. Cast to avoid
the compiler warning.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This test is only built when shared-glapi is used. Because of changes
elsewhere in the tree that were necessary to make shared-glapi work
correct with GLX, it's not feasible to make the test function both ways.
The list of expected functions originally came from the functions set by
api_exec_es2.c. This file no longer exists in Mesa (but api_exec_es1.c
is still generated). It was the generated file that configured the
dispatch table for ES2 contexts. This test verifies that all of the
functions set by the old api_exec_es2.c (with the recent addition of VAO
functions) are set in the dispatch table and everything else is a NOP.
When adding ES2 (or ES3) extensions that add new functions, this test
will need to be modified to expect dispatch functions for the new
extension functions.
v2: Expect VAO functions be non-NOP.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
When building with shared-glapi, we can just use Mesa's _mesa_warning without
problems. stubs.cpp is only used when shared-glapi is not used.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
v2: Allow GL_ARB_shader_objects functions in core profile because we
still expose the extension string there. Don't allow
glBindFragDataLocation in GLES3 because it's not part of that API.
Based (mostly) on review comments from Eric Anholt.
NOTE: This is a candidate for the 9.0 branch
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This isn't used by this patch, but it will be necessary for several
follow-on patches. Separating this out will make it easier to reorder
patches later.
NOTE: This is a candidate for the 9.0 branch
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This function is not the same as glGetProgramiv.
NOTE: This is a candidate for the 9.0 branch
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
The most recent commit that touched this function,
commit b1d0fe022d
Author: Chad Versace <chad.versace@linux.intel.com>
Date: Wed Sep 26 11:05:12 2012 -0700
intel: Fix segfault in intel_texsubimage_tiled_memcpy
did fix the segfault, but introduced yet another bug. From Anholt: """You
need to still test format/type, because that's the incoming format (e.g.
GL_RGBA/GL_FLOAT) that you're trying to memcpy."""
This patch re-introduces the checks on the incoming format and type.
Note: This is a candidate for the 9.0 branch.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
In commit 091eb15b69, Jordan changed get_temp_image_type() to use
_mesa_get_format_datatype() instead of returning GL_FLOAT. That has
several possible return values: GL_FLOAT, GL_INT, GL_UNSIGNED_INT,
GL_SIGNED_NORMALIZED, and GL_UNSIGNED_NORMALIZED.
We do want to use GL_INT/GL_UNSIGNED_INT for integer formats. However,
we want to continue using GL_FLOAT for the normalized fixed-point types.
There isn't any code in pack.c to handle GL_(UN)SIGNED_NORMALIZED.
Fixes oglconform's fboarb advanced.blit.copypix, which was regressed by
commit 091eb15b69.
NOTE: This is a candidate for the 9.0 branch.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=53573
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
This patch removes all gl_config's with swapMethod=GLX_SWAP_COPY_OML. When
page flipping, we are unable to comply with swap-copy semantics.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
This failed when all the uploads to occur were uniform-type vertex data (like
glColor4f being active across a DrawArrays), because it would upload 1 element
instead of 1 element per vertex. There was no citation for how this code
helped any particular application, and it breaks ETQW, so just remove it.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=47170
NOTE: This is a candidate for the 9.0 and 8.0 branches.
Reviewed-and-tested-by: Kenneth Graunke <kenneth@whitecape.org>
The only symbols that need to be public (those in intel_screen.c that the
loader looks for) are already marked public. Saves 100k of compiled driver
size.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The function segfaulted when a game called glTexSubImage2D on a texture
with internalformat/format/type = GL_SLUMINANCE8/GL_BGRA/GL_UNSIGNED_BYTE.
The function only supports MESA_FORMAT_ARGB8888 and returns early if it
detects an unsupported format. Clearly, its detection condition was
insufficient. This patch fixes it to explicity check for
MESA_FORMAT_ARGB8888.
Note: This is a candidate for the 9.0 branch (fixes 413c491).
Reviewed-and-tested-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Paul Berry <stereotype441@gmail.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Haswell supports EXT_texture_swizzle and legacy DEPTH_TEXTURE_MODE
swizzling by setting SURFACE_STATE entries. This means we don't have to
bake the swizzle settings into the shader code by emitting MOV
instructions, and thus don't have to recompile shaders whenever the
swizzles change.
Unfortunately, we can't handle GL_ALPHA this way: unlike all the others,
which store the comparison result in the .r channel (and possibly others
as well), GL_ALPHA puts it in the .a channel. The GLSL 1.30+ style
functions which return a float always simply return the .r channel,
which would be zero if we handled this as a surface override. In this
case, fall back to doing it the old way. DEPTH_TEXTURE_MODE = GL_ALPHA
isn't an interesting performance path anyway.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes valgrind errors in piglit test
oes_compressed_etc1_rgb8_texture-miptree: an invalid write in
_mesa_store_compressed_store_texsubimage() at line 4406 and invalid reads
in texcompress_etc_tmp.h:etc1_parse_block().
The calculation of the size of the temporary etc1 buffer allocated by
intel_miptree_map_etc1() was incorrect. Sometimes the allocated buffer was
too small, sometimes too large. This patch corrects the size to that
expected by _mesa_store_compressed_store_texsubimage().
Note: This is candidate for the 9.0 branch.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Do all error checking of glTexSubImage, glCopyTexSubImage and
glCompressedTexSubImage's xoffset, yoffset, zoffset, width, height, and
depth params in one place.
If a subtexture region isn't aligned to the compressed block size,
return GL_INVALID_OPERATION, not gl_INVALID_VALUE.
NOTE: This is a candidate for the stable branches.
Reviewed-by: Eric Anholt <eric@anholt.net>
Instead of tracking the inferred state changes separately
just check if queued and emitted states are the same.
This patch just reworks the update of the SPI map between
vs and ps, but there are probably more cases like this.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
GLES 3 supports sRGB functionality, but it does not expose the
GL_FRAMEBUFFER_SRGB enable/disable bit. Instead the implementation
is expected to behave as though that bit is always enabled.
This patch ensures that ctx->Color.sRGBEnabled (the internal variable
tracking GL_FRAMEBUFFER_SRGB) is initially true in GLES 2/3 contexts,
and that it cannot be modified through the GLES 3 API.
This is safe for GLES 2, since ctx->Color.sRGBEnabled has no effect on
non-sRGB formats, and GLES 2 doesn't support any sRGB formats.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Previously, meta logic was saving and restoring the value of
GL_FRAMEBUFFER_SRGB in an ad-hoc fashion. As a result, it was not
properly disabled and/or restored for some meta operations.
This patch causes GL_FRAMEBUFFER_SRGB to be saved/restored in the
conventional way of meta-ops (using _mesa_meta_begin() and
_mesa_meta_end()). It is now reliably saved/restored for
_mesa_meta_BlitFramebuffer, _mesa_meta_GenerateMipmap, and
decompress_texture_image, and preserved for all other meta ops.
Fixes piglit tests "ARB_framebuffer_sRGB/blit renderbuffer
{linear_to_srgb,srgb} scaled {disabled,enabled}".
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
GLES3 supports sRGB formats, but it does not support the
GL_FRAMEBUFFER_SRGB enable/disable flag (instead it behaves as if this
flag is always enabled). Therefore, meta ops that need to disable
GL_FRAMEBUFFER_SRGB will need a backdoor mechanism to do so when the
API is GLES3.
We were already doing a similar thing for GL_MULTISAMPLE, which has
the same constraints.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This patch reduces the time spent in glTexImage and glTexSubImage by
over 5x on Sandybridge for the workload described below.
It adds a new fast path for glTexImage2D and glTexSubImage2D,
intel_texsubimage_tiled_memcpy, which is optimized for Google Chrome's
paint rectangles. The fast path is implemented only for 2D GL_BGRA
textures for chipsets with a LLC.
=== Performance Analysis ===
Workload description:
Personalize your google.com page with a wallpaper. Start chromium
with flags "--ignore-gpu-blacklist --enable-accelerated-painting
--force-compositing-mode". Start recording with chrome://tracing. Visit
google.com and wait for page to finish rendering. Measure the time spent
by process CrGpuMain in GLES2DecoderImpl::HandleTexImage2D and
HandleTexSubImage2D.
System config:
cpu: Sandybridge Mobile GT2+ (0x0126)
kernel 3.4.9 x86_64
chromium 21.0.1180.89 (154005)
Statistics:
| N Median Avg Stddev
--------------|-------------------------
before (msec) | 8 472.5 463.75 72.6
after (msec) | 8 78.0 79.6 5.7
Arithmetic difference at 95.0% confidence:
-384.1 +/- 55.2 msec
-82.8% +/- 11.9%
Ratio at 95.0% confidence:
5.81 +/- 0.119
v2:
- Replace check for `intel->gen >= 6` with `intel->has_llc`, per
danvet.
- Fix typo in comment, s/throuh/through/.
- Swap 'before' and 'after' rows in stat table.
v3:
- If the current batch references the bo, then flush batch before mapping
the bo. Found by Chris.
- Restrict supported texture images to level 0 of target
GL_TEXTURE_2D. This avoids an arithmetic bug in calculating image
offsets within the miptree, found by Paul. This restriction does not
diminish this patch's benefit to Chrome OS performance.
- Use less instructions for bit6 swizzling, suggested by Paul.
- Remove erroneous comment about Y-tiling, for Paul.
- Print perf_debug messages when flushing and stalling.
- Update stats in commit message; run workload under a release build
rather than a debug build.
Note: This is a candidate for the 9.0 branch.
Acked-by: Eric Anholt <eric@anholt.net>
CC: Stéphane Marchesin <marcheu@chromium.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
A game we're working with leaves scissoring enabled, but frequently sets
the scissor rectangle to the size of the whole screen. In that case,
scissoring has no effect, so it's safe to go ahead with a fast clear.
Chad believe this should help with Oliver McFadden's "Dante" as well.
v2/Chad: Use the drawbuffer dimensions rather than the miptree slice
dimensions. The miptree slice may be slightly larger due to alignment
restrictions.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-and-tested-by: Oliver McFadden <oliver.mcfadden@linux.intel.com>
Fixes an assertion failure when compiling certain shaders that need both
pull constants and register spilling:
brw_eu_emit.c:204: validate_reg: Assertion `execsize >= width' failed.
NOTE: This is a candidate for release branches.
Signed-off-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Commit e2249e8c4d (i965/blorp: Add
support for blits between SRGB and linear formats) changed blorp to
always configure surface states for in linear format (even if the
underlying surface is sRGB). This allowed sRGB-to-linear and
linear-to-sRGB blits to occur without causing the image to be
inappropriately brightened or darkened.
However, it broke sRGB MSAA resolves, since they rely on the
destination buffer format being sRGB in order to ensure that samples
are averaged together in sRGB-correct fashion.
This patch fixes the problem by instead configuring the source buffer
to use the *same* format as the destination buffer. This ensures that
the image won't be brightened or darkened, but preserves proper sRGB
averaging.
Fixes piglit tests "EXT_framebuffer_multisample/accuracy srgb".
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=55265
NOTE: This is a candidate for stable release branches.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-and-tested-by: Kenneth Graunke <kenneth@whitecape.org>
There are a few automake files that reference $(X11_INCLUDES) such as
src/glx/Makefile.am but configure.ac wasn't declaring the variable for
substitution. This would break builds of glx if libxcb, for example, was
installed in its own prefix since AM_CFLAGS wouldn't coincidentally
list the needed include path in that case.
Reviewed-by: Matt Turner <mattst88@gmail.com>
This patch is a band-aid fix for a bug in commit 5fd67fa (i965/blorp:
Reduce alignment restrictions for stencil blits), which causes
multisampled stencil blits to work incorrectly on Sandy Bridge.
When blitting to or from a normal stencil buffer, we have to use a
coordinate transformation that swizzles coordinates to account for the
fact that stencil buffers use W tiling, but the most similar tiling
format available for textures and render targets is Y tiling. The
differences between W and Y tiling cause pixels to be scrambled within
a block of size 8x4 (width x height) as measured relative to a W tile,
or 16x2 as measured relative to a Y tile. So in order to make sure
that pixels at the edges of the blit aren't lost, we need to align the
rendering rectangle (and the buffer sizes) to multiples of the 8x4
block size. This alignment happens in the brw_blorp_blit_params
constructor, whereas the determination of how to swizzle the
coordinates happens during code generation, in the
brw_blorp_blit_program class.
When blitting to or from a multisampled stencil buffer, the coordinate
swizzling is more complex, because it has to account for the
interleaving pattern of samples, which uses 4x4 blocks for 4x MSAA and
8x4 blocks for 8x MSAA. The end result is that if multisampling is in
use, the 16x2 block size (relative so a Y tile) needs to be expanded
to 16x4, and the corresponding size relative to a W tile expands to
8x8.
The problem doesn't affect Ivy Bridge severely enough to crop up in
Piglit tests because on Ivy Bridge we have to disable multisampling
when blitting *to* a multisampled stencil buffer (the blorp compiler
generates code to compensate for the fact that multisampling is
disabled). However I suspect a bug is still present because we don't
disable multisampling when blitting *from* a multisampled stencil
buffer.
This patch fixes the problem by doubling the vertical alignment
requirement when blitting to or from a multisampled stencil buffer,
and multisampling has not been disabled.
In the long run I would like to rework the brw_blorp_blit_params
constructor--it's difficult to follow and has had several subtle bugs
like this one. However this band-aid fix should be suitable for
cherry-picking to release branches.
Fixes Piglit tests "unaligned-blit {2,4} stencil {msaa,upsample}" on
Sandy Bridge.
NOTE: This is a candidate for stable release branches.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Recent version of GCC report a warning for the implicit conversion from
int to float:
ff_fragment_shader.cpp:897:3: warning: narrowing conversion of '(1 << ((int)rgb_shift))' from 'int' to 'float' inside { } is ill-formed in C++11 [-Wnarrowing]
This is because floats cannot precisely represent all possible 32-bit
integer values. However, texenv code is all expected to be floating
point, so this should not be a problem.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
From the OpenGL Registry:
"2012/08/13: specs named GL_ARB_debug_group, GL_ARB_debug_label, and
GL_ARB_debug_output2 were published in error during the initial OpenGL 4.3
release. All functionality in these documents was combined into
the extension GL_KHR_debug. They have been withdrawn from the registry,
and a few other extensions were renumbered to avoid holes in the numbering
scheme."
pipe_draw_info::indexed determines if it should be indexed and not
the presence of an index buffer.
This fixes crashes in r300g.
NOTE: This is a candidate for the stable branches.
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
A call to glGenerateMipmap() follows the generation of a relevant
shader program in setup_glsl_generate_mipmap().
To support all texture targets and to avoid compiling shaders
everytime, per target shader programs are compiled on demand
and saved for the next call.
Fixes float-texture(mipmap.manual):
See Comment 6: https://bugs.freedesktop.org/show_bug.cgi?id=54296
NOTE: This is a candidate for stable branches.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Blorp has to convert rectangle coordinates from integers to floats in
order to send them down the GPU pipeline. Recent versions of GCC
issue a warning for this, since a float is not capable of precisely
representing all possible 32-bit integer values. Suppress the warning
with an explicit type cast in the case of blorp, since rectangle
coordinates will never be large enough to cause a loss of precision.
Reviewed-by: Eric Anholt <eric@anholt.net>
Given that it exists between a push/pop of instruction state, this call
can only affect the MOV or ADD instruction generated just below it.
Neither of those instructions are predicated, so it makes no sense to
ask for the inverse predicate.
This fixes grumblings from the simulator debugger, which was
complaining about an invalid predicate.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Commit 42723d88d intended to override an S3TC internalFormat to a
generic compressed format when the application requested online
compression of uncompressed data. Unfortunately, it also broke
pre-compressed textures when libtxc_dxtn isn't installed but the
extensions are forced on.
Both glCompressedTexImage2D() and glTexImage2D() call teximage(), which
calls _mesa_choose_texture_format(), hitting this override code. If we
have actual S3TC source data, we can't treat it as any other format, and
need to avoid the override.
Since glCompressedTexImage2D() passes in a format of GL_NONE (which is
illegal for glTexImage), we can use that to detect the pre-compressed
case and avoid the overrides.
Fixes a regression since 42723d88d3.
NOTE: This is a candidate for the 9.0 branch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-and-tested-by: Jordan Justen <jordan.l.justen@intel.com>
Fixes colorspace issues in L4D2 when multisampling is enabled (the
scene was far too dark, but the flashlight area was way too bright).
The nVidia and AMD binary drivers both allow this kind of blit.
NOTE: This is a candidate for the 9.0 branch.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
MSAA resolves and other blit-like operations ignore SRGB state anyway,
so we should be able to safely allow resolves between compatible
SRGB/linear formats like SRGBA8 and RGBA8888.
This matches the behavior of the nVidia and AMD binary drivers.
Fixes completely black rendering when using multisampling in L4D2.
NOTE: This is a candidate for the 9.0 branch.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
v2: use uint64_t for the total_size variable, per Jose.
Also add two earlier checks for exceeding the max texture size.
For example a 1K^3 RGBA volume would overflow the lpr->image_stride
variable.
Use simple algebra to avoid overflow in intermediate values.
So instead of "x * y > z" use "x > z / y".
This should work if we happen to be on a platform that doesn't have
64-bit types.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This was already (correctly) supported for glGetSamplerParameter paths.
NOTE: This is a candidate for stable branches.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Initializing the regalloc state is expensive, and since it is always
the same for every compile we only need to initialize it once per
context. This should help improve shader compile times for the driver.
Compute shaders fetch data from vertex buffers via the texture cache, so
we need to make sure the texture cache is flushed.
v2:
- Fix rebase mistake
- Fix spelling in comment
Reviewed-by: Marek Olšák <maraeo@gmail.com>
LOOP_START_DX10 ignores the LOOP_CONFIG* registers, so it is not limited
to 4096 iterations like the other LOOP_* instructions. Compute shaders
need to use this instruction, and since we aren't optimizing loops with
the LOOP_CONFIG* registers for pixel and vertex shaders, it seems like
we should just use it for everything.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
For buffers (which is what is being used for RATs), the
COLOR*_DIM.WIDTH_MASK field needs to be set to the low 16-bits of the
buffer size, and the COLOR*_DIM.HEIEGHT_MAX needs to be set to the
high bits.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
- add OpenCL state tracker Clover
- add XvMC state tracker
- remove progs
directory got moved into its own repository mesa/demos
- remove vf
directory removed with abda64efce
Don't cache pointers to elements of reallocatable array.
In some circumstances it caused false cache hits resulting in incorrect
command stream and gpu lockup.
Note: This is a candidate for the stable branches.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
If the gallium driver implements the can_create_resource() function, call
it to do proxy texture size checks.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Used to implement proxy textures. If a gallium driver doesn't implement
this function we'll just continue to use the core Mesa fallback code.
Without this hook we really have no good way to implement OpenGL proxy
textures with gallium drivers.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Before, the limit was 8K. For 32-bit RGBA that would be require 1.5 GB
of memory (w/out mipmaps). That's well beyond the LP_MAX_TEXTURE_SIZE
of 1GB.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Simplify the code and make it more like the other glTexImage commands.
Call _mesa_legal_texture_dimensions() to validate width, height, depth.
Call ctx->Driver.TestProxyTexImage() to make sure texture is not too large.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
There are two aspects to texture image size checking:
1. Are the width, height, depth legal values (not negative, not larger
than the max size for the mipmap level, etc)?
2. Is the texture just too large to handle? For example, we might not be
able to really allocate memory for a 3D texture of maxSize x maxSize x
maxSize.
Previously, we did (1) via the ctx->Driver.TestProxyTextureImage() hook
but those tests are really device-independent. Now we do (2) via that
hook since the max texture memory and texture shape are device-dependent.
Also, (1) is now done outside the general texture parameter error checking
functions because of the special interaction with proxy textures. The
recently introduced PROXY_ERROR token is removed.
The teximage() and copyteximage() functions are bit simpler now (less
if-then nesting, etc.)
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Basically, move the body into a new _mesa_legal_texture_dimensions() function.
More refactoring to come.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
I can't see any reason this is global (unless for debugging)
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds basic flow control support for If-Then-Else blocks using
predicates (stored in the EXEC register) and a predicate stack for
nested flow control.
No regressions found in the tests of opencl-example/run_tests.sh.
Signed-off-by: Xinya Zhang <zxy_thf@hotmail.com>
Signed-off-by: Tom Stellard <thomas.stellard@amd.com>
As far as I can see, the intention of the requirement that we do so is to
prevent instruction prefetch from wandering out into either unmapped memory or
memory with a different caching type, and hanging the chip. The kernel makes
sure that the page after your BO has a valid page of the same caching type,
which meets this requirement, so there's no need to waste space between our
programs (and in instruction cache) on this.
Saves another 9kb instructions in l4d2 shaders.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reduces l4d2 program size from 1195kb to 919kb. Improves performance by 0.22%
+/- 0.11% (n=70).
v2: Rebase on compaction v2, fix up flag reg handling (by anholt).
v3: Fix uncompaction of the flag register number.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
This reduces program size by using some smaller encodings for common bit
patterns in the Gen ISA, with the hope of making programs fit in the
instruction cache better.
v2: Use larger bitshifts for the uncompressed field setups, in line with the
way it's described in the spec. Consistently name a brw_compile "p" like
all other code. Add a couple more tests. Consistently call things
"compacted" not "compressed" (which is a different feature). Drop the
explicit check for not compacting SENDs, which is unjustified and already
implied by our lack of support for immediate values.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
The first cut at instruction compaction won't compact things that
would change control flow jump distances, but we do need to still be
able to walk the instruction stream, which involves jumping by 8 or 16
bytes between instructions.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
It's going to get more complicated when we do instruction compaction. This
also introduces putting the program offset in the output.
v2: Use next_insn_offset in brw_get_program(), too.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
To do unit testing of i965, we want to be able to link against the
driver's symbols and prod them. If we don't have a separate lib from
our loadable module, libtool gets super whiny.
Acked-by: Paul Berry <stereotype441@gmail.com>
This file is used to provide stubs for the link test in gallium dri drivers.
But the same stubs without the main can be used for making unit tests for code
in a dri driver.
Acked-by: Paul Berry <stereotype441@gmail.com>
I noticed in valgrind that p->single_program_flow was used while
uninitialized. Everything else zeroed out brw_compile, but this is better
API.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
This fixes glGetStringi(GL_EXTENSIONS,.. for core contexts. Previously,
all extension names returned would be NULL.
NOTE: This is a candidate for release branches.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
GL_TEXTURE_1D, GL_TEXTURE_3D, GL_TEXTURE_RECTANGLE, and
GL_TEXTURE_GEN_S/T/R/Q don't exist in ES 1 contexts, so any meta ops
that used _mesa_meta_begin with MESA_META_TEXTURE would trigger GL
errors. One such operation is _mesa_meta_Clear().
On ES 1, we want to disable GL_TEXTURE_GEN_STR_OES instead.
Fixes the ES1 conformance test miplin.c, which was regressed by commit
08be1d288f.
NOTE: This is a candidate for the 9.0 branch.
v2: Also blacklist GL_TEXTURE_3D, per Brian's comment.
v3: Disable GL_TEXTURE_GEN_STR_OES, per Ian's comment.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=54297
Reviewed-by: Brian Paul <brianp@vmware.com> [v1]
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Just to make it consistent with the rest of vbo, since it would
be an exported symbol anyways.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
I can't see any external users, and this is a global symbol,
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The current code is duplicated in two places and relies on `uname` to
detect the flags. This is no good for cross-compiling, and the current
logic uses -m64 for the x32 ABI which breaks things.
Unify the code in one place, avoid `uname` completely, and add support
for the new x32 ABI.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
This symbol with dricore escapes into the namespace, its too generic,
we should prefix it with something just to be nice.
Should be applied to stable + 9.0
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
So glcpp tried to workaround yylex its own way, but failed,
do it properly.
This fixes another crash found after fixing the first crash.
this is a candidate for 9.0 and stable branches
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This avoids us making a global yylex symbol which will interfere will
all sorts of apps.
with libdricore which can't do symbol visibility currently we pollute
the namespace with this.
This is a candidate for 9.0 & stable branches.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
In commit 055093e (meta: remove call to _meta_in_progress(), fix
multisample enable/disable), we created a meta_set_enable() function
that could be used by meta ops to enable and disable GL_MULTISAMPLE
even when the GLES API was in use (the GLES API doesn't support
GL_MULTISAMPLE; it behaves as if it is always enabled). This created
some unfortunate code duplication between meta_set_enable() and the
existing _mesa_set_enable() function.
This patch eliminates the duplication by creating a
_mesa_set_multisample() function, which is used by both meta ops and
_mesa_set_enable() to enable/disable GL_MULTISAMPLE.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
glsl version of _mesa_meta_GenerateMipmap() would require separate
shaders for glsl 120 and 130.
V2: Removed the code for integer textures as ARB is planning to
disallow automatic mipmap generation for integer textures.
NOTE: This is a candidate for stable branches.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
glsl path of _mesa_meta_GenerateMipmap() function would require different fragment
shaders depending on the texture target. This patch adds the code to generate
appropriate fragment shader programs at run time.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=54296
V2: Removed the code for integer textures as ARB is planning to
disallow automatic mipmap generation for integer textures.
Now using ralloc_asprintf in setup_glsl_generate_mipmap().
NOTE: This is a candidate for stable branches.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
v2: Group vgt register together to avoid lockup
v3: Split multi primitive register and index bias register
v4: Bump R600_NUM_ATOMS
Signed-off-by: Marek Olšák <maraeo@gmail.com>
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Update only those sampler states which are changed in a shader stage,
instead of always updating all sampler states in the shader stage.
That requires keeping a bitmask of those states which are enabled, and those
states which are dirty at a given point (subset of enabled states).
This is similar to how sampler views, constant buffers, and vertex buffers
are handled.
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Based on the patch called "simplify and fix flushing and synchronization"
by Jerome Glisse.
Rebased, removed unneded code, simplified more and cleaned up.
Also, SH_ACTION_ENA is not set when changing shaders (hw doesn't seem
to need it). It's only used to flush constant buffers.
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Some of the old AMDIL code was hard-coding subreg indices when creating
the VBUILD node, which was making it difficult to match the
vector_insert patterns.
ARB fragment programs use texture unit numbers directly, unlike GLSL
which has an extra indirection. If a fragment program only uses one
texture assigned to GL_TEXTURE1, SamplersUsed will only contain a single
bit, which would make us only upload a single surface/sampler state
entry. However, it needs to be the second entry.
Using _mesa_fls() instead of _mesa_bitcount() solves this. For ARB
programs, this makes num_samplers the ID of the highest texture unit
used. Since GLSL uses consecutive integers assigned by the linker,
_mesa_fls() should give the same result as _mesa_bitcount()..
Fixes a regression since 85e8e9e000,
which caused GPU hangs in ETQW (and probably others), as well as
breaking piglit test fp-fragment-position.
v2: Add a comment, as suggested by Matt.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=54098
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=54179
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Tested-by: meng <mengmeng.meng@intel.com>
ffs() finds the least significant bit set; _mesa_fls() finds the /most/
significant bit.
v2: Make it an inline function in imports.h, per Brian's suggestion.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes piglit test "framebuffer-blit-levels draw stencil".
NOTE: This is a candidate for stable release branches.
Acked-by: Eric Anholt <eric@anholt.net>
Previously, we aligned all stencil blit operations to multiples of the
size of a tile, since stencil buffers use W-tiling, and blorp has to
approximate this by configuring the 3D pipeline for Y-tiling and
swizzling coordinates.
However, this was unnecessarily conservative; it turns out that the
differences between W-tiling and Y-tiling are confined to 32-byte
sub-tiles within the 4k tiling pattern; the layout of these 32-byte
sub-tiles within the larger 4k tile is the same (8 sub-tiles across by
16 sub-tiles down, in column-major order). Therefore we only need to
align stencil blit operations to multiples of the sub-tile size.
Note: although the performance improvement of this change is probably
quite small, the fact that W-tiling and Y-tiling formats only differ
within 32-byte sub-tiles will be essential in a future patch to ensure
that stencil blits work correctly between parts of the miptree other
than level/layer 0. Making this change provides handy documentation
(and validation) of this fact.
NOTE: This is a candidate for stable release branches.
Acked-by: Eric Anholt <eric@anholt.net>
When blitting to a stencil buffer, we need to align the rectangle we
send down the rendering pipeline, to account for the fact that the
stencil buffer uses a W-tiled layout, but we are configuring its
surface state as Y-tiled.
Previously, when the stencil buffer was multisampled, we assumed that
we could reduce the amount of alignment that was necessary, since each
pixel occupies a block of 2x2 or 4x2 samples in the stencil buffer.
That would have been correct if the coordinates we were adjusting were
measured in pixels. However, the conversion from pixel coordinates to
coordinates within the interleaved buffer has already been done;
therefore the full alignment restriction applies.
Note: the reason this mistake wasn't previously uncovered by piglit
tests is because it is being masked by another mistake: the blorp
engine is using overly conservative alignment restrictions when doing
stencil blits. The overly conservative alignment restrictions will be
removed in the patch that follows. Doing this fix now will prevent
the subsequent patch from introducing regressions.
NOTE: This is a candidate for stable release branches.
Acked-by: Eric Anholt <eric@anholt.net>
This patch modifies intel_region_get_aligned_offset() to make the
appropriate calculation when the blorp engine sets up a W-tiled
stencil buffer using a Y-tiled SURFACE_STATE.
NOTE: This is a candidate for stable release branches.
Acked-by: Eric Anholt <eric@anholt.net>
When the blorp engine is performing a blit from one stencil buffer to
another, it sets up the surface state for these buffers as Y-tiled, so
it needs to be able to force intel_region_get_tile_masks() to return
the appropriate masks for a Y-tiled region.
NOTE: This is a candidate for stable release branches.
Acked-by: Eric Anholt <eric@anholt.net>
Fixes piglit tests "framebuffer-blit-levels {read,draw} depth".
NOTE: This is a candidate for stable release branches.
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, when performing a blit using the blorp engine, we failed
to account for the level and layer of the source and destination. As
a result, all blits would occur between miplevel 0 and layer 0 of the
corresponding textures, regardless of which level/layer was bound to
the framebuffer.
This patch passes the correct level and layer through
brw_blorp_miptrees() into the brw_blorp_blit_params data structure.
Further patches in the series will adapt
gen{6,7}_blorp_emit_surface_state to make use of these parameters.
NOTE: This is a candidate for stable release branches.
Reviewed-by: Eric Anholt <eric@anholt.net>
Currently, gen{6,7}_blorp_emit_surface_state assumes that the src and
dst surfaces are mapped to miplevel 0 and layer 0 (thus no surface
offset is required). This is a bug, since the user might try to blit
to and from levels/layers other than 0.
To fix this bug, it will not be sufficient to have
gen6_{6,7}_blorp_emit_surface_state look up the surface offset at the
time they set up the surface state, since these offsets will need to
be tweaked when blitting stencil buffers (due to the fact that stencil
buffer blits have to swizzle between W and Y tiling formats).
So, to pave the way for the bug fix, this patch causes the x and y
offsets to be computed during blit setup and stored in
brw_blorp_mip_info.
As a result of this change, brw_blorp_mip_info doesn't need to store
the level and layer anymore.
For consistency, this patch makes a similar change to the handling of
depth buffers when doing HiZ operations.
NOTE: This is a candidate for stable release branches.
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, gen{6,7}_blorp_emit_surface_state would look up the width
and height of the surface at the time they set up the surface state,
and then tweak it if necessary (it's necessary when a W-tiled surface
is being mapped as Y-tiled). With this patch, we look up the width
and height when setting up the blit, and store them in
brw_blorp_mip_info. This allows us to do the necessary tweak in the
brw_blorp_blit_params constructor (where it makes more sense). It
also reduces the need to keep track of level and layer in
brw_blorp_mip_info, so that a future patch can eliminate them
entirely.
For consistency, this patch makes a similar change to the handling of
depth buffers when doing HiZ operations.
NOTE: This is a candidate for stable release branches.
Reviewed-by: Eric Anholt <eric@anholt.net>
This makes it more convenient for blorp functions to get access to
Intel-specific data inside the renderbuffer objects.
NOTE: This is a candidate for stable release branches.
Reviewed-by: Eric Anholt <eric@anholt.net>
Also add a clarifying comment for why the width/height doesn't need
adjustment for Gen7.
NOTE: This is a candidate for stable release branches.
Reviewed-by: Eric Anholt <eric@anholt.net>
Since Gen6+ stencil buffers use W-tiling (a tiling arrangement which
drm and the kernel are not aware of) we need to round up the width and
height of a stencil buffer to multiples of the W-tile size (64x64)
before allocating a stencil buffer. Previously, we rounded up the
size of the base miplevel, and then computed the miptree layout based
on the rounded up size. This was incorrect, because it meant that the
total size of the miptree would not be properly W-tile aligned, and
therefore we would not always allocate enough pages.
(Note: even though the GL API doesn't allow creation of mipmapped
stencil textures, it does allow mipmapping of a combined depth/stencil
texture, and on Gen6+, a combined depth/stencil texture is internally
implemented as a pair of separate depth and stencil buffers.)
For example, on Sandy Bridge, when allocating a mipmapped stencil
texture of size 128x128, we would first round up to the nearest
multiple of 64x64 (causing no change to the size), and then compute
the miptree layout (whose size worked out to 128x196). Then we would
request an allocation of 128*196 bytes (6.125 pages), causing 7 pages
to be allocated to the texture. However, the texture needs 8 pages,
since each W-tile occupies a page, and it takes 2 W-tiles to cover a
width of 128 and 4 W-tiles to cover a height of 196.
This patch changes the order of operations so that the miptree layout
is computed first and then the total size of the miptree is rounded up
to be W-tile aligned.
NOTE: This is a candidate for stable release branches.
Reviewed-by: Eric Anholt <eric@anholt.net>
Fixes piglit shaders/glsl-fs-uniform-sampler-array and many other similar
tests.
In fact, I just completed a piglit quick-driver.tests run without any GPU
lockups or even VM protection faults. Yay!
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
The value was too small by 1 in some cases (non-first of several vertex
elements interleaved in a single buffer).
Fixes intermittent incorrect geometry in many apps, e.g. piglit
spec/EXT_texture_snorm/fbo-generatemipmap-formats.
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
These enums are valid only in ES1 and ES2. So far they were marked valid
incorrectly, depending on the previous API mask in the enum list.
Signed-off-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
This is basically a follow-on to 1f5b1f9846.
Basically, generate GL errors for ordinary invalid parameters for proxy
targets the same as for non-proxy targets. Only texture size and OOM
errors should be handled specially for proxies.
Note: This is a candidate for the stable branches.
Turns out we weren't doing any format checking before. Now check
the internal format and, in particular, make sure that unsized internal
formats aren't accepted.
Note: This is a candidate for the stable branches.
From the GL 4.3 spec, section 18.3.1 "Blitting Pixel Rectangles":
If SAMPLE_BUFFERS for either the read framebuffer or draw
framebuffer is greater than zero, no copy is performed and an
INVALID_OPERATION error is generated if the dimensions of the
source and destination rectangles provided to BlitFramebuffer are
not identical, or if the formats of the read and draw framebuffers
are not identical.
It is not clear from the spec whether "dimensions" should mean both
sign and magnitude, or just magnitude.
Previously, Mesa interpreted "dimensions" as meaning both sign and
magnitude, so any multisampled blit that attempted to flip the image
in the X and/or Y direction would fail.
However, Y flips are likely to be commonplace in OpenGL applications
that have been ported from DirectX applications, as a result of the
fact that DirectX and OpenGL differ in their orientation of the Y
axis. Furthermore, at least one commercial driver (nVidia) permits Y
filps, and L4D2 relies on them being permitted. So it seems prudent
for Mesa to permit them.
This patch changes Mesa to allow both X and Y flips, since there is no
language in the spec to indicate that X and Y flips should be treated
differently.
NOTE: This is a candidate for stable release branches.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
The compiler needs to know which interpolation modes are enabled, so
it knows which values will be preloaded into the VGPRs.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
At least one interpolation mode must be enable, but the code that checks
this was not checking for perspective center.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Previous command stream might have set any of the constant buffer
and the previous address might no longer be valid thus GPU might
preload constant from random invalid address and possibly triggering
lockup.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
* Handle arbitrary border colours.
* Use correct packing format for detecting special border colours.
Fixes piglit tex-border-1 and probably many other tests using border colours.
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
According to the GLSL 4.30 specification, this is a compile time error.
Earlier specifications don't specify a behavior, but since 0 and 1 are
the only valid indices for dual source blending, it makes sense to
generate the error.
Fixes (the fixed version of) piglit's layout-12.frag.
NOTE: This is a candidate for the 9.0 branch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Fixes piglit spec/EXT_texture_snorm/fbo-generatemipmap-formats (except for
what seems like a random fluke).
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Saves 96MB of wasted memory in the l4d2 demo.
v2: Rebase on compare func change, change brace style.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Saves 26.5MB of wasted memory allocation in the l4d2 demo.
v2: Rebase on compare func change, fix comments.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Currently, this just avoids comparing all unused parts of param[] and
pull_param[], but it's a step toward getting rid of those giant statically
sized arrays.
v2: Actually use the new function instead of just looking at its
address. This required changing the args to const pointers.
(review by Kenneth)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We don't fully process the builtin uniforms, but at least
num_uniform_components reflects reality now.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This fixes an issue where the local 'table' variable was hiding the
function parameter name in glGetColorTable(..., void *table).
This should be OK as long as there's never a GL entrypoint that uses
'disp_table' as a parameter name.
Note: This is a candidate for the 9.0 branch.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Haswell moved the "Cut Index Enable" bit from the INDEX_BUFFER packet to
a new 3DSTATE_VF packet, so we need to emit that. Also, it requires us
to specify the cut index rather than assuming it's 0xffffffff.
This adds a new Haswell-specific tracked state atom to gen7_atoms.
Normally, we would create a new generation-specific atom list, but since
there's only one difference over Ivybridge so far, I chose to simply
make it return without doing any work on non-Haswell systems.
Fixes five piglit tests:
- general/primitive-restart-DISABLE_VBO
- general/primitive-restart-VBO_COMBINED_VERTEX_AND_INDEX
- general/primitive-restart-VBO_INDEX_ONLY
- general/primitive-restart-VBO_SEPARATE_VERTEX_AND_INDEX
- general/primitive-restart-VBO_VERTEX_ONLY
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
To avoid GPU lockup registers must be emited in a specific order
(no kidding ...). This patch rework atom emission so order in which
atom are emited in respect to each other is always the same. We
don't have any informations on what is the correct order so order
will need to be infered from fglrx command stream.
v2: add comment warning that atom order should not be taken lightly
v3: rebase on top of alphatest atom fix
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
glGetStringi(GL_EXTENSIONS) failed to respect the context's API, and so
returned all internally enabled GLES extensions from a GL context.
Likewise, glGetIntegerv(GL_NUM_EXTENSIONS) also failed to repsect the
context's API.
Note: This is a candidate for the 8.0 and 9.0 branches.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Such as
"llvmpipe (LLVM 3.1, 128 bits)"
or
"llvmpipe (LLVM 3.1, 256 bits)"
when leveraging AVX 8-wide registers.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Should be at least mostly working now (with the corresponding fixes in
libdrm_radeon).
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
We can always use the offset and tiling mode from level 0 and restrict the
first and last mipmap level to be used in the sampler resource.
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Same as earlier commit, except for "FREE"
This patch has been generated by the following Coccinelle semantic
patch:
// Remove useless checks for NULL before freeing
//
// free (NULL) is a no-op, so there is no need to avoid it
@@
expression E;
@@
+ FREE (E);
+ E = NULL;
- if (unlikely (E != NULL)) {
- FREE(E);
(
- E = NULL;
|
- E = 0;
)
...
- }
@@
expression E;
type T;
@@
+ FREE ((T) E);
+ E = NULL;
- if (unlikely (E != NULL)) {
- FREE((T) E);
(
- E = NULL;
|
- E = 0;
)
...
- }
@@
expression E;
@@
+ FREE (E);
- if (unlikely (E != NULL)) {
- FREE (E);
- }
@@
expression E;
type T;
@@
+ FREE ((T) E);
- if (unlikely (E != NULL)) {
- FREE ((T) E);
- }
Reviewed-by: Brian Paul <brianp@vmware.com>
This patch has been generated by the following Coccinelle semantic
patch:
@@
expression E;
identifier I;
@@
- I = malloc(E);
+ I = calloc(1, E);
...
- memset(I, 0, sizeof *I);
Reviewed-by: Brian Paul <brianp@vmware.com>
This patch has been generated by the following Coccinelle semantic
patch:
// Remove useless checks for NULL before freeing
//
// free (NULL) is a no-op, so there is no need to avoid it
@@
expression E;
@@
+ free (E);
+ E = NULL;
- if (unlikely (E != NULL)) {
- free(E);
(
- E = NULL;
|
- E = 0;
)
...
- }
@@
expression E;
type T;
@@
+ free ((T) E);
+ E = NULL;
- if (unlikely (E != NULL)) {
- free((T) E);
(
- E = NULL;
|
- E = 0;
)
...
- }
@@
expression E;
@@
+ free (E);
- if (unlikely (E != NULL)) {
- free (E);
- }
@@
expression E;
type T;
@@
+ free ((T) E);
- if (unlikely (E != NULL)) {
- free ((T) E);
- }
Reviewed-by: Brian Paul <brianp@vmware.com>
This patch has been generated by the following Coccinelle semantic
patch:
// Don't cast the return value of malloc/realloc.
//
// Casting the return value of malloc/realloc only stands to hide
// errors.
@@
type T;
expression E1, E2;
@@
- (T)
(
_mesa_align_calloc(E1, E2)
|
_mesa_align_malloc(E1, E2)
|
calloc(E1, E2)
|
malloc(E1)
|
realloc(E1, E2)
)
These calls allowed Xlib to use a custom memory allocator, but Xlib has
used the standard C library functions since at least its initial import
into git in 2003. It seems unlikely that it will grow a custom memory
allocator. The functions now just add extra overhead. Replacing them
will make future Coccinelle patches simpler.
This patch has been generated by the following Coccinelle semantic
patch:
// Remove Xcalloc/Xmalloc/Xfree calls
@@ expression E1, E2; @@
- Xcalloc (E1, E2)
+ calloc (E1, E2)
@@ expression E; @@
- Xmalloc (E)
+ malloc (E)
@@ expression E; @@
- Xfree (E)
+ free (E)
@@ expression E; @@
- XFree (E)
+ free (E)
Reviewed-by: Brian Paul <brianp@vmware.com>
This is a long-standing omission in Mesa's texture image size checking.
We need to take the mipmap level into consideration when checking if the
width, height and depth are too large.
Fixes the new piglit max-texture-size-level test.
Thanks to Stéphane Marchesin for finding this problem.
Note: This is a candidate for the stable branches.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
According to Eric, this shouldn't matter since we don't do precompiles
using the old backend. In other words, brw->fragment_program (the
currently active program) should equal c->fp (the program currently
being compiled).
However, it's just not a good idea to access brw->fragment_program
directly in compiler code. It's totally illegal in the new backend, so
let's just not do it here either.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reported-by: Paul Berry <stereotype441@gmail.com>
The CodeEmitter was not setting the VGPR bit for src0, because the
instruction definition had the VCC register in the src0 slot, instead of
the actual src0 register. This has been fixed by moving the VCC
register to the end of the operand list.
Looks like converting this to a macro, returning bool, caused us to
lose the high (31st) bit result. Fixes piglit fbo-1d test. Strange
that none of the other tests I ran caught this.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=54365
Tested-by: Vinson Lee <vlee@freedesktop.org>
We were already defining sqrtf where we don't have the C99 version.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Use 1/256 for R6xx/7xx, 1/4096 for evergreen, instead of default 1/16.
Helps to pass some piglit tests (fbo, multisample).
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
I wonder if the better solution is to have _mesa_meta_GenerateMipmap not
use MESA_META_ALL for the GLSL path. Even on compatibility profiles
there is no reason to save and restore fog on this path.
NOTE: This is a candidate for the 9.0 branch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Lu Hua <huax.lu@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=54295
Looks like we have an alignment issue with NPOT textures
and mipmaps. So disable NPOT textures until we figure out
what is going wrong here.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reading brw->fragment_program is nonsensical in compiler code: it
contains the currently active program (if any), not the one currently
being compiled. Attempting to access it may either lead to crashes
(null pointer dereference if no program is active) or wrong results.
Fixes piglit regressions since 9ef710575b
on pre-Sandybridge hardware. The actual bug was created in commit
7b1fbc6889.
NOTE: This is a candidate for the 9.0 and 8.0 branches.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=54183
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
From ARB_sync spec:
If the value of <timeout> is zero, then ClientWaitSync does not
block, but simply tests the current state of <sync>. TIMEOUT_EXPIRED
will be returned in this case if <sync> is not signaled, even though
no actual wait was performed.
Fixes random fails of the arb_sync-timeout-zero piglit test on r600g.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
As discussed with Kristian on #wayland. Pushes the decision of components into
the dri driver giving it greater freedom to allow t to implement YUV samplers
in hardware, and which mode to use.
This interface will also allow drivers like SVGA to implement YUV surfaces
without the need to sub-allocate and instead send 3 seperate buffers for each
channel, currently not implemented.
I have tested these changes on Gallium Svga. Scott tested them on both intel
and Gallium Radeon. Kristan and Pekka tested them on intel.
v2: Fix typo in dri2_from_planar.
v3: Merge in intel changes.
Tested-by: Scott Moreau <oreaus@gmail.com>
Tested-by: Pekka Paalanen <ppaalanen@gmail.com>
Tested-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Signed-off-by: Jakob Bornecrantz <jakob@vmware.com>
Immediate operands were previously handled in the CodeEmitter, but that
code was buggy and very confusing. This commit adds a pass that simplifies
the handling of immediate operands by spliting the loading of the
immediate into a sperate insruction that is bundled with the original.
The relevant POINT_SIZE registers are being set using the
pipe_rasterizer_state, so we just need to tell the shader compiler which
export type to use.
This fixes several of the glean glsl tests.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
On Android we want to add only double buffered configs for visuals.
Earlier implementation set the SurfaceType as 0 for single buffered
configs but driver still exposed these configs that were not compatible
with any egl surface type. This caused Khronos conformance test runs to
fail on Android. This patch fixes the issue by skipping single buffered
configs earlier and not exposing them.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
The CALLOC() macro only takes one argument so this was being treated
as a comma expression. Simply use calloc() instead.
A follow-on patch will replace all CALLOC() calls with calloc().
NOTE: This is a candidate for the 8.0 and 9.0 branches.
_mesa_delete_renderbuffer() should free the mutex (though that may be a
no-op) and then free the renderbuffer object itself. Subclasses of
gl_renderbuffer can use this function too.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Now that OpenGL 3.1 is supported by at least one driver, follow
tradition and bump the major version number.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The blend state is different and the resolve single-sample buffer must have
FMASK and CMASK enabled. I decided to have one CMASK and one FMASK
per context instead of per resource.
There are new FMASK and CMASK allocation helpers and a new buffer_create
helper for that.
The color resolve on r6xx needs PT_RECTLIST. Using conventional primitive
types (triangles and quads) produces an ugly line between two diagonally
opposite corners. I guess a rectangular point sprite would work too.
This partially reverts d638da23d2.
With gallium the meta code is not always built so the call to
_meta_in_progress() was unresolved. Simply special-case the
GL_MULTISAMPLE case in the meta code. There might be other special
cases in the future given all the differences between legacy GL,
core GL, GLES, etc.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=54234
and https://bugs.freedesktop.org/show_bug.cgi?id=54239
v2 (Paul Berry <stereotype441@gmail.com>): keep _meta_in_progress
function, since it's needed by the i965 driver, but don't call it from
core mesa.
Signed-off-by: Brian Paul <brianp@vmware.com>
Prior to commit 2f1869822, emit_fb_writes() looped from 0 to 3, writing
all four components of a vec4 color output. However, that broke for
smaller output types (float, vec2, or vec3). To fix that, I introduced
a new variable (output_components[]) containing the size of the output
type for each render target.
Unfortunately, I forgot to actually initialize it in the constructor,
which meant that unless a shader wrote to gl_FragColor, or the specific
output for each render target, output_components would contain a garbage
value, and we'd loop for a completely non-deterministic amount of time.
Not actually emitting any color writes seems like the right approach.
We may still need to emit a render target write (to terminate the
thread), but don't have to put in any sensible values (the shader didn't
write anything, after all).
Fixes a regression since 2f18698220.
NOTE: This is a candidate for stable release branches.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=54193
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Tested-by: Ian Romanick <idr@freedesktop.org>
It is possible to force S3TC extensions to be enabled. This is
generally done to support applications that will only supply
pre-compressed textures. This accounts for the vast majority of
applications.
However, there is still the possibility of an application asking for
on-line compression. In that case, generate a warning and substitute a
generic compressed format. The driver will either pick an uncompressed
format or a compressed format that Mesa can handle on-line (e.g., FXT1).
This should only cause problems for applications that request on-line
compression and read the compressed texture back. This is likely an
infinitesimal subset of an already infinitesimal subset.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Fix API_OPENGL_CORE handling when TEXTURE_FLOAT_ENABLED is not
defined. Based on review feedback from Eric Anholt.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This is a purely software extension. The drivers don't need to do any
work to support it.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Page 407 (page 423 of the PDF) of the OpenGL 3.0 spec says (in the list
of deprecated functionality):
"Separate polygon draw mode - PolygonMode face values of FRONT and
BACK; polygons are always drawn in the same mode, no matter which
face is being rasterized."
Also modify meta to not use FRONT or BACK in a core context.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
We were calling through a dispatch table entry that was NULL, since the apple
variant is only on legacy desktop. Just call the function we mean instead of
indirecting through the dispatch.
v2: Use API_OPENGL_CORE.
v3: Only require desktop GL. If a driver can't support TexBOs in a non-core
context, it should not enable them.
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Fix completely broken condition around ClearColorIiEXT and
ClearColorIuiEXT.
v3: Add special VertexAttrib handling for ES2.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
The comment in the code even says this is the right thing to do.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
All drivers in Mesa do. This allows a lot of extension checking code to be
gutted from the function.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Fixes a bug that glGetMaterial[fx]v in ES1 contexts would (try to) allow
queries of GL_AMBIENT_AND_DIFFUSE. This enum can only be used in glMaterial,
not in the get.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Also handle glDisable, glIsEnabled, glEnableClientState, and
glDisableClientState.
v2: Add proper core-profile and GLES3 filtering.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
v2: Add proper core-profile and GLES3 filtering.
v3: Allow glGetVertexAttribfv(0, GL_CURRENT_VERTEX_ATTRIB_ARB, param) in
OpenGL 3.1, just like OpenGL ES 2.0.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
v2: Add proper core-profile filtering.
v3: Allow GL_SRC_ALPHA_SATURATE as a destination factor in GLES3. Based
on review feedback from Eric Anholt.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
v2: Add proper core-profile and GLES3 filtering.
v3: Allow GL_RGB10_A2UI in GLES3 based on review feedback from Eric
Anholt.
v4: Arg. Reject unsized RED and RG enums on GLES. More feedback from
Eric.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
v2: Add proper core-profile, GLES1, and GLES3 filtering.
v3: Fix the GL_FRAMEBUFFER_ATTACHMENT_OBJECT_NAME query when the
attachment type is GL_NONE on GLES3. Other cleanups. Based on review
feedback from Eric Anholt.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
v2: Add proper core-profile and GLES3 filtering.
v3: Fix a typo in GL_TEXTURE_2D_ARRAY checking.
v4: Change !_mesa_is_desktop_gl tests to _mesa_is_gles test. The test
around GL_TEXTURE_2D_ARRAY got some other changes because that enum is
also available with GLES3 (which uses API_OPENGLES2). Based on review
feedback from Eric Anholt.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
v2: Add proper core-profile and GLES3 filtering.
v3: Change !_mesa_is_desktop_gl tests to _mesa_is_gles test. The test
around GL_TEXTURE_2D_ARRAY got some other changes because that enum is
also available with GLES3 (which uses API_OPENGLES2). Based on review
feedback from Eric Anholt.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
The Common Subexpression Elimination pass will not operate on
instructions with physical register defs, so we end up with
several redundant copies to M0 when using interpolation.
Adding a register class that only contains the M0 register allows
use to use a virtual register to represent M0, and makes it possible
for the Common Subexpression Elimination pass to remove the extra
copies.
This reduces the overhead of using the fixed function internally
in the driver.
V2: Use setup_glsl_generate_mipmap() and setup_ff_generate_mipmap()
functions to avoid code duplication.
Use glsl version when ARB_{vertex, fragmet}_shader are present.
Remove redundant code.
V3: Remove redundant border related code leaving the assertion.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <idr@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
the progs/util directory is now in mesa demos
replace glean with piglit
add ApiTrace
markup: replace the unordered list <ul> with a definition list <dl>
Signed-off-by: Brian Paul <brianp@vmware.com>
I've reviewed the code, and the swrast callsites remaining are all in
drawpixels/copypixels/bitmap/accum, or _swrast_BlitFramebuffer that shouldn't
be hit. A piglit run with the context setup disabled on legacy GL and GLES2
showed regressions only in the copypixels and drawpixels tests.
If the context type is forced, this reduces the shader_runner maximum heap
size for glsl-algebraic-add-add-1.shader_test from 15,137,496b to 4,165,376b.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
The Fallback field of the context struct doesn't work that way on i965, and
it's the only caller of FALLBACK() in the driver.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This code has been in the driver since the first commit. I think it was
trying to stop rendering from happening with a disabled position array. Core
mesa has since had changes to deal with disabled position arrays correctly.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
But cap the size in bytes, to avoid depleting the whole system memory,
with humongus textures.
Tested with max-texture-size piglit test.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
We want to check whether there are bits set outside of the valid flags.
Fixes piglit test egl-create-context-invalid-flag-gl
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Now that it's on by default, we may as well make it obey the flag,
for consistency's sake if nothing else.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Precompiling the shader at link time often allows us to avoid compiling
it at the first use. This moves the expensive compilation and
optimization process to game or level load time, rather than at draw
time, where we really can't avoid any cycles and don't want to risk
stalling the GPU.
The downside is that we have to guess the non-orthagonal state the
program will have set when it draws with the shader. Previously, we
guessed wrong for nearly every shader, so it wasn't useful. With the
recent SamplerUnits rework and this series, we've either eliminated
state or made smarter guesses, and usually get it right now.
In the L4D2 time demo, I now have 39 fragment shader recompiles and no
vertex shader recompiles. Before this series and the SamplerUnits
rework, I had 206 fragment shader recompiles and 192 vertex shader
recompiles.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This fixes a regression since 76d1301e8e:
I began setting SWIZZLE_XYZW for unused sampler units in the actual
program keys, since this matched the FS precompile behavior. However,
the VS precompile was expecting zero, so that commit made essentially
every vertex shader (even those not using texturing) mismatch and need
to be recompiled.
Setting them in the VS precompile key solves the issue. It also is an
improvement over our old behavior: previously we guessed that vertex
shaders didn't use any textures at all. Now we actually look to see if
the VS had any sampler uniforms and guess based on that.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Eric added support for WM key debugging. This adds it for the VS.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Our previous assumption, SWIZZLE_XYZW, was completely bogus for depth
textures. There are no Y, Z, or W components.
DEPTH_TEXTURE_MODE has three options:
- GL_LUMINANCE: <X, X, X, 1>
- GL_INTENSITY: <X, X, X, X>
- GL_ALPHA: <0, 0, 0, X>
The default value is GL_LUMINANCE, and most applications don't seem to
alter DEPTH_TEXTURE_MODE. Make that our precompile guess.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Now that most things are based on the linker-assigned index, it makes
sense to convert the arrays in the VS/WM program key as well. It seems
silly to leave them indexed by texture unit.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
brw_wm_prog_key's proj_attrib_mask field is designed to enable an
optimization for fixed-function programs, letting us avoid projecting
attributes where the divisor is 1.0.
However, for shaders, this is not useful, and is pretty much impossible
to guess when building the FS precompile key. Turning it off for
shaders should allow the precompile to work and not lose much.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Suggested-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
We probably want to do something more sophisticated here, but this at
least makes it through L4D2 without dumping the program cache.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Do all pre-draw hiz resolves *after* the renderbuffers are resized by
intel_prepare_render. Otherwise, we may resolve buffers that are
immediately discarded afterwards.
Fixes the assertion failure below when resizing windows in KDE and under
some unknown circumstance in Chrome OS:
intel_resolve_map.c:46: intel_resolve_map_set: Assertion
`(*tail)->need == need' failed.
Also, remove the comment that "resolves must occur [...] before setting up
any hardware state". That was true when resolves were implemented with
meta-ops, but no longer with blorp.
v2:
- Keep brw_predraw_resolve_buffers in its current position, which is
before any brw_context bits are modified. Instead, move the call to
intel_prepare_render.
Note: This is a candiate for the 8.0 branch.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=52252
Reported-by: Lu Hua <huax.lu@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
intel_renderbuffer_resolve_hiz checks if rb->mt is null, so there is no
need for the caller to do so.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
This adds the FMASK and CMASK buffers. They share the same resource
with color data.
COMPRESSION and FAST_CLEAR are always enabled if both FMASK and CMASK are
allocated. We initialize the CMASK to a "compressed" state (not "fast cleared"),
so that we can keep FAST_CLEAR enabled all the time.
Both FMASK and CMASK must be present at the moment. If either one is missing,
the other one is not used.
v2: add cayman regs in the list
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
The original samples positions took samples outside of the pixel boundary,
leading to dark pixels on the edge of the colorbuffer, among other things.
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Drivers need to be able to communicate their actual number of bits populated
in the field in order for applications to be able to properly handle rollover.
There's a small behavior change here: Instead of reporting the
GL_SAMPLES_PASSED bits for GL_ANY_SAMPLES_PASSED (which would also be valid),
just return 1, because more bits don't make any sense.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
When faced with this sequence:
MOV R1, c[1];
MAD R0, R2, R1.x, R1.y;
we were concluding that the MOV of R1 set up our accumulator and so we could
just use the previous result. Only, it's got R1.xyzw in it instead of the
r1.y we're looking for.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=46784
NOTE: This is a candidate for the 8.0 branch.
Support version 3 as well as 2, since that is only the new format query,
which Jesse added support for to st/dri when he added it to dri_inteface.h.
Tested-by: Scott Moreau <oreaus@gmail.com>
Signed-off-by: Jakob Bornecrantz <jakob@vmware.com>
Since its not used by anything anymore and no release has gone out
where it was being used.
Tested-by: Scott Moreau <oreaus@gmail.com>
Signed-off-by: Jakob Bornecrantz <jakob@vmware.com>
Uses libkms instead of dri image cursor. Since this is the only user of the
DRI cursor and write interface we can remove cursor surfaces entirely from
the DRI interface and as a consequence also from the Gallium interface as
well. Tho to make everybody happy with this it would probably should add a
kms_bo_write function, but that is probably wise in anyways.
The only downside is that it adds a dependancy on libkms, this could how ever
be replaced with the dumb_bo drm ioctl interface.
Tested-by: Scott Moreau <oreaus@gmail.com>
Signed-off-by: Jakob Bornecrantz <jakob@vmware.com>
We already changed the actual program key builder to only set these bits
on gen < 6; this patch just brings the precompile state back in line so
it doesn't mismatch every time.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
When dumping differences in program keys, it printed messages of the
format:
[Name of thing that changed] [new]->[old]
This was terribly confusing: the right arrow implies "the value changed
from this to that", when in fact the message conveyed the opposite.
Except that some of the time, it didn't, since we accidentally swapped
the arguments to brw_debug_recompile_sampler_key. With two swaps, it
would often come out in the expected format.
This patch fixes it to properly print:
[Name of thing that changed] [old]->[new]
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Gallium drivers and i965 don't require special notification when
sampler uniforms change. They simply see the _NEW_TEXTURE and adjust
their indirection tables. These drivers don't want ProgramStringNotify:
it simply causes pointless recompiles.
Unfortunately, i915 still requires shader recompiles and needs
ProgramStringNotify. Rather than trying to fix that, simply change the
hook to a new, more specific one: ShaderUniformChange. On i915, this
translates to ProgramStringNotify; others simply ignore it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
When assigning uniform locations, the linker assigns each sampler
uniform a sequential numerical ID. gl_shader_program::SamplerUnits maps
these sampler variable IDs to the actual texture units they reference
(specified via glUniform1i).
Previously, we encoded this mapping in the SEND instruction encoding:
the "sampler" was the texture unit number, and the binding table index
was SURF_INDEX_TEXTURE(the texture unit number). This unfortunately
meant that whenever the application changed the value of a sampler
uniform, we had to recompile the shader to change the SEND instructions.
This was horrible for the game Cogs, which repeatedly switches between
using texture unit 0 and 1. It also made fragment shader precompiles
useless: we'd do the precompile at glLinkShader() time, before the
application called glUniform1i to set the sampler values. As soon as
it did that, we'd have to recompile, wasting time and space in the
program cache.
This patch encodes the SamplerUnits indirection in the binding table,
sampler state, and sampler default color tables. Instead of baking the
texture unit number into the shader, we bake in the sampler variable ID
assigned by the linker. Since those never change, we don't need to
recompile programs on uniform changes.
This does mean that the tables now depend on the linked shader program
being used for rendering, rather than simply representing all available
texture units. This could cause an increase in state emission.
Another plus is that the sampler state and sampler default color tables
are now compact: we only emit as many entries as there are sampler
uniforms, with no holes in the table since the new sampler IDs are
sequential. Previously we had to emit a full 16 entries every time,
since the tables tracked the state of all active texture units.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This represents the index into the sampler state table or sampler
default color table (the two are identical).
Right now, this is still the texture unit, but that will change shortly.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Currently, we mirror the VS and WM binding tables' texture entries.
That may not continue to be true, so in preparation, pass in the binding
table and surface index as arguments.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
The number we're passing around is actually the ID of the texture unit,
as opposed to the numerical value our of sampler uniforms. Calling it
"texunit" clarifies this slightly.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
The number we're passing around is actually the ID of the texture unit,
as opposed to the numerical value our of sampler uniforms. Calling it
"texunit" clarifies this slightly.
Don't bother renaming fs_instruction::sampler. Although it's currently
the texture unit, this series will change that. No need for the churn.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, we left the swizzle key field as zero for unused texture
units. The precompile sets all of them to SWIZZLE_NOOP, which meant
that we mismatched almost every time.
Since either works equally well, change it to SWIZZLE_NOOP to match
the precompiles.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
I can't actually understand what these mean, and they seem to
essentially say "we should simplify things", which is a nice goal but
not very specific.
Presumably things got cleaned up at some point.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Fixes brw_shader.cpp:101:9: warning: converting to non-pointer type
'GLboolean {aka unsigned char}' from NULL [-Wconversion-null]
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-with-great-enthusiasm-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by Eric Anholt <eric@anholt.net>
v2: Add proper core-profile and GLES3 filtering.
v3: *Really* add proper core-profile and GLES3 filtering based on review
feedback from Eric Anholt. It looks like previously there was some
rebase / merge fail.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
v2: Add proper core-profile and GLES3 filtering based on review feedback
from Eric Anholt. It looks like previously there was some rebase /
merge fail.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
v2: Fix handling of GL_INT and GL_UNSIGNED_INT types pre-ES3.0, and fix
handling of GL_INT_2_10_10_10_REV and GL_UNSIGNED_INT_2_10_10_10_REV in
ES3.0. Based on review comments by Ken Graunke.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This consolidates the tests and makes the emitted error message
consistent.
v2: Rename _mesa_valid_element_type to valid_elements_type. Log the
enum string instead of the hex value in error messages. Based on review
comments from Brian Paul and Ken Graunke.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
_mesa_generic_compressed_format_to_uncompressed_format() probably wins the
prize for longest function name in Mesa.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
See comments in the code for details.
Note: we only need to special-case the generic compressed formats since
specific texture formats are error-checked earlier to see if the compression
format is compatible with the texture type.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This will let us choose the actual hardware format depending on the
type of texture.
v2: fixup radeon, nouveau, intel and swrast drivers too
Reviewed-by: Eric Anholt <eric@anholt.net>
'target' was used both as a parameter of type st_texture_type and then
re-used for GL_TEXTURE_x targets. Rename the function parameter and
add a new local 'GLenum target'.
And remove an extraneous break statement.
Patches changes mesa to use 'HAVE_DLOPEN' defined by configure and Android.mk
instead of _GNU_SOURCE for detecting dlopen capability. This makes dlopen to
work also on Android where _GNU_SOURCE is not defined.
[mattst88] v2: HAVE_DLOPEN is sufficient for including dlfcn.h, remove
mingw/blrts checks around dlfcn.h inclusion.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Previously, when performing a fast depth clear, we would also clear
the miptree's resolve map. This destroyed important information,
since the resolve map contains information about needed resolves for
all levels and layers of the miptree, whereas a depth clear only
applies to a single level/layer combination at a time. As a result,
resolves would sometimes fail to occur, leading to incorrect
rendering.
Fixes rendering artifacts with shadow maps in Unigine Heaven and
Unigine Sanctuary.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=50270
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
There are three possible resolve map states for each (level, layer) of
a depth miptree: "needs HiZ resolve", "needs depth resolve", and
"needs neither". When HiZ was first implemented on i965, any attempt
to directly transition between "needs HiZ resolve" and "needs depth
resolve" without passing through the "needs neither" state would have
been a bug indicating that a necessary resolve hadn't been performed.
Accordingly, intel_resolve_map_set() contained an assertion to verify
that no such direct transition happened.
However, now that we support fast depth clears, there is a valid
transition from the "needs HiZ resolve" to the "needs depth resolve"
state. When doing a fast depth clear, the old state of the buffer is
irrelevant, since we are completely replacing it with the clear value,
so it is not necessary to do any resolves before clearing--we can
transition, if necessary, directly from the "needs HiZ resolve" state
to the "needs depth resolve" state.
To avoid spurious assertions in this valid case, this patch just
removes the assertion.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Just use the functionality provided by the surface manager instead.
This fixes just another bunch of piglit tests.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Previously you could always glGetProgramiv one of the transform feedback
or geometry shader enums even if the extension wasn't supported.
In addtion, this reverts part of bda6ad27. I think the hunks involving
GL_PROGRAM_BINARY_LENGTH_OES were spurious. Mesa has no support for any
other part of GL_OES_get_program_binary.
v2: Remove redundant return in get_programiv based on review feedback
from Matt Turner.
v3: Correctly handle UBO related enums.
v4: Emit the bad enum in the _mesa_error call based on review feedback
from Brian Paul.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fix API functions for memory objects to accept CL_MEM_READ_WRITE flag.
Signed-off-by: Blaž Tomažič <blaz.tomazic@gmail.com>
[ Francisco Jerez: Drop incorrect change in clCreateSubBuffer. ]
Fix-up the texel fetch functions so that they handle 3D coords (as used for
array textures) and remove the "f_2d" part from their names.
Helps fix swrast crashes in piglit's copyteximage test. More to come.
There was a lot of similar or duplicated code before.
To minimize this patch's size, use a forward declaration for
compressed_texture_error_check(). Move the function in the next patch.
If a proxy texture call generates a regular GL error, we should not
clear the proxy image's width/height/depth/format fields. Use a new
PROXY_ERROR token to distinguish proxy errors from regular GL errors.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
When calling glTexImage() with a proxy target most error conditions should
generate a GL error. We were erroneously doing the proxy-error behaviour
(where we zeroed-out the image's width/height/depth/format fields) in too
many places.
There's another issue with proxy textures, but that'll be fixed in the
next patch.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
draw->samplers(_views) now has PIPE_SHADER_TYPES elements, instead of
PIPE_MAX_SAMPLERS as before.
Also, shader_stage must be less than PIPE_SHADER_TYPES to prevent buffer
overflow.
Trivial.
Render Target Write message should include source zero alpha value when
sample-alpha-to-coverage is enabled for an FBO with multiple render targets.
Source zero alpha value is used as fragment coverage for all the render
targets.
This patch makes piglit tests draw-buffers-alpha-to-coverage and
alpha-to-coverage-no-draw-buffer-zero to pass on Sandybridge. No
regressions are observed with piglit all.tests.
V2: Revert all the changes made in emit_color_write() function to
include src0 alpha for targets > 0. Now handling this case in a if
block.
V3: Correctly calculate the instruction length for buffer zero.
Properly handle the case of dual_src_blend when alpha-to-coverage
is enabled.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
When too may uniforms are used, the error will be caught in
check_resources (src/glsl/linker.cpp).
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Stéphane Marchesin <marcheu@chromium.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Benoit Jacob <bjacob@mozilla.com>
Also validate glCopyTexImage border. This fixes a bug in the APIspec.
Previously glTexImage3DOES could be passed a non-zero border without error.
NOTE: This is a candidate for stable release branches.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This also adds a missing extension (and API) check around
GL_TEXTURE_CROP_RECT_OES.
v2: Add proper core-profile and GLES3 filtering. GL_TEXTURE_MAX_LEVEL
is (incorrectly) accepted in ES contexts. A future patch will add
GL_APPLE_texture_max_level, and meta really needs this.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This also adds a missing extension (and API) check around
GL_TEXTURE_CROP_RECT_OES.
v2: Add proper core-profile, GLES1, and GLES3 filtering. GL_TEXTURE_MAX_LEVEL
is (incorrectly) accepted in ES contexts. A future patch will add
GL_APPLE_texture_max_level, and meta really needs this.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Fixed the piglit test arb_texture_buffer_object-negative-unsupported.
NOTE: This is a candidate for stable release branches.
v2: Add proper core-profile and GLES3 filtering.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This should take care of all the TexImage, TexSubImage, CopyTexImage,
CompressedTexImage3DOES, and CopyTexSubImage type paths.
v2: Add proper core-profile and GLES3 filtering.
v3: Squash the CompressedTexImage3DOES patch per review comment from
Ken.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This is a bit of a hack. _mesa_meta_GenerateMipmap shouldn't even be
used in contexts where GL_GENERATE_MIPMAP doesn't exist (i.e., core
profile and ES2) because it uses fixed-function, and fixed-function
doesn't exist there either!
A GLSL-based _mesa_meta_GenerateMipmap should be available soon. When
that is available, this patch will be irrelevant and should be reverted.
v2: Change (ctx->API != API_OPENGLES2 && ctx->API != API_OPENGL_CORE) to
(ctx->API == API_OPENGL || ctx->API == API_OPENGLES) based on review
comment from Brian Paul.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Commit 77a3efc6b9 broke android build that
sets its own value for GLSL_SRCDIR before including Makefile.sources.
Patch moves overriding the value after include, this works as GLSL_SRCDIR
variable gets expanded only later.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
The name is taken from the driver_descriptor, so it will be the same as
expected by driconf utility.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
The segmentation fault occurs when DRI2 is not loaded up and
dri2_setup_screen() function deferences dri2_dpy->dri2 (since it's NULL
at this point).
This patch fixes the segmentation fault by checking if dri2 pointer is
not NULL before deferencing it.
Signed-off-by: Paulo Alcantara <pcacjr@profusion.mobi>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
This new operand replaces the MachineOperand flags in LLVM, which
will be deprecated soon. Eventually all instructions should have a flag
operand, but for now this operand has only been added to instructions
that need it.
SRC_DIRS was overwritten (visible in the second hunk).
Also don't require mapi/shared-glapi to be built for GLES.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We need to enable at least one interpolation mode,
otherwise the GPU will hang.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Disable blending when dual_src_blend is enabled and number of color exports
in the current fragment shader is less than 2.
Fixes lockups with ext_framebuffer_multisample-
alpha-to-coverage-dual-src-blend piglit test.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
The generic texture formats should be accepted by the <internalformat>
parameter of TexImage1D, TexImage2D, TexImage3D, CopyTexImage1D, and
CopyTexImage2D functions. When the application specifies a generic
format, the driver is free to pick an uncompressed format.
This patch reverts the changes due to following commit:
commit a36581ccc0
mesa: do more teximage error checking for generic compressed formats
This patch fixes compressed texture format failures in intel oglconform
pxconv-gettex test case:
https://bugs.freedesktop.org/show_bug.cgi?id=47220
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Don't dereference NULL pointers, and if all views are NULL, don't generate an
invalid PM4 packet which locks up the GPU.
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Mesa doesn't check the parameter passed to glMultiTexCoord*. It does,
however, mask the texture value to prevent out-of-bounds writes. This
patch will promote this non-conformant behavior to OpenGL ES 1. I don't
think anyone will care, and the gets some silly code out of a hot path.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
This is required to make some of llvm's api calls
thread save. In particular the PassRegistry, which is
implicitly accessed while compiling shader programs.
The PassRegistry uses a mutex that is only active if
the llvm_is_multithreaded() returns true.
Calling llvm_start_multithreading() makes this happen
and by calling this function we try to make sure that
we can savely compile shaders in paralell.
Since there is also a call llvm_stop_multithreading()
in the llvm api, we cannot guarantee that this does
not get switched off while we are relying on this being
set, but for the easier use cases this fixes a race with
the radeon llvm compiler we have as of today.
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
Signed-off-by: Tom Stellard <thomas.stellard@amd.com>
In the past, when we called pipe::set_sampler_views(n) the drivers set
samplers [n..MAX] to NULL. We no longer do that. The state tracker
code was already trying to set unused sampler views to NULL to cover
that case, but the logic was broken and unnoticed until now. This patch
fixes it.
Strictly speaking, this patch shouldn't be necessary. Drivers should simply
ignore unused samplers and sampler views. But some drivers like llvmpipe (and
others?) count those things and they figure into state validation. That could
be fixed in the future.
Fixes http://bugs.freedesktop.org/show_bug.cgi?id=53617
Reviewed-by: Marek Olšák <maraeo@gmail.com>
GL_INVALID_OPERATION is to be raised when querying a non-compressed
image/buffer. Since a buffer object can't have a compressed format this
query always generates an error.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These are gradually going to get whittled away and eventually folded into the
source files with the native type functions.
v2: Add (speculative) SConscript changes. These may be broken.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
In the old backend, we looked at any FS attribute's proj_attrib_mask bits, not
just texcoords. Now that we have _mesa_vert_result_to_frag_attrib(), we can
fill in the other FS inputs with correct proj_attrib_mask info.
NOTE: This is a candidate for stable branches.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=46644
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The OpenGL 3.1 specification explicitly allows this. Oddly, the
ARB_texture_buffer_object spec's issues section claims this isn't
allowed, but proceeds to explain that the extension simply doesn't edit
the underlying spec to allow it, and thus it didn't appear in the list
of legal texture targets.
Thus, this patch legalizes it only in 3.1+ contexts, but still returns
INVALID_ENUM in earlier contexts that expose ARB_texture_buffer_object.
Unfortunately, the behavior of the call is horrendously undefined.
Fixes oglconform's tbo/negative.textureParams test.
v2: Require desktop OpenGL.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Move the _mesa_GetTexLevelParameter[iv] functions below the helper
function so the prototype is available.
This will be useful in the next commit.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
For cube maps, _mesa_generate_mipmap() calls this with
GL_TEXTURE_CUBE_MAP (the gl_texture_object's Target) rather than one
of the faces. This caused _mesa_max_texture_levels() to return 0, which
resulted in maxLevels == -1 and the next line's assertion to fail.
This function is called from seven places:
- fbobject.c: framebuffer_texture()
- mipmap.c: _mesa_generate_mipmap()
- texgetimage.c:
- getteximage_error_check()
- getcompressedteximage_error_check()
- texparam.c: _mesa_GetTexLevelParameteriv()
- texstorage.c: tex_storage_error_check()
All of these (or their callers) now explicitly check for invalid targets
already, so this shouldn't cause invalid targets to slip through.
(Technically _mesa_generate_mipmap() doesn't check for invalid targets,
but the API-facing _mesa_GenerateMipmapEXT() function does.)
+2 oglconforms (float-texture/mipmap.automatic and mipmap.manual)
In addition to fixing the mipmap bug, it should also cause glTexStorage
to accept GL_TEXTURE_CUBE_MAP, which is explicitly allowed by the spec.
v2: Drop alterations to callers; this is now in a patch series that adds
explicit checking to API functions.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously, it relied on _mesa_max_texture_levels() for texture target
error checking. This was somewhat dodgy, as _mesa_max_texture_levels()
is called in seven diferent places, not all of which necessarily accept
the same list of targets.
I copied the list of legal targets from _mesa_max_texture_levels(), so
this patch should not introduce any change in behavior. Future patches
will cause the two to diverge.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously, they relied on _mesa_max_texture_levels() for texture target
error checking. This was somewhat dodgy, as _mesa_max_texture_levels()
is called in seven diferent places, not all of which necessarily accept
the same list of targets.
I copied the list of legal targets from _mesa_max_texture_levels() but
removed the proxy targets, as both functions explicitly rejected those
targets. This changes the order in which we check errors, which could
change whether we return INVALID_VALUE or INVALID_ENUM. However, it
shouldn't change the list of accepted targets.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
It's possible for us to have an unused sampler bound when the fragment
shader itself doesn't use any samplers. So the assertion isn't valid.
Fixes http://bugs.freedesktop.org/show_bug.cgi?id=53616
We aligned the dimensions to the blocksize, then divided by it
(in r600_blit.c), then minified, which was wrong.
The minification must be done first, not last.
This fixes piglit/fbo-generatemipmap-formats with S3TC and maybe
a bunch of other tests too. Tested on RV730.
This seems to be expected by the WebGL texture-mips test. The error makes
sense, but I haven't found (yet) any OpenGL documentation specifying this
error condition.
See http://bugs.freedesktop.org/show_bug.cgi?id=44912
Note: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
As with other recent changes, put the vertex and fragment sampler state
into arrays indexed by the shader type. This will let us easily add
support for other types of shaders in the future.
PIPE_MAX_SAMPLERS, PIPE_MAX_VERTEX_SAMPLERS and PIPE_MAX_GEOMETRY_SAMPLERS
were all defined to the same value (16).
In various places we're creating arrays such as
sampler_views[PIPE_SHADER_TYPES][PIPE_MAX_SAMPLERS] so we were assuming
the same number of max samplers for all shader stages anyway.
Of course, drivers are still free to advertise different numbers of max
samplers for different shaders.
The previous test for result != NULL was kind of bogus since we dereferenced
the pointer earlier in the code. Now, check for result != NULL first, then
get the result->key info.
Also, remove the useless "offset +=" code at the end.
We'd end up re-using the old one and throwing away the new one anyway, but only
after a roundtrip to the kernel.
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
If a hole exactly matches the allocated size plus alignment, we would fail to
preserve the alignment as a hole. This would result in never being able to use
the alignment area for an allocation again.
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Otherwise we'll likely end up with an ever increasing amount of ever smaller
holes.
Requires keeping the list ordered wrt offsets.
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Otherwise we'd wrap around after 32 bits. The kernel currently limits GPU
virtual address space to 4GB anyway, but that will probably change sooner or
later, and this would result in confusing error messages when running out of
virtual address space even now.
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
This adds support for having libGL pick a different driver for prime support.
DRI_PRIME env var is set to the value retrieved from the server randr
provider calls, by the calling process. (generally DRI_PRIME=1 will be
the right answer).
Signed-off-by: Dave Airlie <airlied@redhat.com>
With this we can embed data for the shaders (like resource
descriptors) into the PM4 stream.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
I was seeing some GPU hangs that seemed to be cause by ALU instructions
writing to the same register used as the source for VTX_READ. Adding
this constraint to the VTX_READ instructions avoids this situation.
The only allowed instructions are TXQ_LZ and TXF.
TXQ_LZ is like TXQ, but without the LOD parameter (which is always zero
with MSAA textures)
The 3rd or the 4th texcoord component in TXF should contain the sample index
for a 2D_MSAA or 2D_ARRAY_MSAA texture, respectively.
The problem was that the string matching succeeded e.g. for "2D" when there
was actually "2D_MSAA" and then failed parsing "_MSAA".
To prevent similar failures in the future, let's fix this kind of error
everywhere.
Rename _mesa_pack_rgba_span_int to _mesa_pack_rgba_span_from_uints.
Add _mesa_pack_rgba_span_from_ints.
These separate routines allow the integer clamping to be handled
properly for signed versus unsigned integers.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
We need to downsample before flushing BUFFER_FAKE_FRONT_LEFT to
BUFFER_FRONT_LEFT in intel_flush_front.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Stop repeating ourselves. Replace the 4 instances of
`driContext->driDrawablePriv` with `driDrawable`.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Move it from intel_screen.c to intel_context.c. Redeclare as non-static.
A future commit will use it in multiple files.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Unlike 1.x to 2.0, OpenGL ES 3.0 is backwards compatible with 2.0. Use the
same API flag for both. Applications that specifically want 3.0 will specify
this using the major / minor version attributes.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Just like in GLX, EGL_KHR_create_context requires DRI2 version >= 3, and
EGL_EXT_create_context_robustness requires both DRI2 version >= 3 and the
__DRI2_ROBUSTNESS extension.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The extra block in dri2_create_context is to prevent extra white space noise
in the next patch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Add GL_ARB_invalidate_subdata to release notes at Brian's
suggestion.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
These are part of GL_ARB_invalidate_subdata (but not OpenGL ES 3.0).
v2: Add comment explaining why minimum dimensions are set to 1 for some
texture targets. Add default case to switch statement to silence
compiler warnings and detect new texture targets. Both changes
suggested by Brian. Also use _mesa_is_desktop_gl as suggested by Ken.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These are part of GL_ARB_invalidate_subdata (but not OpenGL ES 3.0).
v2: Use _mesa_bufferobj_mapped instead of testing
gl_buffer_object::Pointer as suggested by Brian. Also use
_mesa_is_desktop_gl as suggested by Ken.
v3: Add a comment by the map subrange / discard range overlap test and
fix an off-by-one error noticed by Ken.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
With this change _mesa_init_bufferobj_dispatch won't set function
pointers that don't exist in OpenGL ES.
v2: Use _mesa_is_desktop_gl and _mesa_is_gles3 as suggested by Ken.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These are part of GL_ARB_invalidate_subdata and OpenGL ES 3.0.
v2: Reject aux buffers in core context, and use _mesa_is_desktop_gl and
_mesa_is_gles3. Both suggested by Ken.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is basically cut-and-paste from the swrast implementation, and it
could probably be (slightly) more optimal.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
No driver supports this extension, and it seems unlikely than any driver
ever will. I think r300c may have supported it at one time, but that
driver has already been removed.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
The final step of _mesa_unpack_depth_span is to take the temporary
GLfloat depth values and convert them to the desired format. When
converting to GL_UNSIGNED_INTEGER with depthMax > 0xffffff, we use
double-precision math to avoid overflow and precision problems.
Or at least that's the idea. Unfortunately
GLdouble z = depthValues[i] * (GLfloat) depthMax;
actually causes single-precision multiplication, since both operands are
GLfloats. Casting depthMax to GLdouble causes the scaling to be done
with double-precision math.
Fixes a regression in oglconform's depth-stencil basic.read.ds test
since c60ac7b179, where the expected and
actual values differed slightly. For example, 0xcfa7a6 vs. 0xcfa7a4.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=49772
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Use base-10 for versions like gl_context::Version. Suggested by Ken.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Use base-10 for versions like gl_context::Version. Suggested by Ken.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This forces the drivers to do at least some validation of context API
and version before creating the context. In r100 and r200 drivers, this
means that they don't do any post-hoc validation.
v2: Actually reject compatibility profile 3.2+ contexts. Thanks Ken.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It may be possible to trim the list of extensions futher. These are
just the obvious extensions that add functionality that the core context
explicitly forbids. Apple's core-context extension list is *just* the
extensions on top of the core GL version. I'm not sure we want to go
that far, but removing some things that have been in core since 2.1 may
be okay.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Add both top_srcdir and top_builddir to mesa asm include dirs.
These require both in-tree and build-time-generated files.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Christopher James Halse Rogers <christopher.halse.rogers@canonical.com>
Like in src/mesa, use GLSL_BUILDDIR/GLSL_SRCDIR to unambiguously
distinguish between in-tree and generated files.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Christopher James Halse Rogers <christopher.halse.rogers@canonical.com>
Also fix include paths for the generated headers.
v2: Switch to using self-explanatory BUILDDIR/SRCDIR defined from
top_builddir/top_srcdir rather than the ambiguous TOP.
v3: Add both top_builddir and top_srcdir to include flags for mesa asm.
These rely on both in-tree and build-time-generated includes.
v4: Rebased on top of 948c8f502a.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Christopher James Halse Rogers <christopher.halse.rogers@canonical.com>
Signed-off-by: Matt Turner <mattst88@gmail.com>
After realizing that brw_finish_batch emitted some final PIPE_CONTROLs
to record occlusion queries, Chris noted that we probably hadn't
reserved enough space to actually emit them.
Reserving a full 60 bytes seems a bit harsh, since we only need that
much if occlusion queries are actually active. Plus, 28 bytes would be
sufficient for Gen7, and 24 for Gen4-5.
We could optimize this in the future, but it doesn't seem too critical.
NOTE: This is a candidate for stable release branches.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=53311
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Eric Anholt <eric@anholt.net>
On Gen4+, brw_finish_batch() calls brw_emit_query_end(), which emits
some extra PIPE_CONTROLs to capture the current occlusion query data.
Unfortunately, it was being called *after* _intel_batchbuffer_flush
added the MI_BATCH_BUFFER_END, meaning those PIPE_CONTROLs didn't get
inside the batch.
Not only does this likely cause bogus occlusion query values, it can
also cause crashes: with the recent change to use 64-bit depth count
writes on Gen6+, we started emitting an odd-length PIPE_CONTROL, which
happened after the MI_NOOP padding. This resulted in an odd-length
batch buffer, which resulted in execbuf2 returning -EINVAL and the
application dying with an intel_do_flush_locked failure.
On older generations, finish_batch() doesn't emit any state, so this
change shouldn't have any effect.
Huge thanks to Chris Wilson for helping me figure this out.
NOTE: This is a candidate for stable release branches.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=53311
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Eric Anholt <eric@anholt.net>
I want to introduce some more debug output for performance surprises that
includes fallbacks, but aren't necessarily software rasterization. Leave
INTEL_DEBUG=fall in place for those that have used that flag before.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Avoid INVALID_OPERATION error if decompressing rectangle texture.
Setting mipmap level limits for those textures is error that must not be
hit by meta code to mislead user.
[v3/Kayden]: Resolve conflicts due to Eric picking a subset of Pauli's
original changes.
Signed-off-by: Pauli Nieminen <pauli.nieminen@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Sampler objects are perfect for meta operations.Sampler object
is separate state object that shadows the sampling state in texture
object. With sampler object mipmap can maintain same sampling state for
all subsequent generation requests.
Signed-off-by: Pauli Nieminen <pauli.nieminen@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Sampler queries are so far made only for enabled texture unit. But if
any code would query sampler before checking texture unit state that
would result to NULL deference.
Making the inline helper easier to use with NULL check makes a lot sense
because compiler is likely to combine the checks for the current texture.
Signed-off-by: Pauli Nieminen <pauli.nieminen@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In tune with previous patches. Again there is duplication of information
in function parameters that is good to remove.
Signed-off-by: Pauli Nieminen <pauli.nieminen@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Size and format information is always stored in gl_texture_image
structure. That makes it preferable to remove duplicate information from
parameters to make interface easier to understand.
Signed-off-by: Pauli Nieminen <pauli.nieminen@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
gl_texture_image structure always holds size and internal format before
TexImage driver hook is called. Those passing same information in
function parameters only duplicates information making the interface
harder to understand.
Signed-off-by: Pauli Nieminen <pauli.nieminen@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Commit 6882381a2e added a dependency on a
newer version of xcb, but the version check wasn't added in all the
necessary places.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This reverts commit 9f5a5d541d.
Fixes the following build error on GCC 4.2.3:
cc1plus: error: unrecognized command line option "-Wno-narrowing"
The GCC Manual incorrectly stated that commit 9f5a5d54 woulde be safe for
old versions of GCC.
Reported-by: Andy Furniss <andyqos@ukfsn.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
The var!=softpipe->fs_variant assertion was failing because we weren't
nulling the softpipe->fs_variant pointer when binding a new shader.
Since softpipe->fs_variant depends on the current fs, it's of no use
when a new FS is bound.
Fixes http://bugs.freedesktop.org/show_bug.cgi?id=53318
Note: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
After we attach a new renderbuffer in this function we need to make
sure Mesa's update_framebuffer() gets called.
Fixes crash in WebGL conformance/textures/texture-attachment-formats.html,
but the test still fails for other reasons.
Fixes http://bugs.freedesktop.org/show_bug.cgi?id=53316
Note: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Add -Wno-narrowing to CXXFLAGS for gcc.
It is safe to add this flag even for versions of gcc that don't recognize
it. From the GCC Manual [1]: "[GCC] allows the use of new -Wno- options
with old compilers".
This removes warnings of the form
warning: narrowing conversion of X from 'int' to 'float' inside { } is
ill-formed in C++11 [-Wnarrowing]
in ff_fragment_shader.cpp and gen6_blorp.cpp of the form. When building
i965, I observed no other difference in the build output.
[1] http://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Fixes WebGL conformance/uniforms/uniform-default-values.html crash.
We need to check for the null view pointer before accessing view->texture.
Fixes http://bugs.freedesktop.org/show_bug.cgi?id=53317
Note: This is a candidate for the 8.0 branch.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Always downsample before mapping, even if the map mode contains
GL_MAP_INVALIDATE_RANGE_BIT. If we neglect to downsample when only
a subrect is mapped then the upsample in intel_miptree_unmap_multisample
may write garbage to the region outside the subrect.
(Eric gave my patch e88cfbb a conditional reviewed-by with the condition
that it always downsample before mapping. I forgot to make that change
before pushing the patch.)
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Fixes the glsl skinning demo regression since changing to the new GLSL
compiler, and is part of fixing piglit gl-2.0-edgeflag.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=50079
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If there was an edge flag or a two-side-color pair present, we'd end up
mismatched and read values from earlier in the VUE for later FS inputs.
v2: Fix regression in gles2conform shaders generating point size. (change by
anholt)
Signed-off-by: Olivier Galibert <galibert@pobox.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
NOTE: This is a candidate for the 8.0 branch.
If the application has requested reset notification, then
dri2_convert_glx_attribs will initialize this to the correct value.
Otherwise, it's supposed to initialize this to NO_NOTIFICATION, but
doesn't when num_attribs == 0. (The consensus seems to be that we
should make it do so, but that's more invasive, so I'm pushing this for
now.)
Fixes a regression since a8724d85f8
where trying to run OilRush_x86 or apitrace heaven_x64 would result in:
dri_util.c:221: dri2CreateContextAttribs: Assertion `!"Should not get
here."' failed.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=53076
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Patch changes i915 and i965 drivers to use fixed function version of
meta clear when running on ES 1.1. This fixes rendering errors seen with
Google Maps, Angry Birds and Gallery3D on Android platform.
Change 88128516d4 exposes all extensions
internally to be available independent of GL flavour, therefore check
against ARB_fragment_shader does not work.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=50333
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This removes the CS stall on Ivybridge.
On Sandybridge, the depth stall needs to be preceded by a non-zero
post-sync op, which requires a CS stall, which needs a stall at
scoreboard. Emit the full workaround.
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
I don't know if it was possible to trigger this bug -- we don't merge
saturates into the math instruction because we're bad at coalescing currently,
and there's nothing generating these with predicates. Still, let's avoid
future bugs when we do smarter codegen.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This was ridiculous. We were ignoring the inst->header.saturate flag in the
case of math and only math. On gen4, we would leave inst->header.saturate in
place if it happened to be set, which would end up being applied to the
implicit mov and thus trash the first argument. On gen6, we would overwrite
inst->header.saturate with the saturate flag from the argument, which was not
set appropriately in brw_vec4_emit.cpp, and was only not a bug due to our
incompetence at coalescing saturate moves.
By ripping the argument out and making saturate work just like all the other
brw_eu_emit.c code generation, we can avoid both these classes of bugs.
Fixes piglit fog-modes, and the new specific fs-saturate-exp2 case.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=48628
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
There was a chance for brw_wm_emit.c to screw up and pass (1 << 4) instead of
1, which would get converted to 0 when stored. Instead, use stdbool which
converts nonzero to true/1 like we want.
Otherwise, conditional rendering always takes the fallthrough "render it
anyway" case unless the application had itself done a check or wait on the
query.
Fixes intel oglconform's conditional_render advanced.nofbo.readpixels.
Reviewed-by: Brian Paul <brianp@vmware.com>
NOTE: This is a candidate for the 8.0 branch.
I happened to notice this while looking at a blit pass in l4d2, which had an
optional push/pop around framebuffer srgb setting. It didn't matter in the
end, but the fix is sitting in my tree now.
Reviewed-by: Brian Paul <brianp@vmware.com>
NOTE: This is a candidate for the 8.0 branch.
You can't practically have desktop OpenGL and OpenGL ES on the same system
without this. The benefits of not having it (e.g., a more compact dispatch
table) are irrelevant.
v2: Don't mark shared-glapi as experimental. Review suggestion by Chad.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
These are largely based on the src/mapi/glapi/tests. However,
shared-glapi provides less external visibility into the dispatch table,
so there is less to test. Also, shared-glapi does not implement
_glapi_get_proc_name, so that test was removed.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
When --enable-shared-glapi is used, all non-ABI entries in the table are
lies. Avoiding the use of glapitable.h avoids the lies. The only
entries used in this code are entries that are ABI. For these, the ABI
offset can be used directly.
Since this code is in src/glx, it can't use src/mesa/main/dispatch.h to
get the pretty names for these offsets.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
When --enable-shared-glapi is used, all non-ABI entries in the table are
lies. There are two completely separate code generation paths used to
assign dispatch offset. Neither has any clue about the other.
Unsurprisingly, the can't agree on what offsets to assign.
This adds a bunch of overhead to __glXNewIndirectAPI, but this function
is called at most once.
The test ExtensionNopDispatch was removed. There was just no way to
make this test work with the information provided in shared-glapi.
Since indirect_glx.c uses _glapi_get_proc_offset now, it was also
impossible to make the tests work without shared-glapi. So much pain.
This fixes indirect rendering with shared-glapi.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
This fixes 'make check' on with --enable-shared-glapi. This test cannot work
in that environment.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
The hardware seems to use the length of the PIPE_CONTROL command to
indicate whether the write is 64-bits or 32-bits. Which makes sense
for immediate writes.
Daniel discovered this by writing a pattern into the query object bo
and noticing that the high 32-bits were left intact, even on those
pipe control writes that seemingly worked.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Eric Anholt <eric@anholt.net>
The hardware seems to use the length of the PIPE_CONTROL command to
indicate whether the write is 64-bits or 32-bits. Which makes sense
for immediate writes.
Daniel discovered this by writing a pattern into the query object bo
and noticing that the high 32-bits were left intact, even on those
pipe control writes that seemingly worked.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Eric Anholt <eric@anholt.net>
This consolidates the complexity in one place, which is important
because it's about to get even more complicated.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Eric Anholt <eric@anholt.net>
PIPE_CONTROL has variable length, depending upon generation and whether
we want to do 32-bit or 64-bit data writes. Make it explicit, rather
than hiding a length of 4 in the #define for _3DSTATE_PIPE_CONTROL.
Generated by s/3DSTATE_PIPE_CONTROL/3DSTATE_PIPE_CONTROL | (4 - 2)/g.
This is equivalent since the #define used to have | 2 in it. A grep
through the sources shows that all instances have been converted, so
it's safe to remove the | 2 from the #define.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Eric Anholt <eric@anholt.net>
Unlike the FS side in the previous commit, this does variable indexing just
fine, using the same code as we used for other variable-indexed pull
constants.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Variable array indexing isn't finished, because the lowering pass
turns it all into conditional moves of constant index accesses so I
can't test it.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I wanted to add the surface index as a variable value for UBO support,
and a reg seemed like the obvious way to go. This exposes more of the
information to CSE, which we'll probably want to apply to pull
constant loads for UBOs eventually (you might access 4 floats in a
row, each of which would produce an oword block read of the same
block).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes piglit GL_ARB_uniform_buffer_object/dlist.
v2: Use the .ui fields instead of .i for type consistency (review by Brian
Paul)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The ARB spec lets you get away with the default block counting against the
blocks for combined size limits. The core spec says you need to be able to
support the maximum size of default block *and* the maximum size of each
uniform block. I see no reason that any driver would have a problem with
that.
Fixes gl 3.1/minmax (with an associated fix to the test)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We were only propagating it to the API when the variable was a matrix type,
but we were still tripping over it in lower_ubo_reference when it was set on a
vector.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We were getting the base offset of a vec2, not of a vec2[2] like the quoted
spec text says we should.
v2: Fix swapped then/else cases.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, we were returning the index into the UniformBlocks of one of the
linked shaders, when it's supposed to be the program global index.
Fixes piglit getactiveuniformsiv-uniform_block_index.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In between glGenBuffers() and glBindBuffer(), the buffer object points to this
dummy buffer with a name of 0, and a glBindBufferBase() would point to that.
It seems pretty clear, given that glBindBufferBase() only cares about the
current size of the buffer at render time, that it should bind up the buffer
that you passed in instead of pointing it at this useless dummy buffer.
However, what should glBindBufferRange() do? As of this patch, it will
promote the genned buffer to a proper buffer like it had been
glBindBuffer()ed, and then detect that the size is greater than the buffer's
current size of 0 and throw INVALID_VALUE. It seems like the most reasonable
answer here.
Note that this also changes the behavior of these two on non-glGenBuffers() bo
names. We haven't yet set up the error throwing for glBindBuffers() on gl
3.1+, and my assumption is that these two functions should inherit their
behavior on un-genned names from glBindBuffers().
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Reduce the impenetrable code in emit_ubo_loads() by 23 lines by keeping
the ir_variable as the variable part of the offset from handle_rvalue(),
and track the constant offsets from that with a plain old integer value,
avoiding a bunch of temporary variables in the array and struct handling.
Also, fix file description doxygen.
v3: Fix a row vs col typo, and fix spelling in a comment.
Reviewed-by: Eric Anholt <eric@anholt.net>
For the UBO lowering pass, I want to see the whole dereference chain for
replacing, not the innermost ir_dereference_variable.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Drivers will probably want to be able to take UBO references in a
shader like:
uniform ubo1 {
float a;
float b;
float c;
float d;
}
void main() {
gl_FragColor = vec4(a, b, c, d);
}
and generate a single aligned vec4 load out of the UBO. For intel,
this involves recognizing the shared offset of the aligned loads and
CSEing them out. Obviously that involves breaking things down to
loads from an offset from a particular UBO first. Thus, the driver
doesn't want to see
variable_ref(ir_variable("a")),
and even more so does it not want to see
array_ref(record_ref(variable_ref(ir_variable("a")),
"field1"), variable_ref(ir_variable("i"))).
where a.field1[i] is a row_major matrix.
Instead, we're going to make a lowering pass to break UBO references
down to expressions that are obvious to codegen, and amenable to
merging through CSE.
v2: Fix some partial thoughts in the ir_binop comment (review by Kenneth)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When converting var->location from pointing at the program's UniformBlocks to
pointing at the linked shader's UniformBlocks, I missed this change. It
usually worked out in the end because the two lists happen to be the same in
many testcases.
Fixes a valgrind complaint on
oglconform ubo-compile.cpp advanced.std140.2stage
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
As we get into supporting GL 3.x core, we come across more and more features
of the API that depend on the version number as opposed to just the extension
list. This will let us more sanely do version checks than "(VersionMajor == 3
&& VersionMinor >= 2) || VersionMajor >= 4".
v2: Fix a bad <= 30 check.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This turns on window system MSAA.
This patch changes the id of many GLX visuals and configs, but that
couldn't be prevented. I attempted to preserve the id's of extant configs
by appending the multisample configs to the end of the extant ones. But
somewhere, perhaps in the X server, the configs are reordered with
multisample configs interspersed among the singlesample ones.
Test results:
Tested with xonotic and `glxgears -samples 1` on Ivybridge.
No piglit regressions on Ivybridge.
On Sandybridge, passes 68/70 of oglconform's
winsys multisample tests. The two failing tests are:
multisample(advanced.pixelmap.depth)
multisample(advanced.pixelmap.depthCopyPixels)
These tests hang the gpu (on kernel 3.4.6) due to
a glDrawPixels/glReadPixels pair on an MSAA depth buffer. I don't expect
realworld apps to do that, so I'm not too concerned about the hang.
On Ivybridge, passes 69/70. The failing case is
multisample(advanced.line.changeWidth).
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
This function felt sloppy, so this patch cleans it up a little bit.
- Rename `color` to `i`. It is not a color value, only an iterator int.
- Move `depth_bits[0] = 0` into the non-accum loop because that is where
it used. The accum loop later overwrites depth_bits[0].
- Rename `depth_factor` to `num_depth_stencil_bits`.
- Redefine `msaa_samples_array` as static const because it is never
modified. Rename to `singlesample_samples`.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
If either argument to driConcatConfigs(a, b) is null or the empty list,
then simply return the other argument as the resultant list.
All callers were accomplishing that same behavior anyway. And each caller
accopmplished it with the same pattern. So this patch moves that external
pattern into the function.
Reviewed-by: <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
DRI2 configs were constructed in intelInitScreen2. That function already
does too much, so move verbatim the code for creating configs to a new
function, intel_screen_make_configs.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Add two new functions: intel_miptree_{map,unmap}_multisample, to which
intel_miptree_{map,unmap} dispatch. Only mapping flat, renderbuffer-like
miptrees are supported.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Move the opencoded construction and destruction of intel_miptree_map into
new functions, intel_miptree_attach_map and intel_miptree_release_map.
This patch prevents code duplication in a future commit that adds support
for mapping multisample miptrees.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Move the body of intel_miptree_map into a new function,
intel_miptree_map_singlesample. Now intel_miptree_map dispatches to the
new function. A future commit adds a multisample variant.
Ditto for intel_miptree_unmap.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Add function intel_renderbuffer_set_needs_downsample. It is a no-op
except on multisample winsys buffers shared with DRI2.
Mark the needed downsamples with the new function at two locations:
- Immediately after drawing is complete.
- After blitting.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Define a function, brw_blorp_blit_miptrees, that simply wraps
brw_blorp_blit_params + brw_blorp_exec with C calling conventions. This
enables intel_miptree.c, in a following commit, to perform blits with
blorp for the purpose of downsampling multisample miptrees.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Immediately after obtaining, with DRI2GetBuffersWithFormat, the DRM buffer
handle for a DRI2 buffer, we wrap that DRM buffer handle with a region and
a miptree. This patch additionally allocates an accompanying multisample
miptree if the DRI2 buffer is multisampled.
Since we do not yet advertise multisample GL configs, the code for
allocating the multisample miptree is currently inactive.
This patch adds the following fields to intel_mipmap_tree:
singlesample_mt
needs_downsample
and the following function stubs:
intel_miptree_downsample
intel_miptree_upsample
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Move the logic for creating the ancillary hiz and mcs miptress for winsys
and non-texture renderbuffers from intel_alloc_renderbuffer_storage to
intel_miptree_create_for_renderbuffer. Let's try to isolate complex
miptree logic to intel_mipmap_tree.c.
Without this refactor, code duplication would be required along the
intel_process_dri2_buffer codepath in order to create the mcs miptree.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Add a new param, num_samples, to intel_create_renderbuffer and
intel_create_private_renderbuffer.
No multisample GL config is yet advertised, so the value of num_samples is
currently 0. For server-owned winsys buffers, gl_renderbuffer::NumSamples
is not yet used.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com> (v1)
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Rename quantize_num_samples to intel_quantize_num_samples and change the
first param from struct intel_context* to struct intel_screen*. The
function will later be used by intelCreateBuffer, which is not bound to
any context but is bound to a screen.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com> (v1)
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
The comment referred to intel_tex_image_map/unmap, but should more
accurately refer to intel_miptree_map/unmap.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Fixes uninitialized scalar field defect reported by Coverity.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
v2: Note that GLSL 4.3 has not been started, and that
ARB_compute_shader has been started in Gallium drivers.
Signed-off-by: Jason Wood <sandain@hotmail.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
KHR extension name is reserved for Khronos ratified extensions, and there is
no such thing as EGL_KHR_surfaceless_{gles1,gles2,opengl}. Replace these
three extensions with EGL_KHR_surfaceless_context since that extension
actually exists.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Since support for swrast version 2 was added (f55d027a), it has also been
required. In swrast_driver_extensions, version 2 is set for __DRI_SWRAST
extension. Remove the spurious version checks sprinked through the code.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Previously an error would be generated if any attributes were specified when
creating a non-desktop OpenGL context. This was a mistake, and it will
prevent old drivers from working with new EGL libraries that add support for
the createContextAttribs interface. Instead, match the behavior of
EGL_KHR_create_context: allow versions that make sense, reject non-zero flags.
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Commit f0cecd43d6 moved the VUE map computation to be only once, at
VS compile time. However, it did so in slightly the wrong place: it
made the one call to brw_vue_compute_map happen right before the
allocation of dummy slots for replaced point sprite coordinates, causing
a different VUE map to be generated (at least on Ironlake).
Fixes a regression in Piglit's point-sprite test on Ironlake.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=46489
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Consider a texture call such as:
textureLod(s, coordinate, log2(...))
First, we begin setting up the sampler message by loading the texture
coordinates into MRFs, starting with m2. Then, we realize we need the
LOD, and go to compute it with:
ir->lod_info.lod->accept(this);
On Gen4-5, this will generate a SEND instruction to compute log2(),
loading the operand into m2, and clobbering our texcoord.
Similar issues exist on Gen6+. For example, nested texture calls:
textureLod(s1, c1, texture(s2, c2).x)
Any texturing call where evaluating the subexpression trees for LOD or
shadow comparitor would generate SEND instructions could potentially
break. In some cases (like register spilling), we get lucky and avoid
the issue by using non-overlapping MRF regions. But we shouldn't count
on that.
Fixes four Piglit test regressions on Gen4-5:
- glsl-fs-shadow2DGradARB-{01,04,07,cumulative}
NOTE: This is a candidate for stable release branches.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=52129
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
With the textureRect support and GL_CLAMP workarounds, it's grown
sufficiently that it deserves its own function. Separating it out
makes the original function much more readable.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Setting the texture offset bits in the message header involves very
specific hardware register descriptions. As such, I feel it's better
suited for the lower level "generate" layer that has direct access to
the weird register layouts, rather than at the fs_inst abstraction layer.
This also parallels the approach I took in the VS backend.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Use atom for sampler state. Does not provide new functionality
or fix any bug. Just a step toward full atom base r600g.
v2: Split seamless on r6xx/r7xx into it's own atom. Make sure it's
emited after sampler and with a pipeline flush before otherwise
it does not take effect.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
...to look like update_fragment_samplers() code, as with the previous
commit. The next step would be to merge the two functions.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Less code. And as with softpipe, if/when we consolidate the pipe_context
functions for binding sampler state, this will make the llvmpipe changes
trivial.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
The functions for setting samplers and sampler views for vertex,
fragment and geometry shaders were nearly identical. Now they
use shared code.
In the future, if the pipe_context functions for setting samplers
and sampler views for vert/frag/geom/compute are combined, this
will make updating the softpipe driver a snap.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Combine separate arrays for vertex/fragment/geometry samplers, etc into
one array indexed by PIPE_SHADER_x.
This allows us to collapse separate code for vertex/fragment/geometry
state into loops over the shader stage. More to come.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Fixes dereference before null check defect reported by Coverity.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Fixes uninitialized pointer read defect reported by Coverity.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Fixes dereference before null check defect reported by Coverity.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Merge the vertex/fragment versions of the cso_set/save/restore_samplers()
functions. Now we pass the shader stage (PIPE_SHADER_x) to the function
to indicate vertex/fragment/geometry samplers. For example:
cso_single_sampler(cso, PIPE_SHADER_FRAGMENT, unit, sampler);
This results in quite a bit of code reduction, fewer CSO functions and
support for geometry shaders.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Fixes uninitialized scalar variable defect reported by Coverity.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
The GL_OES_mapbuffer extension is supported by OpenGL ES 1 and ES 2 so return
GL_MAP_WRITE_BIT for both ES versions, not just ES 1.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Before, the GLSL parser was getting rebuilt every time that scons was
run. The problem was scons was expecting a glsl_parser.hpp file but
we were generating a glsl_parser.h file.
Signed-off-by: Brian Paul <brianp@vmware.com>
Windowed speed is of course way to slow, but fullscreen
works like a charm now.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Using the writemask in the sampler results in packet
VGPRS. For now just sample all components and let
llvm chose the right one.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
The backend is multiplying the offset by the numbers of
elements anyway, so doing it twice just makes everything
crash.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
The patch makes the SCons build with Intel Compiler successful.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Framebuffer blit needs to setup texture sampling with no reference to the
user's texturing state, and a sampler object lets us avoid a bunch of changes
to the user's state setup.
We don't bother caching the sampler object since we're changing parameters in
it based on the filtering option to glBlitFramebuffer().
Fixes piglit GL_ARB_sampler_objects/framebufferblit and rendering in l4d2 (our
setting of srgb decode wasn't being respected due to the user's sampler object
being active).
Signed-off-by: Pauli Nieminen <pauli.nieminen@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Sampler objects can be used to shadow texture object state without
modifying original application state. Decompression path feels a bit
like path where caching shouldn't happen. But as everything else is
cached already I decided to cache sampler state too.
Signed-off-by: Pauli Nieminen <pauli.nieminen@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
To allow meta module to use sample objects mesa GL functions need to be
visible and linkable for meta module.
Signed-off-by: Pauli Nieminen <pauli.nieminen@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
swrast needs to pass sampler object into all texture fetching functions
to use correct sampling state when sampler object is bound to the unit.
The changes were made using half manual regular expression replace.
v2: Fix NULL deref in _swrast_choose_triangle(), because the _Current
values aren't set yet, so we need to look at our texObj2D. (anholt)
Signed-off-by: Pauli Nieminen <pauli.nieminen@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
To allow meta acceleration operations to use sampler objects the
ARB_sampler_objects extension needs to be mandatory for all drivers.
Because the extension doesn't have any hardware dependencies it is
trivial to implement.
Signed-off-by: Pauli Nieminen <pauli.nieminen@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
CompareFailValue is part of Sampler state that needs to be read from
bound sampler object if present.
Signed-off-by: Pauli Nieminen <pauli.nieminen@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Fixed function fragment shader generator was incorrectly read texture
sampling state directly from texture object. To make sure that
ARB_sampler_object works correctly shader generator has to use the
bound sampler if one exist.
Signed-off-by: Pauli Nieminen <pauli.nieminen@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Preparation for the mandatory support of ARB_sampler_objects. I have tested
this patch with rv280 only.
Signed-off-by: Pauli Nieminen <pauli.nieminen@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
When I build tested radeon changes I noticed two warnings about format
size missmatch in 64bit. I decided to clean them to make relevant
compiler warnings easier to spot.
Signed-off-by: Pauli Nieminen <pauli.nieminen@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
ARB_sampler_objects is very simple software only extension to support. I want
to make it a mandatory extension for Mesa drivers to allow the meta module to
use it.
This patch add support for the extension to nouveau. It is completely untested
search and replace patch, except for flagging the texture state as needing to
be recomputed when a sampler object is present.
Signed-off-by: Pauli Nieminen <pauli.nieminen@linux.intel.com>
sRGBDecode state is part of sampler object state but mesa was missing
handlers to access the state. This patch adds the support for required
state changes and queries.
GL_EXT_texture_sRGB_decode issue 4:
"4) Should we add forward-looking support for ARB_sampler_objects?
RESOLVED: YES
If ARB_sampler_objects exists in the implementation, the sampler
objects should also include this parameter per sampler."
Fixes piglit GL_ARB_sampler_objects/GL_EXT_texture_sRGB_decode.
Signed-off-by: Pauli Nieminen <pauli.nieminen@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
GL_DEPTH_TEXTURE_MODE isn't meant to be part of sampler state based on
compatibility profile specifications.
OpenGL specification 4.1 compatibility 20100725 3.9.2:
"... The values accepted in the pname parameter
are TEXTURE_WRAP_S, TEXTURE_WRAP_T, TEXTURE_WRAP_R, TEXTURE_MIN_-
FILTER, TEXTURE_MAG_FILTER, TEXTURE_BORDER_COLOR, TEXTURE_MIN_-
LOD, TEXTURE_MAX_LOD, TEXTURE_LOD_BIAS, TEXTURE_COMPARE_MODE, and
TEXTURE_COMPARE_FUNC. Texture state listed in table 6.25 but not listed here and
in the sampler state in table 6.26 is not part of the sampler state, and remains in the
texture object."
The list of states is in Table 6.24 "Textures (state per texture
object)" instead of 6.25 mentioned in the specification text.
Same can be found from 3.3 compatibility specification.
Signed-off-by: Pauli Nieminen <pauli.nieminen@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This patch allows GL_SAMPLES to be set to either 0 or 1 on i965
platforms that don't support MSAA (those prior to Gen6). Setting
GL_SAMPLES=1 has the same effect as setting it to 0 on these platforms
(because MSAA is unsupported), but is distinguishable via the GL API.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=50165
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
EXT_framebuffer_multisample is a required subpart of
ARB_framebuffer_object, which means that we must support it even on
platforms that don't support MSAA. Fortunately
EXT_framebuffer_multisample allows for this by allowing GL_MAX_SAMPLES
to be set to 1.
This leads to a tricky quirk in the GL spec: since
GlRenderbufferStorageMultisamples() accepts any value for its
"samples" parameter up to and including GL_MAX_SAMPLES, that means
that on platforms that don't support MSAA, GL_SAMPLES is allowed to be
set to either 0 or 1. On platforms that do support MSAA, GL_SAMPLES=1
is not used; 0 means no MSAA, and 2 or higher means MSAA.
In other words, GL_SAMPLES needs to be interpreted as follows:
=0 no MSAA (possible on all platforms)
=1 no MSAA (only possible on platforms where MSAA unsupported)
>1 MSAA (only possible on platforms where MSAA supported)
This patch modifies all MSAA-related code to choose between
multisampling and single-sampling based on the condition (GL_SAMPLES >
1) instead of (GL_SAMPLES > 0) so that GL_SAMPLES=1 will be treated as
"no MSAA".
Note that since GL_SAMPLES=1 implies GL_SAMPLE_BUFFERS=1, we can no
longer use GL_SAMPLE_BUFFERS to distinguish between MSAA and non-MSAA
rendering.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Nearly the whole function body was contained in the 'else' branch. The
'if' branch did one thing: return early with an error. Clean things up by
moving all the code out of the 'else' branch. Decreases max nesting level
from 4 to 3.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
After commit "intel: Convert to using private depth/stencil buffers", we
request from DRI2GetBuffersWithFormat only the front left and back left
buffers. We no longer request depth and stencil buffers.
Assert that in intelAllocateBuffer and remove the related dead code.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
These assignments caused CFLAGS specified on the configure line to
appear twice in the final CFLAGS. Removing them makes the behavior
reasonable -- USER_CFLAGS are appended at the end of CFLAGS, allowing
the builder to override flags added by configure.ac like
-fno-strict-aliasing.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Even on s390{,x} where there's no video card, you still want this so GLX
protocol works.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Adam Jackson <ajax@redhat.com>
This reverts commit 5d5af7d359.
It turns out the issue this was supposed to fix merely counter-acted
a bug in the hardware driver that I wasn't aware of.
The resource_resolve is not supposed to do sRGB conversion, period.
(This would violate the requirement that source and destination must
be of the same format).
no point in emitting aux scissor values if we
a) never enable them
b) never set the actual values
plus it is enough to have that aux scissor enable reg (which we never set to
enable) in one place not two.
There were several problems with these functions (which are a remnant
of dri1 hyperz mostly - should bring it back somehow someday).
First, it would always do a swrast clear if the buffer to clear was a fbo.
Second, for buffers we wouldn't handle the clear (I guess aux/accum?) we
would actually still have tried to clear that later even when we already
cleared it with swrast.
This addresses one issue raised in bug #51658 discovered by Eugene St Leger.
The assert is bogus since there's no problem with texture width/height being
2048 (the width/height programmed is width/height minus one).
OTOH though the programmed size for scissor rect should be width/height
minus one too otherwise bad things may happen (as it is inclusive, and there's
not enough bits for more than a value of 2047).
SI does not support 64-bit immediates natively, but llvm will generate
i64 immediates when indexing loads and stores (since SI has 64-bit
pointers). The i64 indices will always be small enough to fit into
32-bits (i.e. the high 32 bits will always be all zeros), so we can
treat these index values as 32-bits.
In tablegen, if two patterns match, the one that comes first in the file
is given preference. We want the SMRD IMM pattern to be given
preference, because it encodes the pointer offset in its immediate
field, which saves us an add instruction.
I ended up having to add rallocing of the ast_type_qualifier in order
to avoid pulling in ast.h for glsl_parser_extras.h, because I wanted
to track an ast_type_qualifier in the state.
Fixes piglit ARB_uniform_buffer_object/row-major.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Yes, you get to say things like "layout(row_major, column_major)" and
get column major.
Part of fixing piglit ARB_uniform_buffer_object/row_major.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This is like a stripped-down version of glGetActiveUniform that just
returns the name, since the other return values (type and size) of
that function are now meant to be handled with
glGetActiveUniformsiv().
Fixes piglit ARB_uniform_buffer_object/getactiveuniformname
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The previous implementation required a flag in _mesa_glsl_parse_state
and line of code to initialize it for every version of the shading
language we intend to support. As we look to add 150, 330, 400, 410,
420, and beyond, this gets rather unwieldy.
This patch retains the switch statement (to reject, say, #version 111),
but removes all the bits. Code to check for ctx->API == API_OPENGL_CORE
could easily be added to the 110 and 120 cases to reject those.
v2: Use _mesa_is_desktop_gl to preserve the existing behavior in the
presence of the new API_OPENGL_CORE enumeration.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net> [v1]
Fixes some failures in getteximage-formats.
v2: Remove stray include, and drop extra test for encoding == GL_SRGB --
_mesa_get_srgb_format_linear() returns the same format if it wasn't SRGB.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=48120
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
NOTE: This is a candidate for the 8.0 branch.
It was using state->Const.GLSL_100ES, which is set if the driver
supports ARB_ES2_compatibility or we're in ES2 mode. Instead, it should
use state->language_version, as that represents the actual GLSL version
of the shader being compiled.
Since the correct logic is < 120 && !100, just make it == 110.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This will need to get refactored when we add support for core profiles
or forward-compatible contexts, but we may as well have it in the
meantime. This allows us to override the GLSL version and experiment.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Move installing osmesa.pc to drivers/osmesa, where it belongs better
This also restores the installation of gl.pc if we are building osmesa at the
same time as libGL, which was broken in commit 39785488 when the .pc
installation was converted to automake
v2:
Remove HAVE_OSMESA_DRIVER automake conditional, it's now pointless as we
will only be building in the drivers/osmesa directory if the condition it
checked was true.
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Eric Anholt <eric@anholt.net>
This patch fixes this build failure with Intel Compiler.
src/gallium/auxiliary/util/u_format_tests.c(903): error: floating-point operation result is out of range
{PIPE_FORMAT_R16_FLOAT, PACKED_1x16(0xffff), PACKED_1x16(0x7c01), UNPACKED_1x1( NAN, 0.0, 0.0, 1.0)},
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Now that ir_quadop_vector exists, ir_last_binop and ir_last_opcode are
no longer the same. Only one place currently uses this enumeration, and
already handles ir_quadop_vector correctly.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Olivier Galibert <galibert@pobox.com>
It's more convenient to use shortcuts like glsl_type::bvec2_type than
the longwinded glsl_type::get_instance(GLSL_TYPE_BOOL, 2, 1).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Olivier Galibert <galibert@pobox.com>
The hardware supports this format with no known quirks, so we may as
well enable it.
Alpha blending is not supported until Sandybridge, but as far as I can
tell, OpenGL doesn't require alpha blending on SNORM formats. Plus, we
already expose R8G8B8A8_SNORM which has a similar restriction.
Fixes 6 piglit texwrap-2D-*SNORM* cases,
gl-3.1/required-sized-texture-formats, and 10 oglconform snorm-textures
subcases
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
v2: fix tiling for small pitches, that finally makes
glxgears and readPixSanity work
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
The format member of pipe_surface may differ from that of the
pipe_resource, which is used to communicate, for instance, whether
sRGB encode should be enabled in the resolve operation or not.
Fixes resolve to sRGB surfaces in mesa/st when GL_FRAMEBUFFER_SRGB
is disabled.
Reviewed-by: Brian Paul <brianp@vmware.com>
sRGBEnabled should affect both textures and renderbuffers, so we need
to check/update the pipe_surface format for both.
Fixes, for instance, rendering appearing too bright in wine applications
using sRGB multisample renderbuffers.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
Remove the check for pixel transfer ops. If any RGB/depth scale/bias
is in effect, it'll be applied in the glTexImage step.
If drawing stencil pixels we need to disable pixel transfer so that
alpha scale/bias are not applied to the stencil data.
These issues were spotted by Roland.
Fixes Blender performance issues reported in
http://bugs.freedesktop.org/show_bug.cgi?id=47375
NOTE: This is a candidate for the 8.0 branch.
Tested-by: Barto <mister.freeman@laposte.net>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
No functional change. This patch modifies intel_miptree_alloc_mcs to
allocate the 4x MCS buffer using MESA_FORMAT_R8 instead of
MESA_FORMAT_A8. In principle it doesn't matter, since we only access
the buffer using MCS-specific hardware mechanisms, so all that's
important is to use a format with the correct size. However,
MESA_FORMAT_A8 has enough unusual behaviours that it seems prudent to
avoid it.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
It seems reset is not required for setting the max_wm_threads to 80
on gen6 GT2.
Increases performance in the Counter-Strike: Source video stress test
by 7.18% (n=5).
Signed-off-by: Zou Nan hai <nanhai.zou@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Matt Turner <mattst88@gmail.com>
Acked-by: Eric Anholt <eric@anholt.net>
The VCC register is tricky because the SALU views it as 64-bit, but the
VALU views it as 1-bit. In order to deal with this we've added some
special bitcast and binary operations to help convert from the 64-bit
SALU view to the 1-bit VALU view and vice versa.
If you want to change your compiler arguments, just set CFLAGS/CXXFLAGS.
Having Mesa have this separate variable is a great way to have your arguments
not thoroughly propagated to all compiler invocations.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In all current uses, it was appended to CFLAGS, which already had -m32. If
you want to do some other flag supplied to compiler invocations, there's
CFLAGS/CXXFLAGS.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
No functional change. This patch modifies brw_blorp_blit.cpp to use
the ROUND_DOWN_TO macro instead of open-coded bit manipulations, for
clarity.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The emit->key.fkey info is only valid if we're generating a fragment shader.
We should not look at it if we're generating a vertex shader.
When generating a vertex shader, the value of emit->key.fkey.num_textures was
garbage and the loop over num_textures would read invalid data. At best
this would cause us to emit an unused constant. At worse, we could segfault.
Just by dumb luck, fkey.num_textures was usually a smallish integer.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Recently more files were removed from control to be auto-generated
in the dricore library. Android build was not able to locate the
new files if they were not created beforehand.
LOCAL_SRC_FILES includes some of those files and Android.gen.mk
re-defines this variable by filtering out the auto-generated files.
Unfortunately for this variable it is not the same to have the SRCDIR
variable defined as the current directory.
By re-defining SRCDIR for the autotools build the Android build system
is happy again and the new files were actually removed from the sources
to use the auto generated versions.
Also patch d5c1801a01 was partially reverted as the files
can not be compiled to the LOCAL_PATH, instead they should live on the
intermediates folder so that a clean can wipe them out.
v3: [chad] Fix the definition of SRCDIR in libdricore/Makefile.am.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Signed-off-by: Daniel Charles <daniel.charles@intel.com>
XGetImage() will generate a BadMatch error if the source window isn't
visible. When that happens, create a new XImage. Fixes piglit 'select'
test failures with swrast/xlib driver.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Always allocate space for the inverse matrix in _math_matrix_ctr()
since we were always calling _math_matrix_alloc_inv() anyway.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When computing a matrix inverse, if the determinant is too small we could hit
a divide by zero. There's a check to prevent this (we basically give up on
computing the inverse and return the identity matrix.) This patch loosens
this test to fix a lighting bug reported by Lars Henning Wendt.
v2: use abs(det) to handle negative values
NOTE: This is a candidate for the 8.0 branch.
Tested-by: Lars Henning Wendt <lars.henning.wendt@gris.tu-darmstadt.de>
The sendc instruction causes the fragment shader thread to wait for
any dependent threads (i.e. threads rendering to overlapping pixels)
to complete before sending the message. We need to use sendc on the
first render target write in order to guarantee that fragment shader
outputs are written to the render target in the correct order.
Previously, we only used the "sendc" instruction when writing to
binding table index 0. This did the right thing for fragment shaders,
because our fragment shader back-ends always issue their first render
target write to binding table index 0. However, it did the wrong
thing for blorp, which performs its render target writes to binding
table index 1.
A more robust solution is to use sendc for all render target writes.
This should not produce any performance penalty, since after the first
sendc, all of the dependent threads will have completed.
For more information about sendc, see the Ivy Bridge PRM, Vol4 Part3
p218 (sendc - Conditional Send Message), and p54 (TDR Registers).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
A lot of code was still differentiating between between winsys and
user fbos by testing the fbo's name against zero. This converts
everything in the i915 and 965 drivers over to use _mesa_is_user_fbo()
and _mesa_is_winsys_fbo().
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
A lot of code was still differentiating between between winsys and
user fbos by testing the fbo's name against zero. This converts
everything in core mesa, the state tracker, and src/mesa/program over
to use _mesa_is_user_fbo() and _mesa_is_winsys_fbo().
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The OpenGL(R) ES Shading Language
Version 1.00 Revision 17 (12 May, 2009)
> 4.6.1 The Invariant Qualifier
> ... To force all output variables to be invariant, use the pragma
> #pragma STDGL invariant(all)
Signed-off-by: Oliver McFadden <oliver.mcfadden@linux.intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
We already provided these files on 'make install', but only created a
'libglapi.so' in the top-level lib/ convenience folder. We used to
create all three, but at some point in the build system churn, it broke.
Various applications (like the ES2 conformance suite) seem to link
against libglapi.so.0, so without these links, setting LD_LIBRARY_PATH
and LIBGL_DRIVERS_PATH can lead to using /usr/lib/libglapi.so.0 with
/home/whatever/libGL.so, which leads to API calls getting routed
incorrectly (i.e. glCompileShader -> _mesa_LinkProgramARB), which leads
to rage problems.
Preserve developer sanity...install links.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Ever since ctx->NativeIntegers was set, the conversion flag has been
PARAM_NO_CONVERT.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Since osmesa now has been converted to Makefile.am, an appropriate install: rule
is generated to install the shared libary, so we no longer need to do that in
src/mesa/Makefile.old
This leaves nothing in src/mesa/Makefile.old but the tags: rule, so move that to
Makefile.am and remove Makefile.old
Also, nothing now uses OSMESA_LIB_GLOB anymore, so remove it
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Commit 6c6803f28d removed xm_image.[ch], and removed
xm_image.c, but not xm_image.h from the Makefile, this was subsequently carried over
into Makefile.am
Remove xm_image.h from Makfile.am. This allows 'make dist' to succeed, even if it
doesn't do anything useful
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
"Use -no-undefined to assure libtool that the library has no
unresolved symbols at link time, so that libtool will build a shared
library on platforms require that all symbols are resolved when the
library is linked."
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
"Use -no-undefined to assure libtool that the library has no
unresolved symbols at link time, so that libtool will build a shared
library on platforms require that all symbols are resolved when the
library is linked."
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
MCS buffers use 32 bits per pixel in 8x MSAA, and 8 bits per pixel in
4x MSAA. This patch adjusts the format we use to allocate the buffer
so that enough memory is set aside for 8x MSAA.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The code to emit 3DSTATE_SAMPLE_MASK was already correct for 8x
MSAA--this patch just removes an assertion that would have prevented
it from being used for 8x MSAA.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch updates the blorp functions encode_msaa() and decode_msaa()
to properly handle the encoding of IMS MSAA buffers when
num_samples=8.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
When operating in persample dispatch mode, the blorp engine would
previously assume that subspan N always represented sample N (this is
correct assuming 4x MSAA and a 16-wide dispatch). In order to support
8x MSAA, we must compute which sample is associated with each subspan,
using the "Starting Sample Pair Index" field in the thread payload.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When rendering to an IMS MSAA surface on Gen7, blorp sets up the
rendering pipeline as though it were rendering to a single-sampled
surface; accordingly it must adjust the size of the primitive it sends
down the pipeline to account for the interleaving of samples in an IMS
surface.
This patch modifies the size adjustment code to properly handle 8x
MSAA, which makes room for the extra samples by using an interleaving
pattern that is twice as wide as 4x MSAA.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch adds a num_samples argument to the blorp function
manual_blend(), allowing it to be told how many samples need to be
blended together. Previously it assumed 4x MSAA, since that was all
we supported.
We also bump up LOG2_MAX_BLEND_SAMPLES from 2 to 3, so that
manual_blend() will be able to handle 8x MSAA.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When the client program uses glDrawBuffer() or glDrawBuffers() to
select more than one color buffer for drawing into, and then performs
a blit, we need to blit into every single enabled draw buffer.
+2 oglconforms.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=50407
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
This patch rearranges the order of steps performed by a blorp blit
from this:
- Sync up state of window system buffers.
- Find buffers.
- Find miptrees.
- Make sure buffer formats match.
- Handle mirroring.
- Make sure width and height match.
- Handle clipping/scissoring.
- Account for window system origin conventions.
- Do depth resolves, if applicable.
- Do the blit.
- Record the need for a future HiZ resolve, if applicable.
To this:
- Sync up state of window system buffers.
- Handle mirroring.
- Make sure width and height match.
- Handle clipping/scissoring.
- Account for window system origin conventions.
- Find buffers.
- Make sure buffer formats match.
- Find miptrees.
- Do depth resolves, if applicable.
- Do the blit.
- Record the need for a future HiZ resolve, if applicable.
The steps are the same, but they are now performed in an order that
will make it possible to implement correct DrawBuffers support. Note
that the last four steps are now in a separate function
(do_blorp_blit), since they will need to be executed repeatedly when
DrawBuffers support is added.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Previously, the blorp engine would fall back to swrast if the source
or destination of a blit had no associated miptree. This was
unnecessary, since _mesa_BlitFramebufferEXT() already takes care of
making the blit silently succeed if there are no buffers bound, so the
fallback paths could never actually happen in practice.
Removing these fallback paths will simplify the implementation of
correct DrawBuffers support in blorp.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
This patch modifies the order of operations in the blorp engine so
that clipping and scissoring are performed before adjusting the
coordinates to account for the difference in origin convention between
window system buffers and framebuffer objects. Previously, we would
do clipping and scissoring after adjusting for origin conventions, so
we would get scissoring wrong in window system buffers.
Fixes Piglit test "fbo-scissor-blit window".
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
When checking that the source and destination dimensions match, we
don't need to store the width and height in variables; doing so just
risks confusion since right after the check, we do clipping and
scissoring, which may alter the width and height.
No functional change.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
On Gen6, multisampled null render targets don't seem to work
properly--they cause the GPU to hang. So, as a workaround, we render
into a dummy color buffer.
Fortunately this situation (multisampled rendering without a color
buffer) is rare, and we don't have to waste too much memory, because
we can give the workaround buffer a very small pitch.
Fixes piglit test "EXT_framebuffer_multisample/no-color {2,4}
depth-computed *" on Gen6.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
The HW docs say that the width and height of null render targets need
to match the width and height of the corresponding depth and/or
stencil buffers, and that they need to be marked as Y-tiled. Although
leaving these values at 0 doesn't seem to cause any ill effects, it
seems wise to follow the documented requirements.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Previously, we used the number of samples in draw buffer 0 to
determine whether to set up the 3D pipeline for multisampling. Using
the visual is cleaner, and has the benefit of working properly when
there is no color buffer.
Fixes all piglit tests "EXT_framebuffer_multisample/no-color" on Gen7.
On Gen6, the "depth-computed" variants of these tests still fail; this
will be addresed in a later patch.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
This patch ensures that Visual.samples and Visual.sampleBuffers are
set correctly even in the case where there is no color buffer.
Previously, these values would retain their default value of 0 in this
circumstance, even if the depth or stencil buffer was multisampled.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Mesa misses a few checks when compiling on a uclibc system
which cause it to fall back on glibc-ism. This patch
addresses those issues.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Anthony G. Basile <blueness@gentoo.org>
The kernel streamout support was supposed to get into 3.3 along
the tiling change and thus use the same kernel version bump of
2.13 to report userspace that streamout register were supported.
This is not what happen. So as streamout kernel support did not
bump the kernel driver version, rely on kernel 2.14 version bump
to know if streamout is enabled or not. Which means you need at
least 3.4 kernel.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
The error was being set on the non-error path, rather
than the error path.
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
For 'non-legacy' contexts we will want to generate an error
if an uninstalled function is called.
The effect of this change will be that we can avoid installing
legacy functions, and they will then generate an error as
needed for deprecated functions in GL >= 3.1.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Commit 2d4b77c7 (automake: Convert src/mesa/drivers/x11/Makefile to
automake, 2012-06-12) dropped the old Makefile, which used GL_LIB, and
replaced it with a Makefile.am hard-coding the name "GL". This broke
handling of --enable-mangling and --with-gl-lib-name options which
depend on GL_LIB to specify the GL library name.
Use "@GL_LIB@" in src/mesa/drivers/x11/Makefile.am to configure the
library name. Also use this approach to simplify src/glx/Makefile.am
and drop the HAVE_MANGLED_GL conditional. While at it, fix the
compatibility link we create in "lib" for the software-only driver to
use version GL_MAJOR instead of hard-coding "1".
Reviewed-by: Dan Nicholson <dbn.lists@gmail.com>
This fixes the piglit EXT_framebuffer_multisample/bitmap tests.
Note that we must not rely on ctx->DrawBuffer when flushing the cache, because
that's already updated with a new framebuffer. We want to draw into the old
framebuffer where glBitmap was called.
Reviewed-by: Brian Paul <brianp@vmware.com>
Testing shows that the standard JIT engine retrofited with AVX support is quite
stable and as capable to handle AVX instructions as MC-JIT is.
And the old JIT is much more memory efficient, as we don't need to
allocate one engine instance per shader, as we do for MC-JIT due to its
incompleteness.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
When X is running it is neccesary for pipe_loader to authenticate with
DRM, in order to be able to use the device.
This makes it possible to run OpenCL programs while X is running.
v2:
- Fix C++ style comments
- Drop Xlib-xcb dependency
- Close the X connection when done
- Split auth code into separate function
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Calling glDeleteShader() should mark shaders as pending for deletion,
but shouldn't decrement the refcount every time. Otherwise, repeated
glDeleteShader() is not safe.
This is particularly bad since glDeleteProgram() frees shaders: if you
first call glDeleteShader() on the shaders attached to the program (thus
decrementing the refcount), then called glDeleteProgram(), it would try
to free them again (decrementing the refcount another time), causing
a refcount > 0 assertion to fail.
Similar to commit d950a778.
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
- a VRML parsing/display library with "lookat" - an example VRML browser
<li><ahref="http://plib.sourceforge.net/"target="_parent">PLIB</a> - A collection of portable games libraries, including an OpenGL GUI and a simple Scene Graph API
<li><ahref="ftp://ftp.troll.no/contest/Pryan-1.2.tar.gz"target="_parent">Pryan</a> - an OpenInventor-like toolkit
<li><ahref="http://starship.python.net:9673/crew/da/Code/PyOpenGL"target="_parent">PyOpenGL</a> - OpenGL interface for Python
<li><ahref="http://www.quesa.org/"target="_parent">Quesa</a> - QuickDraw3D-compatible library based on OpenGL, Mesa or Direct3D
<li><ahref="http://www.mesa3d.org/brianp/repgl.txt"target="_parent">repGL</a> - IRIS GL emulated with OpenGL
<li><ahref="http://www.scitechsoft.com/dp_mgl.html"target="_parent">SciTech MGL</a> - A multiplatform (Windows, Linux, OS/2, DOS, QNX, SMX, RT-Target & more) graphics library
<li><ahref="http://sgl.sourceforge.net/"target="_parent">SGL</a> - a 3D Scene Graph Library
<li><ahref="http://www.lal.in2p3.fr/SI/SoFree/"target="_parent">SoFree</a> - a free implementation of Open Inventor
<li><ahref="http://togl.sourceforge.net/"target="_parent">Togl</a> - Tcl/Tk widget for OpenGL
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=46644">Bug 46644</a> - Sandybridge Mobile: ARBfp TXP with coords from fragment.color doesn't apply W divide</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=46784">Bug 46784</a> - MAD using multiply written register fails</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=47375">Bug 47375</a> - Blender crash on startup after upgrade to mesa 8.0.1</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=48120">Bug 48120</a> - GL_EXT_texture_sRGB_decode still broken</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=52382">Bug 52382</a> - [ivb gt1] Severe image corruption and GPU Hang, too many PS threads</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=52563">Bug 52563</a> - build failure - struct radeon_renderbuffer has no member named Base</li>
libGLU has been moved into its own repository, found at <ahref="http://cgit.freedesktop.org/mesa/glu/">http://cgit.freedesktop.org/mesa/glu/</a>
</li>
</ul>
</div>
</body>
</html>
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.