Without this, the build fails for me when trying to build from a generated tar
file after running just ./configure. (It's not clear to me why I didn't
encounter similar breakage with previous releases.)
This reverts commit fb3e55f898.
This commit was identified as causing the piglit
glx-create-context-current-no-framebuffer test to crash, (where, previously,
it merely failed without crashing).
Instead of checking width==height in four places, just do it in
_mesa_legal_texture_dimensions() where we do the other width, height,
depth checks. Similarly, move the check that cube map array depth is
a multiple of 6.
This change also fixes some missing cube dimension checks for the
glTexStorage[23]D() functions.
Remove width==height assertion in _mesa_get_tex_max_num_levels() since
that's called before the other size checks for glTexStorage.
Cc: "9.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit fa9c702164)
For a few reasons.
1: In the (current) common case, these conditionals are never true. All
we're doing by checking them is slowing down MakeCurrent. The server
does these checks already anyway.
2: GLX >= 3.0 contexts may legally be made current without a bound
framebuffer.
This does not fix piglit/glx-create-context-current-no-framebuffer, but
is a prerequisite for fixing it.
Cc: "9.1 9.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Adam Jackson <ajax@redhat.com>
(cherry picked from commit e166a58c43)
As we march over the source buffer we're uploading in pieces, we
need to memcpy from the current offset, not the start of the buffer.
Fixes graphical corruption when drawing very large vertex buffers.
Cc: "9.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Matthew McClure <mcclurem@vmware.com>
(cherry picked from commit a50c5f8d24)
We return 0 for GL_NUM_SHADER_BINARY_FORMATS, so
GL_SHADER_BINARY_FORMATS should not write any data to the application
buffer.
Fixes piglit test 'arb_get_program_binary-overrun shader'.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit 0667e2c969)
Mesa provides the wayland-egl libs and the pkgconfig file, but the headers
originate from the wayland package. Ensure everything matches, by requiring
application builds to look at the wayland headers as well.
Signed-off-by: Torsten Duwe <duwe@suse.de>
Signed-off-by: Johannes Obermayr <johannesobermayr@gmx.de>
(cherry picked from commit 3bc642cbf6)
_mesa_meta_begin() sets up an orthographic project and initializes the
viewport based on the current drawbuffer's width and height. This is
likely the window size, since it occurs before the meta operation binds
any temporary buffers.
decompress_texture_image needs the viewport to be the size of the image
it's trying to draw. Otherwise, it may only draw part of the image.
v2: Actually set the projection properly too.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68250
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: Mak Nazecic-Andrlon <owlberteinstein@gmail.com>
(cherry picked from commit 62411681da)
When used with a cube array in VS, failed assertion in ir_validate:
Assignment count of LHS write mask channels enabled not
matching RHS vector size (3 LHS, 4 RHS).
To fix this, swizzle the RHS correctly for the writemask.
This showed up in the ARB_texture_gather tests, which exercise cube
arrays in the VS.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Cc: "9.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 0d7fc10bcd)
The spec doesn't say GL_INVALID_VALUE should be raised for bufSize <= 0.
In any case, memcpy(len < 0) will lead to a crash, so don't allow it.
CC: "9.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
(cherry picked from commit 6659131be3)
The format of the window system framebuffer changed from ARGB8888 to
SARGB8, but we're still supposed to render to it the same as ARGB8888
unless the user flipped the GL_FRAMEBUFFER_SRGB switch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
NOTE: This is a candidate for stable branches.
(cherry picked from commit 48b9720272)
I believe this extension was enabled by accident. As far as I can tell,
there has never been any code in Mesa to actually support it. Not only
that, this extension is only useful in the common-lite profile, and Mesa
does the common profile.
This "fixes" the piglit test oes_matrix_get-api.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "9.1 9.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 3e1fdf3899)
For some reason that I don't yet fully understand, Glaze does not work with
libEGL unless libEGL is linked with -Bsymbolic.[*]
Beyond that specific reason, all of the reasons for which libGL.so is linked
with -Bsymbolic, (see the commit history), should also apply here.
[*] The specific behavior I am seeing is that when Glaze calls dlopen for
libEGL.so, ifunc resolvers within Glaze for EGL functions are called before
the dlopen returns. These resolvers cannot succeed, as they need the return
value from dlopen in order to find the functions to resolve to. I don't know
what's causing these resolvers to be called, but I have verified that linking
libEGL with -Bsymbolic causes this problematic behavior to stop.
CC: "9.1 and 9.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
(cherry picked from commit 9baf35de5c)
In between the two appearances, it was reverted once.
Regardless, the two versions on master are the same, and we've already
cherry-picked one of them, so ignore the second.
Hardware requires the magnitude of the largest component to not exceed
1; brw_cubemap_normalize ensures that this is the case.
Unfortunately, we would previously multiply the array index for cube
arrays by the normalization factor. The incorrect array index would then
cause the sampler to attempt to access either the wrong cube, or memory
outside the cube surface entirely, resulting in garbage rendering or in
the worst case, hangs.
Alter the normalization pass to only multiply the .xyz components.
Fixes broken rendering in the arb_texture_cube_map_array-cubemap piglit,
which was recently adjusted to provoke this behavior.
V2: Fix indent.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Cc: "9.2" mesa-stable@lists.freedesktop.org
Reviewed-by: Eric Anholt <eric@anholt.net>
(cherry picked from commit fe2528c0b6)
The rescale_texcoord(), if it does something, will return just the
GLSL-sized coordinate, leaving out the 3rd and 4th components where we
were storing our projected shadow compare and the texture projector.
Deref the shadow compare before using the shared rescale-the-coordinate
code to fix the problem.
Fixes piglit tex-shadow2drect.shader_test and txp-shadow2drect.shader_test
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=69525
NOTE: This is a candidate for stable branches.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 938956ad52)
It seems a user app can get us into this state, I trigger the fail
running fbo-maxsize inside virgl, it fails to create the backing
storage for the texture object, but then segfaults here when it
should fail the completeness test.
Cc: "9.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 2f508f244e)
Fixes FTBFS on kfreebsd-*
Debian GNU/kFreeBSD doesn't provide getprogname() since it uses stdlib.h
from glibc. Instead it provides program_invocation_short_name from glibc.
You can find the same order in src/mesa/drivers/dri/common/xmlconfig.c
Cc: "9.2" <mesa-stable@lists.freedesktop.org>
Tested-by: Julien Cristau <jcristau@debian.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
(cherry picked from commit 32637f56a5)
Otherwise, coordinates with four components would result in a MOV
with a destination writemask that has no channels enabled:
mov(8) g115<1>.F 0D { align16 WE_normal NoDDChk 1Q };
At best, this is stupid: we emit code that shouldn't do anything.
Worse, it apparently causes GPU hangs (observable with Chris's
textureGather test on CubeArrays.)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: Chris Forbes <chrisf@ijw.co.nz>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 6c3db2167c)
Fixes a bug where if an uniform array is passed to a function the accesses
to the array are not propagated so later all but the first vector of the
uniform array are removed in parcel_out_uniform_storage resulting in
broken shaders and out of bounds access to arrays in
brw::vec4_visitor::pack_uniform_registers.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-and-Tested-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Dominik Behr <dbehr@chromium.org>
(cherry picked from commit 0f6fce1585)
The old code in dri2_glx suffered from a typographical error that caused
the default version to be 2.1 instead of 1.2 (minimum required by the
Linux OpenGL ABI). drisw_glx had a similar error resulting in a default
version of 0.1.
Some driver/card combinations (r200/RV280, i915/915G) don't support
OpenGL 2.1. These create in some corner cases an indirect context
instead of a direct context when calling glXCreateContextAttribsARB().
This happens because of a bad default value. To avoid this, just used
the default value specified by the GLX_ARB_create_context specification:
"The default values for GLX_CONTEXT_MAJOR_VERSION_ARB and
GLX_CONTEXT_MINOR_VERSION_ARB are 1 and 0 respectively. In this
case, implementations will typically return the most recent version
of OpenGL they support which is backwards compatible with OpenGL 1.0
(e.g. 3.0, 3.1 + GL_ARB_compatibility, or 3.2 compatibility
profile)"
Refactor all the default value setting to dri2_convert_glx_attribs, and
make sure the correct defaults are set in that one place.
Signed-off-by: Rico Schüller <kgbricola@web.de>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla http://bugs.winehq.org/show_bug.cgi?id=34238
Cc: "9.1 9.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 8b302e1635)
Changes to the grammar for GL_ARB_shading_language_420pack (commit
6eec502) moved precision qualifiers out of the type_specifier production
chain. This caused declarations such as:
struct S {
lowp float f;
};
to generate parse errors. Section 4.1.8 (Structures) of both the GLSL
ES 1.00 spec and GLSL 1.30 specs says:
"Member declarators may contain precision qualifiers, but may not
contain any other qualifiers."
So, it sure seems like we shouldn't generate a parse error. :)
Instead of type_specifier, use fully_specified_type in struct members.
However, fully_specified_type allows a lot of other qualifiers that are
not allowed on structure members, so expeclitly disallow them.
Note, this makes struct_declaration look an awful lot like
member_declaration (used for interface blocks). We may want to
(somehow) unify these rules to reduce code duplication at some point.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68753
Reported-by: Aras Pranckevicius <aras@unity3d.com>
Cc: Aras Pranckevicius <aras@unity3d.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: "9.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 87252bf97b)
Fixes broken rendering if these MRFs contained anything other than zero.
NOTE: This is a candidate for stable branches.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit f35dea05b1)
Commit b77316ad75
st/dri: always copy new DRI front and back buffers to corresponding MSAA buffers
introduced creating a pipe_context for every call to validate, which is not required
because the callers have a context anyway.
Only exception is egl_g3d_create_pbuffer_from_client_buffer, can someone test if it
still works with NULL passed as context for validate? From examining the code I
believe it does, but I didn't thoroughly test it.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Cc: 9.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit b217d48364)
This aligns the gfx, compute, and dma IBs to 8 DW boundries.
This aligns the the IB to the fetch size of the CP for optimal
performance. Additionally, r6xx hardware requires at least 4
DW alignment to avoid a hw bug. This also aligns the DMA
IBs to 8 DW which is required for the DMA engine. This
alignment is already handled in the gallium driver, but that
patch can be removed now that it's done in the winsys.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
CC: "9.2" <mesa-stable@lists.freedesktop.org>
CC: "9.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit a81beee37e)
It is incorrect to assume that src[0] of a SEND-from-GRF opcode is the
GRF. For example, FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD uses src[1] for
the GRF.
To be safe, loop over all the source registers and mark any GRFs. We
probably won't ever have more than one, but it's simpler to just check
all three rather than attempting to bail early.
Not observed to fix anything yet, but likely to. Parallels the bug fix
in the previous commit, which actually does fix known failures.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit a35b320250)
It is incorrect to assume that src[0] of a SEND-from-GRF opcode is the GRF.
VS_OPCODE_PULL_CONSTANT_LOAD_GEN7 uses an IMM as src[0], and stores the
GRF as src[1].
To be safe, loop over all the source registers and mark any GRFs. We
probably won't ever have more than one, but it's simpler to just check
all three rather than attempting to bail early.
Fixes assertion failures in Unigine Sanctuary since we started making
register allocation rely on split_virtual_grfs working. (The register
classes were actually sufficient, we were just interpreting an IMM as
a virtual GRF number.)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68637
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 4e3d1712a2)
If the app is asking us to do GL_COMPRESSED_RGBA, then the app obviously
doesn't have pre-compressed data to hand us. So don't choose a storage
format that we won't actually be able to compress and store.
Fixes black screen in warzone2100 when libtxc_dxtn is not present. Also
66 piglit tests.
NOTE: This is a candidate for the 9.2 branch.
Reported-by: Paul Wise <pabs@debian.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit bdf3f50e9a)
You should only be flagging the formats as supported if you support them
anyway.
NOTE: This is a candidate for the 9.2 branch. (required for next commit)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit b188467fdf)
GLSL 1.30 doesn't allow precision qualifiers on sampler types,
but in GLSL ES, sampler types are also allowed. This seems like
an oversight (since the intention of including these in GLSL 1.30
is to allow compatibility with ES shaders).
Currently, Mesa allows "default" precision qualifiers to be set for
sampler types in GLSL (commit d5948f2). This patch makes it follow
GLSL ES rules and also allow declaring sampler variables with a
precision qualifier in GLSL 1.30 (and later). e.g.
uniform lowp sampler2D sampler;
This fixes a shader compilation error in Khronos OpenGL conformance
test "depth_texture_mipmap".
V2: Update comments.
Signed-off-by: Ian Romanick <idr@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <idr@lists.freedesktop.org>
Cc: <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 9c0b7be964)
Two callers of brw_search_cache() weren't initializing that function's
inout_offset parameter: brw_blorp_const_color_params::get_wm_prog()
and brw_blorp_const_color_params::get_wm_prog().
That's a benign problem, since the only effect of not initializing
inout_offset prior to calling brw_search_cache() is that the bit
corresponding to cache_id in brw->state.dirty.cache may not be set
reliably. This is ok, since the cache_id's used by
brw_blorp_const_color_params::get_wm_prog() and
brw_blorp_blit_params::get_wm_prog() (BRW_BLORP_CONST_COLOR_PROG and
BRW_BLORP_BLIT_PROG, respectively) correspond to dirty bits that are
not used.
However, failing to initialize this parameter causes valgrind to
complain. So let's go ahead and fix it to reduce valgrind noise.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=66779
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit b8f13fbb85)
glIsQuery is supposed to return false for names returned by glGenQueries
until their first use. BeginQuery is a use, but QueryCounter is also a
use.
From the ARB_timer_query spec:
"A timer query object is created with the command
void QueryCounter(uint id, enum target);
[...] If <id> is an unused query object name, the
name is marked as used [...]"
Fixes Piglit's spec/ARB_timer_query/query-lifetime.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 7950315583)
As of "2f142d59 build: Add --enable-gallium-osmesa flag." the pkgconfig
file from classic osmesa is no longer installed when building gallium
osmesa, so copy it to gallium osmesa and install the copy instead.
CC: "9.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit c811190430)
enums were being converted twice resulting in incorrect values.
The extra conversion has been removed and the redundant assert is
removed also.
Cc: 9.2 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Brian Paul <brianp@vmware.com>
(cherry picked from commit f0072e3c6b)
The previous value of (GLuint64) ~0 has some problems:
GL_MAX_SERVER_WAIT_TIMEOUT is supposed to be a GLuint64 value, but has
to be queried via GetInteger64v(), which returns a GLint64. This means
that some applications are likely to treat it as a signed integer, where
~0 means -1. Negative values are nonsensical and problematic.
When interpreted correctly, ~0 translates to about 0.58 million years,
which seems rather excessive.
This patch changes it to 0x1fff7fffffff, which is about 1.11 years.
This is still plenty long, and is the same as both an int64 and uint64.
Applications that accidentally store it in a 32-bit int/unsigned also
get a non-negative value, which is again the same as both int and
unsigned. This value was suggested by Ian Romanick.
v2: Add the ULL prefix on the constant (suggested by Ian).
Fixes Piglit's spec/!OpenGL 3.2/get-integer-64v.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit a27180d0d8)
The variable means that UBO qualifiers are allowed in a particular
context (e.g., not allowed in a struct field declaration), rather than a
particular set of UBO qualifiers are valid.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
(cherry picked from commit 1a45db9705)
Fixes inconsistent failure of gles2conform/GL2Tests/glUniform/glUniform.test
under gnome-shell. What follows is a description of the bug and its fix.
When intel_update_renderbuffers() allocates a miptree for a winsys
renderbuffer, it propagates the renderbuffer's format to become also the
miptree's format.
If the winsys color buffer format is SARGB, then, in the first call to
eglMakeCurrent, intel_gles3_srgb_workaround() changes the renderbuffer's
format to ARGB. That is, it changes the format from sRGB to non-sRGB.
However, it changes the renderbuffer's format *after*
intel_update_renderbuffers() has allocated the renderbuffer's miptree.
Therefore, when eglMakeCurrent returns, the miptree format (SARGB)
differs from the renderbuffer format (ARGB).
If the X server reallocates the color buffer,
intel_update_renderbuffers() will create a new miptree for the
renderbuffer. The new miptree's format (ARGB) will differ from old
miptree's format (SARGB). This mismatch between old and new miptrees
causes bugs.
Fix the bug by moving intel_gles3_srgb_workaround() to occur *before*
intel_update_renderbuffers().
CC: "9.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=67934
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
(cherry picked from commit ce8639a766)
The Gallium implementation is apparently not ready for regular
consumption, so as much as I hate adding more build-time options, here's
another.
Acked-by: Brian Paul <brianp@vmware.com>
(cherry picked from commit 2f142d596f)
Previously, copy propagation would cause bitcast_f2u(abs(float)) to
be performed in a single step, but the application of source modifiers
(abs, neg) happens after type conversion, leading to incorrect results.
That is, for bitcast_f2u(abs(float)) we would in fact generate code to
do abs(bitcast_f2u(float)).
For example, whereas bitcast_f2u(abs(float)) might result in a register
argument such as
(abs)g2.2<0,1,0>UD
v2: Set interfered = true and break in register_coalesce instead of
returning false.
Reviewed-by: Paul Berry <stereoytpe441@gmail.com>
(cherry picked from commit 9c48ae751a)
Necessary to avoid combining a bitcast and a modifier into a single
operation. Otherwise if safe, the MOV should be removed by
copy-propagation or register coalescing.
With this and the next patch, there are only four changes in shader-db:
all a single extra instruction. The code does something like
mov a.w, -b.x
and copy propagation doesn't work because it only handles no-op
swizzles. Seems acceptable, given the known limitation of our copy
propagation.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Paul Berry <stereoytpe441@gmail.com>
(cherry picked from commit 0ae9ca12a8)
Cc: 9.2 <mesa-stable@lists.freedesktop.org>
Tested-by: Brian Paul <brianp at vmware.com>
Reviewed-by: Brian Paul <brianp at vmware.com>
(cherry picked from commit 63ac68bae3)
The NVIDIA driver doesn't expose them, and piglit's
arb_texture_compression-invalid-formats expects them to not be there.
This, with the previous commit, fixes piglit
arb_texture_compression-invalid-formats.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: "9.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit f53b634807)
This is required by the spec, and it's a bit tricky because the default
precision is scoped. As a result, I'm slightly abusing the symbol
table.
Fixes piglit no-default-float-precision.frag tests and the piglit
default-precision-nested-scope-0[1234].frag tests that are currently on
the piglit mailing list for review.
On IRC I got confirmation from cwabbot that ARM (Mali T6xx and T400)
enforces this requirement and from kusma that NVIDIA (Tegra2) enforces
this requirement. We should be safe from regressing shipping
applications.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "9.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit cabd45773b)
We never noticed this before because we previously didn't enfoce GLSL ES
fragement shader requirements that precision be defined. There may also
have been some interaction here with the addition of
GL_ARB_shading_language_420pack, but it doesn't appear to me that it
added any new bugs (just perhaps uncovered some old ones).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: "9.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 73e2d69792)
The rules were writing files to e.g. util/u_indices_gen.py, but in an
out-of-tree build this directory doesn't exist in the build directory. So,
create the directories just in case.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Ross Burton <ross.burton@intel.com>
(cherry picked from commit 76feef0823)
The LLVM R600 backend currently always uses separate VGPRs for these.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68162
(Centroid interpolation is identical to center interpolation without
multisampling, so the shader hardware was only pre-loading one set of
interpolation coefficients, and the pixel shader code was using
uninitialized values as the centroid interpolation coefficients)
Cc: mesa-stable@lists.freedesktop.org
Tested-by: Laurent Carlier <lordheavym@gmail.com>
(cherry picked from commit be301f707e)
We recently proposed a new syntax for stable-patch nominations such as:
CC: "9.2 and 9.1" <mesa-stable@lists.freedesktop.org>
and this has already appeared in the wild.
So we extend the regular expression to pick this up as well.
(cherry picked from commit c6f3036179)
We recently adopted a new convention that patches can be nominated for the
stable branch by including a line in the commit message as follows:
CC: mesa-stable@lists.freedesktop.org
This is a convenient syntax as "git send-email" will notice this line and
automatically copy the resulting patch email to the mesa-stable mailing list.
Here we extend the regular expression in the get-pick-list.sh script to also
notice this pattern, (as well as the traditional "NOTE: This patch is a
candidate..." form.
(cherry picked from commit 122d8d2f5a)
The first field of a record in a UBO has the aligment of the record
itself.
Fixes piglit vs-struct-pad, fs-struct-pad, and (with the patch posted to
the piglit list that extends the test) layout-std140.
NOTE: The bit of strangeness with the version of visit_field without the
record_type poitner is because that method is pure virtual in the base
class. The original implementation of the class did this to ensure
derived classes remembered to implement that flavor. Now they can
implement either flavor but not both. I don't know a C++ way to enforce
that.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68195
Cc: "9.2 9.1" mesa-stable@lists.freedesktop.org
(cherry picked from commit 574e4843e9)
The outer-most record is passed into the visit_field method for
the first field. In other words, in the following structure:
struct S1 {
vec4 v;
float f;
};
struct S {
S1 s1;
S1 s2;
};
uniform Ubo {
S s;
};
s.s1.v would get record_type = S (because s1.v is the first non-record
field in S), and s.s2.v would get record_type = S1. s.s1.f and s.s2.f
would get record_type = NULL becuase they aren't the first field of
anything.
This new overload isn't used yet, but the next patch will add several
uses.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Cc: "9.2 9.1" mesa-stable@lists.freedesktop.org
(cherry picked from commit 5ac884fd9f)
This patch fixes a case of framebuffer blitting with renderbuffer
as color attachment and GL_LINEAR filter. Meta implementation of
glBlitFrambuffer() converts source color buffer to a texture and
uses it to do the scaled blitting in to destination buffer. Using
the exact source rectangle to create the texture does incorrect
linear filtering along the edges. This patch makes the changes to
extend the texture edges by one pixel in x, y directions. This
ensures correct linear filtering.
It fixes failing piglit fbo-attachments-blit-scaled-linear test.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
CC: "9.2" <mesa-stable@lists.freedesktop.org>
CC: "9.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
(cherry picked from commit d944a6144f)
This is a very well hidden bug found by accident (only the fixed glean
tstencil2 test so far seems to hit it).
We must use new mask with combined s_pass values and orig_mask values
for zpass/zfail stencil ops, otherwise both the sfail op and one of
zpass/zfail op are applied (probably not hit in most tests because
some of the ops tend to be KEEP usually).
Note: this is a candidate for the 9.2 branch.
Reviewed-by: Zack Rusin <zackr@vmware.com>
(cherry picked from commit abdd32dcd5)
Previously we would emit a warning for empty declarations like
float;
We would also emit the same warning for things like
highp float;
However, this second case is most likely the application trying to set
the default precision. This makes the compiler generate a stronger
warning with some suggestion of a fix.
It really seems like this should be an error. I'll bet that 100% of the
time someone writes 'highp float;' the actually meant 'precision highp
float;'. Alas, both AMD and NVIDIA accept this syntax, and the spec
doesn't explicitly forbid it.
This makes piglit's precision-05.vert generate the following warnings:
0:12(11): warning: empty declaration with precision qualifier, to set the default precision, use `precision lowp float;'
0:13(12): warning: empty declaration with precision qualifier, to set the default precision, use `precision mediump int;'
v2: Add { } around a one-line if body and fix a comment. Suggested by
Ken.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "9.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 830f4df993)
Since disabling denorms in draw_vbo() we require the util_cpu_caps to be
initialized there. Hence add another util_cpu_detect() call in
draw_create_context() which should ensure this.
(There is another call in draw_get_option_use_llvm() which only gets called
with x86 (not x86_64) but calling it always there wouldn't help since it most
likely wouldn't get called when compiling without llvm, so leave it alone
there.)
This fixes https://bugs.freedesktop.org/show_bug.cgi?id=66806.
(Because util_cpu_caps wasn't initialized when first calling util_fpstate_get()
hence it returning zero, but it would later get initialized by rtasm translate
code hence when draw call returned it unmasked all exceptions by calling
util_fpstate_set(). This was happening only with DRAW_USE_LLVM=0 or not
compiling with llvm, otherwise the llvm init code was calling it on time too.)
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Zack Rusin <zackr@vmware.com>
Tested-by: Vinson Lee <vlee@freedesktop.org>
GLSL ES does not allow unsized arrays, and GLSL ES 1.00 does not allow
array initializers. However, GLSL ES 3.00 allows array initializers,
and the initializer can explicitly size the array. The specification
even includes some examples of this:
float x[] = float[2] (1.0, 2.0); // declares an array of size 2
float y[] = float[] (1.0, 2.0, 3.0); // declares an array of size 3
float a[5];
float b[] = a;
Move the unsized array check to after the initializer has been
processed. If the array is still unsized, generate the error. This
should have no effect in GLSL ES 1.00 because, as previously mentioned,
array initializers are not allowed.
Fixes piglit "glsl-es-3.00 compiler array-sized-by-initializer.vert".
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "9.1 9.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 42624b1c81)
The functional change is that now invalidate_framebuffer is called if
the texture is actually detached from one of the currently bound FBOs.
Previously this was only done for renderbuffers.
The remaining changes make the texture delete path look more similar to
the renderbuffer delete path. This includes adding relevant spec
quotations to justify the behavior.
Fixes piglit fbo-incomplete "delete texture of bound FBO" test.
v2: Move 'fb->Attachment[i].Texture == att' check from previous patch to
this patch... where it was intended to be in the first place. Noticed
by Chad.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Cc: "9.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit ef83bd2b95)
Also add a return value indicating whether any work was done.
This will be used by the next patch.
v2: Move 'fb->Attachment[i].Texture == att' check to the next
patch... where it was intended to be in the first place. Noticed by
Chad.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Cc: "9.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 438cc6bc49)
libEGL was incorrectly exporting *all* symbols, public and private.
This patch adds -fvisibility=hidden to libEGL's linker flags to ensure
that only symbols annotated with __attribute__((visibility("default")))
get exported.
Sanity-checked with libEGL's builtin DRI2 driver and the i965 DRI driver
by running Piglit on X/EGL and by running weston-gears on Weston as an
X client.
Sanity-checked with libEGL's Gallium driver (which is not built-in) and
the swrast Gallium driver by running es2gears_x11.
Kristian reviewed the symbol diff in `nm libEGL.so`.
CC: "9.2" <mesa-stable@lists.freedesktop.org>
CC: Ian Romanick <idr@freedesktop.org>
Acked-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
(cherry picked from commit 2c2e64edab)
Previously only the slice of a 3D texture was validated in the FBO
completeness check. This fixes the failure in the 'invalid layer of an
array texture' subtest of piglit's fbo-incomplete test.
v2: 1D_ARRAY textures have Depth == 1. Instead, compare against Height.
v3: Handle CUBE_MAP_ARRAY textures too. Noticed by Marek.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "9.1 9.2" mesa-stable@lists.freedesktop.org
(cherry picked from commit 25281fef0f)
This fixes the segfault in the 'invalid slice of 3D texture' and
'invalid layer of an array texture' subtests of piglit's fbo-incomplete
test.
The 'invalid layer of an array texture' subtest still fails.
v2: Fix off-by-one comparison error noticed by Chris Forbes. Also,
1D_ARRAY textures have Depth == 1. Instead, compare against Height.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> [v1]
Cc: "9.1 9.2" mesa-stable@lists.freedesktop.org
(cherry picked from commit 41485fea7c)
Allow user-generated names for glBindFramebufferEXT on desktop GL.
Disallow its use altogether for core profiles.
Names bound with glBindFramebuffer in desktop OpenGL are still
(incorrectly) shared across the share group instead of being
per-context. This gets us a bit closer to being strictly conformant.
v2: Disallow glBindFramebufferEXT in 3.1 by not installing it in the
dispatch table. Suggested by Jordan.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> [v1]
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> [v1]
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 4a9522a5a0)
Allow user-generated names for glBindRenderbufferEXT on desktop GL.
Disallow its use altogether for core profiles.
v2: Disallow glBindRenderbufferEXT in 3.1 by not installing it in the
dispatch table. Suggested by Jordan.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> [v1]
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> [v1]
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 97965e87fc)
GL_EXT_framebuffer_object differs from GL_ARB_framebuffer_object in ways
that we can't and don't implement in core profiles. Exposing it is a
lie, so we shouldn't do that.
It's possible the some other GL_EXT_framebuffer_* extensions should be
disabled, but it's not quite so clear cut.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit b55c1638ad)
vblank_mode is read by dri_util.c and falls under the "dri2" driver name,
which is not connected to the actual Mesa/Gallium driver in any way.
Reviewed-by: Brian Paul <brianp@vmware.com>
(cherry picked from commit 772070527f)
MP_TEMP_SIZE must be aligned to 0x8000, while TEMP_SIZE on NVE4_3D
must be aligned to 0x20000, so perform both alignments to be sure
we allocate enough space (actually the bo will most likely use 128
KiB pages and not aligning to that would be a waste anyway).
Cc: "9.2" mesa-stable@lists.freedesktop.org
(cherry picked from commit ef6d5ee9f3)
YYLEX_PARAM is no longer supported as of Bison 3.0. Instead, the Bison
developers recommend using %lex-param.
%lex-param takes a type and variable name, similar to %parse-param,
so you can't pass an arbitrary expression like state->scanner. But Flex
insists on passing the actual scanner object, not an arbitrary object
like state.
To solve this, the parser defines a wrapper lex() function which accepts
"state," and calls Flex's lex() function with state->scanner.
Fixes the build with Bison 3.0. Also works with Bison 2.7.1.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=67354
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Tested-by: Laurent Carlier <lordheavym@gmail.com>
Cc: "9.2" mesa-stable@lists.freedesktop.org
(cherry picked from commit 6d2a9220b8)
YYLEX_PARAM is no longer supported as of Bison 3.0. Instead, the Bison
developers recommend using %lex-param.
%lex-param takes a type and variable name, similar to %parse-param,
so you can't pass an arbitrary expression like state->scanner. But Flex
insists on passing the actual scanner object, not an arbitrary object
like state.
To solve this, the parser defines a wrapper lex() function which accepts
"state," and calls Flex's lex() function with state->scanner.
Fixes the build with Bison 3.0. Also works with Bison 2.7.1.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=67354
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Tested-by: Laurent Carlier <lordheavym@gmail.com>
Cc: "9.2" mesa-stable@lists.freedesktop.org
(cherry picked from commit f043381334)
I had removed it in commit 1e7776ca2b
because it was obviously wrong -- why do we care whether the server is a
version that emits events, if we're not watching for the server's events,
anyway? And why would you only invalidate on a server that emits
invalidate events, when the comment said to emit invalidates if the server
*doesn't*? Only, I missed that we otherwise don't flag that our buffers
might have changed at swap time at all, so the driver was only checking
for new buffers when triggered by the Viewport hack. Of course you don't
expect Viewport to be called after a swap.
So, this is effectively a revert of the previous commit, except that I
dropped the check for only emitting invalidates on a new server -- we
*always* need to invalidate if we're doing a SwapBuffers.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=63435
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "9.1 and 9.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit eed0a80137)
The code that checks if some texture target is valid for
glGetTexLevelParameter*() was not programmed to check for multisampling
proxy textures. This made it impossible(?) to use the proxy textures
for their intended purpose as glGetTexLevelParameter*() would just fail
on you.
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 8624a514c2)
glTexStorage*() functions make textures immutable. This carries on to
proxy textures. Error checking in texture storage functions prevents
proxy textures from working after first time because internally, they
became immutable.
This commit makes the error checking ignore the immutability flag when
working with proxy textures.
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit e404105e7d)
When working with the glTexStorage*() functions, the error checking
checks that a non-default (i.e., non-zero) texture is currently bound.
However, this check made glTexStorage*() functions fail with proxy
textures when the default texture is bound. Proxy textures do not care
about the current texture bindings so for them this check should not
be done.
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 3f3f66fd94)
The function _mesa_get_tex_max_num_levels() is supposed to calculate
the number of mipmap levels but it was not written to handle proxy
textures, at best returning a maximum of 1 mipmap level. Because of
this, at least glTexStorage*() calls would incorrectly fail when used
with proxy textures with more than one mipmap level.
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit de7e3741eb)
Free all our temporary buffers in one place at the end of the
function. Fixes memory leak detected by Coverity.
Note: This is a candidate for the 9.x branches
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: José Fonseca <jfonseca@vmware.com>
(cherry picked from commit e5f32a0b3a)
PP saves current states to cso_context and then util_blit_pixels does
the same. cso_context doesn't like that and the original state is not
correctly restored.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Brian Paul <brianp@vmware.com>
(cherry picked from commit 4c89ec1f69)
The second 'const' says that the pointer itself is constant. This in
unenforcible in C++, so GCC emits a warning (see) below for each of
these functions in every file that includes glsl_types.h. It's a lot of
warning spam.
../../../src/glsl/glsl_types.h:176:58: warning: type qualifiers ignored on function return type [-Wignored-qualifiers]
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 803f755ede)
If any component used the ZERO or ONE swizzle, its corresponding member
in the `swizzle` array would never be initialized. We *mostly* got away
with this, except when that memory happened to contain a value that
clobbered another channel when combined using BRW_SWIZZLE4().
NOTE: This is a candidate for stable branches.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 124f567f1d)
This fixes the dri2 opening to check if DRI_PRIME is set,
and picks the correct drm device path to open, this along
with a change to libvdpau allows vdpauinfo to work at least,
Martin Peres tested with nouveau, and there seems to be a
further issue with final displaying, it only works sometimes,
but this patch is at least necessary to help debug further.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Christian König <christian.koenig@amd.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=67283
Tested-by: Armin K. <krejzi@email.com>
(cherry picked from commit 19338157c9)
This reverts commit c9db037dc9.
Eric believes that the viewport hacks are still necessary for EGL;
invalidate events aren't hooked up properly.
This commit caused a regression where EFL applications wouldn't show
anything other than window decorations; GLBenchmark also showed issues.
The revert had conflicts due to the intel_context/brw_context merge.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=66606
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 0e9549e2bd)
The is_loop_terminator() function was asserting that the following
kind of if statement could never occur:
if (...) { } else { }
(presumably based on the assumption that such an if statement would be
eliminated by previous optimization stages). But that isn't the
case--it's possible that previous optimization stages might simplify
more complex code down to this empty if statement, in which case it
won't be eliminated until the next time through the optimization loop.
So is_loop_terminator() needs to handle it. Fortunately it's easy to
handle--it's not a loop terminator because it does nothing.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=64330
CC: mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit a5eecb246d)
Some videos specify mb_adaptive_frame_field_flag instead of
field_pic_flag. This implies that the pic height needs to be halved, and
this field needs to be passed to the VP engine.
Cc: "9.2" mesa-stable@lists.freedesktop.org
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
(cherry picked from commit 8edb79f1ef)
Looks like a thinko, "Hey, constant buffers can be at most 64 KiB
in size, offset can't be larger." But it can, of course.
I think piglit lacks a test for UBO and BindBufferRange that
tests if it actually works.
Almost all of the functions between the ARB and the EXT share the same
GLX protocol because the functionality is, essentially, identical.
However, there are some differences between the extensions:
- In the ARB extension, names must come from glGenBuffers.
- In the ARB extension, framebuffer objects are not shared (but they are
in the EXT).
For these reasons, glBindFramebuffer and glBindRenderbuffer have
different GLX protocol opcodes than their EXT counterparts. Currently
these functions alias each other in the dispatch table. This makes it
impossible to be truly spec conformant.
This patch enables fixing the conformance issue by splitting
glBindFramebuffer / glBindFramebufferEXT and glBindRenderbuffer /
glBindRenderbufferEXT into separate dispatch table entries.
Patches will be available shortly to:
- Fix the conformance issue.
- Stop advertising the EXT in OpenGL 3.1 (or core profiles).
HOWEVER, this does represent a compatibility break between the loader
(libGL or the Xserver GLX module) and the driver. Mesa drivers compiled
without this change will request a single dispatch table entry for
glBindFramebuffer and glBindFramebufferEXT. Since the updated loader
has different entries for each, the request will fail, and the driver
will die in a fire.
Drivers built with the change should continue to load fine on loaders
without the change. In this case, the driver will separately ask for
entries for glBindFramebuffer and glBindFramebufferEXT, and the loader
will tell it the same location. Since the loader in the server's GLX
module is not (yet) updated, this should not be a problem. We also do
not advertise the ARB extension from the server, so, again, this should
not be a problem for the server.
HOWEVER, this means that DRI1 drivers (remember mga_dri.so?) will no
longer load with libGL build hereafter. That means this patch will need
to be back ported to the 8.0 branch.
v2 (idr): Added missing GLX protocol opcodes for the EXT functions and
corrected the opcodes for the ARB functions. Updated GLX indirect_api
unit test and dispatch sanity unit test.
Signed-off-by: Tomasz Lis <tomasz.lis@intel.com>
Signed-off-by: Bartosz Zawistowski <bartosz.l.zawistowski@intel.com>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> [v1]
Any driver that supports GLSL 1.30 should be able to handle this
extension, as it's entirely implemented in the GLSL compiler.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
While all the work is in the shared GLSL compiler, this extension
requires GLSL 1.30, which is currently only supported on Gen6+.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
layout(binding = N) is equivalent to calling glUniformBlockBinding(_,N).
This currently only handles the GLSL 1.40 case - no interface names, no
arrays of uniform blocks. This is okay since we don't yet support GLSL
1.50, and don't expose ARB_shading_language_420pack in ES 3.0.
v2: Move into the other function; use binding, not constant_value.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Paul Berry <stereotype441@gmail.com>
Without an instance name, there is no ir_variable representing the
actual uniform block declaration. When the linker goes to set uniform
initializers, it only sees the members as ir_variables; never the block.
So, unfortunately, the members need to know about the binding.
There has to be a better way to do this.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Normally, uniform array variables are initialized by array literals.
That is, val->type->array_elements >= storage->array_elements.
However, samplers are different. Consider a declaration such as:
layout(binding = 5) uniform sampler2D[3];
The initializer value is a single integer (5), while the storage has 3
array elements. The proper behavior here is to increment one for each
element; they should be initialized to 5, 6, and 7.
This patch introduces new code for sampler types which handles both
arrays of samplers and single samplers correctly.
v2: Move into the other function; use binding, not constant_value.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Paul Berry <stereotype441@gmail.com>
Sampler uniforms and uniform blocks do not have a var->constant_value.
Instead, they have an integer var->binding value.
This makes extending set_uniform_initializer() somewhat problematic: it
assumes that there is an ir_constant * which represents the initializer,
and that it's safe to dereference that without any NULL checks.
Instead, this patch creates an analogous function for binding
qualifiers, and calls one or the other as appropriate.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
There is existing code to handle sampler uniform initializers. Prior to
GLSL 4.20's "binding" keyword, sampler uniforms don't have initializers
at all, so this is somewhat surprising.
The existing code is broken into two cases: one where both the variable and
initializer are arrays, and a second where the variable and initializer are
scalars.
The first case should never occur, since array-typed initializers do not
exist for sampler uniforms. Even with the binding keyword, the
initializer is a single integer which represents the texture unit to use
for the first array element.
The second is apparently used for some fixed-function code.
v2: Rewrite the commit message - suggested by Paul.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
All compilation units need to agree on the binding point, if they
specify one at all.
v2: Use binding, not constant_value.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Rather than creating a new "binding" field in ir_variable, we reuse
constant_value since the linker code for handling uniform initializers
uses that.
Since UBOs and samplers can't otherwise have initializers/constant
values, there shouldn't be a conflict.
v2: Propagate the new binding variable around too.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
These are not used yet, but they exist and are copied appropriately.
v2: Add an explicit "int binding" variable rather than reusing
constant_value, as suggested by Paul Berry.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
The "binding" qualifier only applies to UBO blocks and samplers, along
with arrays of those types. (It would also apply to images and atomic
counters, but we don't support those yet.)
This also validates sampler bindings against the maximum number of
texture units, and UBO bindings against the number of uniform buffer
binding points.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Nothing actually uses this yet.
v2: Remove >= 0 checks. They'll be handled in later validation.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
The idea of this code is to disallow layout(...) sections with the
deprecated "varying" or "attribute" keywords, unless a few select
extensions are enabled which allow a more relaxed check.
In order to detect a layout(...) section, the code checks for a number
of layout qualifiers. However, it failed to check for all of them,
which could lead to layout(...) not being detected when it should.
By replacing this with has_layout(), we properly check for all layout
qualifiers, and also guarantees that new qualifiers added in the future
will not be forgotten.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
These were already semi-relaxed, since the storage qualifier rule
already skipped when 420pack was enabled.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The GL_ARB_shading_language_420pack extension/GLSL 4.20 split centroid
off into a new category, "auxiliary storage qualifiers," and allow these
to be placed anywhere in the series. So we have to stop recognizing
"centroid in"/"centroid out"/"centroid varying" in the grammar and get
more creative.
The same approach used before works here, too.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This is necessary for the parser to be able to accept precision
qualifiers not immediately adjacent to the type, such as "const highp
inout float foo".
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Currently, we store precision in ast_type_specifier, rather than
ast_type_qualifier. This works because precision is the last qualifier,
and immediately adjacent to the type.
Default precision statements (such as "precision highp float") are
represented as ast_type_specifier objects, with a boolean to indicate
that it's a default precision statement rather than an ordinary type.
ast_type_specifier::precision will be moving to ast_type_qualifier soon,
in order to support arbitrary qualifier ordering. However, we still
need to store a "this is a precision statement" flag /and/ the default
precision in ast_type_specifier.
This patch changes the boolean into a new field, default_precision.
If default_precision != ast_precision_none, it's a precision statement
with the specified precision. Otherwise, it's an ordinary type.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This makes the complier accept both "const in" and "in const".
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This will make it easy to support both "const in" and "in const", as
required by GLSL 4.20/ARB_shading_language_420pack.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
"Parameter direction qualifier" is a new term I invented just now; it's
not part of any GLSL specification.
This paves the way handling multiple parameter qualifiers, in any order,
as required by GLSL 4.20/ARB_shading_language_420pack.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Most of ast_type_qualifier is simply a bitfield (represented as a
structure of unsigned:1 bits in a union with an unsigned). However, it
also contains ARB_explicit_attrib_location's location/index fields.
In the past, this has worked by simply returning the layout qualifier's
ast_type_qualifier and merging the other bits into it. However, that's
not obvious until you break it by switching $1 and $2.
Using merge_qualifier() copies them appropriately, and also properly
overrides layout qualifiers. It also checks for duplicate qualifiers,
which renders some of the checks in the previous patch unnecessary.
However, those checks provide better error messages, such as "Duplicate
interpolation qualifier", rather than just "duplicate qualifier".
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This makes the compiler accept invariant, storage, layout, and
interpolation qualifiers in any order when ARB_shading_language_420pack
is enabled.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The GL_ARB_shading_language_420pack extension/GLSL 4.20 allow qualifiers
to be specified in (basically) any order. In order to support this, we
can't hardcode the ordering restrictions in the grammar.
This patch alters the grammar to accept invariant, storage, layout, and
interpolation qualifiers in any order, but adds C code to enforce the
ordering requirements. In the 420pack case, we should be able to simply
skip the error checks.
As a bonus, this also lets us generate decent error messages, rather
than Bison's awful "unexpected TOKEN" errors.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
"Auxiliary storage qualifiers" is the new term given to "centroid",
"patch", and "sample" by GLSL 4.20/GL_ARB_shading_language_420pack.
Even though we only support "centroid", it's useful to add this now
so that all auxiliary storage qualifiers get handled in the right places
once they're eventually supported.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This makes it easy to check if any storage qualifiers are set.
"centroid" is not considered a storage qualifier. In the old language
rules, you can't specify "centroid" by itself; it's always "centroid
in", "centroid out", or "centroid varying." So one of the other storage
qualifiers will always be set; there's no need to specifically check for
centroid.
In the new 4.20 rules, centroid is an auxiliary storage qualifier, not a
storage qualifier.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This makes it easy to check if any layout qualifiers are set.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
All four URB packets need to be programmed together in order for the GPU
state to be valid. Putting them in separate BEGIN..ADVANCE blocks is
risky: if we're nearing the end of a batch, the batch could be flushed
inbetween two of the commands, causing the URB programming to be split
into two batchbuffers.
This -might- be okay with hardware contexts, but it offers no advantages
over keeping them together, and has a potential for hangs.
Putting them into a single BEGIN..ADVANCE block ensures they'll be kept
in the same batch, which seems wise.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Change from "not cacheable" to "cacheable" in L3.
Do so for the draw upload path and blorp.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Change from "not cacheable" to "cacheable" in L3.
Do so for the draw upload path and blorp.
In blorp, change only the PS packet, because the VS packet is disabled.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Change from "not cacheable" to "cacheable" in L3.
Do so for the draw upload path and blorp.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Change from "not cacheable" to "cacheable" in L3.
Do so for the draw upload path and blorp.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Some render types, such as floating-point, aren't valid with EGL.
Return NULL in those cases to drop them.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Mark __DRI_ATTRIB_FLOAT_MODE as deprecated, and introduce new flags to
__DRI_ATTRIB_RENDER_TYPE for float modes. Both signed float
(fbconfig_float) and unsigned (packed_float) are introduced. The old
attribute should be set for both float modes.
v2 (idr): Require that the render mode from the DRI attributes matches the
render mode of the config exactly. This is the behavior of the old code.
Signed-off-by: Tomasz Lis <tomasz.lis@intel.com>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Make sure that init_fbconfig_for_chooser sets correct value of
drawableType for visual configs and fbconfigs.
Signed-off-by: Tomasz Lis <tomasz.lis@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Correctly handle the value of renderType in GLX context. In case of the
value being incorrect, context creation fails.
v2 (idr): indirect_create_context is just a memory allocator, so don't
validate the GLX_RENDER_TYPE there. Fixes regressions in several
GLX_ARB_create_context piglit tests.
Signed-off-by: Tomasz Lis <tomasz.lis@intel.com>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
v2 (idr): Open-code the check for GLX_RENDER_TYPE.
dri2_convert_glx_attribs can't be called from here because that function
only exists in direct-rendering builds. Also add a stub version of
indirect_create_context_attribs to tests/fake_glx_screen.cpp to prevent
'make check' regressions.
Signed-off-by: Tomasz Lis <tomasz.lis@intel.com>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Set the correct values of renderType in glXCreateContext and
init_fbconfig_for_chooser.
Signed-off-by: Tomasz Lis <tomasz.lis@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Correctly handle the value of renderType and drawableType in
fbconfig. Modify glXInitializeVisualConfigFromTags to read the parameter
value, or detect it if it's not there.
v2 (idr): If there was no GLX_RENDER_TYPE property, set the type based
purely on the rgbMode as the previous code did. It is impossible for
floatMode to be set at this point, so we can't have a float config. The
previous code regressed a large number of piglit GLX tests because those
tests don't set GLX_RENDER_TYPE in the glXChooseConfig call. Restoring
the old behavior for that case fixes those regressions.
Also fix handling of GLX_DONT_CARE for GLX_RENDER_TYPE. Fixes a
regression in glx-dont-care-mask.
Signed-off-by: Tomasz Lis <tomasz.lis@intel.com>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Make sure that context creation routines are provided with the value of
RENDER_TYPE retrieved from GLX attribs.
v2 (idr): Minor formatting changes. Change type of
dri2_convert_glx_attribs render_type parameter to uint32_t to silence
some GCC warnings.
Signed-off-by: Tomasz Lis <tomasz.lis@intel.com>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Make sure that renderType property value is stored in GLX context while
it's being created. Further patches will be provided to make the value
correspond to fbconfig's renderType.
v2 (idr): Move a hunk from the next patch to this patch to prevent a
build break.
Signed-off-by: Tomasz Lis <tomasz.lis@intel.com>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The L3 controls are identical on all platforms, but LLC differs:
- Ivybridge has a "cache in LLC" flag
- Baytrail has no LLC, but instead has a snoop bit:
"data accesses in this page must be snooped in the CPU caches."
- Haswell has writeback/uncached flags for LLC and eLLC (eDRAM).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
The current gen_matypes logic assumes that the host compiler will produce
information that is useful for the target compiler. Unfortunately, this
is not the case whenever cross-compiling.
When we detect that we're cross-compiling and using GCC, use the target
compiler to produce assembly from the gen_matypes.c source, then process
it with a shell script to create a usable header. This is similar to how
the linux kernel creates its asm-offsets.c file.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
This is required in case a wrapper or symlink is used. This patch
has also been sent upstream, awaiting moderation.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Andreas Oberritter <obi@saftware.de>
Adds the dependencies of builtin_compiler as sources when cross
compiling instead of using libtool to share compilation with src/glsl.
The builtin_compiler executable is built for the host when cross
compiling so it doesn't make sense to share compilation with src/glsl
built for the target in this case.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44618
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Jonathan Liu <net147@gmail.com>
Usually with fixed point renderbuffers clamping is done as part of conversion.
However, since we blend in float format, we essentially skip all conversion
steps pre-blend but since this is still a fixed point renderbuffer we must
still clamp the inputs in this case. Makes no difference for piglit though.
Obviously we could skip this if fragment color clamping is enabled, but a)
this is deprecated in OpenGL (d3d never had it) and b) we don't support it
natively so it gets baked into the shader.
Also add some comment about logic ops being broken for srgb, luckily no test
tries to do that as there's no easy fix...
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Zack Rusin <zackr@vmware.com>
We were fixing up the blend factor to ZERO, however this only works correctly
with fixed point render buffers where the input values are clamped to 0/1
(because src_alpha_saturate is min(As, 1-Ad) so can be negative with unclamped
inputs). Haven't seen any failure anywhere due to that with fixed point SNORM
buffers (which clamp inputs to -1/1) but it should apply there as well (snorm
blending is rare, even opengl 4.3 doesn't require snorm rendertargets at all,
d3d10 requires them but they are not blendable).
Doesn't look like piglit hits this though (some internal testing hits the
float case at least). (With legacy OpenGL we could theoretically still use the
fixup to zero if the fragment color clamp is enabled, but we can't detect that
easily since we don't support native clamping hence it gets baked into the
shader.)
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Zack Rusin <zackr@vmware.com>
Adds H.264 and MPEG2 codec support via VP2, using firmware from the
blob. Acceleration is supported at the bitstream level for H.264 and
IDCT level for MPEG2.
Known issues:
- H.264 interlaced doesn't render properly
- H.264 shows very occasional artifacts on a small fraction of videos
- MPEG2 + VDPAU shows frequent but small artifacts, which aren't there
when using XvMC on the same videos
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Use grep -w instead of the empty string escape sequences
which are less portable. Makes the grep tests
function as intended on OpenBSD.
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Vinson Lee <vlee@freedesktop.org>
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Fixes this build error on OpenBSD 5.3.
In file included from ../../src/mesa/main/ff_fragment_shader.cpp:53:
./../glsl/ir_optimization.h:64: error: comma at end of enumerator list
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Fixes these build errors on OpenBSD 5.3.
In file included from ../../src/mesa/main/errors.h:47,
from ../../src/mesa/main/imports.h:41,
from ../../src/mesa/main/ff_fragment_shader.cpp:32:
../../src/mesa/main/mtypes.h:3286: error: comma at end of enumerator list
../../src/mesa/main/mtypes.h:3296: error: comma at end of enumerator list
../../src/mesa/main/mtypes.h:3303: error: comma at end of enumerator list
../../src/mesa/main/mtypes.h:3356: error: comma at end of enumerator list
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Use "or" instead of "add" (this is a classic select sequence, which at
least newer llvm versions can actually recognize (3.2+?), and the "add"
might prevent that - and we really don't want an add instead of an or with
avx if it isn't recognized (even without avx logic ops might be cheaper)).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Instead of just ignoring the srgb/linear conversions, simply call the
corresponding conversion functions, for all of pack/unpack/fetch,
both for float and unorm8 versions (though some don't make a whole
lot of sense, i.e. unorm8/unorm8 srgb/linear combinations).
Refactored some functions a bit so don't have to duplicate all the code
(there's a slight change for packing dxt1_rgb, as there will now be
always 4 components initialized and sent to the external compression
function so the same code can be used for all, the quite horrid and
ad-hoc interface (by now) should always have worked with that).
Fixes llvmpipe/softpipe piglit texwrap GL_EXT_texture_sRGB-s3tc.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Scheduler/register allocator in r600-sb was developed and optimized
on evergreen (VLIW-5) hardware, so currently it's not optimal for
VLIW-4 chips.
This patch should improve performance on cayman gpus due to better alu
packing, but also it tends to increase register usage, so overall positive
effect on performance has to be proven by real benchmarks yet.
Some results with bfgminer kernel on cayman:
source bytecode: 60 gprs, 3905 alu groups,
sbcl before the patch: 45 gprs, 4088 alu groups,
sbcl with this patch: 55 gprs, 3474 alu groups.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Ex-scalar instructions that became multislot on cayman do replicate result
to all channels - handle them similar to DOT4.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Actually PS doesn't make sense for cayman and isn't even mentioned in
cayman docs, but llvm backend currently uses it in bytecode and, assuming
that hw seems to be mostly ok with it, this will allow sb to parse such
source bytecode correctly.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Every function but the above four uses explicitly sized types for their
src and dst arguments. Even fetch_rgba_{s,u}int follows the convention.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Marek Olšák <maraeo@gmail.com>
MCJIT is the only supported LLVM JIT on AArch64 and ARM (the regular
JIT has bit-rotted badly on ARM and doesn't exist on AArch64.)
Signed-off-by: Kyle McMartin <kyle@redhat.com>
Signed-off-by: Dave Airlie <airlied@gmail.com>
Historically, we indented grammar production rules with a single 8-space
tab, but code inside of blocks used Mesa's 3-space indents.
This meant when editing code, you had to use an 8-space tab for the
first level of indentation, and 3-spaces after that. Unless you
specifically configure your editor to understand this, it will get the
indentation wrong on every single line you touch, which quickly devolves
into a colossal waste of time.
It's also inconsistent with every other file in the entire project.
This patch removes all tabs and moves to a consistent 3-space indent.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
When working on a parser, it's very easy to accidentally introduce
new shift/reduce conflicts. Failing the build guarantees they'll
be noticed and fixed.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
The single remaining shift/reduce conflict was the classic ELSE problem:
292 selection_rest_statement: statement . ELSE statement
293 | statement .
ELSE shift, and go to state 479
ELSE [reduce using rule 293 (selection_rest_statement)]
$default reduce using rule 293 (selection_rest_statement)
The correct behavior here is to shift, which is what happens by default.
However, resolving it explicitly will make it possible to fail the build
on new errors, making them much easier to detect.
The classic way to solve this is to use right associativity:
http://www.gnu.org/software/bison/manual/html_node/Non-Operators.html
Since there is no THEN token in GLSL, we need to fake one. %right THEN
creates a new terminal symbol; the %prec directive says to use the
precedence of that terminal.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
opt_return_value was not initialized if mode != ast_return.
Fixes "Uninitialized pointer field" defect reported by Coverity.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
This should fix missing symbols in a osmesa built against shared glapi
osmesa build. All opengl exports were missing that are defined in the
static glapi, so link against both to fix this.
This is a candidate for the stable series.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=47824
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
We always emit U,V,R coordinates for this message, but the sampler gets
very angry if we pass garbage in the R coordinate for at least some
texture formats.
Fill the remaining coordinates with zero instead.
Fixes broken rendering on GM45 in Source games, and in VDrift.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=65236
NOTE: This is a candidate for stable branches.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
brw_tex_layout.c sets up the align_w/h fields, and has all the
appropriate spec references already.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
The Sandybridge code had a citation for the range of the "Maximum Number
of Threads" field, and the Ivybridge code just mentioned the "BSpec" in
general. That's documented in the obvious place, so people can find it
without a spec reference.
The real value of the comment is to say "we tried zero, and it exploded,
so program it to a valid number even if pixel shading is off."
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Unfortunately, the workaround text never made it into the Sandybridge
PRM, so we still have to refer to the BSpec.
It also wasn't obvious why we needed this workaround at all, since we
don't currently do VS passthrough - but BLORP can turn off the VS.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Sadly, the Ivybridge PRM can't be cited, as it is missing the relevant
text for some reason. However, the Sandybridge PRM has the text Chad
originally quoted, and the modern BSpec has the same text.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
I cut and pasted these comments from the Gen4 code during Ivybridge
enabling, and didn't understand what they meant at the time.
The data cache is NOT the same as the sampler cache on Ivybridge.
The sampler cache has L1 and L2 caches in addition to the L3 cache,
while data port messages to the "data cache" hit L3 directly.
This means that the sampler domain is technically wrong, but we stopped
caring about read/write domains quite a while ago. The kernel just
flushes all the caches at the end of each batchbuffer, and our render to
texture code flushes the sampler caches when necessary.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Presumably, this comment exists to justify the usage of
I915_GEM_DOMAIN_SAMPLER for this relocation. At one point, this was
necessary to ensure that the right flushing was done to keep caches
coherent. These days, the kernel just flushes everything, so I don't
think it matters.
Still, the comment is interesting, so leave it in place.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
The Ivybridge PRM adds new SFIDs and lists them in a different volume
than Sandybridge, so it's worth adding a reference.
I also removed the BSpec reference, as the section it referred to
was moved somewhere, and I couldn't find it. This leaves one Haswell
SFID without a citation, but we can add one once the PRMs are out.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Just use the new conversion functions to do the work. The way it's plugged
in into the blend code is quite hacktastic but follows all the same hacks
as used by packed float format already.
Only support 4x8bit srgb formats (rgba/rgbx plus swizzle), 24bit formats never
worked anyway in the blend code and are thus disabled, and I don't think anyone
is interested in L8/L8A8. Would need even more hacks otherwise.
Unless I'm missing something, this is the last feature except MSAA needed for
OpenGL 3.0, and for OpenGL 3.1 as well I believe.
v2: prettify a bit, use separate function for packing.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The splitting of a draw call into several draw commands was broken, because
the split sometimes took place in the middle of a primitive. The splitting
was supposed to be dealing with the case when there are more indices than
the maximum size of a CS.
This commit throws that code away and uses a real index buffer instead.
https://bugs.freedesktop.org/show_bug.cgi?id=66558
Cc: mesa-stable@lists.freedesktop.org
_mesa_ast_set_aggregate_type walks through declarations initialized with
C-style aggregate initializers and stops when it runs out of LHS
declarations or RHS expressions.
In the example
vec4 v = {{{1, 2, 3, 4}}};
_mesa_ast_set_aggregate_type would not recurse into the subexpressions
(since vec4s do not contain types that can be initialized with an
aggregate initializer) to set their <constructor_type>s. Later in ::hir
we would dereference the NULL pointer and segfault.
If <constructor_type> is NULL in ::hir we know that the LHS and RHS
were unbalanced and the code is illegal.
Arrays, structs, and matrices were unaffected.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Previously, we had a separate function for setting up the built-in
variables for each combination of shader stage and GLSL version
(e.g. generate_110_vs_variables to generate the built-in variables for
GLSL 1.10 vertex shaders). The functions called each other in ad-hoc
ways, leading to unexpected inconsistencies (for example,
generate_120_fs_variables was called for GLSL versions 1.20 and above,
but generate_130_fs_variables was called only for GLSL version 1.30).
In addition, it led to a lot of code duplication, since many varyings
had to be duplicated in both the FS and VS code paths. With the
advent of geometry shaders (and later, tessellation control and
tessellation evaluation shaders), this code duplication was going to
get a lot worse.
So this patch reworks things so that instead of having a separate
function for each shader type and GLSL version, we have a function for
constants, one for uniforms, one for varyings, and one for the special
variables that are specific to each shader type.
In addition, we use a class, builtin_variable_generator, to keep track
of the instruction exec_list, the GLSL parse state, commonly-used
types, and a few other variables, so that we don't have to pass them
around as function arguments. This makes the code a lot more compact.
Where it was feasible to do so without introducing compilation errors,
I've also gone ahead and introduced the variables needed for
{ARB,EXT}_geometry_shader4 style geometry shaders. This patch takes
care of everything except the GS variable gl_VerticesIn, the FS
variable gl_PrimitiveID, and GLSL 1.50 style geometry shader inputs
(using the gl_in interface block). Those remaining features will be
added later.
I've also made a slight nomenclature change: previously we used the
word "deprecated" to refer to variables which are marked in GLSL 1.40
as requiring the ARB_compatibility extension, and are marked in GLSL
1.50 onward as requiring the compatibilty profile. This was
misleading, since not all deprecated variables require the
compatibility profile (for example gl_FragData and gl_FragColor, which
have been deprecated since GLSL 1.30, but do not require the
compatibility profile until GLSL 4.20). We now consistently use the
word "compatibility" to refer to these variables.
This patch doesn't introduce any functional changes (since geometry
shaders haven't been enabled yet).
Reviewed-by: Matt Turner <mattst88@gmail.com>
v2: Rename "typ" -> "type". Add blank line between inline functions
and declarations in builtin_variable_generator class. Use the
standard comment "/* FALLTHROUGH */" for compatibility with static
code analysis tools.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In certain rare cases (such as those involving dereference of a
literal constant array of structs),
flatten_named_interface_blocks_declarations's rvalue visitor may be
invoked on an ir_dereference_record whose variable_referenced() method
returns NULL.
Check for this case to avoid a segfault.
Prevents crashes in piglit tests
{vs,fs}-deref-literal-array-of-structs.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Vertex shader inputs are not allowed to be arrays until GLSL 1.50. We
were accidentally enabling them for GLSL 1.40 (although we haven't
written any tests for them, so it's not clear whether they actually
work).
NOTE: although this is a simple bug fix, it probably isn't sensible to
cherry-pick it to stable release branches, since its only effect is to
cause incorrectly-written shaders to fail to compile.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes undefined results if a back color is written, but the
corresponding front color is not, and only backfacing primitives are
drawn. Results are still undefined if a frontfacing primitive is drawn,
but that's OK.
The other reasonable way to fix this would have been to just pick
the one color slot that was populated, but that dilutes the value of
the tests.
On Gen6+, the fixed function clipper and triangle setup already take
care of this.
Fixes 11 piglits:
spec/glsl-1.10/execution/interpolation/interpolation-none-gl_Back*Color-*
NOTE: This is a candidate for stable branches.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Some lame compilers can't do exp2f() and as far as I can tell they can't do
exp2() (with doubles) neither so instead of providing some workaround for
that (wouldn't actually be too bad just replace with pow) and since it is
used with a constant only just use the precalculated constant.
When only the offset to the index buffer is changed, we can skip the
3DSTATE_INDEX_BUFFER if we always use 0 for the offset, and add
(offset / index_size) to Start Vertex Location in 3DPRIMITIVE.
srgb-to-linear is using 3rd degree polynomial for now which should be _just_
good enough. Reverse is using some rational polynomials and is quite accurate,
though not hooked into llvmpipe's blend code yet and hence unused (untested).
Using a table might also be an option (for srgb-to-linear especially).
This does not enable any new features yet because EXT_texture_srgb was already
supported via util_format fallbacks, but performance was lacking probably due
to the external function call (the table used by the util_format_srgb code may
not be all that much slower on its own).
Some performance figures (taken from modified gloss, replaced both base and
sphere texture to use GL_SRGB instead of GL_RGB, measured on 1Ghz Sandy Bridge,
the numbers aren't terribly accurate):
normal gloss, aos, 8-wide: 47 fps
normal gloss, aos, 4-wide: 48 fps
normal gloss, forced to soa, 8-wide: 48 fps
normal gloss, forced to soa, 4-wide: 47 fps
patched gloss, old code, soa, 8-wide: 21 fps
patched gloss, old code, soa, 4-wide: 24 fps
patched gloss, new code, soa, 8-wide: 41 fps
patched gloss, new code, soa, 4-wide: 38 fps
So there's a performance hit but it seems acceptable, certainly better
than using the fallback.
Note the new code only works for 4x8bit srgb formats, others (L8/L8A8) will
continue to use the old util_format fallback, because I can't be bothered
to write code for formats noone uses anyway (as decoding is done as part of
lp_build_unpack_rgba_soa which can only handle block type width of 32).
Compressed srgb formats should get their own path though eventually (it is
going to be expensive in any case, first decompress, then convert).
No piglit regressions.
v2: use lp_build_polynomial instead of ad-hoc polynomial construction, also
since keeping both linear to srgb functions for now make sure both are
compiled (since they share quite some code just integrate into the same
function).
v3: formatting fixes and bugfix in the complicated (disabled) linear-to-srgb
path.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
We had to disable fast rsqrt before because it wasn't precise enough etc.
However in situations when we know we're not going to need more precision
we can still use a fast rsqrt (which can be several times faster than
the quite expensive sqrt). Hence introduce a new helper which does exactly
that - it is probably not useful calling it in some situations if there's
no fast rsqrt available so make it queryable if it's available too.
v2: use fast_rsqrt consistently instead of rsqrt_fast, fix indentation,
let rsqrt use fast_rsqrt.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
gl_TexCoord was deprecated in GLSL 1.30. In GLSL 1.40 it was marked
as ARB_compatibility-only, and in GLSL 1.50 and above it was marked as
only appearing in the compatibility profile. It has never appeared in
GLSL ES.
However, Mesa erroneously included it in all desktop versions of GLSL,
even versions 1.40 and 1.50 (which do not currently support the
compatibility profile). This patch makes gl_TexCoord available in the
compatibility profile (and GLSL versions 1.30 and prior) only.
NOTE: although this is a simple bug fix, it probably isn't sensible to
cherry-pick it to stable release branches, since its only effect is to
cause incorrectly-written shaders to fail to compile.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The compiler does not know that ilo_3d_pipeline_estimate_size() is pure and
can be eliminated in a release build in gen6_pipeline_end(). Move the call
into the assert().
The AC_CHECK_FILE macro can't be used for cross compiling as it will
result in "error: cannot check for file existence when cross compiling".
Replace it with the AS_IF macro.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Signed-off-by: Jonathan Liu <net147@gmail.com>
GLSL spec says that rsq is undefined for src<=0, but the D3D10
spec says it needs to be a NaN, so lets stop taking an absolute
value of the source which completely breaks that behavior. For
the gl program we can simply insert an extra abs instrunction
which produces the desired behavior there.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
TGSI_OPCODE_KIL and KILP had confusing names. The former was conditional
kill (if any src component < 0). The later was unconditional kill.
At one time KILP was supposed to work with NV-style condition
codes/predicates but we never had that in TGSI.
This patch renames both opcodes:
TGSI_OPCODE_KIL -> KILL_IF (kill if src.xyzw < 0)
TGSI_OPCODE_KILP -> KILL (unconditional kill)
Note: I didn't just transpose the opcode names to help ensure that I
didn't miss updating any code anywhere.
I believe I've updated all the relevant code and comments but I'm
not 100% sure that some drivers had this right in the first place.
For example, the radeon driver might have llvm.AMDGPU.kill and
llvm.AMDGPU.kilp mixed up. Driver authors should review their code.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
KILP is really unconditional fragment kill.
We've had KIL and KILP transposed forever. I'll fix that next.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The code happened to work in the past since the (scalar) src args
effectively always have a swizzle of .xxxx, .yyyy, .zzzz, or .wwww so
whether you grab the X or Y component doesn't really matter. Just
fixing the code to make it look right.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
This update fixes the problem with duplicated typedefs for
GLclampf and GLclampd in the previous version.
It also changes some parameter types for glDebugMessageCallbackARB()
and glTransformFeedbackVaryingsEXT().
Note we should someday update the glapi-gen code so that it
understands void pointer parameters. Currently, the Python code
only understands "GLvoid *" but not "void *". Luckily, the
compilers don't seem to complain about mixing GLvoid and void.
If the size argument isn't a multiple of four, we would have read/
copied uninitialized memory.
Fixes an issue reported by Myles C. Maxfield <myles.maxfield@gmail.com>
They are a non-standard GCC extension that's not widely supported by
other C/C++ compilers.
Use a dynamic array instead.
Trivial. Should fix the MSVC build.
Required by GL_ARB_shading_language_420pack.
Parts based on work done by Todd Previte and Ken Graunke, implementing
basic support for C-style initializers of arrays.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Will be used in a later commit to differentiate between a structure type
declaration and a variable declaration of a struct type. I.e., the
difference between
struct S { float x; }; (is_declaration = true)
and
S s; (is_declaration = false)
Also note that is_declaration = true for
struct S { float x; } s;
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Will be used in a future commit. An ast_type_specifier is stored (rather
than an ast_struct_specifier) with the idea that we may have more
general uses for this in the future. struct names are prefixed with
'#ast.' to avoid collisions with the glsl_types in the symbol table.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The code float a[2] = float[2]( 3.4, 4.2, 5.0 ); previously generated
this:
error: array constructor must have at least 2 parameters
when in fact it requires exactly two.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romainck@intel.com>
libglslcore.la and libglcpp.la that are built with builtin_compiler are also
linked to by drivers not using libdricore. Since there is no public symbol in
them, it is better to mark all symbols hidden.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We mark ARB_uniform_buffer_object as enabled under ES 3 since it
contains that functionality, which tricked the compiler into tokenizing
"row_major".
Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This patch adds support for some math optimizations that are generally
considered unsafe, that's why they are currently disabled for compute
shaders.
GL requirements are less strict, so they are enabled for
for GL shaders by default. In case of any issues with
applications that rely on higher precision than guaranteed by GL,
'sbsafemath' option in R600_DEBUG allows to disable them.
v2 - always set proper src vector size for transformed instructions
- check for clamp modifier in the expr_handler::fold_assoc
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
From BSpec: 3D-Media-GPGPU Engine > 3D Pipeline > Pixel >
Pixel Backend > MCS Buffer for Render Target(s) [DevIVB+]:
[DevHSW:GT3]: Clear rectangle must be aligned to two times
the number of pixels in the table shown below...
Observed no piglit, gles3conform regressions with this patch.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=65744
It seems __builtin_ia32_ldmxcsr is only available on gcc and only when
-msse is used. xmmintrin.h/pmmintrin.h provide portable intrinsics, but
these too are only available with gcc when -msse/-msse3 are set.
scons build always sets -msse on x86 builds, but autotools doesn't seem
to.
We could try to get this working on gcc x86 without -msse by emitting
assembly, but I believe that in this day and age we really should be
building Mesa with -msse and -msse2.
The D3D10 spec is very explicit about treatment of denorm floats and
the behavior is exactly the same for them as it would be for -0 or
+0. This makes our shading code match that behavior, since OpenGL
doesn't care and on a few cpu's it's faster (worst case the same).
Float16 conversions will likely break but we'll fix them in a follow
up commit.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Most functions no longer use intel_context, so this patch additionally
removes the local "intel" variables to avoid compiler warnings.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Chris Forbes <chrisf@ijw.co.nz>
Acked-by: Paul Berry <stereotype441@gmail.com>
Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
Things worked out in the past because both brw and intel share the same
memory address (by virtue of intel being the first member of brw).
However, brw is what actually gets rzalloc'd (brw_context.c:285), so
freeing that seems safer and more obvious.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Chris Forbes <chrisf@ijw.co.nz>
Acked-by: Paul Berry <stereotype441@gmail.com>
Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
This makes brw_context available in every function that used
intel_context. This makes it possible to start migrating fields from
intel_context to brw_context.
Surprisingly, this actually removes some code, as functions that use
OUT_BATCH don't need to declare "intel"; they just use "brw."
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Chris Forbes <chrisf@ijw.co.nz>
Acked-by: Paul Berry <stereotype441@gmail.com>
Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
These files have forward declarations for intel_context. This makes
brw_context available in the same places without further #include
monkeying.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Chris Forbes <chrisf@ijw.co.nz>
Acked-by: Paul Berry <stereotype441@gmail.com>
Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
brw_context.h includes intel_context.h, but additionally makes the
brw_context structure available. Switching this allows us to start
using brw_context in more places.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Chris Forbes <chrisf@ijw.co.nz>
Acked-by: Paul Berry <stereotype441@gmail.com>
Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
brwCreateContext() has a lot of random things to do. Factoring out the
part that initializes ctx->Const values and shader compiler options
makes the main function a bit easier to read.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Chris Forbes <chrisf@ijw.co.nz>
Acked-by: Paul Berry <stereotype441@gmail.com>
Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
Technically, needs_ff_sync was set on Gen5+, but it was only consulted
in the clipper threads and quad/lineloop decomposition code, which are
both Gen4-5 only. So in reality it only identified Ironlake.
The named flag doesn't really clarify things, and seems like overkill.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Chris Forbes <chrisf@ijw.co.nz>
Acked-by: Paul Berry <stereotype441@gmail.com>
Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
perf_debug() doesn't add a newline for you; without this, all the
INTEL_DEBUG=perf output was jumbled together.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Resolves the following gcc warning
opt_flip_matrices.cpp:84:32: warning: unused variable 'deref'
v2: keep the variable, but wrap it in a ifndef NDEBUG block
(suggested by Ian)
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Resolves the following gcc warnings
warning: 'iface_type_name' may be used uninitialized in this function
warning: 'var_mode' may be used uninitialized in this function
Note: The variables are initialised to UNKNOWN and ir_var_auto
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
driver->ProgramStringNotify is only called for ARB programs, fixed
function vertex programs, and ir_to_mesa (which isn't used by the i965
back-end). Therefore, even after geometry shaders are added,
brwProgramStringNotify should only ever be called with a target of
GL_VERTEX_PROGRAM_ARB or GL_FRAGMENT_PROGRAM_ARB.
This patch adds an assertion to clarify that.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It's done automatically for vertex buffers, but not for constant buffers,
textures, and colorbuffers.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
This should increase performance if constant uploads are done with the CP DMA,
because only the cache that needs to be flushed is flushed.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
also flushing any cache in evergreen_emit_cs_shader seems to be superfluous
(we don't flush caches when changing the other shaders either)
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
1. flush SH with read caches
2. add flag for DB flushes
3. add flag for CB flushes
v2: flush all CBs, remove redundant emit_state variable.
v3: Marek: also set the new flags in r600_context_flush, the CP dma functions,
and texture_barrier, and rename them
Signed-off-by: Marek Olšák <maraeo@gmail.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
The winsys should do this, because it measures how much time we spend
in buffer_map doing synchronization, which can be viewed with the gallium
HUD.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
It was wrong, because the offset shouldn't be applied to MSAA depth buffers.
This small cleanup should prevent such issues in the future.
This fixes a lockup in "piglit/fbo-depthstencil default_fb -samples=n".
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Include src0 alpha in the RT write message when using MRT, so it is used
for the alpha test instead of the normal per-RT alpha value.
Fixes broken rendering in Dota2 under Wine [FDO #62647].
No Piglit regressions on Ivybridge.
V2: reuse (and simplify) existing sample_alpha_to_coverage flag in
the FS key, rather than adding another redundant one.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewd-by: Paul Berry <stereotype441@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62647
NOTE: This is a candidate for the stable branches.
The logic for choosing number of lods was bogus.
(The code should ultimately handle the case of only one lod even with multiple
quads but currently can't.)
It is perfectly valid for the swizzle to be bigger than 2. For example the
texel offsets could be
SAMPLE ..., IMM[0].zzz
What is not correct is for chan_index to be bigger than 2.
Trivial.
Shaders need a lot of work still. Basic stuff generally works, so this
is basically just fine for gnome-shell, OA etc at this point.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
The assertion was always broken but the code unused until enabling the
per-element lod code. Fixes piglit texelFetch vs isampler1D and similar
tests (only run with GL 3.0 version override).
d3d10 requires per-pixel lod calculations for explicit lod, lod bias and
explicit derivatives, and we should probably do it for OpenGL too - at least
if they are used from vertex or geometry shaders (so doesn't apply to lod
bias) this doesn't just affect neighboring pixels.
Some code was already there to handle this so fix it up and enable it.
There will no doubt be a performance hit unfortunately, we could do better
if we'd knew we had a real vector shift instruction (with variable shift
count) but this requires AVX2 on x86 (or a AMD Bulldozer family cpu).
Don't do anything for lod bias and explicit derivatives yet, though
no special magic should be needed for them neither.
Likewise, the size query is still broken just the same.
v2: Use information if lod is a (broadcast) scalar or not. The idea would be
to base this on the actual value, for now just pretend it's a scalar in fs
and not a scalar otherwise (so, per-pixel lod is only used in gs/vs but same
code is generated for fs as before).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The semantics for overflow detection are a bit tricky with
indexed rendering. If the base index in the elements array
overflows, then the index of the first element should be used,
if the index with bias overflows then it should be treated
like a normal overflow. Also overflows need to be checked for
in all paths that either the bias, or the starting index location.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The comparison, incorrectly, was greater-than-or-equal to
elt max.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The texture alignment unit functions are called from brw_tex_layout.c,
so it makes sense to put them there. Since the only caller of
intel_get_texture_alignment_unit() is in brw_tex_layout.c, it could be
made into a static function. However, this patch instead simply folds
it into the caller, as it's only two lines anyway.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
intel_miptree_create_layout() calls intel_get_texture_alignment_unit()
and then immediately calls brw_miptree_layout(). There are no other
callers.
intel_get_texture_alignment_unit() populates the miptree's alignment
unit fields, which are used by brw_miptree_layout() to determine where
to place each miplevel. Since brw_miptree_layout() needs those to be
present, it makes sense to have it initialize them as the first step.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
The driver is compiled in C99 mode, so this is not a problem. It's
slighlty tidier.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This uses Doxygen style for the file comments, and generally makes it
more consistent with the rest of the driver.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Now that both 2DArray and Cube layouts are taken care of by helper
functions, it's easy to just call the right function for each
generation. This is a little cleaner than falling through.
This also reworks the comments. Referencing "Volume 1" of the BSpec
isn't very helpful, since that's only available inside Intel, and it
doesn't even use volume numbers. Also, "Ironlake...finally" sounds a
bit strange considering that almost all hardware uses the 2D array
approach. At this point, Gen4 is the only special case.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
maxBatchSize was only ever initialized to BATCH_SZ, and a few places
used BATCH_SZ directly anyway.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
brw_annotate_aub() is the only implementation of this function, so it
makes sense to just call it directly.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
brw_debug_batch() is the only implementation of this function, so it
makes sense to just call it directly.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
brw_render_target_supported() is the only implementation of this
function, so it makes sense to just call it directly.
Rather than adding an #include of brw_wm.h, this patch moves the
prototype to brw_context.h. Prototypes seem to be in rather arbitrary
places at the moment, and either place seems as good as the other.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
brw_is_hiz_depth_format() is the only implementation of this function,
so it makes sense to just call it directly.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
These functions translate GLenum comparison operations into the hardware
enumerations. They should never be passed something other than a GL
comparison operator, or something is very broken.
Assertions seem more appropriate than fprintf.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Both intel_context.h and brw_defines.h have #defines for comparison
functions, stencil ops, blending logic ops, and blending factors.
They're exactly the same values, so it makes sense to pick one.
brw_defines.h is the logical place for this kind of stuff, so this patch
converts intel_state.c to use the set defined there.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
The __DRI_USE_INVALIDATE extension was added in May 11th, 2010 by commit
4258e3a2e1. At this point, it's unlikely that anyone's using the
right mix of new and old components to hit this path. Deleting it
removes an untested code path and cleans up the driver a bit.
Cc: Kristian Høgsberg <krh@bitplanet.net>
Cc: Keith Packard <keithp@keithp.com>
This wasn't called from anywhere; presumably it was used to examine
brw_regs when debugging shader assembly. However, it prints registers
in a different notation than brw_disasm.c which everyone is used
to...which means I doubt anyone will want to use it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Having a header file for a single prototype seems rather excessive.
Plus, the actual function is in brw_clear.c, not intel_clear.c, so
there isn't even the .c/.h filename symmetry one might expect.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
This was only used for BOs backed by system memory on i915. With that
gone, there's nothing that even sets source to non-zero, so this is
purely dead code.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Commit cf31a19300 removed support for BOs
backed by system memory, as it was only useful for i915. However, it
removed a little too much code: intel_bufferobj_buffer() used to call
intel_bufferobj_alloc_buffer(), and after that commit, it didn't.
This led to NULL pointer dereferences in several test cases, such as
es3conform's transform_feedback_state_variables test.
This commit restores the allocation, preserving the original behavior.
It may not be the cleanest approach, but tidying should come later.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=66432
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
This eliminates built-in varyings such as gl_Color, gl_SecondaryColor,
gl_TexCoord, and gl_FogFragCoord if they are unused by the next stage or
not written at all (e.g. gl_TexCoord elements). The gl_TexCoord array is
broken down into separate vec4s if needed.
v2: - use a switch statement in varying_info_visitor::visit(ir_variable*)
- use snprintf
- disable the optimization for GLES2
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This ensures that inter-shader outputs and inputs are properly eliminated
across 3 or more shader stages. The behavior is unchanged with 2 or less
shader stages.
For example, elimination of unused FS inputs causes elimination of matching
GS outputs, which causes elimination of the GS inputs that were needed for
evaluation of the eliminated GS outputs, which causes elimination of
matching VS outputs. An unused FS input is all that's needed to trigger
this chain reaction.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
See my explanation in mtypes.h.
v2: don't do this in gallium
v3: also updated the comment at the gl_shader_type definition
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This patch adds texture() for isamplerCubeArray and usamplerCubeArray,
which were entirely missing.
It also makes texture() with a LOD bias fragment shader specific. The
main GLSL specification explicitly says that texturing with LOD bias
should not be allowed for vertex shaders.
Affects Piglit's ARB_texture_cube_map_array/compiler/tex_bias-01.vert.
which tries to use bias in a vertex shader. Currently, it expects this
to pass (so this patch regresses the test), but I've sent a patch to
reverse the expected behavior (so this patch would fix the updated test):
http://lists.freedesktop.org/archives/piglit/2013-June/006123.html
NOTE: This is a candidate for stable branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
If reg->Register.Indirect is true then the immediate is not truly a
constant LLVM expression.
There is no performance regression in using LLVMBuildBitCast, as it will
fallback to LLVMConstBitCast internally when the argument is a constant.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Zack Rusin <zackr@vmware.com>
Current implementation of ext_framebuffer_multisample_blit_scaled in
i965/blorp uses nearest filtering for multisample scaled blits. Using
nearest filtering produces blocky artifacts and negates the benefits
of MSAA. That is the reason why extension was not enabled on i965.
This patch implements the bilinear filtering of samples in blorp engine.
Images generated with this patch are free from blocky artifacts and show
big improvement in visual quality.
Observed no piglit and gles3 regressions.
V3:
- Algorithm used for filtering assumes a rectangular grid of samples
roughly corresponding to sample locations.
- Test the boundary conditions on the edges of texture.
V4:
- Clip texcoords and use conditional MOVs.
- Send texture dimensions as push constants.
- Remove the optimization in case of scaled multisample blits.
V5:
- Move mcs_fetch() inside the 'for' loop after computing pixel coordinates.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Acked-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
We were incorrectly computing the buffer offset when using the
instances. The buffer offset is always equal to:
start_instance * stride + (instance_num / instance_divisor) *
stride
We were completely ignoring the start instance quite
often producing instances that completely wrong, e.g. if
start instance = 5, instance divisor = 2, then on the first
iteration it should be:
5 * stride, not (5/2) * stride as we'd have currently, and if
start instance = 1, instance divisor = 3, then on the first
iteration it should be:
1 * stride, not 0 as we'd have.
This fixes it and adjusts all the code to the changes.
Signed-off-by: Zack Rusin <zackr@vmware.com>
clipper invocations are computed earlier (of course
before the emittion) so this code was adding bogus
numbers to already computed clipper invocations.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Integers could easily overflow is the starting instance
was large enough. Instead of letting bogus counts through
set the instance to max if it overflown and let our
regular buffer overflow computation handle it.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Our buffer overflow arithmetic was susceptible to integer
overflows which was the buffer overflow logic to break.
Lets use the llvm overflow intrinsics to check for integer
overflows while computing the stride/needed buffer size.
Signed-off-by: Zack Rusin <zackr@vmware.com>
We weren't taking into account the size of element
that is to be fetched, which meant that it was possible
to overflow the buffer reads if the stride was very
close to the end of the buffer, e.g. stride = 3, buffer
size = 4, and the element to be read = 4. This should
be properly detected as an overflow.
Signed-off-by: Zack Rusin <zackr@vmware.com>
In the generic Unix case use the "unsigned long" type instead of 32-bit
integers so that the type sizes are consistant on 64-bit machines between X11
and not-X11.
Signed-off-by: Ross Burton <ross.burton@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
eglplatform.h defaults to X11 on Unix unless told otherwise, so if we're doing a
build without any X11 support tell it so that we don't try including headers
that don't exist.
Also set GL_PC_FLAGS so that the definition is in egl.pc, so that applications
using EGL don't try to pull in X11 headers on systems where EGL was configured
without X11 support.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=64959
Signed-off-by: Ross Burton <ross.burton@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The only reason the checks existed were paranoia, when I first
wrote the code I wasn't sure it was correct. Now that I am,
the asserts triggered when XBMC was dropping frames, so remove it.
NOTE: This is a candidate for the 9.1 branch.
The assembly parser can be used to load r300 assembly dumps
and run them through any of the r300 compiler passes.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Allows MSAA colorbuffers, which have a CMASK automatically and don't
need any further special handling, to be fast cleared. Instead
of clearing the buffer, set the clear color and the CMASK to the
cleared state.
Fast clear is used only when all bound colorbuffers fulfill certain
conditions: a CMASK is required, we have to be able to create a clear
color value for the format and the texture mustn't contain multiple
images. Technically, it should be possible to support array textures
and cubemaps if all images are attached to the framebuffer,
but this does not appear to be common.
v2: fix fast clear check
v3: Marek: - disable fast clear with 128-bit formats, which are unsupported
- set tex->dirty_level_mask in r600_clear, so that the driver knows
the resource must be decompressed/expanded
- return early from r600_clear if there's nothing else to do
Signed-off-by: Marek Olšák <maraeo@gmail.com>
b04a295a4a removed seemingly unnecessary
code in get_query. Turns out this code could in fact be reached - while
timestamps are always binned, if there are no bins (which happens if fb
size is 0) then the rasterization query code filling this in is still
never executed.
So fix this up by filling in some timestamp, but do it at EndQuery time
not GetQuery time which should be more appropriate.
Makes piglit arb_timer_query-timestamp-get happy again.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
While i915 does have hardware contexts in hardware, we don't expect there
to ever be SW support for it (given that support hasn't even made it back
to gen5 or gen4).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Every driver left in Mesa that enables one also enables the other.
There's no reason to let it be optional.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
In Mesa, this extension is implemented purely in software. Drivers may
*optionally* provide optimized paths. If a driver enables,
GL_ARB_texture_multisample, it gets GL_ARB_texture_storage_multisample
for free.
NOTE: This has the side effect of enabling the extension in Gallium
drivers that enable GL_ARB_texture_multisample.
v2 (Ken): Still prevent multisample texture targets in TexParameter for
implementations that don't support multisampling.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
In Mesa, this extension is implemented purely in software. Drivers may
*optionally* provide optimized paths.
NOTE: This has the side effect of enabling the extension in the radeon,
r200, and nouveau drivers.
v2: Minor whitespace tidying (suggested by Brian).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
This extension just provides some of the most basic software framework
for GLSL. Without GL_ARB_vertex_shader or GL_ARB_fragment_shader,
applications still cannot use GLSL. There's no value in
conditionalizing support for this extension.
NOTE: This has the side effect of enabling the extension in the radeon,
r200, and nouveau drivers.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
This extension just provides some of the most basic software framework
for GLSL. Without GL_ARB_vertex_shader or GL_ARB_fragment_shader,
applications still cannot use GLSL. There's no value in
conditionalizing support for this extension.
NOTE: This has the side effect of enabling the extension in the radeon,
r200, and nouveau drivers.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Every driver left in Mesa enables this extension all the time. There's
no reason to let it be optional.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Every driver left in Mesa enables this extension all the time. There's
no reason to let it be optional.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Every driver left in Mesa enables this extension all the time. There's
no reason to let it be optional.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Every driver left in Mesa enables this extension all the time. There's
no reason to let it be optional.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Commit bab755a made the implementation a no-op, and it was only ever
enabled by software rasterizers.
v2: Move the spec into docs/specs/OLD since it's now obsolete
(squashed patch from Andreas Boll)
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
_mesa_enable_sw_extensions enables all the extensions (and more) that
the others enable. Also, don't duplicate the DXTn checks.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
This copy of the source file is only used for GEN >= 4, so extensions
that are enabled for GEN >= 4 are always enabled.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This copy of the source file is only used for GEN >= 4, so extensions
that are enabled for GEN >= 3 are always enabled.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This copy of the source file is only used for GEN <= 3, so remove the
dead code.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
brw_wm_surface_state.c has gotten rather large and unwieldy. At this
point, it consists of two separate portions:
1. Surface format code
This includes the giant table of surface formats and what features
they support on each generation, as well as the code to translate
between Mesa formats and hardware formats.
This is used across all generations.
2. Binding table (SURFACE_STATE) related code.
This is the code to generate SURFACE_STATE entries for renderbuffers,
textures, transform feedback buffers, constant buffers, and so on, as
well as the code to assemble them into binding tables.
This is only used on Gen4-6; gen7_surface_state.c has Gen7+ code.
Since the two are logically separate, and one is reused on every
generation while the other is not, it makes a lot of sense to split
them out. It should also make finding code easier.
No code is changed by this patch. I simply copied the file then deleted
portions of both.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
On CIK, DB switches back to using per-surface tiling
parameters rather than the tile index used on SI.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Assertions are not sufficient to check for null pointers as they don't
show up in release builds. So, return ZeroVec/dummyReg instead of NULL
pointer in get_{src,dst}_register_pointer(). This should calm down the
warnings from static analysis tool.
Note: This is a candidate for the 9.1 branch.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
If allocation fails in intel_miptree_create_layout(), don't proceed to
dereference the miptree. Return an early NULL.
Fixes static analysis error reported by Klocwork.
Note: This is a candidate for the 9.1 branch.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
This was just ignored (unless for some reason like unfilled polys draw was
handling this).
I'm not convinced of that code, putting the float for the clamp in the key
isn't really a good idea. Then again the other floats for depth bias are
already in there too anyway (should probably have a jit_context for the
setup function), so this is just a quick fix.
Also, the "minimum resolvable depth difference" used isn't really right as it
should be calculated according to the z values of the current primitive
and not be a constant (of course, this only makes a difference for float
depth buffers), at least for d3d10, so depth biasing is still not quite right.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
If there are queries active the opaque optimization reseting the bin needs to
be disabled.
(Not really tested since the bug was discovered by code inspection not
an actual test failure.)
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This patch fixes segfaults observed when enabling the post processing
features. When the format is not supported, or a texture cannot be
created, the code must gracefully handle failure and report the error to
the calling code for proper failure handling.
To accomplish this the following changes were made to the filters.h
prototypes:
- bool return for pp_init_func
- Added pp_free_func for filter specific resource destruction
Fixes segfaults from backtraces:
* util_destroy_blit
pp_free
* u_transfer_inline_write_vtbl
pp_jimenezmlaa_init_run
pp_init
This patch also uses tgsi_alloc_tokens to allocate temporary tokens in
pp_tgsi_to_state, instead of allocating the array on the stack. This
fixes the following stack corruption segfault in pp_run.c:
* _int_free
aaline_delete_fs_state
pp_free
Bug Number: 1021843
Reviewed-by: Brian Paul <brianp@vmware.com>
OpenGL doesn't support this but d3d10 does.
It is a bit of a pain as it is necessary to keep track of queries
still active at the end of a scene, which is also why I cheat a bit
and limit the amount of simultaneously active queries to (arbitrary)
16 (simplifies things because don't have to deal with a real list
that way). I can't think of a reason why you'd really want large
numbers of overlapping/nested queries so it is hopefully fine.
(This only affects queries which need to be binned.)
v2: don't copy remainder of array when deleting an entry simply replace
the deleted entry with the last one (order doesn't matter).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Previously lp_rast_begin_query commands were always inserted into each bin,
and re-issued if the scene was restarted, while lp_rast_end_query commands
were executed for each still active query at the end of tile rasterization.
Also, the ps_invocations and vis_counter were set to zero when the respective
command was encountered.
This however cannot work for multiple queries of the same type (note that
occlusion counter and occlusion predicate while different type were also
affected).
So, change the logic to always set the ps_invocations and vis_counter to zero
at the start of tile rasterization, and then use "start" and "end" per-thread
query values when encountering the begin/end query commands instead, which
should work for multiple queries of the same type. This also means queries do
not have to be reissued in a new scene, however they still need to be finished
at end of tile rasterization, so a list of queries still active at the end of
a scene needs to be maintained.
Also while here don't bin the queries which don't do anything in rasterization.
(This change does not actually handle multiple queries of the same type yet,
as the list of active queries is just a simple fixed array and setup can still
only have one query active per type.)
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Now that i915's forked off, they don't need to live in a shared directory.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Chad Versace <chad.versace@linux.intel.com>
Acked-by: Adam Jackson <ajax@redhat.com>
(and I hear second hand that idr is OK with it, too)
Of this 15000 lines of code in intel/, we've identified 4000 lines that
are trivially unnecessary for i915, and another 1000 that are pointless for
i965, and expect to find more as time goes on. Split the i915 driver off,
so that we can continue active development on i965 without worrying about
breaking i915.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Chad Versace <chad.versace@linux.intel.com>
Acked-by: Adam Jackson <ajax@redhat.com>
(and I hear second hand that idr is OK with it, too)
This has the (intended!) side effect that vertex shader inputs and
fragment shader outputs will appear in the IR in the same order that
they appeared in the shader code. This results in the locations being
assigned in the declared order. Many (arguably buggy) applications
depend on this behavior, and it matches what nearly all other drivers
do.
Fixes the (new) piglit test attrib-assignments.
NOTE: This is a candidate for stable release branches (and requires the
previous commit to prevent a regression in OpenGL ES 2.0 conformance
test stencil_plane_operation).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
The checks to determine when the data can be uploaded in an interleaved
fashion can be tricked by certain data layouts. For example,
float data[...];
glVertexAttribPointer(0, 4, GL_FLOAT, GL_FALSE, 16, &data[0]);
glVertexAttribPointer(1, 4, GL_FLOAT, GL_FALSE, 16, &data[4]);
glDrawArrays(GL_POINTS, 0, 1);
will hit the interleaved path with an incorrect size (16 bytes instead
of 32 bytes). As a result, the data for attribute 1 never gets
uploaded. The single element draw case is the only sensible case I can
think of for non-interleaved-that-looks-like-interleaved data, but there
may be others as well.
To fix this, make sure that the end of the element in the array being
checked is within the stride "window." Previously the code would check
that the begining of the element was within the window.
NOTE: This is a candidate for stable branches.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
The 20130624 version of glext.h changed this to match the
glMultiDrawElements() function which already had the extra const
qualifier.
Fixes warnings/errors that seem to vary from one compiler to the next.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
vec4_visitor::generate_code() switches on vec4_instruction::opcode and
calls into the brw_eu_emit.c layer to generate code for some of them.
It then has a default case which calls generate_vec4_instruction() to
handle the rest...which switches on opcode and handles the rest of the
cases.
The split apparently is that generate_code() handles the actual hardware
opcodes (BRW_OPCODE_*) while generate_vec4_instruction() handles the
virtual opcodes (SHADER_OPCODE_* and VS_OPCODE_*). But this looks
fairly arbitrary, and it makes more sense to combine the two switches.
This patch moves the cases from generate_code() into the helper function
so that generate_code() isn't as large.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Commit 526ffdfc03 attempted to generalize
the source register type assertions to allow D and UD. However, the
src1 and src2 assertions actually checked src0.type against D and UD due
to a copy and paste bug.
It also began setting the source and destination register types based on
dest.type, ignoring src0/src1/src2.type completely. BFE and BFI2 may
actually pass mixed D/UD types and expect them to be ignored, which is
arguably a bit sloppy, but not too crazy either.
This patch simply removes the source register assertions as those values
aren't used anyway. It also clarifies the comment above the block that
sets the register types.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Commit 526ffdfc03 relaxed the type
assertions in brw_alu3 to allow D/UD types (required by BFE and BFI2).
This lost us the strict type checking for MAD and LRP, which require
all four types to be float.
This patch adds a new ALU3F wrapper which checks these once again.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Over the last few years, the compiler has grown to support 7 different
language versions and 6 extensions that add new built-in types. With
more and more features being added, some of our core code has devolved
into an unmaintainable spaghetti of sorts.
A few problems with the old code:
1. Built-in types are declared...where exactly?
The types in builtin_types.h were organized in arrays by the language
version or extension they were introduced in. It's factored out to
avoid duplicates---every type only exists in one array. But that
means that sampler1D is declared in 110, sampler2D is in core types,
sampler3D is a unique global not in a list...and so on.
2. Spaghetti call-chains with weird parameters:
generate_300ES_types calls generate_130_types which calls
generate_120_types and generate_EXT_texture_array_types, which calls
generate_110_types, which calls generate_100ES_types...and more
Except that ES doesn't want 1D types, so we have a skip_1d parameter.
add_deprecated also falls into this category.
3. Missing type accessors.
Common types have convenience pointers (like glsl_type::vec4_type),
but others may not be accessible at all without a symbol table (for
example, sampler types).
4. Global variable declarations in a header file?
#include "builtin_types.h" in two C++ files would break the build.
The new code addresses these problems. All built-in types are declared
together in a single table, independent of when they were introduced.
The macro that declares a new built-in type also creates a convenience
pointer, so every type is available and it won't get out of sync.
The code to populate a symbol table with the appropriate types for a
particular language version and set of extensions is now a single
table-driven function. The table lists the type name and GL/ES versions
when it was introduced (similar to how the lexer handles reserved
words). A single loop adds types based on the language version.
Explicit extension checks then add additional types. If they were
already added based on the language version, glsl_symbol_table simply
ignores the request to add them a second time, meaning we don't need
to worry about duplicates and can simply list types where they belong.
v2: Mark uvecs and shadow samplers as ES3 only, and 1DArrayShadow as
unsupported in ES entirely. Add a touch more doxygen.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Using a random glsl_type convenience pointer as an array is a really bad
idea, for all the reasons mentioned in the previous commit.
The new glsl_type::bvec() function is simpler anyway.
Prevents breakage in the next commit.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Currently, vector types are linked together closely: the glsl_type
objects for float, vec2, vec3, and vec4 are all elements of the same
array, in that exact order. This makes it possible to obtain vector
types via pointer arithmetic on the scalar type's convenience pointer.
For example, float_type + (3 - 1) = vec3.
However, relying on this is extremely fragile. There's no particular
reason the underlying type objects need to be stored in an array. They
could be individual class members, possibly with padding between them.
Then the pointer arithmetic would break, and we'd get bad pointers to
non-heap allocated data, causing subtle breakage that can't be detected
by valgrind. Cue insanity.
Or someone could simply reorder the type variables, causing us to get
the wrong type entirely. Also cue insanity.
Writing this explicitly is much safer. With the new helper functions,
it's a bit less code even.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This patch introduces new functions to quickly grab a pointer to a
vector type. For example:
glsl_type::bvec(4) returns glsl_type::bvec4_type
glsl_type::ivec(3) returns glsl_type::ivec3_type
glsl_type::uvec(2) returns glsl_type::uvec2_type
glsl_type::vec(1) returns glsl_type::float_type
This is less wordy than glsl_type::get_instance(GLSL_TYPE_BOOL, 4, 1),
which can help avoid extra word wrapping.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
In glapi_priv.h we always need the typedef for the GLclampx type
since GL_OES_fixed_point is now defined in glext.h but the
GLclampx type is not. GLclampx is not used by anything in glext.h
but we need it for GL ES dispatch.
This is a huge patch because the structure of the file has been
changed.
The following extensions are new, however:
GL_AMD_interleaved_elements
GL_AMD_shader_trinary_minmax
GL_IBM_static_data
GL_INTEL_map_texture
GL_NV_compute_program5
GL_NV_deep_texture3D
GL_NV_draw_texture
GL_NV_shader_atomic_counters
GL_NV_shader_storage_buffer_object
GL_NVX_conditional_render
GL_OES_byte_coordinates
GL_OES_compressed_paletted_texture
GL_OES_fixed_point
GL_OES_query_matrix
GL_OES_single_precision
And these extensions were removed:
GL_FfdMaskSGIX
GL_INGR_palette_buffer
GL_INTEL_texture_scissor
GL_SGI_depth_pass_instrument
GL_SGIX_fog_scale
GL_SGIX_impact_pixel_texture
GL_SGIX_texture_select
Reviewed-by: José Fonseca <jfonseca@vmware.com>
This prevents trampling beyond the end of the command stream during flushes.
NOTE: This is a candidate for the stable branches.
Reported-by: Christoph Bumiller <christoph.bumiller@speed.at>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
max_threads cannot be greater than 28. It is either 21 or 28.
Fixes "Logically dead code" defect reported by Coverity.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
We want to access the user buffer, if available, when primitive restart is
enabled and the restart index/primitive type is not natively supported.
And since we are handling index buffer uploads in the driver with this change,
we can also work around misalignment of index buffer offsets.
Rename ilo_finalize_states() to ilo_finalize_3d_states(), and bind
pipe_draw_info to the context when it is called. This saves us from having to
pass pipe_draw_info around in several places.
The polygon offset math used for triangles by the WM is "OffsetUnits * 2 *
MRD + OffsetFactor * m" where 'MRD' is the minimum resolvable difference
for the depth buffer (~1/(1<<16) or ~1/(1<<24)), 'm' is the approximated
slope from the GL spec, and '2' is this magic number from the original
i965 code dump that we deviate from the GL spec by because "it makes glean
work" (except that it doesn't, because of some hilarity with 0.5 *
approximately 2.0 != 1.0. go glean!).
This clipper code for unfilled polygons, on the other hand, was doing
"OffsetUnits * garbage + OffsetFactor * m", where garbage was MRD in the
case of 16-bit depth visual (regardless the FBO's depth resolution), or
128 * MRD for 24-bit depth visual.
This change just makes the unfilled polygons behavior match the WM's
filled polygons behavior.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
There's no reason to care about the window system visual's depth for
handling polygon offset in an FBO, and it could only lead to pain.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The separate function for the fallback checks wasn't particularly
clarifying things, so I put the improved checks in the caller. (Note that
the dropped _mesa_update_state() had already happened once at the start of
the caller)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I think we've all added instrumentation at one point or another to see
what's being called in blorp. Now you can quickly get output like:
Testing glCopyPixels(depth).
intel_hiz_exec depth clear to mt 0x16d9160 level 0 layer 0
intel_hiz_exec depth resolve to mt 0x16d9160 level 0 layer 0
intel_hiz_exec hiz ambiguate to mt 0x16d9160 level 0 layer 0
intel_hiz_exec depth resolve to mt 0x16d9160 level 0 layer 0
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Commit 551c991606 tried to avoid spilling
registers that were trivially colorable. But since we do optimistic
coloring, the top of the stack also contains nodes that are not trivially
colorable, so we need to consider them for spilling (since they are some
of our best candidates).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=58384
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=63674
NOTE: This is a candidate for the 9.1 branch.
It should never happen, but it does, and at this point, you're going to
_mesa_problem() and abort() (unless it's just in precompile). Give the
developer something to look at.
Some shells does not set variables sequentially in a statement i.e. "a=X
b=${a}" won't set "b" to "X" but empty value.
This patch introduce ";" to make sure "mo" is set properly before "lang"
assignment.
Bugzilla: https://bugs.gentoo.org/show_bug.cgi?id=471302
The last piece of code with an effect was flagging _NEW_BUFFERS. Only,
that is already flagged from everything that calls this function: Mesa GL
state updates flag it before even calling down into the driver, and the
calls from the DRI2 window system framebuffer update path end up flagging
it as part of the ResizeBuffers() hook.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The computed fields are updated appropriately as part of the normal draw
call path due to _NEW_BUFFERS being set.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
For winsys FBOs, the bounds are appropriately updated immediately upon
_mesa_resize_framebuffer(). For user FBOs, they're updated as part of the
normal draw path state update due to _NEW_BUFFERS having been flagged.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Of the places noting a _NEW_DEPTH dependency, all were already checking
for _NEW_BUFFERS if appropriate.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2/3 packets depending on Stencil._Enabled already checked for
_NEW_BUFFERS, so just add _NEW_BUFFERS to the remaining one.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The viewport (ctx->Viewport._WindowMap) doesn't change with drawable size
changes, and we update scissor (ctx->DrawBuffer->_Xmin and friends) on
_NEW_BUFFERS in things like brw_sf_state.c.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Things like brw_sf.c that need to know about orientation are already
recomputing on _NEW_BUFFERS.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
_mesa_resize_framebuffer(), the default value of the ResizeBuffers hook,
already checks for a window system framebuffer and walks the renderbuffers
calling AllocStorage().
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
_mesa_resize_framebuffer(), the default value of the ResizeBuffers hook,
already checks for a window system framebuffer and walks the renderbuffers
calling AllocStorage().
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This existed to tell the core not to call GetBufferSize, except that even
if you didn't set it nothing happened because nobody had a GetBufferSize.
v2: Remove two more instances of setting the field (from Brian)
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Only the GDI driver set it to non-NULL any more, and that driver has a
Viewport hook that should keep it limping along as well as it ever has.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
commit 26d86d26f9 added
gl_shader_program::UniformLocationBaseScale. According to the code
comments in that commit, UniformLocationBaseScale "must be >=1".
UniformLocationBaseScale is of type unsigned. Coverity reported a "Macro
compares unsigned to 0" defect as well.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The function does array bounds checking. Note, this exposes a
bug in the svga_mark_surface_dirty() function: we're calling
svga_age_texture_view() with a texture slice instead of mipmap
level. This can lead to a failed assertion. That'll be fixed next.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
- use mgwhelp -- the successor for bfdhelp which does not have a hard
dependency on BFD, and works on 64bits.
- use a macro instead of hand-typing to dispatch DbgHelp functions
- dump line numbers
- dump module names when symbols are not available
- support 64bits.
- add comments
Reviewed-by: Brian Paul <brianp@vmware.com>
Because our code couldn't handle it we were skipping rendering
if we detected overflows. According to the spec we should
still render but with all 0 vertices, which is what the llvm
code already does. So for the llvm paths lets enable processing
even if an overflow condition has been detected.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Before we could easily overflow if start+count>max integer. To
avoid it we can just iterate over the count. This makes sure
that we never crash, since most of the overflow conditions
is already handled.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Make pass_render_condition() available for blitter, and check for render
condition in (and only in) clear(), clear_render_target(), and
clear_depth_stencil().
Add ilo_shader_select_kernel_routing() to construct 3DSTATE_SBE. It is called
in ilo_finalize_states(), rather than in create_fs_state(), as it depends on
VS/GS and rasterizer states.
With this change, ilo_shader_internal.h is no longer needed for
ilo_gpe_gen6.c.
This allows us to remove ilo_shader_internal.h from ilo_gpe_gen7.c. The
unfinished code in 3DSTATE_DS, 3DSTATE_HS, and INTERFACE_DESCRIPTOR_DATA are
partly or entirely removed.
The unmodified pipe_stream_output_info describes its outputs as if they are in
TGSI_FILE_OUTPUT. Remap the register indices to where they appear in the VUE.
TGSI_SEMANTIC_PSIZE needs a little care because it is at the W channel.
When a new VS kernel is generated, a newly added function,
ilo_gpe_init_vs_cso(), is called to construct 3DSTATE_VS command in
ilo_shader_cso. When the command needs to be emitted later, we copy the
command from the CSO instead of constructing it dynamically.
Add ilo_shader_get_type() to query the type (PIPE_SHADER_x) of the shader.
Add ilo_shader_get_kernel_offset() and ilo_shader_get_kernel_param() to query
the cache offset and various kernel parameters of the selected kernel.
Add ilo_shader_select_kernel() to replace the dependency table,
ilo_shader_variant_init(), and ilo_shader_state_use_variant().
With the changes, we no longer need to include ilo_shader_internal.h in
ilo_state.c.
Replace ilo_shader_state_create() by
ilo_shader_create_vs()
ilo_shader_create_gs()
ilo_shader_create_fs()
ilo_shader_create_cs()
Rename ilo_shader_state_destroy() to ilo_shader_destroy(). The old
ilo_shader_destroy() is renamed to ilo_shader_destroy_kernel().
To prevent segfaults in the AA line module, the code will check for a
valid pointer to the aaline_stage in the draw context.
Fixes segfault from backtrace:
* aaline_stage_from_pipe
aaline_delete_fs_state
Reviewed-by: Brian Paul <brianp@vmware.com>
swrastGetImage rounds the pitch up to 4 bytes for compatibility reasons
that are explained in drisw_glx.c:bytes_per_line, so drisw_update_tex_buffer
must do the same.
Fixes window skew seen while running firefox over vnc on a 16-bit screen.
NOTE: This is a candidate for the stable branches.
[ajax: fixed typo in comment]
Reviewed-by: Stéphane Marchesin <marcheu@chromium.org>
Signed-off-by: Richard Sandiford <rsandifo@linux.vnet.ibm.com>
Squashed commit of the following:
commit 0857a7e105bfcbc4d1431b2cc56612094c747ca3
Author: Richard Sandiford <r.sandiford@uk.ibm.com>
Date: Tue Jun 18 12:25:07 2013 -0400
gallivm: Fix lp_build_rgba8_to_fi32_soa for big endian
Reviewed-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Richard Sandiford <r.sandiford@uk.ibm.com>
commit 0d65131649a8aa140e2db228ba779d685c4333e3
Author: Richard Sandiford <r.sandiford@uk.ibm.com>
Date: Tue Jun 18 12:25:07 2013 -0400
gallivm: Fix big-endian machines
This adds a bit-shift count to the format table, and adds the concept of
vector or bitwise alignment on gathers.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Richard Sandiford <r.sandiford@uk.ibm.com>
commit 9740bda9b7dc894b629ed38be9b51059ce90818f
Author: Richard Sandiford <r.sandiford@uk.ibm.com>
Date: Tue Jun 18 12:25:07 2013 -0400
llvmpipe: Fix convert_to_blend_type on big-endian
Reviewed-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Richard Sandiford <r.sandiford@uk.ibm.com>
commit ae037c2de0f029e4e99371c0de25560484f0d8df
Author: Richard Sandiford <r.sandiford@uk.ibm.com>
Date: Tue Jun 18 12:25:06 2013 -0400
util: Convert color pack to packed formats
This fixes them on big-endian.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Richard Sandiford <r.sandiford@uk.ibm.com>
commit 5b05ac0c89ae092ea8ba5bba9f739708d7396b5c
Author: Richard Sandiford <r.sandiford@uk.ibm.com>
Date: Tue Jun 18 12:25:06 2013 -0400
graw-xlib: Convert to packed formats
Reviewed-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Richard Sandiford <r.sandiford@uk.ibm.com>
commit 51396e7d098cb6ff794391cf11afe4dbf86dbea0
Author: Richard Sandiford <r.sandiford@uk.ibm.com>
Date: Tue Jun 18 12:25:06 2013 -0400
format: Convert to packed formats
Reviewed-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Richard Sandiford <r.sandiford@uk.ibm.com>
commit 417b60bc66eb450e68a92ab0e47f76e292b385e6
Author: Adam Jackson <ajax@redhat.com>
Date: Tue Jun 18 12:25:06 2013 -0400
st/dri: Convert to packed formats
Reviewed-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Richard Sandiford <r.sandiford@uk.ibm.com>
commit 0934b2e022a5e0847d312c40734e2b44cac52fd8
Author: Richard Sandiford <r.sandiford@uk.ibm.com>
Date: Tue Jun 18 12:25:06 2013 -0400
st/xlib: Convert to packed formats
Reviewed-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Richard Sandiford <r.sandiford@uk.ibm.com>
commit a307ea3c3716a706963acce7966b5e405ba11db9
Author: Richard Sandiford <r.sandiford@uk.ibm.com>
Date: Tue Jun 18 12:25:06 2013 -0400
gbm: Convert to packed formats
Reviewed-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Richard Sandiford <r.sandiford@uk.ibm.com>
commit 53eebdd253e1960a645ea278f31d7ef6a6cf4aeb
Author: Richard Sandiford <r.sandiford@uk.ibm.com>
Date: Tue Jun 18 12:25:06 2013 -0400
tests: Convert to packed formats
Reviewed-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Richard Sandiford <r.sandiford@uk.ibm.com>
commit 2f77fe3ee524945eacd546efcac34f7799fb3124
Author: Adam Jackson <ajax@redhat.com>
Date: Tue Jun 18 13:07:37 2013 -0400
gallium: Document packed formats
Signed-off-by: Adam Jackson <ajax@redhat.com>
commit 1f1017159ce951f922210a430de9229f91f62714
Author: Richard Sandiford <r.sandiford@uk.ibm.com>
Date: Tue Jun 18 12:25:06 2013 -0400
gallium: Introduce 32-bit packed format names
These are for interacting with buffers natively described in terms of
bit shifts, like X11 visuals:
uint32_t xyzw8888 = (x << 0) | (y << 8) | (z << 16) | (w << 24);
Define these in terms of (endian-dependent) aliases to the array-style
format names.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Richard Sandiford <r.sandiford@uk.ibm.com>
commit 6cc7ab1ee66ed668da78c1d951dfd7782b4e786a
Author: Adam Jackson <ajax@redhat.com>
Date: Mon Jun 3 12:10:32 2013 -0400
gallium: Document format name conventions
v2:
- Fix a channel name thinko (Michel Dänzer)
- Elaborate on SCALED versus INT
- Add links to DirectX and FOURCC docs
Signed-off-by: Adam Jackson <ajax@redhat.com>
commit df4d269e7fb62051a3c029b84147465001e5776e
Author: Adam Jackson <ajax@redhat.com>
Date: Tue Jun 18 12:25:06 2013 -0400
gallivm: Remove all notion of byte-swapping
Signed-off-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Adam Jackson <ajax@redhat.com>
The result isn't always 0 in this case (depends on query type),
so instead of special casing this just use the ordinary path (should result
in correct values thanks to initialization in query_begin/end), just
skipping the fence wait.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This fixes the bytestream parsing of mpeg-1 stream, but still leaves
open a number of issues with the interpretation:
- IDCT mismatch control is not correct for MPEG-1.
- Slices do not have to start and end on the same horizontal row of macroblocks.
- picture_coding_type = 4 (D-pictures) is not handled.
- full_pel_*_vector is not handled.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
The results of a bary.f do not appear to be immediatley available, but
there is no explicit sync bit. Instead the compiler must just ensure
that there are a minimum number of instructions following the bary
before use of the result of the bary. We aren't clever enough for that
so just throw in some nop's.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
If we are accumulating result into tmp.x, and need a mov to final
destination, we want to move the .x component into all of the components
enabled from the read dest's writemask, ie. we want:
MOV dst.xyzw tmp.xxxx
rather than:
MOV dst.xyzw tmp.xyzw
Signed-off-by: Rob Clark <robclark@freedesktop.org>
This code had no relation to ir_to_mesa.cpp, since it was also used by
intel and state_tracker, and most of it was duplicated with the standalone
compiler (which has periodically drifted from the Mesa copy).
v2: Split from the ir_to_mesa to shaderapi.c changes.
Acked-by: Paul Berry <stereotype441@gmail.com> (v1)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
There was nothing ir_to_mesa-specific about this code, but it's not
exactly part of the compiler's core turning-source-into-IR job either.
v2: Split from the ir_to_mesa to glsl/ commit, avoid renaming the sh
variable.
Acked-by: Paul Berry <stereotype441@gmail.com> (v1)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I noticed this while trying to merge code with the builtin compiler, which
does set it.
Note that this causes two regressions in piglit in
default-precision-sampler.* which try to link without a vertex or fragment
shader, due to being run under the desktop glslparsertest binary (using
ARB_ES3_compatibility) that doesn't know about this requirement.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
We were duplicating this code all over the place, and they all would need
updating for the next set of shader targets.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
We have ir->print() to do the old declaration of a visitor and having the
IR accept the visitor (yuck!). And now you can call _mesa_print_ir()
safely anywhere that you know what an ir_instruction is.
A couple of missing printf("\n")s are added in error paths -- when an
expression is handed to the visitor, it doesn't print '\n' (since it might
be a step in printing a whole expression tree).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
No more forgetting to #include "ir_print_visitor.h" when doing temporary
debug code, or forgetting and leaving it in after removing your temporary
debug code. Also, available from C code so you don't need to move the
caller to C++ just to call it (see also: ir_to_mesa.cpp).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Based from the code from the good old python state tracker.
Extremely handy to diagnose regressions in state trackers.
Reviewed-by: Brian Paul <brianp@vmware.com>
Not used yet but there's a couple of places in llvmpipe which should use this
(occlusion count is currently very inefficent if there's no cpu popcnt
instruction).
Handle PIPE_QUERY_GPU_FINISHED and PIPE_QUERY_TIMESTAMP_DISJOINT, and
also fill out the ps_invocations and c_primitives from the
PIPE_QUERY_PIPELINE_STATISTICS (the others in there should already
be handled). Note that ps_invocations isn't pixel exact, just 16 pixel
exact but I guess it's better than nothing.
Doesn't really seem to work correctly but there's probably bugs elsewhere.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The driver can do render_condition but wasn't handling the occlusion
and so_overflow predicates (though the latter might not work yet due
to gs support).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The semantics didn't really make sense, not really matching neither d3d9
(though the docs are all broken there) nor d3d10. So make it match d3d10
semantics, which actually gives meaning to the "disjoint" part.
Drivers are fixed up in a very primitive way, I have no idea what could
actually cause the counter to become unreliable so just always return
FALSE for the disjoint part.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Move some functions from the svga_tgsi_insn.h header into the
svga_tgsi_insn.c file since they're only used there. Plus, add
comments and fix formatting.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The new code makes the shader cache manages all shaders and be able to upload
all of them to a caller-provided bo as a whole.
Previously, we uploaded only the bound shaders. When a different set of
shaders is bound, we had to allocate a new kernel bo to upload if the current
one is busy.
When doing blit using the 3D engine, the rasterizer cso may be NULL.
Ported from nvc0 commit 8aa8b0539.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
We need to set up a handler for the global_remove event that gets sent
out when a global gets removed. Without the handler we end up calling
a NULL pointer.
https://bugs.freedesktop.org/show_bug.cgi?id=65910
NOTE: This is a candidate for the stable branches.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
When rendering to a texture with BaseLevel set, the miptree may be laid
out such that BaseLevel is in level 0 of the miptree (to avoid wasting
memory on unused levels between 0 and BaseLevel-1). In that case, we
have to shift our render target's level down to the appropriate level of
the smaller miptree.
The WebGL test in combination with a meta code relating to
glGenerateMipmap also triggered a similar failure scenario.
This GPU hang regression was introduced by c754f7a8.
Bugzilla: http://bugs.freedesktop.org/show_bug.cgi?id=65324
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
This reverts commit 41966fdb3b.
While it's a lot cleaner it causes regressions because
the draw interface is always called from the draw functions
of the drivers (because the buffers need to be mapped) which
means that the stream output buffers endup being cleared on
every draw rather than on setting.
Signed-off-by: Zack Rusin <zackr@vmware.com>
honor render_condition for clear_render_target and clear_depth_stencil.
Also add minimal support for occlusion predicate, though it can't be active
at the same time as an occlusion query yet.
While here also switchify some large if-else (actually just mutually
exclusive if-if-if...) constructs.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
For conditional rendering this makes it possible to skip rendering
if either the predicate is true or false, as supported by d3d10
(in fact previously it was sort of implied skip rendering if predicate
is false for occlusion predicate, and true for so_overflow predicate).
There's no cap bit for this as presumably all drivers could do it trivially
(but this patch does not implement it for the drivers using true
hw predicates, nvxx, r600, radeonsi, no change is expected for OpenGL
functionality).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Add ilo_gpe_init_zs_surface() to construct
3DSTATE_DEPTH_BUFFER
3DSTATE_STENCIL_BUFFER
3DSTATE_HIER_DEPTH_BUFFER
at surface creation time. This allows fast state emission in draw_vbo().
This gets us support for blitting to attachment types other than
textures.
v2: fix up comments from review by Kenneth.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Paul Berry <stereotype441@gmail.com>
Now any caller (such as glCopyPixels()) can benefit from it, and it only
changes the correct subset of the destination instead of a whole teximage.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Apparently we don't have any piglit tests for this, because it would have
assertion failed in a debug build, or just rendered wrong in a non-debug
build if the destination wasn't covering whole tiles.
v2: Use the new macros.
Reviewed-by: Paul Berry <stereotype441@gmail.com> (v1)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
We're going to add more BCS_SWCTRL setup instances soon, and you have to
be careful to have the set and restore atomic with the rendering that's
done, so that our state doesn't leak out to other rendering processes.
v2: Rewrite the patch to have batch begin/advance macros so that magic
numbers don't get sprinkled around (and so you don't mix up your
do-I-need-to-reset vs what-do-I-reset-to logic, which I nearly did in
the next patch when first writing it)
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Intel had brokenness here, and I'd like to continue moving Mesa toward
hiding 1D_ARRAY's ridiculousness inside of the core, like we did with
MapTextureImage. Fixes copyteximage 1D_ARRAY on intel.
There's still an impedance mismatch in meta when falling back to read and
texsubimage, since texsubimage expects coordinates into 1D_ARRAY as
(width, slice, 0) instead of (width, 0, slice).
v2: Fix offset of scanline reads from the source. (Thanks Brian!), replace
dd.h comment with Paul's text and replace early exit with an assert.
Reviewed-by: Brian Paul <brianp@vmware.com> (v1)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
Reviewed-by: Paul Berry <stereotype441@gmail.com> (v1)
I noticed this code didn't work as advertised while doing some passing around
of TGSI shaders and trying to reparse them, and things failing.
This seems to fix it here for at least the small test case I hacked into a
graw test.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Commit 1f82bf12ed inadvertently broke it, checking for __IEEE_FLOAT on all
Alpha machines instead of only on VMS as before.
NOTE: This is a candidate for the 9.1 branch.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Andreas Boll <andreas.boll.dev@gmail.com>
Signed-off-by: Sven Joachim <svenjoac@gmx.de>
Fixes window skew seen while running gnome on a 16-bit screen over vnc.
NOTE: This is a candidate for stable release branches.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Richard Sandiford <rsandifo@linux.vnet.ibm.com>
Fixes a crash seen while running gnome on a 16-bit screen over vnc.
NOTE: This is a candidate for stable release branches.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Richard Sandiford <rsandifo@linux.vnet.ibm.com>
byteswap.h and bswap_32 aren't portable, replace them with calls to
gallium's util_bswap32 as suggested by Mark Kettenis. Lets these files
build on OpenBSD.
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
gl can use elts without setting indices, in which case
our eltMax was set to 0 and always invoking the overflow
condition. So by default set eltMax to maximum, it will
be curbed by draw_set_indexes (if it ever comes) and if
not then it will let gl's glVertexPointer/glDrawArrays
work correctly. Fixes piglit's
triangle-rasterization-overdraw test.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Moves clearing of the draw so target buffers to the draw
module. They had to be cleared in the drivers before
which was quite messy.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Port BLT code in ilo_blit.c to BLT-based blitting methods of ilo_blitter. Add
BLT-based clears. The latter is verifed with util_clear(), but it is not in
use yet.
Primitive restart with an arbitrary cut index was first supported as of
Haswell. It's very doubtful that they'd take that away in future
hardware, so we may as well alter the check now.
The PRM suggests a larger layout, mostly to support having
gl_ClipDistance[] somewhere predictable for the fixed-function clipper
-- but it didn't actually arrive in Gen5.
Just use the same layout for both Gen4 and Gen5.
No Piglit regressions.
Improves performance in CS:S Video Stress Test by ~3%.
V2: - Remove now-useless function for determining the SF URB read offset
- Remove now-unused BRW_VARYING_SLOT_POS_DUPLICATE
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Required by ARB_shading_language_420pack. Note that the 420pack spec
incorrectly specifies their values as (Min, Max) = (-7, 8) when they
should be (-8, 7) as listed in the GLSL 4.30 and ESSL 3.0 specs.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Assert that we do not support user vertex/index/constant buffers. Issue a
warning when a sampler view is created for a resource without
PIPE_BIND_SAMPLER_VIEW.
The temporary texture should have either PIPE_BIND_RENDER_TARGET or
PIPE_BIND_DEPTH_STENCIL set in addition to PIPE_BIND_SAMPLER_VIEW.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
There are strict limits on those registers. Define the maximums
and use them instead of magic numbers. Also allows us to add
some extra sanity checks.
Suggested by Brian.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
We don't need the clamped variable, because we can just
return early. We should also do the regular culling after
the distance culling passes.
All spotted by Brian.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
When a resource is busy and is mapped with
PIPE_TRANSFER_DISCARD_WHOLE_RESOURCE, the underlying bo is replaced. We need
to mark states affected by the resource dirty.
With this change, we no longer have to emit vertex buffers and index buffer
unconditionally.
Even with hardware contexts, since we do not pin resources, we have to re-emit
the states so that the resources are referenced (by cp->bo) and their offsets
are updated in case they are moved. This also allows us to elimiate cp flush
in is_bo_busy().
Problem: The IEEE float optimized version of UNCLAMPED_FLOAT_TO_UBYTE
in macros.h computed incorrect results for inputs in the range
0x3f7f0000 (=0.99609375) to 0x3f7f7f80 (=0.99803924560546875)
inclusive. 0x3f7f7f80 is the IEEE float value that results in 254.5
when multiplied by 255. With rounding mode "round to closest even
integer", this is the largest float in the range 0.0-1.0 that is
converted to 254 by the generic implementation of
UNCLAMPED_FLOAT_TO_UBYTE. The IEEE float optimized version
incorrectly defined the cut-off for mapping to 255 as 0x3f7f0000
(=255.0/256.0). The same bug was present in the function
float_to_ubyte in u_math.h.
Fix: The proposed fix replaces the incorrect cut-off value by
0x3f800000, which is the IEEE float representation of 1.0f. 0x3f7f7f81
(or any value in between) would also work, but 1.0f is probably
cleaner.
The patch does not regress piglit on llvmpipe and on i965 on sandy
bridge.
Tested-by Stéphane Marchesin <marcheu@chromium.org>
Reviewed-by Stéphane Marchesin <marcheu@chromium.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Page flipping generates an invalidate event every frame, causing reallocations
of all private resources (MSAA and depth-stencil).
Reusing the resources may improve performance (especially under memory
pressure).
Reviewed-by: Brian Paul <brianp@vmware.com>
We have to use pipe->blit, not resource_copy_region, so that the read buffer
is resolved if it's multisampled. I also removed the CPU-based copying,
which just did format conversion (obsoleted by the blit).
Also, the layer/slice/face of the read buffer is taken into account (this was
ignored).
Last but not least, the format choosing is improved to take float and integer
read buffers into account.
Reviewed-by: Brian Paul <brianp@vmware.com>
There were 2 issues with it:
- resource_copy_region doesn't allow different sample counts of both src
and dst, which can occur if we blit between a window and a FBO, and
the window has an MSAA colorbuffer and the FBO doesn't.
(this was the main motivation for using pipe->blit)
- blitting from or to a non-zero layer/slice/face was broken, because
rtt_face and rtt_slice were ignored.
blit_copy_pixels is now used even if the formats and orientation of
framebuffers don't match.
Reviewed-by: Brian Paul <brianp@vmware.com>
We did downsample (=resolve) MSAA resources to make ReadPixels work with MSAA
GLX visuals, which was enough for read-only color-only transfers.
This commit makes write color transfers and depth-stencil transfers work
in a similar manner. It does downsampling in transfer_map and upsampling
in transfer_unmap.
Reviewed-by: Brian Paul <brianp@vmware.com>
There isn't any difference between 32_FLOAT and 32_*INT in vertex fetching.
Both of them don't do any format conversion.
Reviewed-by: Brian Paul <brianp@vmware.com>
Previously we would generate uniform locations as (slot << 16) +
array_index. We do this to handle applications that assume the location
of a[2] will be +1 from the location of a[1]. This resulted in every
uniform location being at least 0x10000. The OpenGL 4.3 spec was
amended to require this behavior, but previous versions did not require
locations of array (or structure) members be sequential.
We've now encountered two applications that assume uniform values will
be "small." As far as we can tell, these applications store the GLint
returned by glGetUniformLocation in a int16_t or possibly an int8_t.
THIS BEHAVIOR IS NOT GUARANTEED OR IMPLIED BY ANY VERSION OF OpenGL.
Other implementations happen to have both these behaviors (sequential
array elements and small values) since OpenGL 2.0, so let's just match
their behavior.
Fixes "3D Bowling" on Android.
NOTE: This is a candidate for stable release branches.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-and-tested-by: Chad Versace <chad.versace@linux.intel.com>
This is used by _mesa_uniform_merge_location_offset and
_mesa_uniform_split_location_offset to determine how the base and offset
are packed. Previously, this value was hard coded as (1U<<16) in those
functions via the shift and mask contained therein. The value is still
(1U<<16), but it can be changed in the future.
The next patch dynamically generates this value.
NOTE: This is a candidate for stable release branches.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-and-tested-by: Chad Versace <chad.versace@linux.intel.com>
Use new util_fill_box helper for util_clear_render_target.
(Also fix off-by-one map error.)
v2: handle non-zero z correctly in new helper
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This patch adds code to place mcs_state into INTEL_MCS_STATE_RESOLVED
for miptrees that are capable of supporting fast color clears. This
will have no effect on buffers that don't undergo a fast color clear;
however, for buffers that do undergo a fast color clear, an MCS
miptree will be allocated (at the time of the first fast clear), and
will be used thereafter.
Reviewed-by: Eric Anholt <eric@anholt.net>
In certain circumstances the memory region underlying a miptree is
shared with other miptrees, or with other code outside Mesa's control.
This happens, for instance, when an extension like GL_OES_EGL_image or
GLX_EXT_texture_from_pixmap extension is used to associate a miptree
with an image existing outside of Mesa.
When this happens, we need to disable fast color clears on the miptree
in question, since there's no good synchronization mechanism to ensure
that deferred clear writes get performed by the time the buffer is
examined from the other miptree, or from outside of Mesa.
Fortunately, this should not be a performance hit for most
applications, since most applications that use these extensions use
them for importing textures into Mesa, rather than for exporting
rendered images out of Mesa. So most of the time the miptrees
involved will never experience a clear.
v2: Rework based on the fact that we have decided not to use an
accessor function to protect access to the region.
Reviewed-by: Eric Anholt <eric@anholt.net>
Resolve color buffers that have been fast-color cleared:
1. before texturing from the buffer (brw_predraw_resolve_buffers())
2. before using the buffer as the source in a blorp blit
(brw_blorp_blit_miptrees())
3. before mapping the buffer's miptree (intel_miptree_map_raw(),
intel_texsubimage_tiled_memcpy())
4. before accessing the buffer using the hardware blitter
(intel_miptree_blit(), do_blit_bitmap())
v2: Rework based on the fact that we have decided not to use an
accessor function to protect access to the region.
Reviewed-by: Eric Anholt <eric@anholt.net>
We already had code in intel_downsample_for_dri2_flush() for
downsampling front and back buffers when multisampling was in use.
This patch extends that function to perform fast color clear resolves
when necessary.
To account for the additional functionality, the function is renamed
to simply intel_resolve_for_dri2_flush().
Reviewed-by: Eric Anholt <eric@anholt.net>
This patch implements the "render target resolve" blorp operation.
This will be needed when a buffer that has experienced a fast color
clear is later used for a purpose other than as a render target
(texturing, glReadPixels, or swapped to the screen). It resolves any
remaining deferred clear operation that was not taken care of during
normal rendering.
Fortunately not much work is necessary; all we need to do is scale
down the size of the rectangle primitive being emitted, run the
fragment shader with the "Render Target Resolve Enable" bit set, and
ensure that the fragment shader writes to the render target using the
"replicated color" message. We already have a fragment shader that
does that (the shader that we use for fast color clears), so for
simplicity we re-use it.
Reviewed-by: Eric Anholt <eric@anholt.net>
The fragment shaders that to do color clears will be re-used to
perform so-called "render target resolves" (the resolves associated
with fast color clears). To prepare for that, this patch expands the
class hierarchy for blorp params by adding
brw_blorp_const_color_params (which will be used for all blorp
operations where the fragment shader outputs a constant color).
Some other data structures and functions were also renamed to use
"const_color" nomenclature where appropriate.
Reviewed-by: Eric Anholt <eric@anholt.net>
Since we defer allocation of the MCS miptree until the time of the
fast clear operation, this patch also implements creation of the MCS
miptree.
In addition, this patch adds the field
intel_mipmap_tree::fast_clear_color_value, which holds the most recent
fast color clear value, if any. We use it to set the SURFACE_STATE's
clear color for render targets.
v2: Flag BRW_NEW_SURFACES when allocating the MCS miptree. Generate a
perf_debug message if clearing to a color that isn't compatible with
fast color clear. Fix "control reaches end of non-void function"
build warning.
Reviewed-by: Eric Anholt <eric@anholt.net>
On Gen7+, MCS buffers are used both for compressed multisampled color
buffers and for "fast clear" of single-sampled color buffers.
Previous to this patch series, we didn't support fast clear, so we
only used MCS with multisampled bolor buffers.
As a first step to implementing fast clears, this patch modifies the
code that sets up SURFACE_STATE so that it configures the MCS buffer
whenever it is present, regardless of whether we are multisampling or
not.
Reviewed-by: Eric Anholt <eric@anholt.net>
This patch includes code to update the fast color clear state
appropriately when rendering occurs. The state will also need to be
updated when a fast clear or a resolve operation is performed; those
state updates will be added when the fast clear and resolve operations
are added.
v2: Create a new function, intel_miptree_used_for_rendering() to
handle updating the fast color clear state when rendering occurs.
Reviewed-by: Eric Anholt <eric@anholt.net>
This patch ifdefs out intel_mipmap_tree::mcs_mt when building the i915
(pre-Gen4) driver (MCS buffers aren't supported until Gen7, so there
is no need for this field in the i915 driver). This should make it a
bit easier to implement fast color clears without undue risk to i915.
Reviewed-by: Eric Anholt <eric@anholt.net>
When processing a buffer received from the X server,
intel_process_dri2_buffer() examines intel_region::name to determine
whether it's received a brand new buffer, or the same buffer it
received from the X server the last time it made a request.
However, this didn't work properly, because in the call to
intel_miptree_create_for_dri2_buffer(), we create a fresh intel_region
object to represent the buffer, and this was causing us to forget the
buffer's previous name.
This patch fixes things by copying over the region name when creating
the fresh intel_region object.
At the moment, this is just a minor performance optimization.
However, when fast color clears are added, it will be necessary to
ensure that the fast color clear state for a buffer doesn't get
discarded the next time we receive that buffer from the X server.
Reviewed-by: Eric Anholt <eric@anholt.net>
There is really nothing in struct intel_bo, and having it alias drm_intel_bo
makes the winsys impose almost zero overhead.
We can make the overhead gone completely by making the functions static
inline, if needed.
The motivation is to kill tiling and pitch in struct intel_bo. That requires
us to make tiling and pitch not queryable, and be passed around as function
parameters.
AC_MSG_ERROR([Could not find clang internal header stddef.h in $CLANG_RESOURCE_DIR Use --with-clang-libdir to specify the correct path to the clang libraries.]))
[AC_MSG_ERROR([Could not find clang internal header stddef.h in $CLANG_RESOURCE_DIR Use --with-clang-libdir to specify the correct path to the clang libraries.])])
fi
else
MESA_LLVM=0
LLVM_VERSION_INT=0
MESA_LLVM=0
LLVM_VERSION_INT=0
fi
else
MESA_LLVM=0
@@ -1687,7 +1722,7 @@ gallium_check_st() {
gallium_require_llvm() {
if test "x$MESA_LLVM" = x0; then
case "$host_cpu" in
i*86|x86_64) AC_MSG_ERROR([LLVM is required to build $1 on x86 and x86_64]);;
i*86|x86_64|amd64) AC_MSG_ERROR([LLVM is required to build $1 on x86 and x86_64]);;
esac
fi
}
@@ -1709,7 +1744,7 @@ radeon_llvm_check() {
if test "$LLVM_VERSION_INT" -lt "${LLVM_REQUIRED_VERSION_MAJOR}0${LLVM_REQUIRED_VERSION_MINOR}"; then
AC_MSG_ERROR([LLVM $LLVM_REQUIRED_VERSION_MAJOR.$LLVM_REQUIRED_VERSION_MINOR or newer is required for r600g and radeonsi.])
fi
if test true && $LLVM_CONFIG --targets-built | grep -qv '\<R600\>' ; then
if test true && $LLVM_CONFIG --targets-built | grep -qvw 'R600' ; then
AC_MSG_ERROR([LLVM R600 Target not enabled. You can enable it when building the LLVM
sources with the --enable-experimental-targets=R600
configure flag])
@@ -1846,7 +1881,7 @@ if test "x$MESA_LLVM" != x0; then
if test "x$with_llvm_shared_libs" = xyes; then
dnl We can't use $LLVM_VERSION because it has 'svn' stripped out,
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=65173">Bug 65173</a> - segfault in _mesa_get_format_datatype and _mesa_get_color_read_type when state dumping with glretrace</li>
</ul>
<h2>Changes</h2>
<p>The full set of changes can be viewed by using the following GIT command:</p>
<pre>
git log mesa-9.1.3..mesa-9.1.4
</pre>
<p>Alan Coopersmith (2):</p>
<ul>
<li>integer overflow in XF86DRIOpenConnection() [CVE-2013-1993 1/2]</li>
<li>integer overflow in XF86DRIGetClientDriverName() [CVE-2013-1993 2/2]</li>
</ul>
<p>Alex Deucher (3):</p>
<ul>
<li>radeonsi: add support for hainan chips</li>
<li>radeonsi: add Hainan pci ids</li>
<li>winsys/radeon: add env var to disable VM on Cayman/Trinity</li>
</ul>
pp
<p>Andreas Boll (1):</p>
<ul>
<li>glapi: Add some missing static_dispatch="false" annotations to es_EXT.xml</li>
</ul>
<p>Anuj Phogat (1):</p>
<ul>
<li>intel: Add a null pointer check before dereferencing the pointer</li>
</ul>
<p>Armin K (1):</p>
<ul>
<li>gallivm: Fix build with LLVM 3.3</li>
</ul>
<p>Brian Paul (9):</p>
<ul>
<li>mesa: fix the compressed TexSubImage size checking code</li>
<li>st/mesa: generate GL_OUT_OF_MEMORY if we can't create the index buffer</li>
<li>mesa: fix error checking of DXT sRGB formats in _mesa_base_tex_format()</li>
<li>st/glx/xlib: check for null ctx pointer in glXIsDirect()</li>
<li>xlib: check for null ctx pointer in glXIsDirect()</li>
<li>st/glx: add null ctx check in glXDestroyContext()</li>
<li>xlib: add null ctx check in glXDestroyContext()</li>
<li>meta: move vertex array enables for mipmap generation</li>
<li>mesa: handle missing read buffer in _mesa_get_color_read_format/type()</li>
</ul>
<p>Bryan Cain (1):</p>
<ul>
<li>nv50: initialize kick_notify callback in nv50_create</li>
</ul>
<p>Chad Versace (3):</p>
<ul>
<li>egl/android: Fix error condition for EGL_ANDROID_image_native_buffer</li>
<li>i965: Fix glColorPointer(GL_FIXED)</li>
<li>intel: Return early if miptree allocation fails</li>
</ul>
<p>Chia-I Wu (1):</p>
<ul>
<li>u_vbuf: fix index buffer leak</li>
</ul>
<p>Chris Forbes (8):</p>
<ul>
<li>mesa: add accessor for effective stencil ref</li>
<li>intel: Use accessor for stencil reference values</li>
<li>nouveau: Use accessor for stencil reference values</li>
<li>radeon: Use accessor for stencil reference values</li>
<li>st: Use accessor for stencil reference values</li>
<li>swrast: Use accessor for stencil reference values</li>
<li>mesa: Stop clamping stencil reference value at specification time</li>
<li>mesa: Use accessor for stencil reference values in glGet</li>
</ul>
<p>Chí-Thanh Christopher Nguyễn (1):</p>
<ul>
<li>targets/dri-i915: Force c++ linker in all cases</li>
</ul>
<p>Daniel Martin (1):</p>
<ul>
<li>Fix build of swrast only without libdrm</li>
</ul>
<p>Dave Airlie (1):</p>
<ul>
<li>i965: fix problem with constant out of bounds access (v3)</li>
</ul>
<p>Eric Anholt (10):</p>
<ul>
<li>mesa: Make core Mesa allocate the texture renderbuffer wrapper.</li>
<li>mesa: Make gl_renderbuffers backed by EGL images use FinishRenderTexture.</li>
<li>i965/fs: Bake regs_written into the IR instead of recomputing it later.</li>
<li>i965/vs: Fix implied_mrf_writes() for integer division pre-gen6.</li>
<li>intel: Add support for writing to our linear-temporary-CPU-map case.</li>
<li>intel: Do temporary CPU maps of textures that are too big to GTT map.</li>
<li>intel: Avoid making tiled miptrees we won't be able to blit.</li>
<li>intel: Fix MRT handling of glBitmap().</li>
<li>intel: Fix format handling of blit glBitmap()</li>
<li>i965: Shut up the last release build warning.</li>
</ul>
<p>Fabian Bieler (2):</p>
<ul>
<li>mesa/st: Don't copy propagate from swizzles.</li>
<li>mesa/program: Don't copy propagate from swizzles.</li>
</ul>
<p>Frank Henigman (1):</p>
<ul>
<li>intel: initialize fs_visitor::params_remap in constructor</li>
</ul>
<p>Ian Romanick (2):</p>
<ul>
<li>docs: Add 9.1.3 release md5sums</li>
<li>mesa: Bump version to 9.1.4</li>
</ul>
<p>José Fonseca (1):</p>
<ul>
<li>scons: Fix implicit python dependency discovery on Windows.</li>
</ul>
<p>Kenneth Graunke (17):</p>
<ul>
<li>mesa: Add i965 varying index patches to .cherry-ignore.</li>
<li>i965: Turn brw->urb.vs_size and gs_size into local variables.</li>
<li>i965: Use a variable for the push constant size in kB.</li>
<li>i965: Update URB partitioning code for Haswell's GT3 variant.</li>
<li>i965: Add chipset limits for the Haswell GT3 variant.</li>
<li>i965: Enable the Bay Trail platform.</li>
<li>mesa: Add a reverted commit to cherry-ignore.</li>
<li>vbo: Ignore PRIMITIVE_RESTART_FIXED_INDEX for glDrawArrays().</li>
<li>mesa: Add a helper function for determining the restart index.</li>
<li>vbo: Use the new primitive restart index helper function.</li>
<li>i965: Use the correct restart index for fixed index mode on Haswell.</li>
<li>mesa: Cherry-ignore a patch that got picked but squashed.</li>
<li>i965: Fix can_cut_index_handle_restart_index() for byte/short types.</li>
<li>st/mesa: Go back to using ctx->Array.RestartIndex, not _RestartIndex.</li>
<li>mesa: Ignore fixed-index primitive restart in ArrayElement().</li>
<li>mesa: Delete the ctx->Array._RestartIndex derived state.</li>
<li>glsl: Bail on parsing if the #version directive is bogus.</li>
</ul>
<p>Lauri Kasanen (1):</p>
<ul>
<li>r600g: Correctly initialize the shader key, v2</li>
</ul>
<p>Maarten Lankhorst (4):</p>
<ul>
<li>nvc0: fix up video buffer alignment requirements</li>
<li>nvc0: kill assert in ppp code</li>
<li>nvc0: set rsvd_kick correctly</li>
<li>nvc0: allow frame dropping in h264</li>
</ul>
<p>Marek Olšák (7):</p>
<ul>
<li>radeonsi: increase array size for shader inputs and outputs</li>
<li>vbo: fix possible use-after-free segfault after a VAO is deleted</li>
<li>glsl: fix the value of gl_MaxFragmentUniformVectors</li>
<li>st/mesa: initialize all program constants and UBO limits</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=47824">Bug 47824</a> - osmesa using --enable-shared-glapi depends on libgl</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=62362">Bug 62362</a> - Crash when using Wayland EGL platform</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=63435">Bug 63435</a> - [Regression since 9.0] Flickering in EGL OpenGL full-screen window with swap interval 1</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=64087">Bug 64087</a> - Webgl conformance shader-with-non-reserved-words crash when mesa is compiled without --enable-debug</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=65236">Bug 65236</a> - [i965] Rendering artifacts in VDrift/GL2</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=66558">Bug 66558</a> - RS690: 3D artifacts when playing SuperTuxKart</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=66847">Bug 66847</a> - compilation broken with llvm 3.3</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=66850">Bug 66850</a> - glGenerateMipmap crashes when using GL_TEXTURE_2D_ARRAY with compressed internal format</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=66921">Bug 66921</a> - [r300g] Heroes of Newerth: HiZ related corruption</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=67283">Bug 67283</a> - VDPAU doesn't work on hybrid laptop through DRI_PRIME</li>
</ul>
<h2>Changes</h2>
<p>The full set of changes can be viewed by using the following GIT command:</p>
<pre>
git log mesa-9.1.5..mesa-9.1.6
</pre>
<p>Andreas Boll (1):</p>
<ul>
<li>configure.ac: Require llvm-3.2 for r600g/radeonsi llvm backends</li>
</ul>
<p>Brian Paul (4):</p>
<ul>
<li>mesa: handle 2D texture arrays in get_tex_rgba_compressed()</li>
<li>meta: handle 2D texture arrays in decompress_texture_image()</li>
<li>mesa: implement mipmap generation for compressed 2D array textures</li>
<li>mesa: improve free() cleanup in generate_mipmap_compressed()</li>
</ul>
<p>Carl Worth (7):</p>
<ul>
<li>docs: Add 9.1.5 release md5sums</li>
<li>Merge 'origin/9.1' into stable</li>
<li>cherry-ignore: Drop 13 patches from the pick list</li>
<li>get-pick-list.sh: Include commits mentionining "CC: mesa-stable..." in pick list</li>
<li>get-pick-list: Allow for non-whitespace between "CC:" and "mesa-stable"</li>
<li>get-pick-list: Ignore commits which CC mesa-stable unless they say "9.1"</li>
<li>Bump version to 9.1.6</li>
</ul>
<p>Chris Forbes (5):</p>
<ul>
<li>i965/Gen4: Zero extra coordinates for ir_tex</li>
<li>i965/vs: Fix flaky texture swizzling</li>
<li>i965/vs: set up sampler state pointer for Gen4/5.</li>
<li>i965/vs: Put lod parameter in the correct place for Gen4</li>
<li>i965/vs: Gen4/5: enable front colors if back colors are written</li>
</ul>
<p>Christoph Bumiller (1):</p>
<ul>
<li>nv50,nvc0: s/uint16/uint32 for constant buffer offset</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=66779">Bug 66779</a> - Use of uninitialized stack variable with brw_search_cache()</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=68233">Bug 68233</a> - Valgrind errors in mesa</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=68250">Bug 68250</a> - Automatic mipmap generation with texture compression produces borders that fade to black</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=44618">Bug 44618</a> - Cross-compilation broken by glsl builtin_compiler</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=46632">Bug 46632</a> - Make the alignment checks for the readpixel blit fastpath a bit more lenient</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=47116">Bug 47116</a> - Enemy territory freezes with rs880 and commit fbebd431ec4e2e461a0cbcd5f3a04a000b8f6bbf</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=47248">Bug 47248</a> - autogen missing dependency on flex and bison, causes infinite loop in glsl build</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=50655">Bug 50655</a> - [r600g][RV670 HD3870] Ioquake games causes GPU lockup (waiting for 0x00003039 last fence id 0x00003030)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=51471">Bug 51471</a> - [965gm] Corrupted graphics in corners of screen with pixel shaders enabled</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=51782">Bug 51782</a> - mesa-8.0.3: fails to compile against uclibc</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=58680">Bug 58680</a> - [IVB] Graphical glitches in 0 A.D</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=58872">Bug 58872</a> - Mac OS X configure: error: Couldn't find clock_gettime</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=59322">Bug 59322</a> - r300g MSAA breaks Half-Life 2 in Wine</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=59364">Bug 59364</a> - [bisected] Mesa build fails: clientattrib.c:33:22: fatal error: indirect.h: No such file or directory</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=59439">Bug 59439</a> - glCopyPixels generates no fragments (occlusion_query_meta_fragments test fails)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=59440">Bug 59440</a> - glBitmap generates no fragments (occlusion_query_meta_fragments test fails)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=60086">Bug 60086</a> - Wayland platform backend crashes if there's no back buffer during dri2_swap_buffers</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=60633">Bug 60633</a> - EXT_texture_sRGB does not work in game The Cave on IvyBridge</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=60737">Bug 60737</a> - In GLSL ES, a missing FS precision qualifier does not generate an error</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=60866">Bug 60866</a> - GLSL performance issues for uniform buffer objects</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=61036">Bug 61036</a> - Shader fails to build in LLVMpipe, aborts program</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=61200">Bug 61200</a> - insufficient linking of libxatracker.so</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=61635">Bug 61635</a> - glVertexAttribPointer(id, GL_UNSIGNED_BYTE, GL_FALSE,...) does not work</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=62466">Bug 62466</a> - r600g hyperz lockups with KSP 0.19</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=62669">Bug 62669</a> - HyperZ freeze when playing PrBoom-Plus demo with lots of monsters</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=62721">Bug 62721</a> - GPU lockup in Minecraft 1.5.1 with HyperZ</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=64959">Bug 64959</a> - Cannot build against EGL without X11</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=65112">Bug 65112</a> - glcpp hangs parsing line continuations</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=65958">Bug 65958</a> - GPU Lockup on Trinity 7500G</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=66450">Bug 66450</a> - JUNIPER UVD accelerated playback of MPEG 1/2 streams does not work</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=66606">Bug 66606</a> - [i965 bisected]GLBenchmark 2.5.1/2.7.0 sometimes render error with gnome-session enabling SNA</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=66713">Bug 66713</a> - Team Fortress 2 crashes with r600-sb on HD4850</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=67354">Bug 67354</a> - glsl_parser.cpp is broken with bison 3.0</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=67548">Bug 67548</a> - glGetAttribLocation seems to be broken</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=67927">Bug 67927</a> - R600_DEBUG=sb: Celestia show 2 earths, one wrongly rendered</li>
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.